Building a News Roundup with Docker Agent, Docker Model Runner, and Skill

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. I wanted a lightweight way to automate my IT news roundups without burning through AI credits. So I built a Docker Agent skill that uses the Brave Search API to fetch recent articles on a topic, then hands the results to a local model running with Docker Model Runner to analyze the stories and generate a Markdown report.

In this setup, Qwen3.5-4B handles the reasoning and skill invocation, while the skill itself does the retrieval work. The result is a simple local workflow for turning a prompt like “use news roundup skill with tiny language models” into a structured news brief I can save, review, and reuse.

It is a bit slower than doing the same thing with Claude Code, but that tradeoff works for me: I keep the workflow local, I save my Claude credits, and I get a practical example of how skills make Docker Agent more useful for repeatable tasks.

Prerequisites for building the news roundup:

Docker and Docker Compose, obviously.

A Brave Search account with an API key (you can get one here). (There’s a free plan.)

A local model that supports a large context window and knows how to do function calling.

I chose to use qwen3.5-4b from Qwen (I went with the Unsloth version), a 4-billion-parameter model optimized for text understanding and generation, with native support for up to 262144 context tokens.

I started my tests with qwen3.5-9b, but on my MacBook Air, it’s a bit slow and qwen3.5-4b does the job just fine.

Let’s get into the setup.

Step-by-step guide to building the news roundup

Step 1: Creating the Dockerfile

I used an ubuntu:22.04 base image and installed curl to make requests to the Brave Search API. I also copied the docker-agent binary from the docker/docker-agent:1.32.5 image to run the agents.

FROM –platform=$BUILDPLATFORM docker/docker-agent:1.32.5 AS coding-agent

FROM –platform=$BUILDPLATFORM ubuntu:22.04 AS base

LABEL maintainer="@k33g_org"
ARG TARGETOS
ARG TARGETARCH

ARG USER_NAME=docker-agent-user

ARG DEBIAN_FRONTEND=noninteractive

ENV LANG=en_US.UTF-8
ENV LANGUAGE=en_US.UTF-8
ENV LC_COLLATE=C
ENV LC_CTYPE=en_US.UTF-8

# ————————————
# Install Tools
#————————————
RUN <<EOF
apt-get update
apt-get install -y wget curl
apt-get clean autoclean
apt-get autoremove –yes
rm -rf /var/lib/{apt,dpkg,cache,log}/
EOF

# ————————————
# Install docker-agent
# ————————————
COPY –from=coding-agent /docker-agent /usr/local/bin/docker-agent

# ————————————
# Create a new user
# ————————————
RUN adduser ${USER_NAME}
# Set the working directory
WORKDIR /home/${USER_NAME}
# Set the user as the owner of the working directory
RUN chown -R ${USER_NAME}:${USER_NAME} /home/${USER_NAME}
# Switch to the regular user
USER ${USER_NAME}

Let’s move on to the agent configuration.

Step 2: Creating the Docker Agent configuration file

For the Docker Agent configuration, I defined a root agent using the brain model, which is an alias for qwen3.5-4b. I also enabled skills support (skills: true) and provided detailed instructions so the agent behaves like an expert IT journalist, capable of searching, analyzing, and summarizing the latest tech news.

For the toolsets, Docker Agent ships with some ready-to-use ones, but I preferred a script-type toolset with an execute_command that can run any shell command and capture its output. This gives me the flexibility to interact with the Brave Search API directly from shell commands, without having to implement specific tools for it — and most importantly, it keeps the agent’s instructions lightweight. 

agents:
root:

model: brain
description: News Roundup Expert
skills: true
instruction: |
You are an expert IT journalist with deep knowledge of software engineering, cloud infrastructure, artificial intelligence, cybersecurity, and the open-source ecosystem.
Your role is to gather, analyze, and summarize the latest technology news in a clear, accurate, and engaging way.
You write for a technical audience and always provide context, highlight trends, and explain the impact of each piece of news.

toolsets:
– type: script
shell:

execute_command:
description: Execute a shell command and return its stdout and stderr output.
args:
command:
description: The shell command to execute.
cmd: |
bash -c "$command" 2>&1

models:

brain:
provider: dmr
model: huggingface.co/unsloth/qwen3.5-4b-gguf:Q4_K_M
temperature: 0.0
top_p: 0.95
presence_penalty: 1.5
provider_opts:
# llama.cpp flags
runtime_flags: ["–context_size=65536"]

Now let’s look at the skill.

Step 3: Building the news roundup skill

I created a news-roundup skill that uses the Brave Search API to search for the latest news on a given topic, enriches each article with additional web searches, and generates a structured Markdown report.

Inside the .agents/skills folder, I created a news-roundup directory with a SKILL.md file that describes the skill in detail, with the steps to follow and the commands to execute at each step.

├── .agents
│ └── skills
│ └── news-roundup
│ └── SKILL.md

Here’s the content of SKILL.md:


name: news-roundup
description: search the news using Brave News Search API with a query as argument. Use this skill when the user asks to search for recent news or current events.

# News Roundup

## Purpose

Generate a comprehensive Markdown news report on a given topic (default: "small ai local models").

## Steps to follow

### Step 1 — Search for recent news

#### Command to execute

“`bash
curl -s "https://api.search.brave.com/res/v1/news/search?q=$(echo "$ARGUMENTS_REST" | sed 's/ /+/g')&count=3&freshness=pw"
-H "X-Subscription-Token: ${BRAVE}"
-H "Accept: application/json"
“`

### Step 2 — Enrich each article

For each article returned in Step 1, use the below command with the article URL to retrieve additional context and details.

#### Command to execute

“`bash
curl -s "https://api.search.brave.com/res/v1/web/search?q=$(echo "$ARTICLE_URL" | sed 's/ /+/g')&count=10"
-H "X-Subscription-Token: ${BRAVE}"
-H "Accept: application/json"
“`

### Step 3 — Generate the Markdown report

Using all the collected information, write a well-structured Markdown report saved to `/workspace/news-report.md`.

The report must follow this structure:

“`markdown
# IT News Report — {topic}

> Generated on {date}

## Summary

A short paragraph summarizing the main trends found across all articles.

## Articles

### {Article Title}

– **Source**: {source name}
– **URL**: {url}
– **Published**: {date}

{2-3 sentence summary of the article content and its significance for IT professionals}

(repeat for each article)

## Key Trends

A bullet list of the main technology trends identified across all articles.
“`

Save the final report to `/workspace/data/news-report-{YYYYMMDD-HHMMSS}.md` using the `write_file` tool, where `{YYYYMMDD-HHMMSS}` is the current date and time (e.g. `news-report-20260318-143012.md`).
To get the current timestamp, run:

“`bash
date +"%Y%m%d-%H%M%S"
“`

All that’s left is to create the compose.yml file to launch the agent.

Step 4: Updating the compose.yml file

Here’s the content of compose.yml. 

Note: you’ll need a .env file with your Brave Search API key (e.g. BRAVE=abcdef1234567890).

services:
news-roundup:
build:
context: .
dockerfile: Dockerfile
stdin_open: true
tty: true
command: docker-agent run /workspace/config.yaml
volumes:
– ./config.yaml:/workspace/config.yaml:ro
– ./.agents:/workspace/.agents:ro
– ./data:/workspace/data
working_dir: /workspace

env_file:
– .env

models:
qwen3.5-4b:

models:

qwen3.5-4b:
model: huggingface.co/unsloth/qwen3.5-4b-gguf:Q4_K_M
context_size: 65536

And that’s it — everything we need to run our IT news roundup agent.

Step 5: Let’s test it out!

Just run the following command in your terminal:

docker compose run –rm –build news-roundup

And ask the agent:

use news roundup skill with tiny language models

The agent will then execute the news-roundup skill, query the Brave Search API, analyze the articles, and generate a Markdown report in the data folder. 

Note: this can take a little while, so feel free to grab a coffee (or get some work done).

The agent will detect that it needs to run tools (the curl commands from the news-roundup skill) — you can validate each command manually or let the agent run them automatically: 

Your agent will work for a few minutes…

…and at the end, it will give you the path of the generated report, which you can open to read your personalized IT news roundup:

You can find examples of generated reports in the data folder of the project on this repository: https://codeberg.org/docker-agents/news-roundup/src/branch/main/data.

Final Thoughts 

That’s the full setup: a Docker Agent skill for news retrieval, the Brave Search API for fresh articles, and Docker Model Runner with Qwen3.5-4B for local analysis and report generation.

You now have a fully local IT news roundup agent. I have written a lot of content on use cases for local models, including context packaging and making small LLMs smarter. See you soon for more Docker Agent use cases with local language models!

Quelle: https://blog.docker.com/feed/

Trivy supply chain compromise: What Docker Hub users should know

We wanted to provide you information about a security incident that we became aware of that affects customers who use the Aqua Security Vulnerability scanner (Trivy) across multiple distribution channels including Docker Hub, GitHub, and npm. Between 18:24 UTC on March 19, 2026 and 01:36 UTC on March 23, 2026, Docker Hub customers who pulled the Trivy images with the 0.69.4, 0.69.5, 0.69.6, and latest tags may have had their CI/CD secrets, cloud credentials, SSH keys, and Docker configurations compromised. Around 08:00 UTC on March 23, 2026, Docker worked with Aqua Security to remove these compromised scanner image versions.

If you downloaded any of these compromised images, you should ensure you are no longer using these images and rotate your affected credentials immediately. This issue was isolated to Aqua Security’s images.

The Docker Hardened Images (DHI) version of the Trivy image, Docker’s infrastructure, and other Docker Hub images were not compromised.

What happened 

Starting on March 19, 2026 at 18:24 UTC, threat actors compromised Aqua Security’s CI/CD pipeline to push malware into the aquasec/trivy vulnerability scanner images with the 0.69.4 and latest tags on Docker Hub. The infostealer embedded into those images has the potential to exfiltrate CI/CD secrets, cloud credentials, SSH keys, and Docker configurations. Attackers used compromised credentials to push images to Aqua Security’s own repository on Docker Hub through their build system. Since these pushes used Aqua Security’s credentials, they were authenticated by Docker Hub and were indistinguishable from normal Aqua Security activity.

On March 20, 2026 at 03:26 UTC, the attackers re-pointed the latest tag to compromised content after Aqua Security’s initial cleanup.

Another wave of compromised images were uploaded to Docker Hub with 0.69.5, 0.69.6 and latest tags on March 22. Starting around 08:00 UTC on March 23, Docker became aware of the compromised images and immediately began to investigate.

The images were deleted by the customer in the following order, preventing any further downloads from Docker Hub:

sha256:27f446230c60bbf0b70e008db798bd4f33b7826f9f76f756606f5417100beef3 (tagged with 0.69.4)  on Mar 19, 2026 at 22:20 UTC

sha256:5aaa1d7cfa9ca4649d6ffad165435c519dc836fa6e21b729a2174ad10b057d2b (tagged with 0.69.5) on Mar 23rd, 2026 at 01:26 UTC

sha256:425cd3e1a2846ac73944e891250377d2b03653e6f028833e30fc00c1abbc6d33 (tagged with 0.69.6) on Mar 23rd, 2026 at 1:26 UTC

On Mar 23rd at 15:43 UTC the content of the compromised images was quarantined by Docker, to make them available for internal investigation. Docker reached out to Aqua Security to address the images, confirm credential revocation, and share our findings. Aqua Security was already aware and worked to remove these tags and images from Docker Hub.

The last known clean release is 0.69.3. 

Am I affected?

You may be affected if your systems pulled aquasec/trivy between March 19, 18:24 UTC and March 23, 01:36 UTC if you specifically pulled the tags 0.69.4, 0.69.5, or 0.69.6 or you pulled the latest tag during the affected period.

How to check your environment

Look for the compromised digests in your local image store, registry mirrors, or Artifactory/Nexus caches:

sha256:27f446230c60bbf0b70e008db798bd4f33b7826f9f76f756606f5417100beef3

sha256:5aaa1d7cfa9ca4649d6ffad165435c519dc836fa6e21b729a2174ad10b057d2b

sha256:425cd3e1a2846ac73944e891250377d2b03653e6f028833e30fc00c1abbc6d33

If any of these digests are present, you should remove that image and ensure they are running version 0.69.3, which is the last known good version.

If any of these digests are present, assume credentials on those systems are compromised. As per our current understanding the malware targets:

Docker registry tokens (~/.docker/config.json)

Cloud provider credentials (AWS, GCP, Azure)

SSH keys, Kubernetes tokens, CI/CD secrets

Environment variables and .env files

Rotate all credentials accessible from any system that ran the compromised image. Note that a few common Trivy container setup patterns mount the Docker socket (-v /var/run/docker.sock:/var/run/docker.sock), which grants the container full access to the host’s Docker daemon and effectively root-level access to the node. If you ran a compromised Trivy image with the Docker socket mounted, treat the entire host as compromised.

Pin to the clean release aquasec/trivy:0.69.3 or wait for a verified new release from Aqua Security.

If you believe you are affected and need assistance, contact Docker Support.

Other ways you may be affected

Trivy was compromised across multiple distribution channels, including Docker Hub images and GitHub. If you consumed through other channels (such as GitHub Actions – see advisory GHSA-cxm3-wv7p-598c), you should determine if you may have been affected.

Lessons for the ecosystem

This incident highlights learnings for how you may be able to improve your consumption of container images and CI/CD actions:Mutable tags are not a security boundary. OCI image tags including latest are mutable pointers that can be overwritten by anyone with push access. This attack succeeded because the latest tag was silently re-pointed to malicious content multiple times. Organizations should pin images by digest (image@sha256:…) in production and CI/CD pipelines rather than relying on tags alone. Digest pinning is not sufficient on its own. A pinned digest ensures you pull the same bytes every time; it does not tell you whether those bytes were built by a trusted party from a known source. Where signed provenance attestations are available, organizations should verify them, not just match digests. Supply chain integrity requires more than scanning. Trivy is a vulnerability scanner, a tool organizations deploy to improve supply chain security. The compromise had nothing to do with a vulnerability in the CVE sense. No software bug was exploited in the Trivy codebase. The attacker stole publishing credentials and used them to push malicious content through a trusted distribution channel. A scanner examining the compromised image would have had no reason to flag it, because the attack was in the publishing process, not the dependency graph. Scanning tells you about known vulnerabilities in image content. It does not tell you whether the image was built by the party you trust, from the source you expect. Secret rotation must be atomic. When responding to a compromise, revoke all credentials simultaneously before issuing replacements. Partial rotation can leave windows for re-exploitation.

What Docker is doing beyond incident response

The properties that made this attack possible, such as mutable references, unverifiable provenance, trust rooted in credentials rather than build systems, are all problems we have been working on. Here is where that work applies to this incident:

Docker Hardened Images (DHI): The Trivy compromise was possible because a single stolen push token gave the attacker the ability to overwrite trusted image tags on a public registry. For images in the Docker Hardened Images catalog, this class of attack does not apply. Docker rebuilds those images from source in a hermetic build environment rather than pulling and republishing upstream binaries. Each image carries signed provenance attestations that allow consumers to verify who built it, when, and from what source. When the compromise was identified, our team locked DHI builds of Trivy to prevent any auto-update to the compromised upstream release. The exposure was limited to users pulling aquasec/trivy directly from Docker Hub. 

Docker Scout: Scout can surface whether any of the three compromised digests listed above are present across your repositories, including in registry mirrors or artifact caches that may have retained copies pulled during the exposure window.

Other resources

Aqua has also published their own incident report at aquasec.com/blog.

Quelle: https://blog.docker.com/feed/

From the Captain’s Chair: Naga Santhosh Reddy Vootukuri

Docker Captains are leaders from the developer community that are both experts in their field and are passionate about sharing their Docker knowledge with others. “From the Captain’s Chair” is a blog series where we get a closer look at one Captain to learn more about them and their experiences. 

Today we are interviewing Naga Santhosh Reddy Vootukuri, known by his nickname Sunny. Sunny is a Principal Software Engineering Manager at Microsoft Azure SQL organization with 17+ years of experience in building cloud distributed scalable systems. He’s also a Dapr Meteor and an open-source contributor to Dapr and Microcks, both highly recognized CNCF projects.

Sunny is also an IEEE Senior member and conducts various IEEE conferences in Seattle, presents workshops and is a regular conference speaker sharing his expertise on Cloud computing, Microservices, Docker and AI related topics. He regularly blogs at DZone as an MVB core member about various topics ranging from Docker, Github Actions, Cloud Native Microservices, Dapr etc. and also published three books on topics like Azure Container Apps, Aspire and Github Copilot.

Can you share how you first got involved with Docker?

My Docker journey began back in 2016 during my time in Shanghai, China. I just moved from Microsoft India to Microsoft Shanghai to join the SQL Server Integration services team in 2015, which is a core ETL product. Being an expat, I was searching for some local community events to go and try out networking. During one of the local meet ups, an engineer from Alibaba or Tencent (I don’t remember exactly) presented a talk on Docker and I remember he mentioned that as a developer you can forget using this sentence as an excuse with your Test teams: “It works on my machine”. I got super fascinated by his talk and demos which made me want to read and go hands-on with Docker and Docker Desktop (also the timing was perfect that Docker Desktop for Windows support had recently launched). Since then, Docker has become like a part of my DNA.

What inspired you to become a Docker Captain?

I think my love towards sharing knowledge and having a stronger community is what got me started with writing blogs and speaking at conferences. During a conference where I was presenting on Docker, I met a few friends who were Docker Captains, and they informed me about the Docker Captains program and the perks they got as Docker Captains (from talking to product teams, trying out new features first-hand to traveling to summits). I immediately applied once I came back home. It took more than a month to receive an email for a Captain’s interview and I hope I impressed Eva Bojorges (Docker community lead) about my passion and my contributions towards the Docker community. Super happy to complete one year as Docker Captain (soonish) and looking forward to many more years.

I was super elated when Docker invited me to their Captains Summit in Istanbul (2025) as I was in their top 20 list of active contributions month over month. This trip was a memorable one as I met Docker product team and also talented Docker Captains across the world. Also, I can’t forget when I experienced my first hot air balloon ride (my friend from Germany took that pic, when I was busy with my Go Pro).

What are some of your personal goals for the year 2026?

There are few interesting goals I set aside to challenge myself:

Writing a couple more technical books. I have finished three books in the last two years and currently two are in the proposal stage and the expected titles are “Docker Loves AI” and “Building Enterprise Copilots Using Copilot Studio” (anyone reading, please don’t steal these titles lol). I don’t know which one I will start soon but both are my personal projects for the year 2026. 

I am currently working on submitting proposals to speak at a couple of really big conferences mainly about Docker and open source projects that I am involved in. I am also the technical committee chair for a couple of IEEE conferences. Hopefully I end 2026 on a big note.

Cross country road trip to the best beaches in the west coast.  

If you weren’t working in tech, what would you be doing instead?

I would have been a cricketer, maybe? My love of Cricket started when I was six years old which was an escape from home and it lasted till now. I still play in domestic leagues in Seattle. Even when I was working in China, I used to play for local clubs in Shanghai with people from different countries. I don’t know if I would have excelled in cricket in a parallel universe (I guess we would never know) but the love towards it is unconditional.

Below pic was right after a game we lost in semifinals of a local domestic league but we were still high on spirits for trying till the last minute (easy guess that blue is my fav color :P)

Can you share a memorable story from collaborating with the Docker community?

Docker community is one of the most active and vibrant communities, where we always encourage and cheer each other’s successes. I still remember the day when I was warmly welcomed into the Slack group as a new captain to get immediate help on a Friday evening when I was having some issues working with Docker Model Runner. My best memory was sitting in the hotel lounge with other Docker Captains at midnight in Turkiye after a boat party and talking about multiple topics from Docker to startups for 3-4 hours.

What’s your favorite Docker product or feature right now, and why?

My favorite one is Docker Agent framework. During the release of Docker Agent, I was playing with the first hands bit when shared in our Captain’s group. I immediately saw there is a potential to integrate with GitHub Models to avoid vendor lock-in when building AI agents. I spoke to the product team, helped them with what exactly GitHub Models about and how this could be integrated into the product as it also supports Open AI standards. It was a useful chat with Docker team lead (Djordje Lukic) and in a couple of hours we had a new release with the integration with GitHub Models.

I also wrote a blog post (https://www.docker.com/blog/configure-cagent-github-models/) on this integration and why everyone should give it a try without worrying about spending money on getting your API developer keys.

Can you walk us through a tricky technical challenge you solved recently?

When I was giving AI related workshops in some colleges from South India, they mentioned some of the popular Microsoft open source repositories not having support for local language translation. There are many colleges that still study in their mother tongue and that hit me hard, so I spent 3-4 weekends and worked on implementing it and currently we have all South Indian languages (Telugu, Tamil, Kannada, Malyalam) support on all the Microsoft open source repositories (100K+ GitHub stars). Check out:

https://github.com/microsoft/ML-For-Beginners

https://github.com/microsoft/AI-For-Beginners

What’s one Docker tip you wish every developer knew?

With the current AI world we are living in, it’s super easy to generate Dockerfiles but VS Code extension (Docker DX- https://marketplace.visualstudio.com/items?itemName=docker.docker) makes it easy to live-debug to figure out any issues. This is a must have tool in your arsenal.

If you could containerize any non-technical object in real life, what would it be and why?

If I had powers I would containerize work sessions. Imagine a perfect containerized isolated work environment that would isolate you from distractions, whether you are at the office, home or on a cruise.

Where can people find you online?

I am always active on LinkedIn and sharing my knowledge on my blog.

Rapid Fire Questions

Cats or Dogs?

Dogs

Morning person or night owl?

Morning Person (4 am)

Favorite comfort food?

Hyderabadi Spicy Dum Biryani

One word friends would use to describe you?

Energetic

A hobby you picked up recently?

Learning Spanish on Duolingo

Quelle: https://blog.docker.com/feed/

Secure Agent Execution with NanoClaw and Docker Sandboxes

Agents have enormous potential to power secure, personal AI assistants that automate complex tasks and workflows. Realizing that potential, however, requires strong isolation, a codebase that teams can easily inspect and understand, and clear control boundaries they can trust. 

Today, NanoClaw, a lightweight agent framework, is integrating with Docker Sandboxes to deliver secure-by-design agent execution. With this integration, every NanoClaw agent runs inside a disposable, MicroVM-based Docker Sandbox that enforces strong operating system level isolation. Combined with NanoClaw’s minimal attack surface and fully auditable open-source codebase, the stack is purpose-built to meet enterprise security standards from day one. 

From Powerful Agents to Trusted Agents

The timing reflects a broader shift in the agent landscape. Agents are no longer confined to answering prompts. They are becoming operational systems.

Modern agents connect to live data sources, execute code, trigger workflows, and operate directly within collaboration platforms such as Slack, Discord, WhatsApp, and Telegram. They are evolving from conversational interfaces into active participants in real work.

That shift from prototype to production introduces two critical requirements: transparency and isolation.

First, transparency. 

Organizations need agents built on code they can inspect and understand, with clear visibility into dependencies, source files, and core behavior. NanoClaw delivers exactly that. Its agent behavior is powered by just 15 core source files, with lines of code up to 100 times smaller than many alternatives. That simplicity makes it dramatically easier to evaluate risk, understand system behavior, and build with confidence.

Second, isolation. 

Agents must run within restricted environments, with tightly controlled filesystems and limited host access. Through the Docker Sandbox integration, each NanoClaw agent runs inside a dedicated MicroVM that mirrors your development environment, with only your project workspace mounted in. Agents can install packages, modify configurations, and even run Docker itself, while your host machine remains untouched.

In traditional environments, enabling more permissive agent modes can introduce significant risk. Inside a Docker Sandbox, that risk is contained within an isolated MicroVM that can be discarded instantly. This makes advanced modes such as –dangerously-skip-permissions practical in production because their impact is fully confined. 

The result is greater autonomy without greater exposure.

Agents no longer require constant approval prompts to move forward. They can install tools, adapt their environment, and iterate independently. Because their actions are contained within secure, disposable boundaries, they can safely explore broader solution spaces while preserving enterprise-grade safeguards.

Powerful agents are easy to prototype. Trusted agents are built with isolation by design.

Together, NanoClaw and Docker make secure-by-default the standard for agent deployment.

“Infrastructure needs to catch up to the intelligence of agents. Powerful agents require isolation,” said Mark Cavage, President and Chief Operating Officer at Docker, Inc. “Running NanoClaw inside Docker Sandboxes gives the agent a secure, disposable boundary, so it can run freely, safely.”

“Teams trust agents to take on increasingly complex and valuable work, but securing agents cannot be based on trust,” said Gavriel Cohen, CEO and co-founder of NanoCo and creator of NanoClaw. “It needs to be based on a provably secure hard boundary, scoped access to data and tools, and control over the actions agents are allowed to take. The security model should not limit what agents can accomplish. It should make it safe to let them loose. NanoClaw was built on that principle, and Docker Sandboxes provides the enterprise-grade infrastructure to enforce it.”

Get Started

Ready to try it out? Deploy NanoClaw in Docker Sandboxes today:

GitHub: github.com/qwibitai/nanoclaw

Docker Sandboxes: Learn more

Quelle: https://blog.docker.com/feed/

How to Run Claude Code with Docker: Local Models, MCP Servers, and Secure Sandboxes

Claude Code is quickly becoming a go-to AI coding assistant for developers and increasingly for non-developers who want to build with code. But to truly unlock its potential, it needs the right local infrastructure, tool access, and security boundaries.

In this blog, we’ll show you how to run Claude Code with Docker to gain full control over your models, securely connect it to real-world tools using MCP servers, and safely give it autonomy inside isolated sandboxes. Read on for practical resources to help you build a secure, private, and cost-efficient AI-powered development workflow.

Run Claude Code Locally with Docker Model Runner

This post walks through how to configure Claude Code to use Docker Model Runner, giving you full control over your data, infrastructure, and spend. Claude Code supports custom API endpoints through the ANTHROPIC_BASE_URL environment variable. Since Docker Model Runner exposes an Anthropic-compatible API, integrating the two is simple. This allows you to run models locally while maintaining the Claude Code experience.

With your model running under your control, it’s time to connect Claude Code to tools to expand its capabilities. 

How to Add MCP Servers to Claude Code with Docker MCP Toolkit

MCP is becoming the de facto standard to connect coding agents like Claude Code to your real tools, databases, repositories, browsers, and APIs. With more than 300 pre-built,containerized MCP servers, one-click deployment in Docker Desktop, and automatic credential handling, developers can connect Claude Code to trusted environments in minutes — not hours. No dependency issues, no manual configuration, just a consistent, secure workflow across Mac, Windows, and Linux.

In this guide, you’ll learn how to:

Set up Claude Code and connect it to Docker MCP Toolkit.

Configure the Atlassian MCP server for Jira integration.  

Configure the GitHub MCP server to access repository history and run git commands.

Configure the Filesystem MCP server to scan and read your local codebase.

Automate tech debt tracking by converting 15 TODO comments into tracked Jira tickets.

See how Claude Code can query git history, categorize issues, and create tickets — all without leaving your development environment.

Prefer a video walkthrough? Check out our tutorial on how to add MCP servers to Claude Code with Docker MCP Toolkit.

Connecting tools unlocks powerful automation but with greater capability comes greater responsibility. If you’re going to let agents take action, you need to run them safely.

Docker Sandboxes: Run Claude Code and Other Coding Agents Unsupervised (but Safely)

As Claude Code moves from suggestions to real-world actions like installing packages and modifying files, isolation becomes critical.

Sandboxes provide disposable, isolated environments purpose-built for coding agents. Each agent runs in an isolated version of your development environment, so when it installs packages, modifies configurations, deletes files, or runs Docker containers, your host machine remains untouched.

This isolation lets you run agents like Claude Code with autonomy. Since they can’t harm your computer, let them run free. Check out our announcement on more secure, easier to use, and more powerful Docker Sandboxes. 

Summary 

Claude Code is powerful on its own but when used with Docker, it becomes a secure, extensible, and fully controlled AI development environment.

In this post, you learned how to:

Run Claude Code locally using Docker Model Runner with an Anthropic-compatible API endpoint, giving you full control over your data, infrastructure, and cost.

Connect Claude Code to tools using the Docker MCP Toolkit, with 300+ containerized MCP servers for services like Jira, GitHub, and local filesystems — all deployable in one click.

Run Claude Code safely in Docker Sandboxes, isolated environments that allow coding agents to operate autonomously without risking your host machine.

By combining local model execution, secure tool connectivity, and isolated runtime environments, Docker enables you to run AI coding agents like Claude Code with both autonomy and control, making them practical for real-world development workflows.
Quelle: https://blog.docker.com/feed/

Achieving Test Reliability for Native E2E Testing: Beyond Fixing Broken Tests

End-to-end (E2E) tests are particularly important for native applications that run on various platforms (Android/iOS), screen sizes, and OS versions. E2E testing picks up differences in behavior across this fragmented ecosystem.

But keeping E2E tests reliable is often more challenging than writing them in the first place. 

The fragmented device ecosystem, gaps in test frameworks, network inconsistencies, unstable test environments, and constantly changing UI all contribute to test flakiness. Teams easily get trapped in a cycle of constantly fixing failing tests due to UI changes or environment instability rather than improving the overall reliability of their test infrastructure. They end up frustrated and hesitant to adopt E2E tests in their workflows.

Having led the native E2E testing infrastructure setup at a mid-sized company, I learned the hard way how critical it is to define and implement strategies for test ownership, observability, and notifications in ensuring long-term test stability. In this piece, I discuss the challenges I’ve seen teams face and share lessons on how to build reliable E2E systems that you actually trust.

Challenges with Reactive Test Maintenance

After setting up periodic E2E runs on the CI, our team initially focused on triaging, investigating, and fixing every failing test to improve test stability. However, even after nearly a year of patching flaky tests, the reliability of our E2E suite didn’t improve, and engineers slowly lost confidence in the usefulness and reliability of the test suite.

I learned that teams that focus primarily on fixing broken tests often end up in a cycle of chasing failures without fixing the root causes of instability. This reactive approach creates several problems:

Test suite fragility: If teams continue patching broken tests without addressing real issues with either the underlying app changes or unstable environments, the test suite becomes increasingly brittle. Over time, tests fail for reasons unrelated to real product defects, making it harder to distinguish genuine regressions from noise.

High maintenance overhead: Debugging and fixing flaky tests often requires a significant amount of developer time and resources. Unlike unit tests, which run quickly and fail in isolation, E2E tests execute against the development, staging, or pre-production environment, making failures harder to reproduce and diagnose. Adjusting E2E tests to work across devices with different screen sizes or OS versions requires additional work, making fixes a non-trivial task.

Reduces trust in the test suites: When failures are common and noisy, teams lose confidence in the E2E suite, and they often start ignoring test failures. This undermines the purpose of having automated tests in the first place. Instead, teams rely on local dev testing or manual QA cycles to validate changes. Over time, the suite becomes more of a liability than a safeguard, slowing down delivery instead of enabling it.

A reactive approach to fixing E2E tests slows down release cycles. Developers must spend significant amounts of time repeatedly fixing and rerunning failing tests, while teams rely on manual QA to catch actual regressions.

Building a Reliable E2E Infrastructure

When our test suite stability didn’t improve after more than a year of chasing failures, we took a step back to analyze historical results and look for patterns. 

We discovered that a significant number of failures could be attributed to an unstable environment or an unexpected state of the test account. For example, spikes in API latencies in the test environment frequently caused false negatives, adding to the noise. Similarly, tests run against existing user accounts could become inconsistent due to a past failure or if multiple tests attempted to use the same account.

I learned that investing in improving your test infrastructure is the only way to get to a stable and reliable native E2E testing workflow. This involves stabilizing the test environment, defining clear test ownership, reducing noisy alerts, and improving observability. Let’s look at each of these in more detail.

Stabilize the Test Environment

Many flaky E2E tests can be traced back to inconsistencies in the underlying environment, such as sporadic device issues, network instability, or API downtime in a staging environment. 

To avoid noisy and unreliable tests, ensure you have a stable and standardized test environment with the following test practices:

Standardize device and environment setup: Device and test environment stability issues heavily impact test stability. To reduce API downtimes, isolate the E2E testing environment from the developer or staging environment to prevent interference from unstable builds and experimental features. Teams could either build a stable pre-prod environment that uses a production-ready artifact or spin up ephemeral environments for each E2E run to ensure consistency. Running tests on standardized device images or containerized emulators with consistent OS versions, configurations, and resources further improves stability. For critical flows, you can schedule periodic runs on physical device farms to validate against real hardware while keeping day-to-day tests stable and cost-effective.

Isolating test data per session: A test that makes modifications to any data should start from a clean slate. For instance, while testing a todo application, every test session should use a new test account to avoid unexpected scenarios because of unpredictable account state. To speed up tests, execute setup scripts in `before` hooks to handle account creation, and seed any required data automatically.

Mocking certain network responses: While an E2E test is meant to test the entire user journey with real data, in some cases, it’s necessary to mock specific API responses to maintain a predictable test environment. For instance, if your application relies on A/B tests or uses feature flags, different sessions might receive different experiences based on the user allocation. This can cause unexpected failures unrelated to actual regressions. Mocking these responses in test builds ensures consistency across sessions, and it avoids building complex test cases that handle different user experiences.

Establish Clear Test Ownership

When a test fails, it’s often unclear who’s responsible for investigating and fixing it. Over time, such an absence of clear test ownership and accountability results in unreliable, unmaintained, and flaky tests. 

Assigning ownership of tests based on the ownership of product features can alleviate this problem to some extent. Ideally, the owning team should be responsible for writing, maintaining, and fixing tests for their critical flows. This ownership model ensures that failures are triaged quickly and that tests are updated as the product evolves instead of becoming stale and unstable. 

Test ownership becomes challenging in codebases where multiple product teams own parts of a single user flow. For example, in a shopping application, different teams might own the login, product catalog, and checkout experiences. If a checkout flow test fails at the login step, it can be confusing which team should triage the issue. Without a clear policy, the failure might be ignored, or multiple teams might end up duplicating the effort. 

To handle these scenarios, set a policy that defines the first point of contact (POC) per test based on the end-user experience. This ensures a single team takes responsibility for triaging the issue, but that fixes can be handed off to upstream dependencies as needed.

Reduce Noise and Improve Alerting

A common challenge with native E2E testing is noisy alerts due to flaky or failing tests. Teams are often flooded with non-actionable alerts when flaky tests fail because of transient network or device issues. Repeated failure notifications about known bugs can also lead to alert fatigue.

The following techniques reduce this noise so that teams are only notified for actionable failures:

Mute flaky tests and known bugs: Instead of reporting and notifying teams about all test failures, allow alerts from tests that are identified as flaky or linked to known issues to be muted without a code change. You can manage muted tests through a remote configuration, environment variables, or a tool like BrowserStack. Flag them for follow-up work, but let alerts only go out for new or unexpected regressions. Muting is particularly important for E2E tests since fixing failing tests often requires significant developer time and resources. Repeated alerts can be especially distracting for developers.

Enrich notifications with failure details: Instead of generic failure messages, include details such as the failing user flow, commit details, the error message, and links to logs or dashboards in your alerts. These details help developers identify and triage issues quicker, resulting in faster fixes and higher confidence in the suite.

Track test metrics and trends: In addition to test suite level reports, track and analyze the historical results of your tests to understand failure rates, flakiness trends, and failure hotspots. For example, if you observe repeated failures in the login flow, it might indicate unstable tests or sporadic bugs in that flow. Tracking these metrics over time provides visibility into whether the E2E suite is improving or degrading, and it helps you prioritize stabilization efforts based on impact.

Hybrid Strategies for Scaling E2E with Dockerized Emulators

Running native E2E tests at scale is challenging due to cost and resource constraints. Device farms that provide access to real cloud-based devices are expensive for running a large suite of tests at high frequency. This becomes a constraint for integrating E2E tests with the CI pipeline that executes with every pull request before the changes are merged. 

As mentioned earlier, a hybrid testing approach that uses Dockerized emulators for PR builds alongside real devices for periodic runs can help you overcome this challenge. When our team moved PR checks to Dockerized emulators, we got faster feedback and significantly reduced cloud device costs.

Containerized device runners can be spun up quickly in CI. For example, the docker-android image lets you run an Android emulator in a containerized Docker environment. It supports multiple device profiles, OS versions, and UI-testing frameworks such as Appium and Espresso. Teams can easily integrate these emulators into CI pipelines to run E2E tests at scale without investing in a huge testing budget

If you are building E2E tests for mobile web, you can also use containerized browser images to run tests consistently across different environments to further reduce cost and setup complexity.

There’s Hope!

If your team has been chasing native E2E test failures like we were, you’re probably also burning engineering time and resources without improving test stability. I hope that this article has encouraged you that there’s a better way: improving your test environment, device setup, alerting, and observability. 

Your best first step is to analyze your historical test failures and categorize them into buckets. Use these insights to define actionable items for reducing flakiness. Use this roadmap to identify test infrastructure investments or process changes that will deliver the most impact. 

After our team invested in test infrastructure improvements, we saw a clear improvement in stability. Developers had a better understanding of real failures, and the number of noisy alerts was reduced. Flakiness didn’t disappear entirely, but the improved reliability of the test suite helped us catch multiple native app regressions before the changes were released to production.

I hope this article will help you achieve similar wins.

Quelle: https://blog.docker.com/feed/

Flexibility Over Lock-In: The Enterprise Shift in Agent Strategy

Building agents is now a strategic priority for 95% of respondents in our latest State of Agentic AI research, which surveyed more than 800 developers and decision-makers worldwide. The shift is happening quickly: agent adoption has moved beyond experiments and demos into early operational maturity. But the road to enterprise-scale adoption is still complex. The foundations are forming, yet far from fully integrated, production-grade platforms that teams can confidently build on.

Security continues to surface as a top blocker to agent adoption. But it’s not the only one. Technical complexity is rising fast as well. Vendor lock-in is a big concern for the vast majority of the respondents surveyed. 

So how do teams cut through the complexity and prepare for a world of multi-model, multi-tool, and multi-framework agents, while avoiding vendor lock-in in their agent workflows? In this blog, we break down the key findings from our research: what teams are actually using to power their agentic workloads, and what it takes to build a more scalable, future-ready agent architecture.

Multi-model and multi-cloud are the new normal. And complexity is rising

Our recent Agent AI study found that enterprises are embracing multi-model and multi-cloud architectures to gain greater control over performance, customization, privacy, and compliance. Multi-model is now the norm. Nearly two-thirds of organizations (61%) combine cloud-hosted and local models. And complexity doesn’t stop there: 46% report using between four and six models within their agents, while just 2% rely on a single model.

Deployment environments are just as diverse. 79% of respondents operate agents across two or more environments; 51% in public clouds, 40% on-premises, and 32% on serverless platforms.

This architectural flexibility delivers control, but it also multiplies orchestration and governance efforts. Coordinating models, tools, frameworks, and environments is consistently cited as one of the hardest parts of building agents. Nearly half of respondents (48%) identify operational complexity in managing multiple components as their biggest challenge, while 43% point to increased security exposure driven by orchestration sprawl.

The strategic shift away from vendor lock-in

As organizations double down on agent investments, concerns about supply chain fragility are rising. Seventy-six percent of global respondents report active worries about vendor lock-in.

 Seventy-six percent of global respondents report active concerns about vendor lock-in

Rather than consolidating, teams are responding by diversifying. They’re distributing workloads across multiple models, tools, and cloud environments to reduce dependency and maintain leverage. Among the 61% of organizations using both cloud-hosted and locally hosted models, the primary drivers are control (64%), data privacy (60%), and compliance (54%). Cost ranks significantly lower at 41%, underscoring that flexibility and governance, not cost savings are shaping architectural decisions.

Containers power the next wave of agent adoption

Containerization is already foundational to agent development. Nearly all organizations surveyed (94%) use containers in their agent development or production workflows and the remainder plan to adopt them.

Nearly all organizations surveyed (94%) use containers in their agent development or production workflows and the remainder plan to adopt them.

As agent initiatives scale, teams are extending the same cloud-native practices that power their application pipelines such as microservices architectures, CI/CD, and container orchestration to support agent workloads. Containers are not an add-on; they are the operational backbone. In fact, 94% of teams building agents rely on them.

At the same time, early signs of orchestration standardization are emerging. Among teams building agents with Docker, 40% are using Docker Compose as their orchestration layer, a signal that familiar, container-based tooling is becoming a practical coordination layer for increasingly complex agent systems.

The agentic future won’t be monolithic

The agentic future won’t be monolithic. It’s already multi-cloud, multi-model, and multi-environment. That reality makes open standards and portable infrastructure foundational for sustaining enterprise trust and long-term flexibility.

What’s needed next isn’t reinvention, but standardization around an open, interoperable and portable infrastructure: the flexibility to work across any model, tool, and agent framework, secure-by-default runtimes, consistent orchestration and integrated policy controls. Teams that invest now in this container-based trust layer will move beyond isolated productivity gains to sustainable enterprise-wide outcomes while reducing vendor lock-in risk.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise.  

Join us on March 25, 2026, for a webinar where we’ll walk through the key findings and the strategies that can help you prioritize what comes next.

Learn more:

Get your copy of the latest State of Agentic AI report! 

Learn more about Docker’s AI solutions

Read more about why AI agents challenge existing governance approaches and explore a new framework designed for agentic AI.

Quelle: https://blog.docker.com/feed/

Building AI Teams: How Docker Sandboxes and Docker Agent Transform Development

It’s 11 PM. You’ve got a JIRA ticket open, an IDE with three unsaved files, a browser tab on Stack Overflow, and another on documentation. You’re context-switching between designing UI, writing backend APIs, fixing bugs, and running tests. You’re wearing all the hats, product manager, designer, engineer, QA specialist, and it’s exhausting.

What if instead of doing it all yourself, you could describe the goal and have a team of specialized AI agents handle it for you?

One agent breaks down requirements, another designs the interface, a third builds the backend, a fourth tests it, and a fifth fixes any issues. Each agent focuses on what it does best, working together autonomously while you sip your coffee.That’s not sci-fi, it’s what Agent + Docker Sandboxes delivers today.

What is Docker Agent?

Docker Agent is an open source tool for building teams of specialized AI agents. Instead of prompting one general-purpose model to do everything, you define agents with specific roles that collaborate to solve complex problems.

Here’s a typical dev-team configuration:

agents:
root:
model: openai/gpt-5
description: Product Manager – Leads the development team and coordinates iterations
instruction: |
Break user requirements into small iterations. Coordinate designer → frontend → QA.
– Define feature and acceptance criteria
– Ensure iterations deliver complete, testable features
– Prioritize based on value and dependencies
sub_agents: [designer, awesome_engineer, qa, fixer_engineer]
toolsets:
– type: filesystem
– type: think
– type: todo
– type: memory
path: dev_memory.db

designer:
model: openai/gpt-5
description: UI/UX Designer – Creates user interface designs and wireframes
instruction: |
Create wireframes and mockups for features. Ensure responsive, accessible designs.
– Use consistent patterns and modern principles
– Specify colors, fonts, interactions, and mobile layout
toolsets:
– type: filesystem
– type: think
– type: memory
path: dev_memory.db

qa:
model: openai
description: QA Specialist – Analyzes errors, stack traces, and code to identify bugs
instruction: |
Analyze error logs, stack traces, and code to find bugs. Explain what's wrong and why it's happening.
– Review test results, error messages, and stack traces
…….

awesome_engineer:
model: openai
description: Awesome Engineer – Implements user interfaces based on designs
instruction: |
Implement responsive, accessible UI from designs. Build backend APIs and integrate.
……….

fixer_engineer:
model: openai
description: Test Integration Engineer – Fixes test failures and integration issues
instruction: |
Fix test failures and integration issues reported by QA.
– Review bug reports from QA

The root agent acts as product manager, coordinating the team. When a user requests a feature, root delegates to designer for wireframes, then awesome_engineer for implementation, qa for testing, and fixer_engineer for bug fixes. Each agent uses its own model, has its own context, and accesses tools like filesystem, shell, memory, and MCP servers.

Agent Configuration

Each agent is defined with five key attributes:

model: The AI model to use (e.g., openai/gpt-5, anthropic/claude-sonnet-4-5). Different agents can use different models optimized for their tasks.

description: A concise summary of the agent’s role. This helps Docker Agent understand when to delegate tasks to this agent.

instruction: Detailed guidance on what the agent should do. Includes workflows, constraints, and domain-specific knowledge.

sub_agents: A list of agents this agent can delegate work to. This creates the team hierarchy.

toolsets: The tools available to the agent. Built-in options include filesystem (read/write files), shell (run commands), think (reasoning), todo (task tracking), memory (persistent storage), and mcp (external tool connections).

This configuration system gives you fine-grained control over each agent’s capabilities and how they coordinate with each other.

Why Agent Teams Matter

One agent handling complex work means constant context-switching. Split the work across focused agents instead, each handles what it’s best at. Docker Agent manages the coordination.

The benefits are clear:

Specialization: Each agent is optimized for its role (design vs. coding vs. debugging)

Parallel execution: Multiple agents can work on different aspects simultaneously

Better outcomes: Focused agents produce higher quality work in their domain

Maintainability: Clear separation of concerns makes teams easier to debug and iterate

The Problem: Running AI Agents Safely

Agent teams are powerful, but they come with a serious security concern. These agents need to:

Read and write files on your system

Execute shell commands (npm install, git commit, etc.)

Access external APIs and tools

Run potentially untrusted code

Giving AI agents full access to your development machine is risky. A misconfigured agent could delete files, leak secrets, or run malicious commands. You need isolation, agents should be powerful but contained.

Traditional virtual machines are too heavy. Chroot jails are fragile. You need something that provides:

Strong isolation from your host machine

Workspace access so agents can read your project files

Familiar experience with the same paths and tools

Easy setup without complex networking or configuration

Docker Sandboxes: The Secure Foundation

Docker Sandboxes solves this by providing isolated environments for running AI agents. As of Docker Desktop 4.60+, sandboxes run inside dedicated microVMs, providing a hard security boundary beyond traditional container isolation. When you run docker sandbox run <agent>, Docker creates an isolated microVM workspace that:

Mounts your project directory at the same absolute path (on Linux and macOS)

Preserves your Git configuration for proper commit attribution

Does not inherit environment variables from your current shell session

Gives agents full autonomy without compromising your host

Provides network isolation with configurable allow/deny lists

Docker Sandboxes now natively supports six agent types: Claude Code, Gemini, Codex, Copilot, Agent, and Kiro (all experimental). Agent can be launched directly as a sandbox agent:

# Run Agent natively in a sandbox
docker sandbox create agent ~/path/to/workspace
docker sandbox run agent ~/path/to/workspace

Or, for more control, use a detached sandbox:

# Create a sandbox
docker sandbox run -d –name my-agent-sandbox claude

# Copy agent into the sandbox
docker cp /usr/bin/agent &lt;container-id&gt;:/usr/bin/agent

# Run your agent team
docker exec -it &lt;container-id&gt; bash -c "cd /path/to/workspace &amp;&amp; agent run dev-team.yaml"

Your workspace /Users/alice/projects/myapp on the host is also /Users/alice/projects/myapp inside the microVM. Error messages, scripts with hard-coded paths, and relative imports all work as expected. But the agent is contained in its own microVM, it can’t access files outside the mounted workspace, and any damage it causes is limited to the sandbox.

Why Docker Sandboxes Matter

The combination of agents and Docker Sandboxes gives you something powerful:

Full agent autonomy: Agents can install packages, run tests, make commits, and use tools without constant human oversight

Complete safety: Even if an agent makes a mistake, it’s contained within the microVM sandbox

Hard security boundary: MicroVM isolation goes beyond containers, each sandbox runs in its own virtual machine

Network control: Allow/deny lists let you restrict which external services agents can access

Familiar experience: Same paths, same tools, same workflow as working directly on your machine

Workspace persistence: Changes sync between host and microVM, so your work is always available

Here’s how the workflow looks in practice:

User requests a feature to the root agent: “Create a bank app with Gradio”

Root creates a todo list and delegates to the designer

Designer generates wireframes and UI specifications

Awesome_engineer implements the code, running pip install gradio and python app/main.py

QA runs tests, finds bugs, and reports them

Fixer_engineer resolves the issues

Root confirms all tests pass and marks the feature complete

All of this happens autonomously inside a sandboxed environment. The agents can install dependencies, modify files, and execute commands, but they’re isolated from your host machine.

Try It Yourself

Let’s walk through setting up a simple agent team in a Docker Sandbox.

Prerequisites

Docker Desktop 4.60+ with sandbox support (microVM-based isolation)

agent (included in Docker Desktop 4.49+)

API key for your model provider (Anthropic, OpenAI, or Google)

Step 1: Create Your Agent Team

Save this configuration as dev-team.yaml:

models:
openai:
provider: openai
model: gpt-5

agents:
root:
model: openai
description: Product Manager – Leads the development team
instruction: |
Break user requirements into small iterations. Coordinate designer → frontend → QA.
sub_agents: [designer, awesome_engineer, qa]
toolsets:
– type: filesystem
– type: think
– type: todo

designer:
model: openai
description: UI/UX Designer – Creates designs and wireframes
instruction: |
Create wireframes and mockups for features. Ensure responsive designs.
toolsets:
– type: filesystem
– type: think

awesome_engineer:
model: openai
description: Developer – Implements features
instruction: |
Build features based on designs. Write clean, tested code.
toolsets:
– type: filesystem
– type: shell
– type: think

qa:
model: openai
description: QA Specialist – Tests and identifies bugs
instruction: |
Test features and identify bugs. Report issues to fixer.
toolsets:
– type: filesystem
– type: think

Step 2: Create a Docker Sandbox

The simplest approach is to use agent as a native sandbox agent:

# Run agent directly in a sandbox (experimental)
docker sandbox run agent ~/path/to/your/workspace

Alternatively, use a detached Claude sandbox for more control:

# Start a detached sandbox
docker sandbox run -d –name my-dev-sandbox claude

# Copy agent into the sandbox
which agent # Find the path on your host
docker cp $(which agent) $(docker sandbox ls –filter name=my-dev-sandbox -q):/usr/bin/agent

Step 3: Set Environment Variables

# Run agent with your API key (passed inline since export doesn't persist across exec calls)
docker exec -it -e OPENAI_API_KEY=your_key_here my-dev-sandbox bash

Step 4: Run Your Agent Team

# Mount your workspace and run agent
docker exec -it my-dev-sandbox bash -c "cd /path/to/your/workspace &amp;&amp; agent run dev-team.yaml"

Now you can describe what you want to build, and your agent team will handle the rest:

User: Create a bank application using Python. The bank app should have basic functionality like account savings, show balance, withdraw, add money, etc. Build the UI using Gradio. Create a directory called app, and inside of it, create all of the files needed by the project

Agent (root): I'll break this down into iterations and coordinate with the team…

Watch as the designer creates wireframes, the engineer builds the Gradio app, and QA tests it, all autonomously in a secure sandbox.

Final result from a one shot prompt

Step 5: Clean Up

When you’re done:

# Remove the sandbox
docker sandbox rm my-dev-sandbox

Docker enforces one sandbox per workspace. Running docker sandbox run in the same directory reuses the existing container. To change configuration, remove and recreate the sandbox.

Current Limitations

Docker Sandboxes and Docker Agent are evolving rapidly. Here are a few things to know:

Docker Sandboxes now supports six agent types natively: Claude Code, Gemini, Codex, Copilot, agent, and Kiro.  All are experimental and breaking changes may occur between Docker Desktop versions.

Custom Shell that doesn’t include a pre-installed agent binary. Instead, it provides a clean environment where you can install and configure any agent or tool

MicroVM sandboxes require macOS or Windows. Linux users can use legacy container-based sandboxes with Docker Desktop 4.57+

API keys may still need manual configuration depending on the agent type

Sandbox templates are optimized for certain workflows; custom setups may require additional configuration

Why This Matters Now

AI agents are becoming more capable, but they need infrastructure to run safely and effectively. The combination of agent and Docker Sandboxes addresses this by:

Feature

Traditional Approach

With agent + Docker Sandboxes

Autonomy

Limited – requires constant oversight

High – agents work independently

Security

Risky – agents have host access

Isolated – agents run in microVMs

Specialization

One model does everything

Multiple agents with focused roles

Reproducibility

Inconsistent across machines

MicroVM-isolated, version-controlled

Scalability

Manual coordination

Automated team orchestration

This isn’t just about convenience, it’s about enabling AI agents to do real work in production environments, with the safety guarantees that developers expect.

What’s Next

Explore the Docker Agent documentation to build your own agent teams

Check out Docker Sandboxes for advanced configurations

Browse example agent configurations in the agent repository

Integrate agent with your editor or use agents as tools in MCP clients

Conclusion

We’re moving from “prompting AI to write code” to “orchestrating AI teams to build software.” agent gives you the team structure; Docker Sandboxes provides the secure foundation.

The days of wearing every hat as a solo developer are numbered. With specialized AI agents working in isolated containers, you can focus on what matters, designing great software, while your AI team handles the implementation, testing, and iteration.

Try it out. Build your own agent team. Run it in a Docker Sandbox. See what happens when you have a development team at your fingertips, ready to ship features while you grab lunch.

Quelle: https://blog.docker.com/feed/

What’s Holding Back AI Agents? It’s Still Security

It’s hard to find a team today that isn’t talking about agents. For most organizations, this isn’t a “someday” project anymore. Building agents is a strategic priority for 95% of respondents that we surveyed across the globe with 800+ developers and decision makers in our latest State of Agentic AI research. The shift is happening fast: agent adoption has moved beyond experiments and demos into something closer to early operational maturity. 60% of organizations already report having AI agents in production, though a third of those remain in early stages. 

Agent adoption today is driven by a pragmatic focus on productivity, efficiency, and operational transformation, not revenue growth or cost reduction. Early adoption is concentrated in internal, productivity-focused use cases, especially across software, infrastructure, and operations. The feedback loops are fast, and the risks are easier to control. 

So what’s holding back agent scaling? Friction shows up and nearly all roads lead to the same place: AI agent security. 

AI agent security isn’t one issue it’s the constraint

When teams talk about what’s holding them back, AI agent security rises to the top. In the same survey, 40% of respondents cite security as their top blocker when building agents. The reason it hits so hard is that it’s not confined to a single layer of the stack. It shows up everywhere, and it compounds as deployments grow.

For starters, when it comes to infrastructure, as organizations expand agent deployments, teams emphasize the need for secure sandboxing and runtime isolation, even for internal agents.

At the operations layer, complexity becomes a security problem. Once you have more tools, more integrations, and more orchestration logic, it gets harder to see what’s happening end-to-end and harder to control it. Our latest research data reflects that sprawl: over a third of respondents report challenges coordinating multiple tools, and a comparable share say integrations introduce security or compliance risk. That’s a classic pattern: operational complexity creates blind spots, and blind spots become exposure.

45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready.

And at the governance layer, enterprises want something simple: consistency. They want guardrails, policy enforcement, and auditability that work across teams and workflows. But current tooling isn’t meeting that bar yet. In fact, 45% of organizations say the biggest challenge is ensuring tools are secure, trusted, and enterprise-ready. That’s not a minor complaint: it’s the difference between “we can try this” and “we can scale this.”

MCP is popular but not ready for enterprise

Many teams are adopting Model Context Protocol (MCP) because it gives agents a standardized way to connect to tools, data, and external systems, making agents more useful and customized.  Among respondents further along in their agent journey,  85% say they’re familiar with MCP and two-thirds say they actively use it across personal and professional projects. 

Research data suggests that most teams are operating in what could be described as “leap-of-faith mode” when it comes to MCP, adopting the protocol without security guarantees and operational controls they would demand from mature enterprise infrastructure.

But the security story hasn’t caught up yet. Teams adopt MCP because it works, but they do so without the security guarantees and operational controls they would expect from mature enterprise infrastructure. For teams earlier in their agentic journey: 46% of them identify  security and compliance as the top challenge with MCP.

Organizations are increasingly watching for threats like prompt injection and tool poisoning, along with the more foundational issues of access control, credentials, and authentication. The immaturity and security challenges of current MCP tooling make for a fragile foundation at this stage of agentic adoption.

Conclusion and recommendations

Ai agent security is what sets the speed limit for agentic AI in the enterprise. Organizations aren’t lacking interest, they’re lacking confidence that today’s tooling is enterprise-ready, that access controls can be enforced reliably, and that agents can be kept safely isolated from sensitive systems.  

The path forward is clear. Unlocking agents’ full potential will require new platforms built for enterprise scale, with secure-by-default foundations, strong governance, and policy enforcement that’s integrated, not bolted on.

Download the full Agentic AI report for more insights and recommendations on how to scale agents for enterprise. 

Join us on March 25, 2026, for a webinar where we’ll walk through the key findings and the strategies that can help you prioritize what comes next.

Learn more:

Get your copy of the latest State of Agentic AI report! 

Learn more about Docker’s AI solutions

Read more about why AI agents challenge existing governance approaches and explore a new framework designed for agentic AI.

Quelle: https://blog.docker.com/feed/

Celebrating Women in AI: 3 Questions with Cecilia Liu on Leading Docker’s MCP Strategy

To celebrate International Women’s Day, we sat down with Cecilia Liu, Senior Product Manager at Docker, for three questions about the vision and strategy behind Docker’s MCP solutions. From shaping product direction to driving AI innovation, Cecilia plays a key role in defining how Docker enables secure, scalable AI tooling.

Cecilia leads product management for Docker’s MCP Catalog and Toolkit, our solution for running MCP servers securely and at scale through containerization. She drives Docker’s AI strategy across both enterprise and developer ecosystems, helping organizations deploy MCP infrastructure with confidence while empowering individual developers to seamlessly discover, integrate, and use MCP in their workflows. With a technical background in AI frameworks and an MBA from NYU Stern, Cecilia bridges the worlds of AI infrastructure and developer tools, turning complex challenges into practical, developer-first solutions.

What products are you responsible for?

I own Docker’s MCP solution. At its core, it’s about solving the problems that anyone working with MCP runs into: how do you find the right MCP servers, how do you actually use them without a steep learning curve, and how do you deploy and manage them reliably across a team or organization.

How does Docker’s MCP solution benefit developers and enterprise customers?

Dev productivity is where my heart is. I want to build something that meaningfully helps developers at every stage of their cycle — and that’s exactly how I think about Docker’s MCP solution.

For end-user developers and vibe coders, the goal is simple: you shouldn’t need to understand the underlying infrastructure to get value from MCP. As long as you’re working with AI, we make it easy to discover, configure, and start using MCP servers without any of the usual setup headaches. One thing I kept hearing in user feedback was that people couldn’t even tell if their setup was actually working. That pushed us to ship in-product setup instructions that walk you through not just configuration, but how to verify everything is running correctly. It sounds small, but it made a real difference.

For developers building MCP servers and integrating them into agents, I’m focused on giving them the right creation and testing tools so they can ship faster and with more confidence. That’s a big part of where we’re headed.

And for security and enterprise admins, we’re solving real deployment pain, making it faster and cheaper to roll out and manage MCP across an entire organization. Custom catalogs, role-based access controls, audit logging, policy enforcement. The goal is to give teams the visibility and control they need to adopt AI tooling confidently at scale.

Customers love us for all of the above, and there’s one more thing that ties it together: the security that comes built-in with Docker. That trust doesn’t happen overnight, and it’s something we take seriously across everything we ship.

What are you excited about when it comes to the future of MCP?

What excites me most is honestly the pace of change itself. The AI landscape is shifting constantly, and with every new tool that makes AI more powerful, there’s a whole new set of developers who need a way to actually use it productively. That’s a massive opportunity.

MCP is where that’s happening right now, and the adoption we’re seeing tells me the need is real. But what gets me out of bed is knowing the problems we’re solving: discoverability, usability, deployment. They are all going to matter just as much for whatever comes next. We’re not just building for today’s tools. We’re building the foundation that developers will reach for every time something new emerges.

Cecilia is speaking about scaling MCP for enterprises at the MCP Dev Summit in NYC on 3rd of April, 2026. If you’re attending, be sure to stop by Docker’s booth (D/P9).

Learn more

Explore Docker’s MCP Catalog and Toolkit on our website.

Dive into our documentation to get started quickly.

Ready to go hands-on? Open Docker Desktop or the CLI and start using MCP to streamline and automate your development workflows.

Quelle: https://blog.docker.com/feed/