Beyond Containers: llama.cpp Now Pulls GGUF Models Directly from Docker Hub

The world of local AI is moving at an incredible pace, and at the heart of this revolution is llama.cpp—the powerhouse C++ inference engine that brings Large Language Models (LLMs) to everyday hardware (and it’s also the inference engine that powers Docker Model Runner). Developers love llama.cpp for its performance and simplicity. And we at Docker are obsessed with making developer workflows simpler.

That’s why we’re thrilled to announce a game-changing new feature in llama.cpp: native support for pulling and running GGUF models directly from Docker Hub.

This isn’t about running llama.cpp in a Docker container. This is about using Docker Hub as a powerful, versioned, and centralized repository for your AI models, just like you do for your container images.

Why Docker Hub for AI Models?

Managing AI models can be cumbersome. You’re often dealing with direct download links, manual version tracking, and scattered files. By integrating with Docker Hub, llama.cpp leverages a mature and robust ecosystem to solve these problems.

Rock-Solid Versioning: The familiar repository:tag syntax you use for images now applies to models. Easily switch between gemma3 and smollm2:135M-Q4_0 with complete confidence.

Centralized & Discoverable: Docker Hub can become the canonical source for your team’s models. No more hunting for the “latest” version on a shared drive or in a chat history.

Simplified Workflow: Forget curl, wget or manually downloading from web UIs. A single command-line flag now handles discovery, download, and caching.

Reproducibility: By referencing a model with its immutable digest or tag, you ensure that your development, testing, and production environments are all using the exact same artifact, leading to more consistent and reproducible results.

How It Works Under the Hood 

This new feature cleverly uses the Open Container Initiative (OCI) specification, which is the foundation of Docker images. The GGUF model file is treated as a layer within an OCI manifest, identified by a special media type like application/vnd.docker.ai.gguf.v3. For more details on why the OCI standard matters for models, check out our blog.

When you use the new –docker-repo flag, llama.cpp performs the following steps:

Authentication: It first requests an authentication token from the Docker registry to authorize the download.

Manifest Fetch: It then fetches the manifest for the specified model and tag (e.g., ai/gemma3:latest).

Layer Discovery: It parses the manifest to find the specific layer that contains the GGUF model file by looking for the correct media type.

Blob Download: Using the layer’s unique digest (a sha256 hash), it downloads the model file directly from the registry’s blob storage.

Caching: The model is saved to a local cache, so subsequent runs are instantaneous.

This entire process is seamless and happens automatically in the background.

Get Started in Seconds

Ready to try it? If you have a recent build of llama.cpp, you can serve a model from Docker Hub with one simple command. The new flag is –docker-repo (or -dr).

Let’s run gemma3, a model available from Docker Hub.

# Now, serve a model from Docker Hub!
llama-server -dr gemma3

The first time you execute this, you’ll see llama.cpp log the download progress. After that, it will use the cached version. It’s that easy! The default organization is ai/, so gemma3 is resolved to ai/gemma3. The default tag is :latest, but a tag can be specified like :1B-Q4_K_M.

For a complete Docker-integrated experience with OCI pushing and pulling support try out Docker Model Runner. The docker model runner equivalent for chatting is:

# Pull, serve and chat to a model from Docker Hub!
docker model run ai/gemma3

The Future of AI Model Distribution

This integration represents a powerful shift in how we think about distributing and managing AI artifacts. By using OCI-compliant registries like Docker Hub, the AI community can build more robust, reproducible, and scalable MLOps pipelines.

This is just the beginning. We envision a future where models, datasets, and the code that runs them are all managed through the same streamlined, developer-friendly workflow that has made Docker an essential tool for millions.

Check out the latest llama.cpp to try it out, and explore the growing collection of models on Docker Hub today!

Learn more

Read our quickstart guide to Docker Model Runner.

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Discover curated models on Docker Hub

Quelle: https://blog.docker.com/feed/

Silent Component Updates & Redesigned Update Experience

Following on from our previous initiative to improve how Docker Desktop delivers updates, we are excited to announce another major improvement to how Docker Desktop keeps your development tools up to date. Starting with Docker Desktop 4.46, we’re introducing automatic component updates and a completely redesigned update experience that puts your productivity first.

Why We’re Making This Change

Your development workflow shouldn’t be interrupted by update notifications and restart requirements. With our new approach, you get:

Zero workflow interruption – components update automatically in the background when a Docker Desktop restart is not required

Always-current tools – Scout, Compose, Ask Gordon, and Model Runner stay up-to-date without manual intervention

Better security posture – automatic updates mean you’re always running the latest, most secure versions

Enterprise control – admin console cloud setting to control the update behaviour. 

What’s New in Docker Desktop 4.46

Silent Component Updates

Independent tools now update automatically in the background without any user interaction required and without impact on running containers:

Docker Scout – Latest vulnerability scanning capabilities

Docker Compose – New features and bug fixes

Ask Gordon – Enhanced AI assistance improvements

Model Runner – Updated model support and performance optimizations

Note that the component list above may change in the future as we add or remove features. 

Redesigned Update Experience

We have completely re-imagined how Docker Desktop communicates updates to you:

Streamlined update flow with clearer messaging

In-app release highlights showcasing key improvements you actually care about

Reduced notification fatigue through more thoughtful update communications

[Coming soon] Smart timing – GUI-only updates happen automatically when you close and reopen Docker 

Full Control When You Need It

Individual User Control

Want to manage updates yourself? You have complete control:

Go to Docker Desktop Settings

Navigate to Software Updates

Toggle “Automatically update components” on or off

Software updates: new setting to control opt in or out of automatic component updates.

Enterprise Management

For Docker Business subscribers, administrators maintain full governance through the admin console:

Access Admin Console > Desktop Settings Management

Edit your global policy

Configure “Automatically update components” to enable, disable, lock, or set defaults for your entire organization

This ensures enterprises can maintain their preferred update policies while giving individual developers the productivity benefits of seamless updates.

Admin console: desktop settings management policy contains a new silent update setting for enterprise control.

We Want Your Feedback

The redesigned update workflow is rolling out to the majority of our users as we gather feedback and refine the experience. We’re committed to getting this right, so please share your thoughts:

In-app feedback popup – we do read those!

Docker Slack community – join the conversation with other developers

GitHub issues – report specific bugs or feature requests

Getting Started

Docker Desktop 4.46 with silent component updates is available now. The new update experience will gradually roll out to all users over the coming weeks.

Already using Docker Desktop? Update in-app to get the latest features. 

New to Docker? Download Docker Desktop here to experience the most seamless development environment we’ve ever built.

Quelle: https://blog.docker.com/feed/

Docker Model Runner General Availability

We’re excited to share that Docker Model Runner is now generally available (GA)! In April 2025, Docker introduced the first Beta release of Docker Model Runner, making it easy to manage, run, and distribute local AI models (specifically LLMs). Though only a short time has passed since then, the product has evolved rapidly, with continuous enhancements driving the product to a reliable level of maturity and stability.

This blog post takes a look back at the most important and widely appreciated capabilities Docker Model Runner brings to developers, and looks ahead to share what they can expect in the near future.

What is Docker Model Runner?

Docker Model Runner (DMR) is built for developers first, making it easy to pull, run, and distribute large language models (LLMs) directly from Docker Hub (in an OCI-compliant format) or HuggingFace (if models are available in the GGUF format, in which case they will be packaged as OCI Artifacts on-the-fly by the HuggingFace backend).

Tightly integrated with Docker Desktop and Docker Engine, DMR lets you serve models through OpenAI-compatible APIs, package GGUF files as OCI artifacts, and interact with them using either the command line, a graphical interface, or developer-friendly (REST) APIs.

Whether you’re creating generative AI applications, experimenting with machine learning workflows, or embedding AI into your software development lifecycle, Docker Model Runner delivers a consistent, secure, and efficient way to work with AI models locally.

Check the official documentation to learn more about Docker Model Runner and its capabilities.

Why Docker Model Runner?

Docker Model Runner makes it easier for developers to experiment and build AI application, including agentic apps, using the same Docker commands and workflows they already use every day. No need to learn a new tool!

Unlike many new AI tools that introduce complexity or require additional approvals, Docker Model Runner fits cleanly into existing enterprise infrastructure. It runs within your current security and compliance boundaries, so teams don’t have to jump through hoops to adopt it.

Model Runner supports OCI-packaged models, allowing you to store and distribute models through any OCI-compatible registry, including Docker Hub. And for teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based access controls to help enforce guardrails at scale.

11 Docker Model Runner Features Developers Love Most

Below are the features that stand out the most and have been highly valued by the community.

1. Powered by llama.cpp 

Currently, DMR is built on top of llama.cpp, which we plan to continue supporting. At the same time, DMR is designed with flexibility in mind, and support for additional inference engines (such as MLX or vLLM) is under consideration for future releases.

2. GPU acceleration across macOS and Windows platforms 

Harness the full power of your hardware with GPU support: Apple Silicon on macOS, NVIDIA GPUs on Windows, and even ARM/Qualcomm acceleration — all seamlessly managed through Docker Desktop.

3. Native Linux support 

Run DMR on Linux with Docker CE, making it ideal for automation, CI/CD pipelines, and production workflows.

4. CLI and UI experience 

Use DMR from the Docker CLI (on both Docker Desktop and Docker CE) or through Docker Desktop’s UI. The UI provides guided onboarding to help even first-time AI developers start serving models smoothly, with automatic handling of available resources (RAM, GPU, etc.).

Figure 1: Docker Model Runner works both in Docker Desktop and the CLI, letting you run models locally with the same familiar Docker commands and workflows you already know

5. Flexible model distribution 

Pull and push models from Docker Hub in OCI format, or pull directly from HuggingFace repositories hosting models in GGUF format for maximum flexibility in sourcing and sharing models.

6. Open Source and free 

DMR is fully open source and free for everyone, lowering the barrier to entry for developers experimenting with or building on AI.

7. Secure and controlled 

DMR runs in an isolated, controlled environment that doesn’t interfere with the main system or user data (sandboxing). Developers and IT admins can fine-tune security and availability by enabling/disabling DMR or configuring options like host-side TCP support and CORS.

8. Configurable inference settings 

Developers can customize context length and llama.cpp runtime flags to fit their use cases, with more configuration options coming soon.

9. Debugging support 

Built-in request/response tracing and inspect capabilities make it easier to understand token usage and framework/library behaviors, helping developers debug and optimize their applications.

Figure 2: Built-in tracing and inspect tools in Docker Desktop make debugging easier, giving developers clear visibility into token usage and framework behavior

10. Integrated with the Docker ecosystem 

DMR works out of the box with Docker Compose and is fully integrated with other Docker products, such as Docker Offload (cloud offload service) and Testcontainers, extending its reach into both local and distributed workflows.

11. Up-to-date model catalog 

Access a curated catalog of the most popular and powerful AI models on Docker Hub. These models can be pulled for free and used across development, pipelines, staging, or even production environments.

Figure 3: Curated model catalog on Docker Hub, packaged as OCI Artifacts and ready to run

The road ahead

The future is bright for Docker Model Runner, and the recent GA version is only the first milestone. Below are some of the future enhancements that you should expect to be released soon.

Streamlined User Experience 

Our goal is to make DMR simple and intuitive for developers to use and debug. This includes richer response rendering in the chat-like interface within Docker Desktop and the CLI, multimodal support in the UI (already available through the API), integration with MCP tools, and enhanced debugging features, alongside expanded configuration options for greater flexibility. Last but not least, we aim to provide smoother and more seamless integration with third-party tools and solutions across the AI ecosystem.

Enhancements and better ability to execute 

We remain focused on continuously improving DMR’s performance and flexibility for running local models. Upcoming enhancements include support for the most widely used inference libraries and engines, advanced configuration options at the engine and model level, and the ability to deploy Model Runner independently from Docker Engine for production-grade use cases, along with many more improvements on the horizon.

Frictionless Onboarding 

We want first-time AI developers to start building their applications right away, and to do so with the right foundations. To achieve this, we plan to make onboarding into DMR even more seamless. This will include a guided, step-by-step experience to help developers get started quickly, paired with a set of sample applications built on DMR. These samples will highlight real-world use cases and best practices, providing a smooth entry point for experimenting with and adopting DMR in everyday workflows.

Staying on Top of Model Launch 

As we continue to enhance inference capabilities, we remain committed to maintaining a first-class catalog of AI models directly in Docker Hub, the leading registry for OCI artifacts, including models. Our goal is to ensure that new, relevant models are available in Docker Hub and runnable through DMR as soon as they are publicly released.

Conclusion

Docker Model Runner has come a long way in a short time, evolving from its Beta release into a mature and stable inference engine that’s now generally available. At its core, the mission has always been clear: make it simple, consistent, and secure for developers to pull, run, and serve AI models locally,. using familiar Docker CLI commands and tools they already love!

Now is the perfect time to get started. If you haven’t already, install Docker Desktop and try out Docker Model Runner today. Follow the official documentation to explore its capabilities and see for yourself how DMR can accelerate your journey into building AI-powered applications.

Learn more

Read our quickstart guide to Docker Model Runner.

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Learn how Compose works with Model runner, making building AI apps and agents easier

Learn how to build an AI tutor

Explore how to use both local and remote models in hybrid AI workflows 

Building AI agents made easy with Goose and Docker

Using Model Runner on Hugging Face

Powering AI generated testing with Docker Model Runner

Build a GenAI App With Java Using Spring AI and Docker Model Runner

Tool Calling with Local LLMs: A Practical Evaluation

Behind the scenes: How we designed Docker Model Runner and what’s next

Why Docker Chose OCI Artifacts for AI Model Packaging

What’s new with Docker Model Runner 

Publishing AI models to Docker Hub

How to Build and Run a GenAI ChatBot from Scratch using Docker Model Runner 

Quelle: https://blog.docker.com/feed/

Build and Distribute AI Agents and Workflows with cagent

cagent is a new open-source project from Docker that makes it simple to build, run, and share AI agents, without writing a single line of code. Instead of writing code and wrangling Python versions and dependencies when creating AI agents, you define your agent’s behavior, tools, and persona in a single YAML file, making it incredibly straightforward to create and share personalized AI assistants.

Figure 1: cagent is a powerful, easy to use, customizable multi-agent runtime that orchestrates AI agents with specialized capabilities and tools, and the interactions between agents.

cagent can use OCI registries to share and pull agents created by the community, so not only can you elegantly solve the agent creation problem, but also the agent distribution problem. 

Let’s dive into what makes cagent special and explore some real-world use cases.

What is cagent?

At its core, cagent is a command-line utility that runs AI agents defined in cagent.yaml files. The philosophy is simple: declare what you want your agent to do, and cagent handles the rest. 

There are a few features that you’ll probably like for authoring your agents. 

Declarative and Simple: Define models, instructions, and agent behavior in one YAML file. This “single artifact” approach makes agents portable, easy to version, and easy to share.

Flexible Model Support: You’re not tied to a specific provider. You can run remote models or even local ones using Docker Model Runner, ideal for privacy reasons. 

Powerful Tool Integration: cagent includes built-in tools for common tasks (like shell commands or filesystem access) and supports external tools via MCP, enabling agents to connect to virtually any API. 

Multi-Agent Systems: You’re also not limited to a single agent. Cagent allows you to define a team of agents that can collaborate and delegate tasks to one another, with each agent having its own specialized skills and tools. 

Practical use cases for agent

I’ve lived with and used cagent for a few weeks now, and in this article, I want to share two of my practically useful agents that I actually use. 

A GitHub Task Tracker

Let’s start with a practical, developer-centric example. While tracking GitHub issues with AI might not be revolutionary, it’s surprisingly useful and demonstrates cagent’s capabilities in a real-world workflow. 

There’s no shortage of task tracking solutions to integrate with, but one of the most useful for developers is GitHub. We’ll use a repository in GitHub and issues on it as our to-do list. Does it have the best UX? It doesn’t actually matter; we’ll consume and create issues with AI, so the actual underlying UX is irrelevant. 

I have a GitHub repo: github.com/shelajev/todo, which has issues enabled, and we’d like an agent that can, among other things, create issues, list issues, and close issues. 

Figure 2

Here’s the YAML for a GitHub-based to-do list agent. The instructions for the agent were generated with the agent new command, and then I refined the instructions it generated by manually asking Gemini to make them shorter. 

YAML

version: "2"

models:
gpt:
provider: openai
model: gpt-5
max_tokens: 64000

agents:
root:
model: gpt
description: "GitHub Issue Manager – An agent that connects to GitHub to use a repo as a todo-list"
instruction: |
You are a to-do list agent, and your purpose is to help users manage their tasks in their "todo" GitHub repository.

# Primary Responsibilities
– Connect to the user's "todo" GitHub repository and fetch their to-do items, which are GitHub issues.
– Identify and present the to-do items for the current day.
– Provide clear summaries of each to-do item, including its priority and any labels.
– Help the user organize and prioritize their tasks.
– Assist with managing to-do items, for example, by adding comments or marking them as complete.

# Key Behaviors
– Always start by stating the current date to provide context for the day's tasks.
– Focus on open to-do items.
– Use labels such as "urgent," "high priority," etc., to highlight important tasks.
– Summarize to-do items with their title, number, and any relevant labels.
– Proactively suggest which tasks to tackle first based on their labels and context.
– Offer to help with actions like adding notes to or closing tasks.

# User Interaction Flow
When the user asks about their to-do list:
1. List the open items from the "todo" repository.
2. Highlight any urgent or high-priority tasks.
3. Offer to provide more details on a specific task or to help manage the list.

add_date: true
toolsets:
– type: mcp
command: docker
args: [mcp, gateway, run]
tools:
[
"get_me",
"add_issue_comment",
"create_issue",
"get_issue",
"list_issues",
"search_issues",
"update_issue",
]

It’s a good example of a well-crafted prompt that defines the agent’s persona, responsibilities, and behavior, ensuring it acts predictably and helpfully. The best part is editing and running it is fast and frictionless, just save the YAML and run: 

cagent run github-todo.yaml

This development loop works without any IDE setup. I’ve done several iterations in Vim, all from the same terminal window where I was running the agent. 

This agent also uses a streamlined tools configuration. A lot of examples show adding MCP servers from the Docker MCP toolkit like this: 

toolsets:
– type: mcp
ref: docker:github-official

This would run the GitHub MCP server from the MCP catalog, but as a separate “toolkit” from your Docker Desktop’s MCP toolkit setup.

Using the manual command to connect to the MCP toolkit makes it easy to use OAuth login support in Docker Desktop. 

Figure 3

Also, the official GitHub MCP server is awfully verbose. Powerful, but verbose. So, for the issue-related agents, it makes a lot of sense to limit the list of tools exposed to the agent: 

tools:
[
"get_me",
"add_issue_comment",
"create_issue",
"get_issue",
"list_issues",
"search_issues",
"update_issue",
]

That list I made with running: 

docker mcp tools list | grep "issue"

And asking AI to format it as an array. 

This todo-agent is available on Docker Hub, so it’s a simple agent pull command away: 

cagent run docker.io/olegselajev241/github-todo:latest

Just enable the GitHub MCP server in the MCP toolkit first, and well, make sure the repo exists.

The Advocu Captains Agent

At Docker, we use Advocu to track our Docker Captains, ambassadors who create content, speak at conferences, and engage with the community. We use Advocu to track their information details and contributions, such as blog posts, videos, and conference talks about Docker’s technologies.

Manually searching through Advocu is time-consuming. For a long time, I wondered: what if we could build an AI assistant to do it for us? 

My first attempt was to build a custom MCP server for our Advocu instance: https://github.com/shelajev/mcp-advocu

It’s largely “vibe-coded”, but in a nutshell, running

docker run -i -rm -e ADVOCU_CLIENT_SECRET=your-secret-here olegselajev241/mcp-advocu:stdio

will run the MCP server with tools that expose information about Docker Captains, allowing MCP clients to search through their submitted activities. 

Figure 4

However, sharing the actual agent, and especially the configuration required to run it, was a bit awkward. 

cagent solved this for me in a much neater way. Here is the complete cagent.yaml for my Advocu agent:

YAML

#!/usr/bin/env cagent run
version: "2"

agents:
root:
model: anthropic/claude-sonnet-4-0
description: Agent to help with finding information on Docker Captains and their recent contributions to Docker
toolsets:
– type: mcp
command: docker
args:
– run
– -i
– –rm
– –env-file
– ./.env
– olegselajev241/mcp-advocu:stdio
instruction: You have access to Advocu – a platform where Docker Captains log their contributions. You can use tools to query and process that information about captains themselves, and their activities like articles, videos, and conference sessions. You help the user to find relevant information and to connect to the captains by topic expertise, countries, and so on. And to have a hand on the pulse of their contributions, so you can summarize them or answer questions about activities and their content

With this file, we have a powerful, personalized assistant that can query Captain info, summarize their contributions, and find experts by topic. It’s a perfect example of how cagent can automate a specific internal workflow.

Users simply need to create a .env file with the appropriate secret. Even for less technical team members, I can give a shell one-liner to get them set up quickly. 

Now, everyone at Docker can ask questions about Docker captains without pinging the person running the program (hi, Eva!) or digging through giant spreadsheets. 

Figure 5

I’m also excited about the upcoming cagent 1Password integration, which will simplify the setup even more.  

All in all, agents are really just a combination of:

A system prompt

An integration with a model (ideally, the most efficient one that gets the job done)

And the right tools via MCP

With cagent, it’s incredibly easy to manage all three in a clean, Docker – native way. 

Get Started Today!

cagent empowers you to build your own fleet of AI assistants, tailored to your exact needs.

It’s a tool designed for developers who want to leverage the power of AI without getting bogged down in complexity.

You can get started right now by heading over to the cagent GitHub repository. Download the latest release and start building your first agent in minutes. 

Give the repository a star, try it out, and let us know what amazing agents you build!

Quelle: https://blog.docker.com/feed/

Docker and CNCF: Partnering to Power the Future of Open Source

At Docker, open source is not just something we support; it’s a core part of our culture. It’s part of our DNA. From foundational projects like Docker Compose (35.5k stars, 5.4k forks) and Moby (69.8k stars, 18.8k forks) to our continued code contributions, we remain committed to strengthening the open-source ecosystem.

Today, we are announcing a new milestone in that journey: an official partnership between Docker and the Cloud Native Computing Foundation (CNCF). This partnership brings more than just resources for open-source projects. It also reflects the CNCF’s recognition of Docker as the leading distribution platform for containerized software and as a trusted partner in modern software supply chain security.

“Docker’s mission has always been to empower developers, and we know that trust is earned through consistency, openness, and listening. This partnership with CNCF reflects a broader commitment to the open source community by helping maintainers grow their projects, reach more developers through Docker Hub, and deliver value to their communities faster with improved tools, automation, and support.”

Michael Donovan
VP Products, Docker

Why this Partnership Matters

This partnership reflects CNCF’s support of Docker as an industry leader and a strategic partner, trusted to deliver the scale, visibility, and security that today’s cloud-native ecosystem demands.

Docker Hub is the most widely used container registry in the world, serving over 22 billion image downloads per month and hosting more than 14 million images. For CNCF projects, using Docker is a natural choice, offering a trusted, reliable distribution platform with unmatched reach and adoption across the developer community.

“Docker was a founding member of CNCF, and we’ve maintained a long-term open collaboration over the past decade. This partnership marks a step forward for CNCF projects and we’re glad to work together to further secure the open source supply chain.”

Chris Aniszczyk
CTO, CNCF

For Docker, this partnership is a reinforcement of our commitment to the open source community.  We are also excited by the opportunity to deepen collaboration with the maintainers and developers building the future of cloud-native software. For maintainers, it’s an opportunity to gain access to premium infrastructure and support tailored to the needs of open-source projects.

Maintainers: Unlock Full Access to DSOS Benefits

Figure: Docker Captain James Spurin providing a talk on Docker.

During the following days, all CNCF projects will have direct access to a dedicated bundle of Docker services through the Docker Sponsored Open Source (DSOS) program. Some of the key benefits of the program are:

Unlimited image pulls

Sponsored OSS status for increased trust and discoverability

Access to Docker usage metrics and engagement insights

Streamlined support through Docker’s open-source channels

These benefits help you scale your project, grow your community, and ensure reliable access for your users.

“Docker Desktop has long been a key part of my Cloud Native workflows, and extending the Docker Sponsored Open Source Program to CNCF projects will be a game-changer for maintainers and contributors alike.”

James Spurin
Docker Captain & CNCF Ambassador

What the Partnership Offers CNCF Projects

Docker: Official CNCF Project Services Provider

As part of this collaboration, Docker will be listed as an official service provider on the CNCF Project Services page. This showcasing enhances the discoverability of Docker’s tools and services for CNCF maintainers, reinforcing Docker’s role as a trusted infrastructure partner. For projects, it means easier access to vetted, high-impact resources already recognized and recommended by the CNCF community.

Security with Docker Scout

CNCF projects now have unlimited access to Docker Scout, our image analysis and policy evaluation tool. Scout is a critical security layer aligned with modern supply chain practices, helping projects detect vulnerabilities, enforce policies, and maintain healthy, secure containers.

Automated Builds

CNCF projects can streamline their development pipelines with Docker autobuilds, enabling automated image creation directly from source code.

OSS Status

All participating projects receive a Sponsored OSS badge on Docker Hub, increasing trust and visibility among users.

Unlimited Image Pulls

DSOS members benefit from unrestricted public image pulls, ensuring reliable access for users and reducing friction for project adoption.

Docker Usage Metrics

Access to pull data and adoption metrics provides deeper visibility into community engagement and image usage trends.

Support and Communication Channels

DSOS projects receive priority support through Docker’s open-source outreach channels.

Reinforcing Docker’s Role in the Open-Source Supply Chain

Security and trust are foundational to sustainable open source. Docker’s continued investment in secure tooling, developer experience, and supply chain integrity reflects our long-term commitment to supporting the infrastructure that open-source projects and their users rely on. Through tools like Docker Scout, now available to all CNCF projects, Docker is helping maintainers adopt secure development practices in a way that integrates naturally into their existing workflows.

Also the recent launch of Docker Hardened Images, curated, security-enhanced base images, has drawn intense interest from both the open-source community and enterprise users. 

By continuing to invest in security, reliability, and open collaboration, Docker aims to help the ecosystem move forward with confidence.

Moving Forward

This partnership with CNCF is more than a program expansion. It is a signal that Docker Hub is the preferred distribution platform for the projects that matter most in the cloud-native ecosystem. It enables us to collaborate more deeply with maintainers, deliver better tools, and ensure open-source infrastructure is built on a strong, secure foundation.

If you’re a CNCF maintainer, now is the time to make sure your project is fully supported.

In the following days, your project will feature the DSOS badge on Docker Hub. If not, contact the CNCF Service Desk to get started. In case you don’t want to become part of the DSOS program, you can also use the same method of contact.

We’re proud to support the projects powering the modern internet, and we’re just getting started.

Learn More

Apply to the Docker Sponsored Open-Source Program

Learn about Docker’s Open Source tools

Read the CNCF blog about the partnership

Quelle: https://blog.docker.com/feed/

How to Build Secure AI Coding Agents with Cerebras and Docker Compose

In the recent article, Building Isolated AI Code Environments with Cerebras and Docker Compose, our friends at Cerebras showcased how one can build a coding agent to use worlds fastest Cerebras’ AI inference API, Docker Compose, ADK-Python, and MCP servers.

In this post, we’ll dive deeper into the underlying technologies and show how the pieces come together to build an AI agent environment that’s portable, secure, and fully containerized. You’ll learn how to create multi-agent systems, run some agents with local models in Docker Model Runner, and integrate custom tools as MCP servers into your AI agent’s workflow.

We’ll also touch on how to build a secure sandbox for executing the code your agent writes, an ideal use case for containers in real-world development. 

Getting Started

To begin, clone the repository from GitHub and navigate into the project directory.

Get the code for the agent, and prepare the .env file to provide your Cerebras API key: 

git clone https://github.com/dockersamples/docker-cerebras-demo && cd docker-cerebras-demo

Next, prepare the .env file to provide your Cerebras API key. You can get a key from the Cerebras Cloud platform.

# This copies the sample environment file to your local .env file
cp .env-sample .env

Now, open the .env file in your favorite editor and add your API key to the CEREBRAS_API_KEY line. Once that’s done, run the system using Docker Compose:

docker compose up –build

The first run may take a few minutes to pull the model and containers. Once it’s up, you can see the agent at localhost:8000.

The first run may take a few minutes to pull the necessary Docker images and the AI model. Once it’s running, you can access the agent’s interface at http://localhost:8000. From there, you can interact with your agent and issue commands like “write code,” “initialize the sandbox environment,” or request specific tools like “cerebras, curl docker.com for me please.”

Understanding the Architecture

This demo follows the architecture from our Compose for Agents repository, which breaks down an agent into three core components:

The Agentic Loop: This is the main application logic that orchestrates the agent’s behavior. In our case, it’s an ADK-Python-based application. The ADK-Python framework also includes a visualizer that lets you inspect tool calls and trace how the system reached specific decisions.

The MCP Tools: These are the external tools the agent can use. We provide them securely via the Docker MCP Gateway. In this app we use context7 and node sandbox MCP servers. 

The AI Model: You can define any local or remote AI model you want to use. Here, we’re using a local Qwen model for routing between the local agent and the powerful Cerebras agent which will use Cerebras API. 

Cerebras Cloud serves as a specialized, high-performance inference backend. It can run massive models, like a half-trillion parameter Qwen coder, at thousands of tokens per second. While our simple demo doesn’t require this level of speed, such performance is a game-changer for real-world applications.

Most of the prompts and responses are a few hundred tokens long, as they are simple commands to initialize a sandbox or write some JavaScript code in it. You’re welcome to make the agent work harder and see Cerebras’ performance on more verbose requests. 

For example, you can ask the Cerebras agent to write some JavaScript code, and see it call the functions from the MCP tools to read and write the files and run them as you see on the screenshot below. 

Building a Custom Sandbox as an MCP Server

A key feature of this setup is the ability to create a secure sandbox for code execution. To do this, we’ll build a custom MCP server. In our example, we enable two MCP servers:

context7: This gives our agent access to the latest documentation for various application frameworks.

node-code-sandbox: This is our custom-made sandbox for executing the code our agent writes.

You can find the implementation of our Node.js sandbox server in the node-sandbox-mcp GitHub repository. It’s a Quarkus application written in Java that exposes itself as an stdio mcp-server and uses the awesome Testcontainers library to create and manage the sandbox containers programmatically.

An important detail is that you have full control over the sandbox configuration. We start the container with a common Node.js development image and, as a crucial security measure, disable its networking. But since it’s a custom MCP server, you can enable any security measures you deem necessary. 

Here’s a snippet of the Testcontainers-java code used to create the container:

GenericContainer sandboxContainer = new GenericContainer<>("mcr.microsoft.com/devcontainers/javascript-node:20")
.withNetworkMode("none") // disable network!!
.withWorkingDirectory("/workspace")
.withCommand("sleep", "infinity");

sandboxContainer.start();

Testcontainers provides a flexible, idiomatic API to interact with the sandbox. Running a command or writing a file becomes a simple one-line method call:

// To execute a command inside the sandbox
sandbox.execInContainer(command);

// To write a file into the sandbox
sandbox.copyFileToContainer(Transferable.of(contents.getBytes()), filename);

The actual implementation has a bit more glue code for managing background processes or selecting the correct sandbox if you’ve created multiple, but these one-liners are the core of the interaction.

Packaging and Using the Custom Server

To use our custom server, we first need to package it as a Docker image. For Quarkus applications, a single command does the trick:

./mvnw package -DskipTests=true -Dquarkus.container-image.build=true

This command produces a local Docker image and outputs its name, something like:

[INFO] [io.quarkus.container.image.docker.deployment.DockerProcessor] Built container image shelajev/node-sandbox:1.0.0-SNAPSHOT

Since we’re running everything locally, we don’t even need to push this image to a remote registry. You can inspect this image in Docker Desktop and find its hash, which we’ll use in the next step.

Integrating the Sandbox via the MCP Gateway

With our custom MCP server image ready, it’s time to plug it into the MCP Gateway. We’ll create a custom catalog file (mcp-gateway-catalog.yaml) that enables both the standard context7 server and our new node-code-sandbox.

Currently, creating this file is a manual process, but we’re working on simplifying it. The result is a portable catalog file that mixes standard and custom MCP servers.

Notice two key things in the configuration for the node-code-sandbox MCP server in the catalog:

longLived: true: This tells the gateway that our server needs to persist between the tool calls to track the sandbox’s state. 

image:: We reference the specific Docker image using its sha256 hash to ensure reproducibility.

If you’re building the custom server for the sandbox MCP, you can replace the image reference with the one your build step produced. 

longLived: true
image: olegselajev241/node-sandbox@sha256:44437d5b61b6f324d3bb10c222ac43df9a5b52df9b66d97a89f6e0f8d8899f67

Finally, we update our docker-compose.yml to mount this catalog file and enable both servers:

mcp-gateway:
# mcp-gateway secures your MCP servers
image: docker/mcp-gateway:latest
use_api_socket: true
command:
– –transport=sse
# add any MCP servers you want to use
– –servers=context7,node-code-sandbox
– –catalog=/mcp-gateway-catalog.yaml
volumes:
– ./mcp-gateway-catalog.yaml:/mcp-gateway-catalog.yaml:ro

When you run docker compose up, the gateway starts, which in turn starts our node-sandbox MCP server. When the agent requests a sandbox, a third container is launched – the actual isolated environment. 

You can use tools like Docker Desktop to inspect all running containers, view files, or even open a shell for debugging.

The Security Benefits of Containerized Sandboxes 

This containerized sandbox approach is a significant security win. Containers provide a well-understood security boundary with a smaller vulnerability profile than running random internet code on your host machine, and you can harden them as needed.

Remember how we disabled networking in the sandbox container? This means any code the agent generates cannot leak local secrets or data to the internet. If you ask the agent to run code that tries to access, for example, google.com, it will fail.

This demonstrates a key advantage: granular control. While the sandbox is cut off from the network, other tools are not. The context7 MCP server can still access the internet to fetch documentation, allowing the agent to write better code without compromising the security of the execution environment.

Oh, and a neat detail is that when you stop the containers managed by compose, it also kills the sandbox MCP server, and that in turn triggers Testcontainers to clean up all the sandbox containers, just like it cleans after a typical test run. 

Next Steps and Extensibility

This coding agent is a great starting point, but it isn’t production-ready. For a real-world application, you might want to grant controlled access to resources like the npm registry. You could, for example, achieve this by mapping your local npm cache from the host system into the sandbox. This way, you, the developer, control exactly which npm libraries are accessible.

Because the sandbox is a custom MCP server, the possibilities are endless. You can build it yourself, tweak it however you want, and integrate any tools or constraints you need.

Conclusion

In this post, we demonstrated how to build a secure and portable AI coding agent using Docker Compose and the MCP Toolkit. By creating a custom MCP server with Testcontainers, we built a sandboxed execution environment that offers granular security controls, like disabling network access, without limiting the agent’s other tools.  We connect this coding agent to Cerebras API, so we get incredible inference speed. This architecture provides a powerful and secure foundation for building your own AI agents. We encourage you to clone the repository and experiment with the code! You probably already have Docker and can sign up for a Cerebras API key here.
Quelle: https://blog.docker.com/feed/

MCP Security: A Developer’s Guide

Since its release by Anthropic in November 2024, Model Context Protocol (MCP) has gained massive adoption and is quickly becoming the connective tissue between AI agents and the tools, APIs, and data they act on. 

With just a few lines of configuration, an agent can search code, open tickets, query SaaS systems, or even deploy infrastructure. That kind of flexibility is powerful but it also introduces new security challenges. In fact, security researchers analyzing the MCP ecosystem found command injection flaws affecting 43% of analyzed servers. A single misconfigured or malicious server can exfiltrate secrets, trigger unsafe actions, or quietly change how an agent behaves. 

This guide is for developers and platform teams building with agents. We’ll unpack what makes MCP workflows uniquely risky for AI infrastructure, highlight common missteps like prompt injection or shadow tooling, and show how secure defaults, like containerized MCP servers and policy-based gateways, can help you govern every tool call without slowing your AI roadmap.

What is MCP security?

Model Context Protocol is a standardized interface that enables AI agents to interact with external tools, databases, and services. MCP security refers to the controls and risks that govern how agents discover, connect to, and execute MCP servers. These security risks span across the entire development lifecycle and involve:

Supply chain: how servers are packaged, signed, versioned, and approved.

Runtime isolation: how they’re executed on the host vs. in containers, with CPU/memory/network limits.

Brokered access: how calls are mediated, logged, blocked, or transformed in real time.

Client trust: which tools a given IDE/agent is allowed to see and use.

Why does MCP security matter?

Securing MCP workflows has become more important than ever because AI agents blur the line between “code” and “runtime.” A prompt or tool description can change what your system is capable of without a code release. 

This means that security practices have to move up a layer, from static analysis to policy over agent‑tool interactions. Docker codifies that policy in a gateway and makes secure defaults practical for everyday developers.

Docker’s approach is to make MCP both easy and safe through containerized execution, a policy‑enforcing MCP Gateway, and a curated MCP Catalog & Toolkit that helps teams standardize what agents can do. If you’re building with agents, this guide will help you understand the risks, why traditional tools fall short, and how Docker reduces blast radius without slowing your AI roadmap.

Understanding MCP security risks

While MCP risks can show up in various ways across the dev lifecycle, there are specific categories they typically fall into. The section below highlights how these risks surface in real workflows, their impact, and practical guardrails that mitigate without slowing teams down. 

Misconfigurations & weak defaults

Running servers directly on the host with broad privileges or a persistent state.

Unrestricted network egress from tools to the public internet.

Unvetted catalogs/registries in client configs, exposing agents to unknown tools.

No audit trail for tool calls-hard to investigate or respond.

Impact: Lateral movement, data exfiltration, and irreproducible behavior.

Mitigation: Always follow MCP server best practices such as leveraging containerization, applying resource and network limits, maintaining an allowlist of approved tools, and capturing call logs centrally.

Malicious or compromised servers (supply chain)

Typosquatting/poisoned images or unsigned builds.

Hidden side effects or altered tool metadata that nudges agents into risky actions.

Impact: Covert behavior change, credential theft, persistent access.

Mitigation: Require signature verification, pin versions/digests, and pull from curated sources such as the MCP Catalog & Toolkit.

Secret management failures

Plaintext credentials in environment variables, prompts, or tool arguments.

Leakage via tool outputs or model completions.

Impact: Account takeover, data loss.

Mitigation: Use managed secrets, minimize prompt exposure, and redact or block sensitive values at the broker.

Prompt injection & tool poisoning

Prompt injection: hostile content instructs the model to exfiltrate data or call dangerous tools.

Tool poisoning/shadowing: misleading tool descriptions or unexpected defaults that steer the agent.

Impact: Agents do the wrong thing, confidently.

Mitigation: Strict tool allowlists, pre/post‑call interceptors, and output filtering at the gateway. Docker’s MCP Gateway provides active security capabilities (signature checks, call logging, secret and network controls, interceptors).

What makes MCP security challenging?

Dynamic & non‑deterministic behavior: the same prompt may lead to different tool calls.

Instruction vs. data ambiguity: LLMs can treat content (including tool docs) as instructions.

Growing, shifting attack surface: every new tool expands what the agent can do instantly.

Traditional AppSec gaps: Static analysis tools don’t see agentic tool calls or MCP semantics; you need mediation between agents and tools, not just better prompts.

Implication for developers: You need a guardrail that lives at the agent–tool boundary, verifying what runs, brokering what’s allowed, and recording what happened.

How to prevent and mitigate MCP server security concerns

Use this practitioner checklist to raise the floor:

Containerize every MCP serverRun servers in containers (not on the host) with CPU/memory caps and a read‑only filesystem where possible. Treat each server as untrusted code with the least privilege necessary.Why it helps: limits blast radius and makes behavior reproducible.

Centralize enforcement at a gateway (broker)Place a policy‑enforcing gateway between clients (IDE/agent) and servers. Use it to:

Verify signatures before running servers.

Maintain a tool allowlist (only approved servers are discoverable).

Apply network egress controls and secret redaction.

Log requests/responses for audit and incident response.

Govern secrets end‑to‑endStore secrets in a managed system; avoid .env files. Prefer short‑lived tokens. Sanitize prompts and tool outputs to reduce exposure.

Defend the prompt layerUse pre‑call interceptors (argument/type checks, safety classifiers) and post‑call interceptors (redaction, PII scrub). Combine with strict tool scoping to reduce prompt‑injection blast radius.

Harden the supply chainPull servers from curated sources (e.g., MCP Catalog & Toolkit), require signatures, and pin to immutable versions.

Monitor and rehearseAlert on anomalous tool sequences (e.g., sudden credential access), and run tabletop exercises to rotate tokens and revoke access.

How Docker makes MCP security practical

Turning MCP security from theory into practice means putting guardrails where agents meet tools and making trusted servers easy to adopt for agentic workflows. Docker’s MCP stack does both: Docker Gateway enforces policy and observability on every call, while the Docker MCP Catalog & Toolkit curates, verifies, and versions the servers your team can safely use.

Docker MCP Gateway: Your enforcement point

The gateway sits between clients and servers to provide verification, policy, and observability for every tool call. It supports active security measures like signature verification, call logging, secret and network controls, and pre/post-interceptors so you can block or transform risky actions before they reach your systems. 

Learn more in Docker MCP Gateway: Unified, Secure Infrastructure for Agentic AI and the Gateway Active Security documentation.

Docker MCP Catalog & Toolkit: Curation and convenience

Use the MCP Catalog & Toolkit to standardize the servers your organization trusts. The catalog helps reduce supply‑chain risk (publisher verification, versioning, provenance) and makes it straightforward for developers to pull approved tools into their workflow. With a growing selection of 150+ curated MCP servers, MCP Catalog is a safe and easy way to get started with MCP.

Looking for a broader view of how Docker helps with AI development? Check out Docker for AI.

Putting it all Together: A practical flow

Choose servers from the Catalog and pin them by digest.

Register servers with the Gateway so clients only see approved tooling.

Enable active security: verify signatures, log all calls, redact/deny secrets, and restrict egress.

Add pre/post interceptors: validate arguments (before), redact/normalize outputs (after).

Monitor and tune: review call logs, alert on anomalies, rotate secrets, and update allowlists as new tools are introduced.

Conclusion

MCP unlocks powerful agentic workflows but also introduces new classes of risk, from prompt injection to tool poisoning and supply‑chain tampering. MCP security isn’t just better prompts; it’s secure packaging, verified distribution, and a brokered runtime with policy.

Key takeaways

Treat MCP as a governed toolchain, not just an SDK.

Put a policy gateway between agents and tools to verify, mediate, and observe.

Pull servers from the MCP Catalog & Toolkit and pin versions/digests.

Use active security features such as signature checks, interceptors, logging, and secret/network controls to reduce blast radius.

Learn more

Browse the MCP Catalog: Discover 200+ containerized, security-hardened MCP servers

Download the MCP Toolkit in Docker Desktop: Get immediate access to secure credential management and container isolation

Submit Your Server: Help build the secure, containerized MCP ecosystem. Check our submission guidelines for more.

Follow Our Progress: Star our repository for the latest security updates and threat intelligence
Quelle: https://blog.docker.com/feed/

The Nine Rules of AI PoC Success: How to Build Demos That Actually Ship

That study claiming “95% of AI POCs fail” has been making the rounds. It’s clickbait nonsense, and frankly, it’s not helping anyone. The real number? Nobody knows, because nobody’s tracking it properly. But here’s what I do know after years of watching teams build AI systems: the study masks a much more important problem.

Teams are confused about how to design POCs that survive beyond the demo stage. There is no playbook.

Most AI POCs die because they were designed to die. They’re built as disposable demos, optimized for executive presentations rather than production reality. They burn through cloud credits, rely on perfect conditions and perfectly structured data, and quickly collapse when real users start to touch them. If they don’t collapse then, often under scale they collapse when the design problems emerge under strain, leading to more serious failure.

But it doesn’t have to be this way. 

After watching hundreds of AI projects at Docker and beyond, I’ve seen the patterns that separate the 5% that make it from the 95% that don’t. Here’s the playbook I wish every platform and MLOps team had from day one.

The New Foundation: Remocal Workflows

Before we dive into the rules, let’s talk about the biggest shift in how successful teams approach AI development: remocal workflows (remote + local).

Running AI locally isn’t just about saving money—though it absolutely does that. It’s about maintaining developer velocity and avoiding the demo theater trap. Here’s how the best teams structure their work:

Test locally on laptops for fast iteration. No waiting for cloud resources, no surprise bills, no network latency killing your flow. The nature of building with AI should be making the process feel very interactive.

Burst to remote resources for scale testing, production-like validation, or when you actually need those H100s. It should feel easy to move AI workloads around.

Keep costs transparent from day one. You know exactly what each experiment costs because you’re only paying for remote compute when you choose to.

POCs that incorporate this pattern from day zero avoid both runaway bills and the classic “it worked in the demo” disaster. They’re grounded in reality because they’re built with production constraints baked in.

The Nine Rules of POC Survival

1. Start Small, Stay Small

Your first instinct is wrong. You don’t need the biggest model, the complete dataset, or every possible feature. Bite-sized everything: models that fit on a laptop, datasets you can actually inspect, and scope narrow enough that you can explain the value in one sentence.

Early wins compound trust. A small thing that works beats a big thing that might work.

2. Design for Production from Day Zero

Logging, monitoring, versioning, and guardrails aren’t “nice to haves” you add later. They’re the foundation that determines whether your POC can grow up to be a real system.

If your POC doesn’t have structured logging and basic metrics – observability –  from the first commit, you’re building a disposable demo, not a prototype of a production system.

3. Optimize for Repeatability and Model Improvement, Not Novelty

Infrastructure should be templated. Prompt testing should be in CI/CD. Model comparisons should be apples-to-apples benchmarks, not “it felt better this time.” What’s more, POC designs can and should assume existing model families will continue to rapidly improve. That includes larger context windows, greater accuracy, lower latency and smaller resource consumption.

The sexiest part of AI isn’t the novel algorithm—it’s how we’re learning to frame problems in ways  that make AI more reliable at scale.

4. Think in Feedback Loops

This is the big one that separates amateur hour from production-ready systems. Separate your non-deterministic AI components from your deterministic business logic. Build in layers of control and validation. Domain knowledge is still your magic ingredient.

In a remocal setup, this becomes natural: your agent loops can run locally for fast iteration, while tool execution and heavy compute burst to remote resources only when needed. You get reliability from layered control, not from hoping your model has a good day.

5. Solve Pain, Not Impress

Anchor everything to measurable business pain. Real users with real problems they’re willing to pay to solve. If your POC’s main value proposition is “look how cool this is,” you’re building the wrong thing.

Kill the vanity demos that only look good in slideware. Build the boring solutions that save people actual time and money.

6. Embed Cost and Risk Awareness Early

Track unit economics from day one. What does each request cost? Each user? Each workflow?

Benchmark small vs. large models. Cloud vs. local execution. Know your trade-offs with real numbers, not hand-waving about “cloud scale.”

7. Make Ownership Clear

Who owns this thing when it breaks at 2 AM? What are the SLAs? Who’s responsible for retraining the model? Who pays for the compute?

Don’t let POCs drift in the organizational void between research labs and operations teams. Assign owners, responsibilities, and lifecycle management from day one.

8. Control Costs Upfront

Transparent cost per request, user, and workflow. Hard budget caps and kill switches. No surprises in the monthly cloud bill.

Remocal workflows make this natural: you default to local execution and only burst remote when you consciously choose to spend money. Your costs are predictable because they’re intentional.

9. Involve Users From Day Zero

Co-design with real users, not executives who saw a ChatGPT demo and want “AI for everything.” Measure adoption and time saved, not just accuracy scores.

The best AI POCs feel like natural extensions of existing workflows because they were built with the people who actually do the work.

Why This Actually Matters

Most failed AI POCs never had a chance. They were too big, too expensive, too disconnected from real problems, and too optimized for demo day rather than daily use.

By flipping the script—starting small, designing for production, involving real users, and building on remocal workflows—you dramatically increase your odds of building something that ships and scales.

The difference between a successful AI POC and a failed one isn’t the sophistication of the model. It’s the boring engineering decisions you make on day zero.

Stop treating AI POCs as disposable demos. Treat them as the first draft of a production system.

Jim Clark is Principal Engineer for AI at Docker, where he helps teams build AI systems that actually make it to production. He’s spent the last decade watching the gap between AI demos and AI products, and occasionally bridging it.

Quelle: https://blog.docker.com/feed/

From Hallucinations to Prompt Injection: Securing AI Workflows at Runtime

How developers are embedding runtime security to safely build with AI agents

Introduction: When AI Workflows Become Attack Surfaces

The AI tools we use today are powerful, but also unpredictable and exploitable.

You prompt an LLM and it generates a Dockerfile. It looks correct. A shell script? Reasonable. You run it in dev. Then something breaks: a volume is deleted. A credential leaks into a log. An outbound request hits a production API. Nothing in your CI pipeline flagged it, because the risk only became real at runtime.

This is the new reality of AI-native development: fast-moving code, uncertain behavior, and an expanding attack surface.

Hallucinations in LLM output are only part of the story. As developers build increasingly autonomous agentic tools, they’re also exposed to prompt injection, jailbreaks, and deliberate misuse of model outputs by adversaries. A malicious user, through a cleverly crafted input, can hijack an AI agent and cause it to modify files, exfiltrate secrets, or run unauthorized commands.

In one recent case, a developer ran an LLM-generated script that silently deleted a production database, an issue that went undetected until customer data was already lost. In another, an internal AI assistant was prompted to upload sensitive internal documents to an external file-sharing site, triggered entirely through user input.

These failures weren’t caught in static analysis, code review, or CI. They surfaced only when the code ran.

In this post, we’ll explore how developers are addressing both accidental failures and intentional threats by shifting runtime security into the development loop, embedding observability, policy enforcement, and threat detection directly into their workflows using Docker.

The Hidden Risks of AI-Generated Code

LLMs and AI agents are great at generating text, but they don’t always know what they’re doing. Whether you’re using GitHub Copilot, LangChain, or building with OpenAI APIs, your generated outputs might include:

Shell scripts that escalate privileges or misconfigure file systems

Dockerfiles that expose unnecessary ports or install outdated packages

Infra-as-code templates that connect to production services by default

Hardcoded credentials or tokens hidden deep in the output

Command sequences that behave differently depending on the context

The problem is compounded when teams start running autonomous agents, AI tools designed to take actions, not just suggest code. These agents can:

Execute file writes and deletions

Make outbound API calls

Spin up or destroy containers

Alter configuration state mid-execution

Execute dangerous database queries

These risks only surface at runtime, after your build has passed and your pipeline has shipped. And that’s a problem developers are increasingly solving inside the dev loop.

Why Runtime Security Belongs in the Developer Workflow

Traditional security tooling focuses on build-time checks, SAST, SCA, linters, compliance scanners. These are essential, but they don’t protect you from what AI-generated agents do at execution time.

Developers need runtime security that fits their workflow, not a blocker added later.

What runtime security enables:

Live detection of dangerous system calls or file access

Policy enforcement when an agent attempts unauthorized actions

Observability into AI-generated code behavior in real environments

Isolation of high-risk executions in containerized sandboxes

Why it matters:

Faster feedback loops: See issues before your CI/CD fails

Reduced incident risk: Catch privilege escalation, data exposure, or network calls early

Higher confidence: Ship LLM-generated code without guesswork

Secure experimentation: Enable safe iteration without slowing down teams

Developer ROI: Catching a misconfigured agent in dev avoids hours of triage and mitigates production risk and reputation risk; saving time, cost, and compliance exposure.

Building Safer AI Workflows with Docker

Docker provides the building blocks to develop, test, and secure modern agentic applications:

Docker Desktop gives you an isolated, local runtime for testing unsafe code

Docker Hardened Images. Secure, minimal, production-ready images

Docker Scout scans container images for vulnerabilities and misconfigurations

Runtime policy enforcement (with upcoming MCP Defender integration) provides live detection and guardrails while code executes

Step-by-Step: Safely Test AI-Generated Scripts

1. Run your agent or script in a hardened container

docker run –rm -it
–security-opt seccomp=default.json
–cap-drop=ALL
-v $(pwd):/workspace
python:3.11-slim

Applies syscall restrictions and drops unnecessary capabilities

Runs with no persistent volume changes

Enables safe, repeatable testing of LLM output

2. Scan the container with Docker Scout

docker scout cves my-agent:latest

Surfaces known CVEs and outdated dependencies

Detects unsafe base images or misconfigured package installs

Available both locally and inside CI/CD workflows

3. Add runtime policy (beta) to block unsafe behavior

scout policy add deny-external-network
–rule "deny outbound to *"

This would catch an AI agent that unknowingly makes an outbound request to an internal system, third-party API, or external data store.

Note: Runtime policy enforcement in Docker Scout is currently in development. CLI and behavior may change upon release.

Best Practices for Securing AI Agent Containers

Practice

Why it matters

Use slim, verified base images

Minimizes attack surface and dependency drift

Avoid downloading from unverified sources

Prevents LLMs from introducing shadow dependencies

Use .dockerignore and secrets management

Keeps secrets out of containers

Run containers with dropped capabilities

Limits impact of unexpected commands

Apply runtime seccomp profiles

Enforces syscall-level sandboxing

Log agent behavior for analysis

Builds observability into experimentation

Integrating Into Your Cloud-Native Workflow

Runtime security for AI tools isn’t just for local testing, it fits cleanly into cloud-native and CI/CD workflows too.

GitHub Actions Integration Example:

jobs:
security-scan:
runs-on: ubuntu-latest
steps:
– uses: actions/checkout@v3
– name: Build container
run: docker build -t my-agent:latest .
– name: Scan for CVEs
run: docker scout cves my-agent:latest

Works across environments:

Local dev via Docker Desktop

Remote CI/CD via GitHub Actions, GitLab, Jenkins

Kubernetes staging environments with policy enforcement and agent isolation

Cloud Development Environments (CDEs) with Docker + secure agent sandboxes

Dev teams using ephemeral workspaces and Docker containers in cloud IDEs or CDEs can now enforce the same policies across local and cloud environments.

Real-World Example: AI-Generated Infra Gone Wrong

A platform team uses an LLM agent to auto-generate Kubernetes deployment templates. A developer reviews the YAML and merges it. The agent-generated config opens an internal-only service to the internet via LoadBalancer. The CI pipeline passes. The deploy works. But a customer database is now exposed.

Had the developer run this template inside a containerized sandbox with outbound policy rules, the attempt to expose the service would have triggered an alert, and the policy would have prevented escalation.

Lesson: You can’t rely on static review alone. You need to see what AI-generated code does, not just what it looks like.

Why This Matters: Secure-by-Default for AI-Native Dev Teams

As LLM-powered tools evolve from suggestion to action, runtime safety becomes a baseline requirement, not an optional add-on.

The future of secure AI development starts in the inner loop, with runtime policies, observability, and smart defaults that don’t slow you down.

Docker’s platform gives you:

Developer-first workflows with built-in security

Runtime enforcement to catch AI mistakes early

Toolchain integration across build, test, deploy

Cloud-native flexibility across local dev, CI/CD, and CDEs

Whether you’re building AI-powered automations, agent-based platforms, or tools that generate infrastructure, you need a runtime layer that sees what AI can’t, and blocks what it shouldn’t do.

What’s Next

Runtime protection is moving left, into your dev environment. With Docker, developers can:

Run LLM-generated code in secure, ephemeral containers

Observe runtime behavior before pushing to CI

Enforce policies that prevent high-risk actions

Reduce the risk of silent security failures in AI-powered apps

Docker is working to bring MCP Defender into our platform to provide this protection out-of-the-box, so hallucinations don’t turn into incidents.

Ready to Secure Your AI Workflow?

Sign up for early access to Docker’s runtime security capabilities

Watch our Tech Talk on “Building Safe AI Agents with Docker”

Explore Docker Scout for real-time vulnerability insights

Join the community conversation on Docker Community Slack or GitHub Discussions

Let’s build fast, and safely.

Quelle: https://blog.docker.com/feed/

Docker Acquisition of MCP Defender Helps Meet Challenges of Securing the Agentic Future

Docker, Inc.®, a provider of cloud-native and AI-native development tools, infrastructure, and services, today announced the acquisition of MCP Defender, a company founded to secure AI applications.

The rapid evolution of AI-from simple generative models to powerful agentic tools-has transformed software development in extraordinary ways. But as with all powerful technologies, new capabilities bring new security challenges. We recently highlighted critical MCP security issues on the Docker blog, emphasizing how essential it is to secure our emerging AI infrastructure. Building on that discussion, we want to offer our perspective on the current state of AI security, outline its trajectory, consider what this means for organizations developing AI agents and tools, and explore Docker’s vision for securely empowering these new AI workloads.

Today’s AI security landscape mirrors the early days of container adoption: rapid innovation, widespread enthusiasm, but significant uncertainty around risks. AI agents now routinely execute critical tasks – from automated code generation and system administration to customer interaction – often interfacing directly with sensitive data and critical infrastructure. The security stakes have never been higher.

Looking ahead, securing AI infrastructure will require a significant shift towards runtime monitoring, real-time threat detection, and continuous security evaluation. Organizations will increasingly adopt tools designed specifically to detect and respond dynamically to threats occurring at runtime. Instead of relying solely on preventative measures, security strategies will embrace active monitoring and intelligent automation.

For companies developing AI agents and MCP tools, these security shifts are profound. Security can no longer be a late-stage consideration-it must be embedded from the earliest design phase. These solutions must transparently enforce policies, providing clear guardrails that reduce the cognitive load on development teams. Security for AI agents should be frictionless – built seamlessly into the workflows developers already use every day.

Docker’s mission has always been to simplify application development while ensuring security and portability. Extending that mission to agentic AI means integrating security deeply into the infrastructure itself. Docker’s vision is clear: secure-by-default AI infrastructure where every interaction is automatically verified, every threat proactively detected, and every policy transparently enforced. Docker’s commitment to security extends beyond AI, with products such as Docker Scout and Docker Hardened Images.

Quelle: https://blog.docker.com/feed/