Compose your way with Provider services!

With the release of Docker Compose v2.36.0, we’re excited to introduce a powerful new feature: provider services. This extension point opens up Docker Compose to interact not only with containers but also with any kind of external system, all while keeping the familiar Compose file at the center of the workflow.

In this blog post, we’ll walk through what provider services are, how developers can use them to streamline their workflows, how the provider system works behind the scenes, and how you can build your own provider to extend Compose for your platform needs.

Why Provider Services Are a Game-Changer

Docker Compose has long been a favorite among developers for orchestrating multi-container applications in a simple and declarative way. But as development environments have become more complex, the need to integrate non-container dependencies has become a common challenge. Applications often rely on managed databases, SaaS APIs, cloud-hosted message queues, VPN tunnels, or LLM inference engines — all of which traditionally sit outside the scope of Compose.

Developers have had to resort to shell scripts, Makefiles, or wrapper CLIs to manage these external components, fragmenting the developer experience and making it harder to onboard new contributors or maintain consistent workflows across teams.

Provider services change that. By introducing a native extension point into the Compose, developers can now define and manage external resources directly in their compose.yaml. Compose delegates their lifecycle to the provider binary, coordinating with it as part of its own service lifecycle.

This makes Docker Compose a more complete solution for full-stack, platform-aware development — from local environments to hybrid or remote setups.

Using a Provider Service in Your Compose File

Provider services are declared like any other Compose service, but instead of specifying an image, you specify a provider with a type, and optionally some options. The type must correspond to the name of a binary available in your $PATH that implements the Compose provider specification.

As an example we will use the Telepresence provider plugin, which routes Kubernetes traffic to a local service for live cloud debugging. This is especially useful for testing how a local service behaves when integrated into a real cluster:

In this setup, when you run docker compose up, Compose will call the compose-telepresence plugin binary. The plugin performs the following actions:

Up Action:

Check if the Telepresence traffic manager is installed in the Kubernetes cluster, and install it if needed.

Establish an intercept to re-route traffic from the specified Kubernetes service to the local service.

Down Action:

Remove the previously established intercept.

Uninstall the Telepresence traffic manager from the cluster.

Quit the active Telepresence session.

The structure and content of the options field are specific to each provider. It is up to the plugin author to define and document the expected keys and values.If you’re unsure how to properly configure your provider service in your Compose file, the Compose Language Server (LSP) can guide you step by step with inline suggestions and validation.

You can find more usage examples and supported workflows in the official documentation: https://docs.docker.com/compose/how-tos/provider-services/

How Provider Services Work Behind the Scenes

Under the hood, when Compose encounters a service using the provider key, it looks for an executable in the user’s $PATH matching the provider type name (e.g. docker-model cli plugin or compose-telepresence). Compose then spawns the binary and passes the service options as flags, allowing the provider to receive all required configuration via command-line arguments.

The binary must respond to JSON-formatted requests on stdin and return structured JSON responses on stdout.

Here’s a diagram illustrating the interaction:

Communication with Compose

Compose send all the necessary information to the provider binary by transforming all the options attributes as flags. It also passes the project and the service name. If we look at the compose-telepresence provider example, on the up command Compose will execute the following command:

$ compose-telepresence compose –project-name my-project up –name api –port 5732:api-80 –namespace avatars –service api dev-api

On the other side, providers can also send runtime messages to Compose:

info: Reports status updates. Displayed in Compose’s logs.

error: Reports an error. Displayed as the failure reason.

setenv: Exposes environment variables to dependent services.

debug: Debug messages displayed only when running Compose with -verbose.

This flexible protocol makes it easy to add new types and build rich provider integrations.

Refer to the official protocol spec for detailed structure and examples.

Building Your Own Provider Plugin

The real power of provider services lies in their extensibility. You can write your own plugin, in any language, as long as it adheres to the protocol.

A typical provider binary implements logic to handle a compose command with up and down subcommands.

The source code of compose-telepresence-plugin will be a good starting point. This plugin is implemented in Go and wraps the Telepresence CLI to bridge a local dev container with a remote Kubernetes service.

Here’s a snippet from its up implementation:

This method is triggered when docker compose up is run, and it starts the service by calling the Telepresence CLI based on the received options.

To build your own provider:

Read the full extension protocol spec

Parse all the options as flags to collect the whole configuration needed by the provider

Implement the expected JSON response handling over /stdout

Don’t forget to add debug messages to have as many details as possible during your implementation phase.

Compile your binary and place it in your $PATH

Reference it in your Compose file using provider.type

You can build anything from service emulators to remote cloud service starters. Compose will automatically invoke your binary as needed.

What’s Next?

Provider services will continue to evolve, future enhancements will be guided by real-world feedback from users to ensure provider services grow in the most useful and impactful directions.

Looking forward, we envision a future where Compose can serve as a declarative hub for full-stack dev environments,  including containers, local tooling, remote services, and AI runtimes.

Whether you’re connecting to a cloud-hosted database, launching a tunnel, or orchestrating machine learning inference, Compose provider services give you a native way to extend your dev environment, no wrappers, no hacks.

Let us know what kind of providers you’d like to build or see added. We can’t wait to see how the community takes this further.

Stay tuned and happy coding!

Quelle: https://blog.docker.com/feed/

Introducing Docker Hub MCP Server: A New Way to Discover, Inspect, and Manage Container Images

Docker Hub has become an essential resource for developers worldwide, serving 11 billion monthly image downloads and hosting over 14 million container images. As agentic AI proliferates, we’re seeing a significant shift in how developers work. More and more developers are leveraging MCP (Model Context Protocol) servers to power their AI agents and automated workflows. These agentic systems can connect to real data and provide intelligent recommendations, but they need access to comprehensive, structured information to be truly effective.

Recognizing this shift, we’re adapting to meet developers where they are. Today, we’re excited to introduce the Docker Hub MCP Server, a tool that bridges the gap between Docker Hub’s vast catalog of container images and the intelligent capabilities of LLMs. By leveraging MCP, this server transforms how developers discover, evaluate, and manage container images, making the entire process more intuitive and efficient.

What is the Docker Hub MCP Server?

The Docker Hub MCP Server is an MCP server that interfaces with Docker Hub APIs to make them accessible to LLMs, enabling intelligent content discovery and repository management. Developers building with containers, especially in AI and LLM-powered workflows, often find that LLMs lack the context needed to select the right image from Docker Hub’s vast catalog. As a result, LLMs struggle to recommend the right images, and developers lose time manually searching instead of building.

Figure 1: The Docker Hub MCP server empowers LLMs and agents with secure content discovery and streamlined repository management.

The Docker Hub MCP Server streamlines your workflow through:

Frictionless setup: No complex local installation or runtime configuration needed. Just one-click install through the MCP Catalog and Toolkit, and you’re ready to go.

Intelligent container image discovery: Provides LLMs with detailed, structured context for Docker Hub images while enabling natural language image discovery — just describe what you need instead of remembering complex tags or repository names.

Simplified Repository Management: Hub MCP Server enables agents to manage repositories through natural language, fetching image details, viewing stats, searching content, and performing key operations quickly and easily.

Enable Docker Hub MCP Server in MCP Toolkit

From the MCP Toolkit menu, select the Catalog tab, search for Docker Hub, and select the plus icon to add the Docker Hub MCP server.

In the server’s Configuration tab, insert your Docker Hub username and personal access token (PAT).

Use Gordon / Docker AI

You can configure Gordon to be a host that can interact with the Docker Hub MCP server.

In the Clients tab in MCP Toolkit, ensure Gordon is connected.

From the Ask Gordon menu, you can now send requests related to your Docker Hub account, in accordance with the tools provided by the Docker Hub MCP server. To test it, ask Gordon: List all repositories in my namespace

Figure 2: Enabling Docker Hub MCP Server in MCP Toolkit

Use Claude Desktop

To configure the Docker Hub MCP Server with Claude Desktop, you don’t need to install the server directly in Claude Desktop. Instead, install the MCP Server using the MCP Toolkit, then add Claude Desktop as a client.

From the Clients tab, select Connect next to Claude Desktop. Restart Claude Desktop if it’s running, and it can now access all the servers in the MCP Toolkit.

Use VS Code

To configure the Docker Hub MCP Server with VS Code, follow the setup instructions in the official documentation.

What’s next?

This is just the beginning for the Docker Hub MCP Server. We’re continuing to expand its capabilities to make container workflows even more intelligent, efficient, and developer-friendly. Keep an eye on the Docker blog for deeper dives, tips, and the latest product announcements.

Learn more

Get the Docker Hub MCP Server from the Docker MCP Catalog.

Visit the repository on GitHub.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

From Dev to Deploy: Compose as the Spine of the Application Lifecycle

Nobody wants a spineless application development process. What do I mean by this? The spine is the backbone that supports and provides nerve channels for the human body. Without it, we would be floppy, weaker, and would struggle to understand how our extremities were behaving. A slightly tortured analogy, but consider the application lifecycle of the average software project. The traditional challenge has been, how do we give it a spine? How can we provide a backbone to support developers at every stage and a nerve channel to pass information back and forth, thereby cementing architectural constructs and automating or simplifying all the other processes required for modern applications?

We built Docker Compose specifically to be that spine, providing the foundation for an application from its inception in local development through testing and on to final deployment and maintenance as the application runs in the wild and interacts with real users. With Docker Compose Bridge, Docker Compose filled out the last gaps in full application lifecycle management. Using Compose Bridge, teams can now, with a single Compose file, take a multi-container, multi-tiered application from initial code and development setup all the way to production deployment in Kubernetes or other container orchestration systems.

Before and After: How Docker Compose Adds the Spine and Simplifies AppDev

So what does this mean in practice? Let’s take a “Before” and “After” view of how the spine of Docker Compose changes application lifecycle processes for the better. Imagine you’re building a customer-facing SaaS application—a classic three-tier setup:

Go API handling user accounts, payments, and anti-fraud check

PostgreSQL + Redis for persistence and caching

TypeScript/React UI that customers log into and interact with

You are deploying to Kubernetes because you want resilience, portability, and flexibility. You’ll deploy it across multiple regions in the cloud for low latency and high availability. Let’s walk through what that lifecycle looks like before and after adopting Docker Compose + Compose Bridge.

Before: The Local Development “Works on My Machine” Status Quo

Without Compose, you set up five separate containers with a messy sprawl that might look something that looks like this:

docker network create saas-net
docker run -d –name postgres –network saas-net
-e POSTGRES_PASSWORD=secret postgres:16
docker run -d –name redis –network saas-net redis:7
docker run -d –name go-api –network saas-net
-e DB_URL=postgres://postgres:secret@postgres/saasdb
-p 8080:8080 go-saas-api:latest
docker run -d –name payments –network saas-net payments-stub:1.2
docker run -d –name fraud –network saas-net anti-fraud:latest
docker run -d –name saas-ui –network saas-net
-p 3000:3000 saas-ui:latest

You can certainly automate the setup process with a script. But that would mean everyone else you are working with would need the same script to replicate your setup. You would also need to ensure that they all have the same updated script. And that’s not the end of it. Before Compose, setting up even a basic multi-service stack meant manually crafting networks and links—typically running docker network create and then launching each container with –network to stitch them together (see Docker run network options). Onboarding new developers only made matters worse: your README would balloon with dozens of flags and environment-variable examples, and inevitably, someone would mistype a port or misspell a variable name. Meanwhile, security and compliance tended to be afterthoughts.

There would be no standard WAF or API gateway in front of your services. In many instances, secrets were scattered in plain .env files, and you would have no consistent audit logging to prove who accessed what and who made what changes. Then, for debugging, you manually spin up phpPgAdmin; for observability, you install Prometheus and Jaeger on an ad-hoc basis. For vulnerability scanning, you would pull down Docker Scout each time. Both debugging and scanning would drag you outside your core workflow and break your vibe.

After: One Line for Universal Local Environment

Remember those five containers you had to set up individually? Now, your Docker Compose “Spine” carries the message and structure to automatically set all those up for you with a single command and a single file (compose.yaml)

The resulting YAML pulls down and lists in a readable format the entire setup (database, cache, API, UI/UX) of your setup, all living on a shared network with security, observability, and any other necessary services already in place. Not only does this save time and ensure consistency, but it also greatly boosts security (manual config error remains one of the leading sources of security breaches, according to the Verizon 2025 DBIR Report). This also standardizes all mounts and ports, ensuring secrets are treated uniformly. For compliance and artifact provenance, all audit logs are automatically mounted for local compliance checks.

Compose also makes debugging and hardening apps locally easier for developers who don’t want to think about setting up debug services. With Compose, the developer or platform team can add a debug profile that invokes a host of debug services (Prometheus for metrics, OpenTelemetry for distributed tracing, Grafana for dashboards, ModeSec for firewall rules). That said, you don’t want to add debug services to production apps in Kubernetes.

Enter Compose Bridge. This new addition to Docker Compose incorporates environmental awareness into all services, removing those that should not be deployed in production, and provides a clean Helm Chart or YAML manifest for production teams. So application developers don’t need to worry about stripping service calls before throwing code over the fence. More broadly, Compose Bridge enforces:

Clean separation – production YAML stays lean, with no leftover debug containers or extra resource definitions.

Conditional inclusion – Bridge reads profiles: settings and injects the right labels, annotations, and sidecars only when you ask for them.

Consistent templating – Bridge handles the profile logic at generation time, so all downstream manifests conform to stage and environment-specific policies and naming conventions

The result? Platform Operations teams can maintain different Docker Compose templates for various application development teams, keeping everyone on the established paths while providing customization where needed. Application Security teams can easily review or scan standardized YAML files to simplify policy adherence across configuration verification, secret handling, and services accessed.

Before:  CI & Testing Lead to Script Sprawl and Complexity

Application developers pass their code off to the DevOps team (or have the joy of running the CI/CD gauntlet themes). Teams typically wire up their CI tool (Jenkins, GitLab CI, GitHub Actions, etc.) to run shell-based workflows. Any changes to the application, like renaming a service, adding a dependency, adjusting a port, or adding a new service, mean editing those scripts or editing every CI step that invokes them. In theory, GitOps means automating much of this. In practice, the complexity is thinly buried and the system lacks, for better or for worse, a nervous system along the spine. The result? Builds break, tests fail, and the time to launch a new version and incorporate new code lengthens. Developers are inherently discouraged from shipping code faster because they know there’s a decent chance that even when everything shows green in their local environment tests, something will break in CI/CD. This dooms them to unpleasant troubleshooting ordeals. Without a nervous system along the spine to share information and easily propagate necessary changes, application lifecycles are more chaotic, less secure and less efficient. 

After: CI & Testing Run Fast, Smooth and Secure

After adopting Docker Compose as your application development spine, your CI/CD pipeline becomes a concise, reliable sequence that mirrors exactly what you run locally. A single compose.yaml declares every component so your CI job simply brings up the entire stack with docker compose up -d, orchestrating startup order and health checks without custom scripts or manual delays. You invoke your tests in the context of a real multi-container network via docker compose exec, replacing brittle mocks with true integration and end-to-end validation. When testing is complete, docker compose down tears down containers, networks, and volumes in one step, guaranteeing a clean slate for every build. Because CI consumes exactly the same manifest developers use on their workstations, feedback loops shrink to minutes, and promotions to staging or production require fewer (and often no) manual configuration tweaks.

Compose Bridge further elevates this efficiency and hardens security. After running tests, Bridge automatically converts your Docker Compose YAML file into Kubernetes manifests or a Helm chart, injecting network policies, security contexts, runtime protection sidecars, and audit log mounts based on your profiles and overlays. There’s no need for separate scripts or manual edits to bake in contract tests, policy validations, or vulnerability scanners. Your CI job can commit the generated artifacts directly to a GitOps repository, triggering an automated, policy-enforced rollout across all environments. This unified flow eliminates redundant configuration, prevents drift, and removes human error, turning CI/CD from a fragile sequence into a single, consistent pipeline.

Before: Production and Rollbacks are Floppy and Floundering 

When your application leaves CI and enters production, the absence of a solid spine becomes painfully clear. Platform teams must shoulder the weight of multiple files — Helm charts, raw manifests for network segmentation, pod security, autoscaling, ingress rules, API gateway configuration, logging agents, and policy enforcers. Each change ripples through, requiring manual edits in three or more places before nerves can carry the signal to a given cluster. There is no central backbone to keep everything aligned. A simple update to your service image or environment variable creates a cascade of copy-and-paste updates in values.yaml, template patches, and documentation. If something fails, your deployment collapses and you start manual reviews to find the source of the fault. Rolling back demands matching chart revisions to commits and manually issuing helm rollback. Without a nervous system to transmit clear rollback signals, each regional cluster becomes its own isolated segment. Canary and blue-green releases require separate, bespoke hooks or additional Argo CD applications, each one a new wrinkle in coordination. This floppy and floundering approach leaves your production lifecycle weak, communication slow, and the risk of human error high. The processes meant to support and stabilize your application instead become sources of friction and uncertainty, undermining the confidence of both engineering and operations teams.

After: Production and Rollbacks are Rock Solid

With Docker Compose Bridge acting as your application’s spinal cord, production and rollbacks gain the support and streamlined communication they’ve been missing. Your single compose.yaml file becomes the vertebral column that holds every service definition, environment variable, volume mount, and compliance rule in alignment. When you invoke docker compose bridge generate, the Bridge transforms that backbone into clean Kubernetes manifests or a Helm chart, automatically weaving in network policies, pod security contexts, runtime protection sidecars, scaling rules, and audit-log mounts. There is no need for separate template edits. Changes made to the Compose file propagate in real-time through all generated artifacts. Deployment can be as simple as committing the updated Compose file to your GitOps repository. Argo CD or Flux then serves as the extended nervous system, transmitting the rollout signal across every regional cluster in a consistent, policy-enforced manner. If you need to reverse course, reverting the Compose file acts like a reflex arc: Bridge regenerates the previous manifests and GitOps reverts each cluster to its prior state without manual intervention. Canary and blue-green strategies fit naturally into this framework through Compose profiles and Bridge overlays, eliminating the need for ad-hoc hooks. Your production pipeline is no longer a loose bundle of scripts and templates but a unified, resilient spine that supports growth, delivers rapid feedback, and ensures secure, reliable releases across all environments.

A Fully Composed Spine for the Full Lifecycle

To summarize, Docker Compose and Compose Bridge give your application a continuous spine running from local development through CI / CD, security validation and multi-region Kubernetes rollout. You define every service, policy and profile once in a Compose file, and Bridge generates production ready manifests with network policies, security contexts, telemetry, database, API and audit-log mounts already included. Automated GitOps rollouts and single-commit rollbacks make deployments reliable and auditable and fast. This helps application developers focus on features instead of plumbing, gives AppSec consistent policy enforcement, allows SecOps to maintain standardized audit trails, helps PlatformOps simplify operations and delivers faster time to market with reduced risk for the business.

Ready to streamline your pipeline and enforce security? Give it a try in your next project by defining your stack in Compose, then adding Bridge to automate manifest generation and GitOps rollouts.
Quelle: https://blog.docker.com/feed/

Docker Desktop 4.43: Expanded Model Runner, Reimagined MCP Catalog, MCP Server Submissions, and Smarter Gordon

Docker Desktop 4.43 just rolled out a set of powerful updates that simplify how developers run, manage, and secure AI models and MCP tools. 

Model Runner now includes better model management, expanded OpenAI API compatibility, and fine-grained controls over runtime behavior. The improved MCP Catalog makes it easier to discover and use MCP servers, and now supports submitting your own MCP servers! Meanwhile, the MCP Toolkit streamlines integration with VS Code and GitHub, including built-in OAuth support for secure authentication. Gordon, Docker’s AI agent, now supports multi-threaded conversations with faster, more accurate responses. And with the new Compose Bridge, you can convert local compose.yaml files into Kubernetes configuration in a single command. 

Together, these updates streamline the process of building agentic AI apps and offer a preview of Docker’s ongoing efforts to make it easier to move from local development to production.

New model management commands and expanded OpenAI API support in Model Runner

This release includes improvements to the user interface of the Docker Model Runner, the inference APIs, and the inference engine under the hood.

Starting with the user interface, developers can now inspect models (including those already pulled from Docker Hub and those available remotely in the AI catalog) via model cards available directly in Docker Desktop. Below is a screenshot of what the model cards look like:

Figure 1: View model cards directly in Docker Desktop to get an instant overview of all variants in the model family and their key features.

In addition to the GUI changes, the docker model command adds three new subcommands to  help developers inspect, monitor, and manage models more effectively:

docker model ps: Show which models are currently loaded into memory

docker model df: Check disk usage for models and inference engines

docker model unload: Manually unload a model from memory (before its idle timeout)

For WSL2 users who enable Docker Desktop integration, all of the docker model commands are also now available from their WSL2 distros, making it easier to work with models without changing your Linux-based workflow.

On the API side, Model Runner now offers additional OpenAI API compatibility and configurability. Specifically, tools are now supported with {“stream”: “true”}, making agents built on Docker Model Runner more dynamic and responsive. Model Runner’s API endpoints now support OPTIONS calls for better compatibility with existing tooling. Finally, developers can now configure CORS origins in the Model Runner settings pane, offering better compatibility and control over security. 

Figure 2: CORS Allowed Origins are now configurable in Docker Model Runner settings, giving developers greater flexibility and control.

For developers who need fine-grained control over model behavior, we’re also introducing the ability to set a model’s context size and even the runtime flags for the inference engine via Docker Compose, for example:

services:
mymodel:
provider:
type: model
options:
model: ai/gemma3
context-size: 8192
runtime-flags: "–no-prefill-assistant"

In this example, we’re using the (optional) context-size and runtime-flags parameters to control the behavior of the inference engine underneath. In this case, the associated runtime is the default (llama.cpp), and you can find a list of flags here. Certain flags may override the stable default configuration that we ship with Docker Desktop, but we want users to have full control over the inference backend. It’s also worth noting that a particular model architecture may limit the maximum context size. You can find information about maximum context lengths on the associated model cards on Docker Hub.

Under the hood, we’ve focused on improving stability and usability. We now have better error reporting in the event that an inference process crashes, along with more aggressive eviction of crashed engine processes. We’ve also enhanced the Docker CE Model Runner experience with better handling of concurrent usage and more robust support for model providers in Compose on Docker CE.

MCP Catalog & Toolkit: Secure, containerized AI tools at scale

New and redesigned MCP Catalog 

Docker’s MCP Catalog now features an improved experience, making it easier to search, discover, and identify the right MCP servers for your workflows. You can still access the catalog through Docker Hub or directly from the MCP Toolkit in Docker Desktop, and now, it’s also available via a dedicated web link for even faster access. 

Figure 3: Quickly find the right MCP server for your agentic app and use the new Catalog to browse by specific use cases.

The MCP Catalog currently includes over 100 verified, containerized tools, with hundreds more on the way. Unlike traditional npx or uvx workflows that execute code directly on your host, every MCP server in the catalog runs inside an isolated Docker container. Each one includes cryptographic signatures, a Software Bill of Materials (SBOM), and provenance attestations. 

This approach eliminates the risks of running unverified code and ensures consistent, reproducible environments across platforms. Whether you need database connectors, API integrations, or development tools, the MCP Catalog provides a trusted, scalable foundation for AI-powered development workflows that move the entire ecosystem away from risky execution patterns toward production-ready, containerized solutions.

Submit your MCP Server to the Docker MCP Catalog

We’re launching a new submission process, giving developers flexible options to contribute by following the process here.  Developers can choose between two options: Docker-Built and Community-Built servers. 

Docker-Built Servers 

When you see “Built by Docker,” you’re getting our complete security treatment. We control the entire build pipeline, providing cryptographic signatures, SBOMs, provenance attestations, and continuous vulnerability scanning.

Community-Built Servers 

These servers are packaged as Docker images by their developers. While we don’t control their build process, they still benefit from container isolation, which is a massive security improvement over direct execution.

Docker-built servers demonstrate the gold standard for security, while community-built servers ensure we can scale rapidly to meet developer demand. Developers can change their mind after submitting a community-built server and opt to resubmit it as a Docker-built server. 

Get your MCP server featured in the Docker MCP Catalog today and reach over 20 million developers. Learn more about our new MCP Catalog in our announcement blog and get insights on best practices on building, running, and testing MCP servers.  Join us in building the largest library of secure, containerized MCP servers! .

MCP Toolkit adds OAuth support and streamlined Integration with GitHub and VS Code

Many MCP servers’ credentials are passed as plaintext environment variables, exposing sensitive data and increasing the risk of leaks. The MCP Toolkit eliminates that risk with secure credential storage, allowing clients to authenticate with MCP servers and third-party services without hardcoding secrets. We’re taking it a step further with OAuth support, starting with the most widely used developer tool, GitHub. This will make it even easier to integrate secure authentication into your development workflow.

Figure 4: OAuth is now supported for the GitHub MCP server.

To set up your GitHub MCP server, go to the OAuth tab, connect your GitHub account, enable the server, and authorize OAuth for secure authentication.

Figure 5: Go to the configurations tab of the GitHub MCP servers to enable OAuth for secure authentication

The MCP Toolkit allows you to connect MCP servers to any MCP client, with one-click connection to popular ones such as Claude and Cursor. We are also making it easier for developers to connect to VSCode with the docker mcp client connect vscode command. When run in your project’s root folder, it creates an mcp.json configuration file in your .vscode folder. 

Figure 6: Connect to VS Code via MCP commands in the CLI.

Additionally, you can also configure the MCP Toolkit as a global MCP server available to VSCode by adding the following config to your user settings. Check out this doc for more details. Once connected, you can leverage GitHub Copilot in agent mode with full access to your repositories, issues, and pull requests.

"mcp": {
"servers": {
"MCP_DOCKER": {
"command": "docker",
"args": [
"mcp",
"gateway",
"run"
],
"type": "stdio"
}
}
}

Gordon gets smarter: Multi-threaded conversations and 5x faster performance

Docker’s AI Agent Gordon just got a major upgrade: multi-threaded conversation support. You can now run multiple distinct conversations in parallel and switch between topics like debugging a container issue in one thread and refining a Docker Compose setup in another, without losing context. Gordon keeps each thread organized, so you can pick up any conversation exactly where you left off.

Gordon’s new multi-threaded capabilities work hand-in-hand with MCP tools, creating a powerful boost for your development workflow. Use Gordon alongside your favorite MCP tools to get contextual help while keeping conversations organized by task. No more losing focus to context switching!

Figure 7: Gordon’s new multi-threaded support cuts down on context switching and boosts productivity.

We’ve also rolled out major performance upgrades, Gordon now responds 5x faster and delivers more accurate, context-aware answers. With improved understanding of Docker-specific commands, configurations, and troubleshooting scenarios, Gordon is smarter and more helpful than ever!

Compose Bridge: Seamlessly go from local Compose to Kubernetes 

We know that developers love Docker Compose for managing local environments—it’s simple and easy to understand. We’re excited to introduce Compose Bridge to Docker Desktop. This new powerful feature helps you transform your local compose.yaml into Kubernetes configuration with a single command.

Translate Compose to Kubernetes in seconds

Compose Bridge gives you a streamlined, flexible way to bring your Compose application to Kubernetes. With smart defaults and options for customization, it’s designed to support both simple setups and complex microservice architectures.

All it takes is:

docker compose bridge convert

And just like that, Compose Bridge generates the following Kubernetes resources from your Compose file:

A Namespace to isolate your deployment

A ConfigMap for every Compose config entry

Deployments for running and scaling your services

Services for exposed and published ports—including LoadBalancer services for host access

Secrets for any secrets in your Compose file (encoded for local use)

NetworkPolicies that reflect your Compose network topology

PersistentVolumeClaims using Docker Desktop’s hostpath storage

This approach replicates your local dev environment in Kubernetes quickly and accurately, so you can test in production-like conditions, faster.

Built-in flexibility and upcoming enhancements

Need something more customized? Compose Bridge supports advanced transformation options so you can tweak how services are mapped or tailor the resulting configuration to your infrastructure.

And we’re not stopping here—upcoming releases will allow Compose Bridge to generate Kubernetes config based on your existing cluster setup, helping teams align development with production without rewriting manifests from scratch.

Get started

You can start using Compose Bridge today:

Download or update Docker Desktop

Open your terminal and run:

Review the documentation to explore customization options

docker compose bridge convert

Conclusion 

Docker Desktop 4.43 introduces practical updates for developers building at the intersection of AI and cloud-native apps. Whether you’re running local models, finding and running secure MCP servers, using Gordon for multi-threaded AI assistance, or converting Compose files to Kubernetes, this release cuts down on complexity so you can focus on shipping. From agentic AI projects to scaling workflows from local to production, you’ll get more control, smoother integration, and fewer manual steps throughout.

Learn more

Learn more about our new MCP Catalog. 

Submit your MCP servers to the MCP Catalog. 

Authenticate and update today to receive your subscription level’s newest Docker Desktop features.

Subscribe to the Docker Navigator Newsletter.

Learn about our sign-in enforcement options.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

5 Best Practices for Building, Testing, and Packaging MCP Servers 

We recently launched a new, reimagined Docker MCP Catalog with improved discovery and a new submission process. Containerized MCP servers offer a secure way to run and scale agentic applications and minimize risks tied to host access and secret management. Developers can submit servers in two ways: Docker-built servers, which include our full security suite (signatures, SBOMs, attestations, and continuous scanning), or community-built servers, which are built and maintained by developers using their own Docker images.

In this blog, we’ll share 5 MCP server best practices for designing, testing, and packaging them for submission. These recommendations are based on our experience building and helping developers build over 100 MCP servers for the Docker MCP Catalog. They’ll help you streamline the submission process, reach over 20 million Docker developers, and deliver real utility to both agents and the developers who use them.

1. Manage your agent’s tool budget intentionally

“Tool Budget” is our internal term for the number of tools an agent can handle effectively. Like any budget, managing it well is key to a good user experience. As the creator of an MCP server, it’s important to consider that offering too many tools can make your server more complex and costly to use, potentially turning users away. Some AI agents now allow users to selectively enable tools, helping keep the experience streamlined. But the better strategy is to design your toolset around clear use cases and avoid mapping every API endpoint to a separate tool.

For example, when creating an MCP server to access your API, you might be tempted to make one tool for each of the API’s endpoints. While that’s a quick way to get started, it often results in an overloaded toolset that discourages adoption.

So, if one tool per endpoint isn’t ideal, how do you design a better MCP server?

This is where the MCP server prompts come in. Think of them like Macros. Instead of requiring users to call multiple tools, you can create a single prompt that chains multiple tools or endpoint calls behind the scenes. That way, a user can simply ask the agent to “fetch my user’s invoices,” and the agent can handle the complexity internally, calling two or three tools without exposing the overhead.

2. The end user of the tool is the agent/LLM

One important point often overlooked: it’s the agent or LLM, not the end user, that actually uses the tool. The user enables the tool, but the agent is the one calling it. Why does this matter? When you’re building an MCP server, you’re not interfacing directly with users. You’re building for the agent that acts on their behalf.

Error handling is one area where we’ve consistently seen developers run into issues. If your tool returns error messages meant for humans, you might not provide the user experience you think. The agent, not the user, is the one calling your tool, and there’s no guarantee it will pass the error message back to the user.

Agents are designed to complete tasks. When something fails, they’ll often try a different approach. That’s why your error handling should help the agent decide what to do next, not just flag what went wrong. Instead of “You don’t have access to this system”, return something along the lines of “To have access to this system, the MCP server needs to be configured with a valid API_TOKEN, the current API_TOKEN is not valid”. 

What you’re doing here is informing the agent that access to the third-party system isn’t possible due to a misconfiguration, not because access is denied outright. The distinction matters: the lack of access is a result of the user not properly configuring the MCP server, not a hard permission issue.

3. Document for humans and agents! 

This brings us to an equally important point: documentation!

When writing for your MCP server, remember you’re serving two audiences: the end users and the AI agent. As we saw with error handling, it’s critical to understand the needs of both.

Your documentation should address each audience clearly. End users want to know why they should use your MCP server, what problems it solves and how it fits into their workflow. Agents, on the other hand, rely on well-written tool names and descriptions to decide whether your server is the right fit for a given task.

Keep in mind: the agent is the one actually using the MCP server, but it’s the end user who decides which tools the agent has access to. Your documentation needs to support both!

4. Don’t just test functionality, test user interactions

One of the best ways to validate your documentation is to test your own MCP server. By far, the easiest way of interacting with your server when developing is to use the MCP inspector (type npx @modelcontextprotocol/inspector in your terminal and off you go!).

While it’s common to test whether your MCP server works, the inspector also helps you think from the end user’s perspective. It gives you a clearer sense of how users will interact with your server and whether your documentation supports that experience.

There are three key steps to testing a server:

Connecting to the MCP Server: This step will help you validate that your server is capturing all the necessary configuration to run properly.

List Tools: This is what AI agents see when they initialize your MCP server.

Tool Calling: Make sure that the tool behaves the way that it’s expected. This is where you can validate the failure modes.

One important design consideration is to think about the MCP Server lifecycle: Ask: What is necessary for the MCP Client to connect to the MCP Server?  How should tools be listed and discovered? And what’s the process for invoking a specific tool?

For example, when you’re writing an MCP server for your database. In a typical API, you’d establish the database connection when the server starts. However, when writing an MCP server, you should aim to make each tool call as self-contained as possible. This means creating a connection for every tool call, not on server start. By doing this, you will allow users to connect and list tools even if the server is not configured correctly. 

While this might feel like an anti-pattern at first, it actually makes more sense in this context. You’re trading a bit of latency for improved usability and reliability. In reality, the only moment your MCP will need a connection to a database (or a third-party system) is when a tool is invoked. The MCP Inspector is a great way to see this in action and gain a better understanding of how both users and agents will interact with your server.

If you are using the Docker MCP Toolkit, there are several ways to test whether your MCP server is behaving as expected. 

Run the following command to call your tool using the configuration you defined in Docker Desktop.

`docker mcp tools call my-tool`

To test what the MCP clients see, you can run the following command:

`docker mcp gateway run –verbose –dry-run`

This command simulates the call from an MCP client to your MCP server, assuming it’s enabled in the Docker MCP Catalog.

5. Packaging your MCP servers with containers

Excellent, we have written and tested our MCP server, what’s next? Packaging!

Packaging an MCP server is not so much about creating the artifact but thinking about how the artifact is going to be used. We might be a bit biased here, but we truly believe that packaging your MCP server as a Docker Image is the way to go.

MCP servers come in many different flavours: Python, TypeScript, Java… Packaging as a Docker image makes your server truly portable and because of the nature of Docker images. You can ensure that the end user will be able to run your MCP server regardless of how their system is configured. Using Docker containers is the easiest way to avoid dealing with dependencies on other people’s machines. If they can run Docker, they can run your MCP server.

There are many resources available about how to create a good Dockerfile, but if you’re not sure if you have done the right thing, you can always use Gordon or `docker ai` command to improve it. Just type `docker ai improve my Dockerfile` and Gordon, the Docker AI agent, will help you with optimizing a Dockerfile for your MCP server.

How to submit your MCP server 

Once you have a Dockerfile in your repository, we invite you to submit your MCP server to the Docker Official Registry! At the time of this writing, all submitted MCP servers must use the stdio transport mechanism, so be sure your server supports this when running as a container. We look forward to your submission!

Conclusion

The new Docker MCP Catalog makes it easier than ever to discover and scale MCP servers securely. Whether you’re submitting a Docker-built server with full security treatment or maintaining your own as a community contributor, following these five best practices for MCP servers; Managing tool budget, designing for the Agent, writing for both users and LLMs, thoroughly testing, and packaging with containers will help you create MCP servers that are reliable, easy to use, and ready for real-world agentic workloads. 

Ready to share yours with the Docker community? Submit it to the Docker MCP Catalog and get it in front of millions of developers! 

Learn more

Check out our new MCP Catalog announcement blog

Find documentation for Docker MCP Catalog and Toolkit.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Docker State of App Dev: Dev Ex & Productivity 

Report: What’s helping devs thrive — and what’s still holding them back? 

A look at how culture, tooling, and habits are shaping the developer experience today, per Docker’s 2025 State of Application Development Survey.

Great culture, better tools — but developers often still feel stuck. From pull requests stuck in review to tasks without clear estimates, the inner loop remains cluttered with surprisingly persistent friction points. This year’s data maps the disconnect between what developers need, where they’re blocked, and how better tooling and cultural support can keep velocity on track.

Here are six key insights into developer experience and productivity from Docker’s annual State of Application Development Survey, based on responses from over 4,500 industry professionals.

1. How devs learn — and what’s changing

Self-guided learning is on the upswing. Across all industries, fully 85% of respondents turn to online courses or certifications, far outpacing traditional sources like school (33%), books (25%), or on-the-job training (25%). 

Among IT folks, the picture is more nuanced. School is still the top venue for learning to code (65%, up from 57% in our 2024 survey), but online resources are also trending upward. Some 63% of IT pros learned coding skills via online resources (up from 54% in our 2024 survey) and 57% favored online courses or certifications (up from 45% in 2024).

Note: For this year’s report, we surveyed over three times more users across a broader spectrum of industries than for our more IT-focused 2024 report.

As for how devs prefer to learn, reading documentation tops the list, as in last year’s report — that despite the rise in new and interactive forms of learning. Some 29% say they lean on documentation, edging out videos and side projects (28% each) and slightly ahead of structured online training (26%). 

AI tools play a relatively minor role in how respondents learn, with GitHub Copilot cited by just 13% overall — and only 9% among IT pros. It’s also cited by 13% as a preferred learning method.

2. Containers: the great divide?

Among IT pros, container usage soared to 92% — up from 80% in our 2024 survey. Zoom out to a broader view across industries, however, and adoption appears considerably lower. Just 30% of developers say they use containers in any part of their workflow. 

Why the gap? Differences in app structure may offer an explanation: IT industry respondents work with microservice-based architectures more often than those in other industries (68% versus 31%). So the higher container adoption may stem from IT pros’ need for modularity and scalability — which containers provide in spades.

And among container users, needs are evolving. They want better tools for time estimation (31%), task planning (18%), and monitoring/logging (15%) — stubborn pain points across the software lifecycle.

3. An equal-opportunity headache: estimating time

No matter the role, estimating how long a task will take is the most consistent pain point across the board. Whether you’re a front-end developer (28%), data scientist (31%), or a software decision-maker (49%), precision in time planning remains elusive.

Other top roadblocks? Task planning (26%) and pull-request review (25%) are slowing teams down. Interestingly, where people say they need better tools doesn’t always match where they’re getting stuck. Case in point, testing solutions and Continuous Delivery (CD) come up often when devs talk about tooling gaps — even though they’re not always flagged as blockers.

4. Productivity by persona: different hats, same struggles

When you break it down by role, some unique themes emerge:

Experienced developers struggle most with time estimation (42%).

Engineering managers face a three-way tie: planning, time estimation, and designing from scratch (28% each).

Data scientists are especially challenged by CD (21%) — a task not traditionally in their wheelhouse.

Front-end devs, surprisingly, list writing code (28%) as a challenge, closely followed by CI (26%).

Across personas, a common thread stands out: even seasoned professionals are grappling with foundational coordination tasks — not the “hard” tech itself, but the orchestration around it.

5. Tools vs. culture: two sides of the experience equation

On the tooling side, the biggest callouts for improvement include:

Time estimation (22%)

Task planning (18%)

Designing solutions from scratch (17%)

But productivity isn’t just about tools — it’s deeply cultural. When asked what’s working well, developers pointed to work-life balance (39%), location flexibility such as work from home policies (38%), and flexible hours (37%) as top cultural strengths.

The weak spots? Career development (38%), recognition (36%), and meaningful work (33%). In other words: developers like where, when, and how they work, but not always why.

6. What’s easy? What’s not?

While the dev world is full of moving parts, a few areas are surprisingly not challenging:

Editing config files (8%)

Debugging in dev (8%)

Writing config files (7%)

Contrast that with the most taxing areas:

Troubleshooting in production (9%)

Debugging in production (9%)

Security-related tasks (8%)

It’s a reminder that production is still where the stress — and the stakes — are highest.

Bottom line:

Developer productivity isn’t about just one thing. It’s the compound effect of better tools, smarter learning, sharper planning — and yes, a healthy team culture. For orgs to excel, they need to invest not just in platforms, but also in people. Because when you improve the experience, you unlock the performance.

Quelle: https://blog.docker.com/feed/

Using Gordon to Containerize Your Apps and Work with Containers

These days, almost every tech company is looking for ways to integrate AI into their apps and workflows, and Docker is no exception. They’ve been rolling out some impressive AI capabilities across their products. This is my first post as a Docker Captain and in this post, I want to shine a spotlight on a feature that hasn’t gotten nearly enough attention in my opinion: Docker’s AI Agent Gordon (also known as Docker AI), which is built into Docker Desktop and CLI.

Gordon is really helpful when it comes to containerizing applications. Not only does it help you understand how to package your app as a container, but it also reduces the overhead of figuring out dependencies, runtime configs, and other pieces that add to a developer’s daily cognitive load. The best part? Gordon doesn’t just guide you with responses; it can also generate or update the necessary files for you.

The Problem: Containerizing apps and optimizing containers isn’t always easy

Containerizing apps can range from super simple to a bit tricky, depending on what you’re working with. If your app has a single runtime like Node.js, Python, or .NET Core, with clearly defined dependencies and no external services, it will be straightforward.

A basic Dockerfile will usually get you up and running without much effort. But once you start adding more complexity, like a backend, frontend, database, and caching layer, you now have the need for a multi-container app. At this point, you might be dealing with additional Dockerfile configurations and potentially a Docker Compose setup. That’s where things can start to be challenging to get going.

This is where Gordon shines. It’s helpful in containerizing apps and can even handle multi-service container app setups, guiding you through what’s needed and even generating the supporting config files, such as Dockerfiles and docker-compose, to get you going.

Optimizing containers can be a headache too

Beyond just containerizing, there’s also the need to optimize your containers for performance, security, and image size. And let’s face it, optimizing can be tedious. You need to know what base images to use, how to slim them down, how to avoid unnecessary layers, and more.

Gordon can help here too. It provides optimization suggestions, shows you how to apply best practices like multi-stage builds or removing dev dependencies, and helps you create leaner, more secure images.

Why not just use general-purpose Generative AI?

Sure, general-purpose AI tools like ChatGPT, Claude, Gemini, etc. are great and I use them regularly. But when it comes to containers, they can lack the context needed for accurate and efficient help. Gordon, on the other hand, is purpose-built for Docker. It has access to Docker’s ecosystem and has been trained on Docker documentation, best practices, and the nuances of Docker tooling. That means its recommendations are more likely to be precise and aligned with the latest standards.

Walkthrough of Gordon

Gordon can help with containerizing applications, optimizing your containers and more. Gordon is still a Beta feature. To start using Gordon, you need Docker Desktop version 4.38 or later. Gordon is powered by Large Language Models (LLMs), and it goes beyond prompt and response: it can perform certain tasks for you as an AI agent. Gordon can have access to your local files and local images when you give it permission. It will prompt you for access if needed for a task.

Please note, the examples I will show in this post are based on a single working session. Now, let’s dive in and start to explore Gordon.

Enabling Gordon / Docker AI

In order to turn Gordon on, go to Settings > Beta features check the Enable Docker AI box as shown in the following screenshot. 

Figure 1: screenshot of where to enable Docker AI in beta features

Accept the terms. The AI in Docker Desktop is in two forms. The first one is through the Docker Desktop UI and is known as Gordon. The second option is Docker AI. Docker AI is accessed through the Docker CLI. The way you activate it is by typing Docker AI in the CLI. I will demonstrate this later on in this blog post.  

Figure 2: screenshot of Docker AI terms acceptance dialog box

Exploring Gordon in Docker Desktop

Now Gordon will appear in your Docker Desktop UI. Here you can prompt it just like any Generative AI tool. Gordon will also have examples that you can use to get started working with it.

You can access Gordon throughout Docker Desktop by clicking on the AI icon as shown in the following screenshot.

Figure 3: screenshot of Docker Desktop interface showing the AI icon for Gordon

When you click on the AI icon a Gordon prompt box appears along with suggested prompts as shown in the following screenshot. The suggestions will change based on the object the AI is next to, and are context-aware.

Figure 4: Screenshot showing Gordon’s suggestion prompt box in Docker Desktop UI

Here is another example of Docker AI suggestions being context-aware based on what area of Docker Desktop you are in. 

 Figure 5: Screenshot showing Docker AI context- specific suggestions 

Another common use case for Gordon is listing local images and using AI to work with them. You can see this in the following set of screenshots. Notice that Gordon will prompt you for permission before showing your local images.

Figure 6: Screenshot showing Gordon referencing local images 

You can also prompt Gordon to take action. As shown in the following screenshot, I asked Gordon to run one of my images.

Figure 7: Screenshot showing Gordon prompts 

If it can’t perform the action, it will attempt to help you. 

Figure 8: Screenshot showing Gordon prompt response to failed request 

Another cool use of Gordon is to explain a container image to you. When you ask this, Gordon will ask you to select the directory where the Dockerfile is and permission to access it as shown in the following screenshot.

Figure 9: Screenshot showing Gordon’s request for particular directory access 

After you give it access to the directory where the Dockerfile is, it will then breakdown what’s in the Dockerfile. 

Figure 10: Screenshot showing Gordon’s response to explaining a Dockerfile 

As shown in the following screenshot, I followed up with a prompt asking Gordon to display what’s in the Dockerfile. It did a good job of explaining its contents, as shown in the following screenshot.

Figure 11: Screenshot showing Gordon’s response regarding Dockerfile contents

Exploring Gordon in the Docker Desktop CLI

Let’s take a quick tour through Gordon in the CLI. Gordon is referred to as Docker AI in the CLI. To work with Docker AI, you need to launch the Docker CLI as shown in the following screenshot. 

Figure 12: Screenshot showing how to launch Docker AI from the CLI 

Once in the CLI you can type “docker ai” and it will bring you into the chat experience so you can prompt Gordon. In my example, I asked Gordon about one of my local images. You can see that it asked me for permission. 

Figure 13: Screenshot showing Docker CLI request for access

Next, I asked Docker AI to list all of my local images as shown in the following screenshot. 

Figure 14: Screenshot showing Docker CLI response to display local images 

I then tested pulling an image using Docker AI. As you can see in the following screenshot, Gordon pulled a nodeJS image for me!

Figure 15: Screenshot showing Docker CLI pulling nodeJS image

Containerizing an application with Gordon

Now let’s explore the experience of containerizing an application using Gordon.

I started by clicking on the example for containerizing an application. Gordon then prompted me for the directory where my application code is. 

Figure 16: Screenshot showing where to enable access to directory for containerizing an application 

I pointed it to my apps directory and gave it permission. It then started to analyze and containerize my app. It picked up the language and started to read through my app’s README file.

Figure 17: Screenshot showing Gordon starting to analyze and containerize app 

You can see it understand the app was written in JavaScript and worked through the packages and dependencies.

Figure 18: Screenshot showing final steps of Gordon processing

Gordon understands that my app has a backend, frontend, and a database, knowing from this that I would need a Docker compose file.

Figure 19: Screenshot showing successful completion of steps to complete the Dockerfiles

From the following screenshot you can see the Docker related files needed for my app. Gordon created all of these.

Figure 20: Screenshot showing files produced from Gordon 

Gordon created the Dockerfile (on the left) and a Compose yaml file (on the right) even picking up that I needed a Postgres DB for this application.

Figure 21: Screenshot showing Dockerfile and Compose yaml file produced from Gordon

I then took it a step further and asked Gordon to build and run the container for my application using the prompt “Can you build and run this application with compose?” It created the Docker Compose file, built the images, and ran the containers!

Figure 22: Screenshot showing completed containers from Gordon

Conclusion

I hope you picked up some useful insights about Docker and discovered one of its lesser-known AI features in Docker Desktop. We explored what Gordon is, how it compares to general-purpose generative AI tools like ChatGPT, Claude, and Gemini, and walked through use cases such as containerizing an application and working with local images. We also touched on how Gordon can support developers and IT professionals who work with containers. If you haven’t already, I encourage you to enable Gordon and take it for a test run. Thanks for reading and stay tuned for more blog posts coming soon.
Quelle: https://blog.docker.com/feed/

The Docker MCP Catalog: the Secure Way to Discover and Run MCP Servers

The Model Context Protocol (MCP) ecosystem is exploding. In just weeks, our Docker MCP Catalog has surpassed 1 million pulls, validating that developers are hungry for a secure way to run MCP servers. Today, we’re excited to share major updates to the Docker MCP Catalog, including enhanced discovery features and our new open submission process. With hundreds of developers already requesting to publish their MCP servers through Docker, we’re accelerating our mission to make containerized MCP servers the standard for secure AI tool distribution.

The rapid adoption of MCP servers also highlights a critical problem — the current practice of running them via npx or uvx commands exposes systems to unverified code with full host access, not to mention dependency management friction. In this post, we’ll explain why Docker is investing in the MCP ecosystem, showcase the new catalog capabilities, and share how you can contribute to building a more secure foundation for AI applications.

Figure 1: The new Docker MCP Catalog, built for easier discovery.

Why Docker is building the MCP Catalog

The security issues in MCP distribution

Every time a developer runs npx -y @untrusted/mcp-server or uvx some-mcp-tool, they’re making a dangerous trade-off: convenience over security. These commands execute arbitrary code directly on the host system with full access to:

The entire file system

Network connections

Environment variables and secrets

System resources

Some MCP clients limit environment variable access, but even that is not a universal practice. This isn’t sustainable. As MCP moves from experimentation to production, we need a fundamentally different approach.

Docker’s unique position

Docker has spent over a decade solving exactly these problems for cloud-native applications. We’ve built the infrastructure, tools, and trust that developers rely on to run billions of containers in production. Now, we’re applying these same principles to the MCP ecosystem.

When you run an MCP server from our Catalog, you get:

Cryptographic signatures verifying the image hasn’t been tampered with

Software Bill of Materials (SBOMs) documenting every component

Complete isolation from your host system

Controlled access to only what the server actually needs

This isn’t about making life harder for developers—it’s about making security the path of least resistance.

Introducing the enhanced MCP Catalog

Built for MCP discovery

We’ve reimagined the MCP Catalog to make it more accessible and easier to navigate. You can still access the MCP Catalog from Docker Hub and the MCP Toolkit in Docker Desktop just like before, or go straight to the MCP catalog. We’ve gone beyond generic container image listings by building features that help you quickly find the right MCP servers for your AI applications.  

Browse by Use Case: MCP servers are organized by what they actually do:

Data Integration (databases, APIs, file systems)

Development Tools (IDEs, code analysis, testing)

Communication (email, Slack, messaging platforms)

Productivity (task management, calendars, note-taking)

Analytics (data processing, visualization, reporting)

Enhanced Search: Find servers by capability, tools, GitHub tags, and categories — not just by name.

Security Transparency: Every catalog entry clearly shows whether it’s Docker-built (with transparent build signing and verification) or community-built (containerized and maintained by the publisher).

Figure 2: Discover MCP servers by use cases.

How we classify MCP Servers: Built by Docker vs. community-built

Docker-Built Servers: When you see “Built by Docker,” you’re getting our complete security treatment. We control the entire build pipeline, providing cryptographic signatures, SBOMs, provenance attestations, and continuous vulnerability scanning.

Community-Built Servers: These servers are packaged as Docker images by their developers. While we don’t control their build process, they still benefit from container isolation, which is a massive security improvement over direct execution.

Tiers serve important roles: Docker-built servers demonstrate the gold standard for security, while community-built servers ensure we can scale rapidly to meet developer demand. Developers can change their mind after submitting a community-built server and opt to resubmit it as a Docker-built server.

Figure 3: An example of Built by Docker MCP Server.

Open for MCP server submission: Join the secure MCP movement

Starting today, we’re opening our submission process to the community. Whether you’re an individual developer or an enterprise team, you can feature your MCP servers on the Docker MCP Catalog. By publishing through our catalog, you’re not just distributing your MCP server — you’re helping establish a new security standard for the entire ecosystem while getting your MCP tools available to millions of developers already using Docker via Docker Hub and Docker Desktop. Your containerized server becomes part of the solution, demonstrating that production-ready AI tools don’t require compromising on security. 

How to submit your MCP server

Containerize your server – Package your MCP server as a Docker image

Submit via GitHub – Create a pull request at github.com/docker/mcp-registry

Choose your tier – Opt for Docker-built (we handle the build) or community-built (you build and maintain it)

We’re committed to a fast, transparent review process. Quality MCP servers that follow our security guidelines will be published quickly, helping you reach Docker’s 20+ million developer community.

ClickHouse is one of the first companies to take advantage of Docker’s MCP Catalog, and they opted for the Docker-built tier to ensure maximum security. Here’s why they chose to partner with Docker:

“At ClickHouse, we deliver the fastest analytics database – open-source, and designed for real-time data processing and analytics at scale. As agentic AI becomes more embedded in modern applications, developers are using the ClickHouse MCP server to support intelligent, data-driven workflows that demand low latency, high concurrency, and cost efficiency.To make it easier for developers to deploy these workloads, we’re featuring ClickHouse MCP Server on Docker’s MCP Catalog, which provides a powerful way to reach 20M+ developers and makes it easier for Docker users to discover and use our solution. We opted for “Built by Docker” with the highest security standard, including cryptographic signatures, SBOMs, provenance attestations, and continuous vulnerability scanning. Together with Docker, developers can run ClickHouse MCP Server with confidence, knowing it’s secured, verified, and ready for their agentic applications.” – Tanya Bragin, VP of Product and Marketing Clickhouse.

What’s coming next

Remote MCP servers

We’re preparing for the future of cloud-native AI applications. Remote MCP servers will enable:

Managed MCP services that scale automatically

Shared capabilities across teams without distributing code

Stricter security boundaries for sensitive operations

Integration with the official MCP registry

We’re actively collaborating with the MCP community on the upcoming official registry. Our vision is complementary:

The official registry provides centralized discovery – the “yellow pages” of available MCP servers

Docker provides the secure runtime and distribution for those listings

Together, we create a complete ecosystem where discovery and security work hand-in-hand

The path forward

The explosive growth of our MCP Catalog, 1 million pulls and hundreds of publisher requests, tells us developers are ready for change. They want the power of MCP, but they need it delivered securely.

By establishing containers as the standard for MCP server distribution, we’re not trying to own the ecosystem — we’re trying to secure it. Every MCP server that moves from npx execution to containerized deployment is a win for the entire community.

Start today

Explore the enhanced MCP Catalog: Visit the MCP Catalog to discover MCP servers that solve your specific needs securely.

Use and test hundreds of MCP Servers: Download Docker Desktop to download and use any MCP server in our catalog with your favorite clients: Gordon, Claude, Cursor, VSCode, etc

Submit your server: Join the movement toward secure AI tool distribution. Check our submission guidelines for more.

Follow our progress: Star our repository and watch for updates on the MCP Gateway release and remote server capabilities.

Together, we’re building more than a catalog — we’re establishing the secure foundation that the MCP ecosystem needs to grow from experimental tool to production-ready platform. Because when it comes to AI applications, security isn’t optional. It’s fundamental.

Learn more

Check out our announcement blog

Find documentation forDocker MCP Catalog and Toolkit.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Tool Calling with Local LLMs: A Practical Evaluation

Which local model should I use for tool calling?

When building GenAI and agentic applications, one of the most pressing and persistent questions is: “Which local model should I use for tool calling?”  We kept hearing again and again, from colleagues within Docker and the developer community, ever since we started working on Docker Model Runner, a local inference engine that helps developers run and experiment with local models. 

It’s a deceptively simple question with a surprisingly nuanced answer. Even when we tried to answer it for a very specific case: “What if I just expose 5 simple tools to the model?”We realized we had no definite answer for that. Local LLM models offer control, cost-efficiency, and privacy, but when it comes to structured tool use, deciding when and how to act, they can behave very differently. We decided to dig deep and test this properly. We started with manual experimentation, then built a framework to scale our testing. This blog documents that journey and shares which models ranked highest on our tool-calling leaderboard.

The first attempt: Manual testing

Our first instinct was to build something quickly and try it out manually.

So we created chat2cart, an AI-powered shopping assistant that lets users interact via chat to build, modify, and check out a shopping cart. Through a natural conversation, users can discover products, add or remove items, and complete or cancel their purchase, all from the chat interface.

To support testing across different LLMs, we added a model selector that makes it easy to switch between local models (via Docker Model Runner or Ollama) and hosted models using the OpenAI API.

OpenAI’s GPT-4 or GPT-3.5 worked as expected, and the experience was fairly smooth. 

Called tools when they were needed

Avoided unnecessary tool usage

Handled tool responses naturally

But the local models? That’s where the challenges started to surface.

What went wrong with local models

We started experimenting with some of the local models listed on the Berkeley Function-Calling Leaderboard. Our goal was to find smaller models, ideally with fewer than 10 billion parameters, so we tested xLAM-2-8b-fc-r and watt-tool-8B. We quickly ran into several recurring issues:

Eager invocation: Tools were being called even for greeting messages like “Hi there!”

Wrong tool selection: The model would search when it should have added, or tried to remove when the cart was empty

Invalid arguments: Parameters like product_name or quantity were missing or malformed

Ignored responses: The model often failed to respond to tool output, leading to awkward or incomplete conversations

At this point, it was clear that manual testing wouldn’t scale. Different models failed in different ways, some struggled with invocation logic, while others mishandled tool arguments or responses.  Testing was not only slow, but also unreliable. Because these models are non-deterministic, we had to run each scenario multiple times just to get a reliable read on behavior.

We needed a testing setup that was repeatable, measurable, and fast.

Our second attempt: A scalable testing tool

Our goal wasn’t academic rigor.It was: “Give us good-enough answers in 2–3 days, not weeks.”

In a couple of days, we created model-test, This is a flexible project with the following capabilities

Define real-world test cases with multiple valid tool call sequences

Run them against many models (local & hosted)

Track tool-calling accuracy, tool selection, and latency

Log everything for analysis (or eventual fine-tuning)

How it works

The core idea behind model-test is simple: simulate realistic tool-using conversations, give the model room to reason and act, and check whether its behavior makes sense.

Each test case includes:

A prompt (e.g. “Add iPhone to cart”)

The initial cart state (optional)

One or more valid tool-call variants, because there’s often more than one right answer

Here’s a typical case:

{
  "prompt": "Add iPhone to cart",
  "expected_tools_variants": [
    {
      "name": "direct_add",
      "tools": [{ "name": "add_to_cart", "arguments": { "product_name": "iPhone" } }]
    },
    {
      "name": "search_then_add",
      "tools": [
        { "name": "search_products", "arguments": { "query": "iPhone" } },
        { "name": "add_to_cart", "arguments": { "product_name": "iPhone 15" } }
      ]
    }
  ]
}

In this case, we consider both “just add ‘iPhone’” and “search first, then add the result” as acceptable. Even though “iPhone” isn’t a real product name, we’re fine with it. We weren’t aiming for overly strict precision, just realistic behavior.

Each test case belongs to a test suite. We provide two built-in suites. However, you can run an entire suite, individual test cases, or a selection of multiple test cases. Additionally, you can create your own custom suites to group tests as needed. 

Simple: Greetings, single-step actions

Complex: Multi-step reasoning and tool chaining

The agent loop

To make tests feel closer to how real agents behave, we simulate an agent loop up to 5 rounds.

Example:

User: “Add iPhone 5 to cart”

Model: “Let me search for iPhone 5…”

Tool: (returns product list)

Model: “Adding product X to cart…”

Tool: (updates cart)

Model: “Done” → Great, test passed!

But if the model still wants to keep going after round 5?

That’s it, my friend,  test failed. Time’s up.

Not all-or-nothing

We deliberately avoided designing tests that require perfect predictions.

We didn’t demand that the model always know the exact product name.

What mattered was: did the tool sequence make sense for the intent?

This helped us focus on the kind of reasoning and behavior we actually want in agents, not just perfect token matches.

What We Measured

Our test outputs distilled down to a final F1 score, encapsulating three core dimensions:

Metric

What it tells us

Tool Invocation

Did the model realize a tool was needed?

Tool Selection

Did it choose the right tool(s) and use them correctly?

Parameter accuracy

Whether the tool call arguments were correct?

The F1 score is the harmonic mean of two things: precision (how often the model made valid tool calls) and recall (how often it made the tool calls it was supposed to).

We also tracked latency, the average runtime in seconds, but that wasn’t part of the F1 calculation; it simply helped us evaluate speed and user experience.

21 models and 3,570 tests later: Which models nailed tool calling?

We tested 21 models across 3570 test cases using 210 batch runs.

Hardware: MacBook Pro M4 Max, 128GB RAMRunner: test-all-models.sh

Overall Rankings (by Tool Selection F1):

Model

F1 Score

gpt-4

0.974

qwen3:14B-Q4_K_M

0.971

qwen3:14B-Q6_K

0.943

claude-3-haiku-20240307

0.933

qwen3:8B-F16

0.933

qwen3:8B-Q4_K_M

0.919

gpt-3.5-turbo

0.899

gpt-4o

0.857

gpt-4o-mini

0.852

claude-3-5-sonnet-20241022

0.851

llama3.1:8B-F16

0.835

qwen2.5:14B-Q4_K_M

0.812

claude-3-opus-20240229

0.794

llama3.1:8B-Q4_K_M

0.793

qwen2.5:7B-Q4_K_M

0.753

gemma3:4B

0.733

llama3.2:3B_F16

0.727

llama3grog:7B-Q4_K_M

0.723

llama3.3:70B.Q4_K_M

0.607

llama-xlam:8B-Q4_K_M

0.570

watt-tool:8B-Q4_K_M

0.484

Top performers

Among all models, OpenAI’s GPT-4 came out on top with a tool selection F1 score of 0.974, completing responses in just under 5 seconds on average. While hosted and not the focus of our local model exploration, it served as a reliable benchmark and provided some ground truths.

On the local side, Qwen 3 (14B) delivered outstanding results, nearly matching GPT-4 with a 0.971 F1 score, though with significantly higher latency (~142 seconds per interaction).

If you’re looking for something faster, Qwen 3 (8B) also achieved an F1 score of 0.933, while cutting latency nearly in half (~84 seconds), making it a compelling balance between speed and tool-use accuracy.

Hosted models like Claude 3 Haiku also performed very well, hitting 0.933 F1 with exceptional speed (3.56 seconds average latency), further illustrating the high bar set by cloud-based offerings.

Underperformers

Not all models handled tool calling well. The quantized Watt 8B model struggled with parameter accuracy and ended up with a tool selection F1 score of just 0.484. Similarly, the LLaMA-based XLam 8B variant often missed the correct tool path altogether, finishing with an F1 score of 0.570. These models may be suitable for other tasks, but for our structured tool use test, they underdeliver.

Quantization

We also experimented with both quantized and non-quantized variants for some models, and in all cases observed no significant difference in tool-calling behavior or performance. This suggests that quantization is beneficial for reducing resource usage without negatively impacting accuracy or reasoning quality, at least for the models and scenarios we tested.

Our recommendations

If your goal is maximum tool-calling accuracy, then Qwen 3 (14B) or Qwen 3 (8B) are your best bets, both local, both precise, with the 8B variant being notably faster.

For a good trade-off between speed and performance, Qwen 2.5 stood out as a solid option. It’s fast enough to support real-time experiences, while still maintaining decent tool selection accuracy.

If you need something more lightweight, especially for resource-constrained environments, the LLaMA 3 Groq 7B variant offers modest performance at a much lower compute footprint.

What we learned and why this matters

Our testing confirmed that the Qwen family of models leads the pack among open-source options for tool calling. But as always, there’s a trade-off; you’ll need to balance between accuracy and latency when designing your application

Qwen models dominate: Even the 8B version of Qwen3 outperformed any other local model

Reasoning = latency: Higher-accuracy models take longer, often significantly.

Tool calling is core to almost every real-world GenAI application. Whether you’re building agents or creating agentic workflows, your LLM must know when to act and how. Thanks to this simple framework, “We don’t know which model to pick” became “We’ve narrowed it down to three great options, each with clear pros and cons.”

If you’re evaluating models for your agentic applications, skip the guesswork. Try model-test and make it your own for testing! 

Learn more

Get an inside look at the design architecture of the Docker Model Runner. 

Explore the story behind our model distribution specification

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Building an Easy Private AI Assistant with Goose and Docker Model Runner

Goose is an innovative CLI assistant designed to automate development tasks using AI models. Docker Model Runner simplifies deploying AI models locally with Docker. Combining these technologies creates a powerful local environment with advanced AI assistance, ideal for coding and automation.

Looking for a seamless way to run AI-powered development tasks locally without compromising on privacy or flexibility? Look no further. By combining the power of Goose, a CLI-based AI assistant, with Docker Model Runner, you get a streamlined, developer-friendly setup for running large language models right on your machine.

Docker Model Runner makes it easy to run open-source AI models with Docker, no cloud APIs or external dependencies required. And the best part? It works out of the box with tools like Goose that expect an OpenAI-compatible interface. That means you can spin up advanced local assistants that not only chat intelligently but also automate tasks, run code, and interact with your system, without sending your data anywhere else.

In this guide, you’ll learn how to build your own AI assistant with these innovative tools. We’ll walk you through how to install Goose, configure it to work with Docker Model Runner, and unleash a private, scriptable AI assistant capable of powering real developer workflows. Whether you want to run one-off commands or schedule recurring automations, this local-first approach keeps you in control and gets things done faster.

Install Goose CLI on macOS

Goose is available on Windows, macOS, and Linux as a command-line tool, and also has a desktop application for macOS if that’s what you prefer. In this article, we’ll configure and show the CLI version on macOS. 

To install Goose on you can use this handy curl2sudo oneliner technique:

curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash

Enable Docker Model Runner

First, ensure you have Docker Desktop installed. Then, configure Docker Model Runner with your model of choice. Go to Settings > Beta features and check the checkboxes for Docker Model Runner.

By default, it’s not wired to be available from your host machine, as a security precaution, but we want to simplify the setup and enable the TCP support as well. The default port for that would be 12434, so the base URL for the connection would be: http://localhost:12434

Figure 1: Docker Desktop beta features settings showing how to enable port 12434

Now we can pull the models from Docker Hub: hub.docker.com/u/ai and run the models. For this article, we’ll use ai/qwen3:30B-A3B-Q4_K_M because it gives a good balance of world knowledge and intelligence at just 3B active parameters: 

docker model pull ai/qwen3:30B-A3B-Q4_K_M
docker model run ai/qwen3:30B-A3B-Q4_K_M

This command starts the interactive chat with the model.

Configure Goose for Docker Model Runner

Edit your Goose config at ~/.config/goose/config.yaml:

GOOSE_MODEL: ai/qwen3:30B-A3B-Q4_K_M
GOOSE_PROVIDER: openai
extensions:
developer:
display_name: null
enabled: true
name: developer
timeout: null
type: builtin
GOOSE_MODE: auto
GOOSE_CLI_MIN_PRIORITY: 0.8
OPENAI_API_KEY: irrelevant
OPENAI_BASE_PATH: /engines/llama.cpp/v1/chat/completions
OPENAI_HOST: http://localhost:12434

The OPENAI_API_KEY is irrelevant as Docker Model Runner does not require authentication because the model is run locally and privately on your machine.

We provide the base path for the OpenAI compatible API, and choose the model GOOSE_MODEL: ai/qwen3:30B-A3B-Q4_K_M that we have pulled before.

Testing It Out

Try Goose CLI by running goose in the terminal. You can see that it automatically connects to the correct model, and when you ask for something, you’ll see the GPU spike as well.

Figure 2: Goose CLI running in terminal, showing example of response to local prompts

Now, we also configure Goose to have the Developer extension enabled. It allows it to run various commands on your behalf, and makes it a much more powerful assistant with access to your machine than just a chat application.

You can additionally configure the custom hints to Goose to tweak its behaviour using the .goosehints file.

And what’s even better, you can script Goose to run tasks on your behalf with a simple one-liner:

goose run -t “your instructions here” or goose run -i instructions.mdwhere instructions.md is the file with what to do.

On macOS you have access to crontab for scheduling recurrent scripts, so you can automate Goose with Docker Model Runner to activate repeatedly and act on your behalf. For example, crontab -e, will open the editor for the commands you want to run, and a line like the one below should do the trick:

5 8 * * 1-5 goose run -i fetch_and_summarize_news.md

Will make Goose run at 8:05 am every workday and follow the instructions in the fetch_and_summarize_news.md file. For example, to skim the internet and prioritize news based on what you like.

Conclusion

All in all, integrating Goose with Docker Model Runner creates a simple but powerful setup for using local AI for your workflows.You can make it run custom instructions for you or easily script it to perform repetitive actions intelligently.It is all powered by a local model running in Docker Model Runner, so you don’t compromise on privacy either.

Learn more

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/