Docker Model Runner now supports vLLM on Windows

Great news for Windows developers working with AI models: Docker Model Runner now supports vLLM on Docker Desktop for Windows with WSL2 and NVIDIA GPUs!

Until now, vLLM support in Docker Model Runner was limited to Docker Engine on Linux. With this update, Windows developers can take advantage of vLLM’s high-throughput inference capabilities directly through Docker Desktop, leveraging their NVIDIA GPUs for accelerated local AI development.

What is Docker Model Runner?

For those who haven’t tried it yet, Docker Model Runner is our new “it just works” experience for running generative AI models.

Our goal is to make running a model as simple as running a container.

Here’s what makes it great:

Simple UX: We’ve streamlined the process down to a single, intuitive command: docker model run <model-name>.

Broad GPU Support: While we started with NVIDIA, we’ve recently added Vulkan support. This is a big deal—it means Model Runner works on pretty much any modern GPU, including AMD and Intel, making AI accessible to more developers than ever.

vLLM: Perform high-throughput inference with an NVIDIA GPU

What is vLLM?

vLLM is a high-throughput inference engine for large language models. It’s designed for efficient memory management of the KV cache and excels at handling concurrent requests with impressive performance. If you’re building AI applications that need to serve multiple requests or require high-throughput inference, vLLM is an excellent choice. Learn more here.

Prerequisites

Before getting started, make sure you have the prerequisites for GPU support:

Docker Desktop for Windows (starting with Docker Desktop 4.54)

WSL2 backend enabled in Docker Desktop

NVIDIA GPU with updated drivers with compute capability >= 8.0

GPU support configured in Docker Desktop

Getting Started

Step 1: Enable Docker Model Runner

First, ensure Docker Model Runner is enabled in Docker Desktop. You can do this through the Docker Desktop settings or via the command line:

docker desktop enable model-runner –tcp 12434

Step 2: Install the vLLM Backend

In order to be able to use vLLM, install the vLLM runner with CUDA support:

docker model install-runner –backend vllm –gpu cuda

Step 3: Verify the Installation

Check that both inference engines are running:

docker model install-runner –backend vllm –gpu cuda

You should see output similar to:

Docker Model Runner is running

Status:
llama.cpp: running llama.cpp version: c22473b
vllm: running vllm version: 0.12.0

Step 4: Run a Model with vLLM

Now you can pull and run models optimized for vLLM. Models with the -vllm suffix on Docker Hub are packaged for vLLM:

docker model run ai/smollm2-vllm "Tell me about Docker."

Troubleshooting Tips

GPU Memory Issues

If you encounter an error like:

ValueError: Free memory on device (6.96/8.0 GiB) on startup is less than desired GPU memory utilization (0.9, 7.2 GiB).

You can configure the GPU memory utilization for a specific mode:

docker model configure –gpu-memory-utilization 0.7 ai/smollm2-vllm

This reduces the memory footprint, allowing the model to run alongside other GPU workloads.

Why This Matters

This update brings several benefits for Windows developers:

Production parity: Test with the same inference engine you’ll use in production

Unified workflow: Stay within the Docker ecosystem you already know

Local development: Keep your data private and reduce API costs during development

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!
Quelle: https://blog.docker.com/feed/

Docker Joins the Agentic AI Foundation

Today, the Linux Foundation launched the Agentic AI Foundation with three founding projects: Anthropic’s Model Context Protocol (MCP), Block’s goose agent framework, and OpenAI’s AGENTS.md standard.

The foundation brings together the companies building the infrastructure layer for agents: Anthropic, Block, OpenAI, Amazon, Google, Microsoft, Cloudflare, and Bloomberg, alongside key tooling and platform companies. 

Docker is joining as a Gold member.

From Open Source to Production

The timing reflects how quickly the space has matured. A year ago, MCP launched as an open source project from Anthropic, solving a specific problem: how AI systems connect to tools and data. It’s now running on 10,000+ public servers and adopted across Claude, ChatGPT, Cursor, Copilot, VS Code, and Gemini.

Six months ago, companies started deploying agents that take real actions, triggering builds, accessing databases, modifying infrastructure, executing workflows. That shift from prototype to production created new questions around protocols and governance.

Today, foundational protocols that helped answer those questions, protocols like MCP, are moving to the Linux Foundation under the same governance structure that stewards Linux and PyTorch.

Why Neutral Governance Matters

When infrastructure becomes critical, developers won’t build on protocols that could change arbitrarily. And larger teams and enterprises want shared standards.

Over the past year we’ve partnered with Anthropic, Block, and other key players in the AI ecosystem to help create and embrace standards like MCP, Goose, and AGENTS.md. The Agentic AI Foundation creates a structure for the industry to unite behind these standards, building an ecosystem of interoperable tools that benefit developers.

Docker is excited to join as an active Gold member to drive innovation in developer-first, secure tools across our ecosystem.

What Happens Next

The protocols exist. Adoption is happening. The foundation ensures these protocols evolve transparently, with input from everyone building on them.

Docker helped build that structure for applications. Now we’re doing it for agents.

Learn more at aaif.io
Quelle: https://blog.docker.com/feed/

From Compose to Kubernetes to Cloud: Designing and Operating Infrastructure with Kanvas

Docker has long been the simplest way to run containers. Developers start with a docker-compose.yml file, run docker compose up, and get things running fast.

As teams grow and workloads expand into Kubernetes and integrate into cloud services, simplicity fades. Kubernetes has become the operating system of the cloud, but your clusters rarely live in isolation. Real-world platforms are a complex intermixing of proprietary cloud services – AWS S3 buckets, Azure Virtual Machines, Google Cloud SQL databases – all running alongside your containerized workloads. You and your teams are working with clusters and clouds in a sea of YAML.

Managing this hybrid sprawl often means context switching between Docker Desktop, the Kubernetes CLI, cloud provider consoles, and infrastructure as code. Simplicity fades as you juggle multiple distinct tools.

Bringing clarity back from this chaos is the new Docker Kanvas Extension from Layer5 – a visual, collaborative workspace built right into Docker Desktop that allows you to design, deploy, and operate not just Kubernetes resources, but your entire cloud infrastructure across AWS, GCP, and Azure.

What Is Kanvas?

Kanvas is a collaborative platform designed for engineers to visualize, manage, and design multi-cloud and Kubernetes-native infrastructure. Kanvas transforms the concept of infrastructure as code into infrastructure as design. This means your architecture diagram is no longer just documentation – it is the source of truth that drives your deployment. Built on top of Meshery (one of the Cloud Native Computing Foundation’s highest-velocity open source projects), Kanvas moves beyond simple Kubernetes manifests by using Meshery Models – definitions that describe the properties and behavior of specific cloud resources. This allows Kanvas to support a massive catalog of Infrastructure-as-a-Service (IaaS) components: 

AWS: Over 55+ services (e.g., EC2, Lambda, RDS, DynamoDB).

Azure: Over 50+ components (e.g., Virtual Machines, Blob Storage, VNet).

GCP: Over 60+ services (e.g., Compute Engine, BigQuery, Pub/Sub).

Kanvas bridges the gap between abstract architecture and concrete operations through two integrated modes: Designer and Operator.

Designer Mode (declarative mode)

Designer mode serves as a “blueprint studio” for cloud architects and DevOps teams, emphasizing declarative modeling – describing what your infrastructure should look like rather than how to build it step-by-step – making it ideal for GitOps workflows and team-based planning. 

Build and iterate collaboratively: Add annotations, comments for design reviews, and connections between components to visualize data flows, architectures, and relationships.

Dry-run and validate deployments: Before touching production, simulate your deployments by performing a dry-run to verify that your configuration is valid and that you have the necessary permissions. 

Import and export: Brownfield designs by connecting your existing clusters or importing Helm charts from your GitHub repositories. 

Reuse patterns, clone, and share: Pick from a catalog of reference architectures, sample configurations, and infrastructure templates, so you can start from proven blueprints rather than a blank design. Share designs just as you would a Google Doc. Clone designs just as you would a GitHub repo. Merge designs just as you would in a pull request.

Operator Mode (imperative mode)

Kanvas Operator mode transforms static diagrams into live, managed infrastructure. When you switch to Operator mode, Kanvas stops being a configuration tool and becomes an active infrastructure console, using Kubernetes controllers (like AWS Controllers for Kubernetes (ACK) or Google Config Connector) to actively manage your designs.

Operator mode allows you to:

Load testing and performance management: With Operator’s built-in load generator, you can execute stress tests and characterize service behavior by analyzing latency and throughput against predefined performance profiles, establishing baselines to measure the impact of infrastructure configuration changes made in Designer mode.

Multi-player, interactive terminal: Open a shell session with your containers and execute commands, stream and search container logs without leaving the visual topology. Streamline your troubleshooting by sharing your session with teammates. Stay in-context and avoid context-switching to external command-line tools like kubectl.

Integrated observability: Use the Prometheus integration to overlay key performance metrics (CPU usage, memory, request latency) and quickly find spot “hotspots” in your architecture visually. Import your existing Grafana dashboards for deeper analysis.

Multi-cluster, multi-cloud operations: Connect multiple Kubernetes clusters (across different clouds or regions) and manage workloads that span across a GKE cluster and an EKS cluster in a single topology view.them all from a single Kanvas interface.

While Kanvas Designer mode is about intent (what you want to build), Operator mode is about reality (what is actually running). Kanvas Designer mode and Operator mode are simply two, tightly integrated sides of the same coin. 

With this understanding, let’s see both modes in-action in Docker Desktop.

Walk-Through: From Compose to Kubernetes in Minutes

With the Docker Kanvas extension (install from Docker Hub), you can take any existing Docker Compose file and instantly see how it translates into Kubernetes, making it incredibly easy to understand, extend, and deploy your application at scale.

The Docker Samples repository offers a plethora of samples. Let’s use the Spring-based PetClinic example below. 

# sample docker-compose.yml

services:
petclinic:
build:
context: .
dockerfile: Dockerfile.multi
target: development
ports:
– 8000:8000
– 8080:8080
environment:
– SERVER_PORT=8080
– MYSQL_URL=jdbc:mysql://mysqlserver/petclinic
volumes:
– ./:/app
depends_on:
– mysqlserver

mysqlserver:
image: mysql:8
ports:
– 3306:3306
environment:
– MYSQL_ROOT_PASSWORD=
– MYSQL_ALLOW_EMPTY_PASSWORD=true
– MYSQL_USER=petclinic
– MYSQL_PASSWORD=petclinic
– MYSQL_DATABASE=petclinic
volumes:
– mysql_data:/var/lib/mysql
– mysql_config:/etc/mysql/conf.d
volumes:
mysql_data:
mysql_config:

With your Docker Kanvas extension installed:

Import sample app: Save the PetClinic docker-compose.yml file to your computer, then click to import or drag and drop the file onto Kanvas.

Kanvas renders an interactive topology of your stack showing services, dependencies (like MySQL), volumes, ports, and configurations, all mapped to their Kubernetes equivalents. Kanvas performs this rendering in phases, applying an increasing degree of scrutiny in the evaluation performed in each phase. Let’s explore the specifics of this tiered evaluation process in a moment.

Enhance the PetClinic design

From here, you can enhance the generated design in a visual, no-YAML way:

Add a LoadBalancer, Ingress, or ConfigMap

Configure Secrets for your database URL or sensitive environment variables

Modify service relationships or attach new components

Add comments or any other annotations.

Importantly, Kanvas saves your design as you make changes. This gives you production-ready deployment artifacts generated directly from your Compose file.

Deploy to a cluster

With one click, deploy the design to any cluster connected to Docker Desktop or any other remote cluster. Kanvas handles the translation and applies your configuration.

Switch modes and interact with your app

After deploying (or when managing an existing workload), switch to Operator mode to observe and manage your deployed design. You can:

Inspect Deployments, Services, Pods, and their relationships.

 Open a terminal session with your containers for quick debugging.

 Tail and search your container logs and monitor resource metrics.

 Generate traffic and analyze the performance of your deployment under heavy load.

Share your Operator View with teammates for collaborative management.

Within minutes, a Compose-based project becomes a fully managed Kubernetes workload, all without leaving Docker Desktop. This seamless flow from a simple Compose file to a fully managed, operable workload highlights the ease by which infrastructure can be visually managed, leading us to consider the underlying principle of Infrastructure as Design.

Infrastructure as Design

Infrastructure as design elevates the visual layout of your stack to be the primary driver of its configuration, where the act of adjusting the proximity and connectedness of components is one in the same as the process of configuring your infrastructure. In other words, the presence, absence, proximity, or connectedness of individual components (all of which affect how one component relates to another) respectively augments the underlying configuration of each. Kanvas is highly intelligent in this way, understanding at a very granular level of detail how each individual component relates to all other components and will augment the configuration of those components accordingly.

Understand that the process by which Kanvas renders the topology of your stack’s architecture in phases. The initial rendering involves a lightweight analysis of each component, establishing a baseline for the contents of your new design. A subsequent phase of rendering applies a higher level of sophistication in its analysis as Kanvas introspect the configuration of each of your stack’s components, their interdependencies, and proactively evaluates the manner in which each component relates to one another. Kanvas will add, remove, and update the configuration of your components as a result of this relationship evaluation.

This process of relationship evaluation is ongoing. Every time you make a change to your design, Kanvas re-evaluates each component configuration.

To offer an example, if you were to bring a Kubernetes Deployment in the same vicinity of the Kubernetes Namespace you will find that one magnetizes to the next and that your Deployment is visually placed inside of the Namespace, and at the same time, that Deployment’s configuration is mutated to include its new Namespace designation. Kanvas proactively evaluates and mutates the configuration of the infrastructure resources in your design as you make changes.

This ability for Kanvas to intelligently interpret and adapt to changes in your design—automatically managing configuration and relationships—is the key to achieving infrastructure as design. This power comes from a sophisticated system that gives Kanvas a level of intelligence, but with the reliability of a policy-driven engine.

AI-like Intelligence, Anchored by Deterministic Truth

In an era where generative AI dramatically accelerates infrastructure design, the risk of “hallucinations”—plausible but functionally invalid configurations—remains a critical bottleneck. Kanvas solves this by pairing the generative power of AI with a rigid, deterministic policy engine.

This engine acts as an architectural guardrail, offering you precise control over the degree to which AI is involved in assessing configuration correctness. It transforms designs from simple visual diagrams into validated, deployable blueprints.

While AI models function probabilistically, Kanvas’s policy engine functions deterministically, automatically analyzing designs to identify, validate, and enforce connections between components based on ground-truth rules. Each of these rules are statically defined and versioned in their respective Kanvas models.

Deep Contextualization: The evaluation goes beyond simple visualization. It treats relationships as context-aware and declarative, interpreting how components interact (e.g., data flows, dependencies, or resource sharing) to ensure designs are not just imaginative, but deployable and compliant.

Semantic Rigor: The engine distinguishes between semantic relationships (infrastructure-meaningful, such as a TCP connection that auto-configures ports) and non-semantic relationships (user-defined visuals, like annotations). This ensures that aesthetic choices never compromise infrastructure integrity.

Kanvas acknowledges that trust is not binary. You maintain sovereignty over your designs through granular controls that dictate how the engine interacts with AI-generated suggestions:

“Human-in-the-Loop” Slider: You can modulate the strictness of the policy evaluation. You might allow the AI to suggest high-level architecture while enforcing strict policies on security configurations (e.g., port exposure or IAM roles).

Selective Evaluation: You can disable evaluations via preferences for specific categories. For example, you may trust the AI to generate a valid Kubernetes Service definition, but rely entirely on the policy engine to validate the Ingress controller linking to it.

Kanvas does not just flag errors; it actively works to resolve them using sophisticated detection and correction strategies.

Intelligent Scanning: The engine scans for potential relationships based on component types, kinds, and subtypes (e.g., a Deployment linking to a Service via port exposure), catching logical gaps an AI might miss.

Patches and Resolvers: When a partial or a hallucinated configuration is detected, Kanvas applies patches to either propagate missing configuration or dynamically adjusts configurations to resolve conflicts, ensuring the final infrastructure-as-code export (e.g., Kubernetes manifests, Helm chart) is clean, versionable, and secure.

Turn Complexity into Clarity

Kanvas takes the guesswork out of managing modern infrastructure. For developers used to Docker Compose, it offers a natural bridge to Kubernetes and cloud services — with visibility and collaboration built in.

Capability

How It Helps You

Import and Deploy Compose Apps

Move from Compose, Helm, or Kustomize to Kubernetes in minutes.

Visual Designer

Understand your architecture through connected, interactive diagrams.

Design Catalog

Use ready-made templates and proven infrastructure patterns.

Terminal Integration

Debug directly from the Kanvas UI, without switching tools.

Sharable Views

Collaborate on live infrastructure with your team.

Multi-Environment Management

Operate across local, staging, and cloud clusters from one dashboard.

Kanvas brings visual design and real-time operations directly into Docker Desktop. Import your Compose files, Kubernetes Manifests, Helm Charts, and Kustomize files to explore the catalog of ready-to-use architectures, and deploy to Kubernetes in minutes — no YAML wrangling required.

Designs can also be exported in a variety of formats, including as OCI-compliant images and shared through registries like Docker Hub, GitHub Container Registry, or AWS ECR — keeping your infrastructure as design versioned and portable.

Install the Kanvas Extension from Docker Hub and start designing your infrastructure today.
Quelle: https://blog.docker.com/feed/

Announcing vLLM v0.12.0, Ministral 3 and DeepSeek-V3.2 for Docker Model Runner

At Docker, we are committed to making the AI development experience as seamless as possible. Today, we are thrilled to announce two major updates that bring state-of-the-art performance and frontier-class models directly to your fingertips: the immediate availability of Mistral AI’s Ministral 3 and DeepSeek-V3.2, alongside the release of vLLM v0.12.0 on Docker Model Runner.

Whether you are building high-throughput serving pipelines or experimenting with edge-optimized agents on your laptop, today’s updates are designed to accelerate your workflow.

Meet Ministral 3: Frontier Intelligence, Edge Optimized

While vLLM powers your production infrastructure, we know that development needs speed and efficiency right now. That’s why we are proud to add Mistral AI’s newest marvel, Ministral 3, to the Docker Model Runner library on Docker Hub.

Ministral 3 is Mistral AI’s premier edge model. It packs frontier-level reasoning and capabilities into a dense, efficient architecture designed specifically for local inference. It is perfect for:

Local RAG applications: Chat with your docs without data leaving your machine.

Agentic Workflows: Fast reasoning steps for complex function-calling agents.

Low-latency prototyping: Test ideas instantly without waiting for API calls.

DeepSeek-V3.2: The Open Reasoning Powerhouse

We are equally excited to introduce support for DeepSeek-V3.2. Known for pushing the boundaries of what open-weights models can achieve, the DeepSeek-V3 series has quickly become a favorite for developers requiring high-level reasoning and coding proficiency.

DeepSeek-V3.2 brings Mixture-of-Experts (MoE) architecture efficiency to your local environment, delivering performance that rivals top-tier closed models. It is the ideal choice for:

Complex Code Generation: Build and debug software with a model specialized in programming tasks.

Advanced Reasoning: Tackle complex logic puzzles, math problems, and multi-step instructions.

Data Analysis: Process and interpret structured data with high precision.

Run Them with One Command

With Docker Model Runner, you don’t need to worry about complex environment setups, python dependencies, or weight downloads. We’ve packaged both models so you can get started immediately.

To run Ministral 3:

docker model run ai/ministral3

To run DeepSeek-V3.2:

docker model run ai/deepseek-v3.2-vllm

These commands automatically pull the model, set up the runtime, and drop you into an interactive chat session. You can also point your applications to them using our OpenAI-compatible local endpoint, making them drop-in replacements for your cloud API calls during development.

vLLM v0.12.0: Faster, Leaner, and Ready for What’s Next

We are excited to highlight the release of vLLM v0.12.0. vLLM has quickly become the gold standard for high-throughput and memory-efficient LLM serving, and this latest version raises the bar again.

Version 0.12.0 brings critical enhancements to the engine, including:

Expanded Model Support: Day-0 support for the latest architecture innovations, ensuring you can run the newest open-weights models (like DeepSeek V3.2 and Ministral 3) the moment they drop.

Optimized Kernels: Significant latency reductions for inference on NVIDIA GPUs, making your containerized AI applications snappier than ever.

Enhanced PagedAttention: Further optimizations to memory management, allowing you to batch more requests and utilize your hardware to its full potential.

Why This Matters

The combination of Ministral 3, DeepSeek-V3.2, and vLLM v0.12.0 represents the maturity of the open AI ecosystem.

You now have access to a serving engine that maximizes data center performance, alongside a choice of models to fit your specific needs—whether you prioritize the edge-optimized speed of Ministral 3 or the deep reasoning power of DeepSeek-V3.2. All of this is easily accessible via Docker Model Runner.

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!
Quelle: https://blog.docker.com/feed/

Docker, JetBrains, and Zed: Building a Common Language for Agents and IDEs

As agents become capable enough to write and refactor code, they should work natively inside the environments developers work in: editors. 

That’s why JetBrains and Zed are co-developing ACP, the Agent Client Protocol. ACP gives agents and editors a shared language, so any agent can read context, take actions, and respond intelligently without bespoke wiring for every tool.

Why it matters

Every protocol that’s reshaped development (LSP for language tools, MCP for AI context) works the same way: define the standard once, unlock the ecosystem. ACP does this for the editor itself. Write an agent that speaks ACP, and it works in JetBrains, Zed, or anywhere else that adopts the protocol. 

Docker’s contribution

Docker’s cagent, an open-source multi-agent runtime, already supports ACP, alongside Claude Code, Codex CLI, and Gemini CLI. Agents built with cagent can run in any ACP-compatible IDE, like JetBrains, immediately.

We’ve also shipped Dynamic MCPs, letting agents discover and compose tools at runtime, surfaced directly in the editor where developers work.

What’s next

ACP is early, but the direction is clear. As agents embed deeper into workflows, the winners will be tools that interoperate. Open standards let everyone build on shared foundations instead of custom glue.

Docker will continue investing in ACP and standards that make development faster, more open, and more secure. When code, context, and automation converge, shared protocols ensure we move forward together.
Quelle: https://blog.docker.com/feed/

Securing the Docker MCP Catalog: Commit Pinning, Agentic Auditing, and Publisher Trust Levels

Trust is the most important consideration when you connect AI assistants to real tools. While MCP containerization provides strong isolation and limits the blast radius of malfunctioning or compromised servers, we’re continuously strengthening trust and security across the Docker MCP solutions to further reduce exposure to malicious code. As the MCP ecosystem scales from hundreds to tens of thousands of servers (and beyond), we need stronger mechanisms to prove what code is running, how it was built, and why it’s trusted.

To strengthen trust across the entire MCP lifecycle, from submission to maintenance to daily use, we’ve introduced three key enhancements:

Commit Pinning: Every Docker-built MCP server in the Docker MCP Registry (the source of truth for the MCP Catalog) is now tied to a specific Git commit, making each release precisely attributable and verifiable.

Automated, AI-Audited Updates: A new update workflow keeps submitted MCP servers current, while agentic reviews of incoming changes make vigilance scalable and traceable.

Publisher Trust Levels: We’ve introduced clearer trust indicators in the MCP Catalog, so developers can easily distinguish between official, verified servers and community-contributed entries.

These updates raise the bar on transparency and security for everyone building with and using MCP at scale with Docker.

Commit pins for local MCP servers

Local MCP servers in the Docker MCP Registry are now tied to a specific Git commit with source.commit. That commit hash is a cryptographic fingerprint for the exact revision of the server code that we build and publish. Without this pinning, a reference like latest or a branch name would build whatever happens to be at that reference right now, making builds non-deterministic and vulnerable to supply chain attacks if an upstream repository is compromised. Even Git tags aren’t really immutable since they can be deleted and recreated to point to another commit. By contrast, commit hashes are cryptographically linked to the content they address, making the outcome of an audit of that commit a persistent result.

To make things easier, we’ve updated our authoring tools (like the handy MCP Registry Wizard) to automatically add this commit pin when creating a new server entry, and we now enforce the presence of a commit pin in our CI pipeline (missing or malformed pins will fail validation). This enforcement is deliberate: it’s impossible to accidentally publish a server without establishing clear provenance for the code being distributed. We also propagate the pin into the MCP server image metadata via the org.opencontainers.image.revision label for traceability.

Here’s an example of what this looks like in the registry:

# servers/aws-cdk-mcp-server/server.yaml
name: aws-cdk-mcp-server
image: mcp/aws-cdk-mcp-server
type: server
meta:
category: devops
tags:
– aws-cdk-mcp-server
– devops
about:
title: AWS CDK
description: AWS Cloud Development Kit (CDK) best practices, infrastructure as code patterns, and security compliance with CDK Nag.
icon: https://avatars.githubusercontent.com/u/3299148?v=4
source:
project: https://github.com/awslabs/mcp
commit: 7bace1f81455088b6690a44e99cabb602259ddf7
directory: src/cdk-mcp-server

And here’s an example of how you can verify the commit pin for a published MCP server image:

$ docker image inspect mcp/aws-core-mcp-server:latest
–format '{{index .Config.Labels "org.opencontainers.image.revision"}}'
7bace1f81455088b6690a44e99cabb602259ddf7

In fact, if you have the cosign and jq commands available, you can perform additional verifications:

$ COSIGN_REPOSITORY=mcp/signatures cosign verify mcp/aws-cdk-mcp-server –key https://raw.githubusercontent.com/docker/keyring/refs/heads/main/public/mcp/latest.pub | jq -r ' .[].optional["org.opencontainers.image.revision"] '

Verification for index.docker.io/mcp/aws-cdk-mcp-server:latest —
The following checks were performed on each of these signatures:
– The cosign claims were validated
– Existence of the claims in the transparency log was verified offline
– The signatures were verified against the specified public key
7bace1f81455088b6690a44e99cabb602259ddf7

Keeping in sync

Once a server is in the registry, we don’t want maintainers needing to hand‑edit pins every time they merge something into their upstream repos (they have better things to do with their time), so a new automated workflow scans upstreams nightly, bumping source.commit when there’s a newer revision, and opening an auditable PR in the registry to track the incoming upstream changes.  This gives you the security benefits of pinning (immutable references to reviewed code) without the maintenance toil. Updates still flow through pull requests, so you get a review gate and approval trail showing exactly what new code is entering your supply chain. The update workflow operates on a per-server basis, with each server update getting its own branch and pull request.

This raises the question, though: how do we know that the incoming changes are safe?

AI in the review loop, humans in charge

Every proposed commit pin bump (and any new local server) will now be subject to an agentic AI security review of the incoming upstream changes. The reviewers (Claude Code and OpenAI Codex) analyze MCP server behavior, flagging risky or malicious code, adding structured reports to the PR, and offering standardized labels such as security-risk:high or security-blocked. Humans remain in the loop for final judgment, but the agents are relentless and scalable.

The challenge: untrusted code means untrusted agents

When you run AI agents in CI to analyze untrusted code, you face a fundamental problem: the agents themselves become attack vectors. They’re susceptible to prompt injection through carefully crafted code comments, file names, or repository structure. A malicious PR could attempt to manipulate the reviewing agent into approving dangerous changes, exfiltrating secrets, or modifying the review process itself.

We can’t trust the code under review, but we also can’t fully trust the agents reviewing it.

Isolated agents

Our Compose-based security reviewer architecture addresses this trust problem by treating the AI agents as untrusted components. The agents run inside heavily isolated Docker containers with tightly controlled inputs and outputs:

The code being audited is mounted read-only — The agent can analyze code but never modify it. Moreover, the code it audits is just a temporary copy of the upstream repository, but the read-only access means that the agent can’t do something like modify a script that might be accidentally executed outside the container.

The agent can only write to an isolated output directory — Once the output is written, the CLI wrapper for the agent only extracts specific files (a Markdown report and a text file of labels, both with fixed names), meaning any malicious scripts or files that might be written to that directory are deleted.

The agent lacks direct Internet access — the reviewer container cannot reach external services.

CI secrets and API credentials never enter the reviewer container — Instead, a lightweight reverse proxy on a separate Docker network accepts requests from the reviewer, injects inference provider API keys on outbound requests, and shields those keys from the containerized code under review.

All of this is encapsulated in a Docker Compose stack and wrapped by a convenient CLI that allows running the agent both locally and in CI.

Most importantly, this architecture ensures that even if a malicious PR successfully manipulates the agent through prompt injection, the damage is contained: the agent cannot access secrets, cannot modify code, and cannot communicate with external attackers.

CI integration and GitHub Checks

The review workflow is automatically triggered when a PR is opened or updated. We still maintain some control over these workflows for external PRs, requiring manual triggering to prevent malicious PRs from exhausting inference API credits. These reviews surface directly as GitHub Status Checks, with each server being reviewed receiving dedicated status checks for any analyses performed.

The resulting check status maps to the associated risk level determined by the agent: critical findings result in a failed check that blocks merging, high and medium findings produce neutral warnings, while low and info findings pass. We’re still tuning these criteria (since we’ve asked the agents to be extra pedantic) and currently reviewing the reports manually, but eventually we’ll have the heuristics tuned to a point where we can auto-approve and merge most updated PRs. In the meantime, these reports serve as a scalable “canary in the coal mine”, alerting Docker MCP Registry maintainers to incoming upstream risks — both malicious and accidental.

It’s worth noting that the agent code in the MCP Registry repository is just an example (but a functional one available under an MIT License). The actual security review agent that we run lives in a private repository with additional isolation, but it follows the same architecture.

Reports and risk labels

Here’s an example of a report our automated reviewers produced:

# Security Review Report

## Scope Summary
– **Review Mode:** Differential
– **Repository:** /workspace/input/repository (stripe)
– **Head Commit:** 4eb0089a690cb60c7a30c159bd879ce5c04dd2b8
– **Base Commit:** f495421c400748b65a05751806cb20293c764233
– **Commit Range:** f495421c400748b65a05751806cb20293c764233…4eb0089a690cb60c7a30c159bd879ce5c04dd2b8
– **Overall Risk Level:** MEDIUM

## Executive Summary

This differential review covers 23 commits introducing significant changes to the Stripe Agent Toolkit repository, including: folder restructuring (moving tools to a tools/ directory), removal of evaluation code, addition of new LLM metering and provider packages, security dependency updates, and GitHub Actions workflow permission hardening.

The reviewers can produce both differential analyses (looking at the changes brought in by a specific set of upstream commits) as well as full analyses (looking at entire codebases). We intend to run both differential for PRs and full analyses regularly.

Why behavioral analysis matters

Traditional scanners remain essential, but they tend to focus on things like dependencies with CVEs, syntactical errors (such as a missing break in a switch statement), or memory safety issues (such as dereferencing an uninitialized pointer) — MCP requires us to also examine code’s behavior. Consider the recent malicious postmark-mcp package impersonation: a one‑line backdoor quietly BCC’d outgoing emails to an attacker. Events like this reinforce why our registry couples provenance with behavior‑aware reviews before updates ship.

Real-world results

In our scans so far, we’ve already found several real-world issues in upstream projects (stay tuned for a follow-up blog post), both in MCP servers and with a similar agent in our Docker Hardened Images pipeline. We’re happy to say that we haven’t run across anything malicious so far, just logic errors with security implications, but the granularity and subtlety of issues that these agents can identify is impressive.

Trust levels in the Docker MCP Catalog

In addition to the aforementioned technical changes, we’ve also introduced publisher trust levels in the Docker MCP Catalog, exposing them in both the Docker MCP Toolkit in Docker Desktop and on Docker MCP Hub. Each server will now have an associated icon indicating whether the server is from a “known publisher” or maintained by the community. In both cases, we’ll still subject the code to review, but these indicators should provide additional context on the origin of the MCP server.

Figure 1: Here’s an example of an MCP server, the AWS Terraform MCP published by a known, trusted publisher

Figure 2: The Fetch MCP server, an example of an MCP community server

What does this mean for the community?

Publishers now benefit from a steady stream of upstream improvements, backed by a documented, auditable trail of code changes. Commit pins make each release precisely attributable, while the nightly updater keeps the catalog current with no extra effort from publishers or maintainers. AI-powered reviewers scale our vigilance, freeing up human reviewers to focus on the edge cases that matter most.

At the same time, developers using MCP servers get clarity about a server’s publisher, making it easier to distinguish between official, community, and third-party contributions. These enhancements strengthen trust and security for everyone contributing to or relying on MCP servers in the Docker ecosystem.

Submit your MCP servers to Docker by following the submission guidance here!

Learn more

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers.

Get started with the MCP Toolkit: Run MCP servers easily and securely.

Find documentation for Docker MCP Catalog and Toolkit.

Quelle: https://blog.docker.com/feed/

Building AI agents shouldn’t be hard. According to theCUBE Research, Docker makes it easy

For most developers, getting started with AI is still too complicated. Different models, tools, and platforms don’t always play nicely together. But with Docker, that’s changing fast.

Docker is emerging as essential infrastructure for standardized, portable, and scalable AI environments. By bringing composability, simplicity, and GPU accessibility to the agentic era, Docker is helping developers and the enterprises they support move faster, safer, and with far less friction. 

Real results: Faster AI delivery with Docker

The platform is accelerating innovation: According to the latest report from theCUBE Research, 88% of respondents reported that Docker reduced the time-to-market for new features or products, with nearly 40% achieving efficiency gains of more than 25%. Docker is playing an increasingly vital role in AI development as well. 52% of respondents cut AI project setup time by over 50%, while 97% report increased speed for new AI product development.

Reduced AI project failures and delays

Reliability remains a key performance indicator for AI initiatives, and Docker is proving instrumental in minimizing risk. 90% of respondents indicated that Docker helped prevent at least 10% of project failures or delays, while 16% reported prevention rates exceeding 50%. Additionally, 78% significantly improved testing and validation of AI models. These results highlight how Docker’s consistency, isolation, and repeatability not only speed development but also reduce costly rework and downtime, strengthening confidence in AI project delivery.

Build, share, and run agents with Docker, easily and securely

Docker’s mission for AI is simple: make building and running AI and agentic applications as easy, secure, and shareable as any other kind of software.

Instead of wrestling with fragmented tools, developers can now rely on Docker’s trusted, container-based foundation with curated catalogs of verified models and tools, and a clean, modular way to wire them together. Whether you’re connecting an LLM to a database or linking services into a full agentic workflow, Docker makes it plug-and-play.

With Docker Model Runner, you can pull and run large language models locally with GPU acceleration. The Docker MCP Catalog and Toolkit connect agents to over 270 MCP servers from partners like Stripe, Elastic, and GitHub. And with Docker Compose, you can define the whole AI stack of models, tools, and services in a single YAML file that runs the same way locally or in the cloud. Cagent, our open-source agent builder, lets you easily build, run, and share AI agents, with behavior, tools, and persona all defined in a single YAML file. And with Docker Sandboxes, you can run coding agents like Claude Code in a secure, local environment, keeping your workflows isolated and your data protected.

Even hardware limits aren’t a blocker anymore when building agents. Docker Offload lets developers run heavy compute tasks on cloud GPUs with one click. 

Conclusion 

Docker’s vision is clear: to make AI development as simple and powerful as the workflows developers already know and love. And it’s working: theCUBE reports 52% of users cut AI project setup time by more than half, while 87% say they’ve accelerated time-to-market by at least 26%.

Learn more

Read more about ROI of working with Docker in our latest blog

Download theCUBE Research Report and eBook – economic validation of Docker

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Open Docker Desktop and get started with the MCP Toolkit (Requires version 4.48 or newer to launch the MCP Toolkit automatically)

Head over to the cagent GitHub repository, give the repository a star, try it out, and let us know what amazing agents you build!

Quelle: https://blog.docker.com/feed/

Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Embeddings have become the backbone of many modern AI applications. From semantic search to retrieval-augmented generation (RAG) and intelligent recommendation systems, embedding models enable systems to understand the meaning behind text, code, or documents, not just the literal words.

But generating embeddings comes with trade-offs. Using a hosted API for embedding generation often results in reduced data privacy, higher call costs, and time-consuming model regeneration. When your data is private or constantly evolving (think internal documentation, proprietary code, or customer support content), these limitations quickly become blockers.

Instead of sending data to a remote service, you can easily run local embedding models on-premises with Docker Model Runner. Model Runner brings the power of modern embeddings to your local environment, giving you privacy, control, and cost-efficiency out of the box. 

In this post, you’ll learn how to use embedding models for semantic search. We’ll start by covering the theory behind embedding and why developers should run them. Then, we’ll wrap up with a practical example, using Model Runner, to help you get started.

Understanding semantic search embeddings 

Let’s take a moment to first demystify what embeddings are.

Embeddings represent words, sentences, and even code as high-dimensional numerical vectors that capture semantic relationships. In this vector space, similar items cluster together, while dissimilar ones are farther apart.

For example, a traditional keyword search looks for exact matches. If you search for “authentication”, you’ll only find documents containing that exact term. But with embeddings, searching for “user login” might also surface results about authentication, session management, or security tokens because the model understands that these are semantically related ideas.

This makes embeddings the foundation for more intelligent search, retrieval, and discovery — where systems understand what you mean, not just what you type.

For a deeper perspective on how language and meaning intersect in AI, check out “The Language of Artificial Intelligence”.

How Vector Similarity Enables Semantic Search with Embeddings

Here’s where the math behind semantic search comes in, and it’s elegantly simple.

Once text is converted into vectors (lists of numbers), we can measure how similar two pieces of text are using cosine similarity:

Where:

A is your query vector (e.g., “user login”),

B is another vector (e.g., a code snippet or document).

The result is a similarity score, typically between 0 and 1, where values closer to 1 mean the texts are more similar in meaning.

In practice:

A search query and a relevant document will have a high cosine similarity.

Irrelevant results will have low similarity.

This simple mathematical measure allows you to rank documents by how semantically close they are to your query, which powers features like:

Natural language search over docs or code

RAG pipelines that retrieve contextually relevant snippets

Deduplication or clustering of related content

With Model Runner, you can generate these embeddings locally, feed them into a vector database (like Milvus, Qdrant, or pgvector), and start building your own semantic search system without sending a single byte to a third-party API.

Why use Docker Model Runner to run embedding models

With Model Runner, you don’t have to worry about setting up environments or dependencies. Just pull a model, start the runner, and you’re ready to generate embeddings, all inside a familiar Docker workflow.

Full data privacy 

Your sensitive data never leaves your environment. Whether you’re embedding source code, internal documents, or customer content, you can rest assured that everything stays local — no third-party API calls, no network exposure.

Zero cost per embedding

There are no usage-based API costs. Once you have the model running locally, you can generate, update, or rebuild your embeddings as often as you need, at no extra cost.

That means iterating on your dataset or experimenting with new prompts won’t affect your budget.

Performance and control

Run the model that best fits your use case, leveraging your own CPU or GPU for inference.

Models are distributed as OCI artifacts, so they integrate seamlessly into your existing Docker workflows, CI/CD pipelines, and local development setups. This means you can manage and version models just like any other container image, ensuring consistency and reproducibility across environments.

Model Runner lets you bring models to your data, not the other way around, unlocking local, private, and cost-effective AI workflows.

Hands-on: Generating embeddings with Docker Model Runner

Now that we understand what embeddings are and how they capture semantic meaning, let’s see how simple it is to generate embeddings locally using Model Runner.

Step 1. Pull the model

docker model pull ai/qwen3-embedding

Step 2. Generate Embeddings

You can now send text to this endpoint via curl or your preferred HTTP client:

curl http://localhost:12434/engines/v1/embeddings
-H "Content-Type: application/json"
-d '{
"model": "ai/qwen3-embedding",
"input": "A dog is an animal"
}'

The response will include a list of embedding vectors, which is a numerical representation of your input text.

You can store these vectors in a vector database like Milvus, Qdrant, or pgvector to perform semantic search or similarity queries.

Example use case: Semantic search over your codebase

Let’s make it practical.

Imagine you want to enable semantic code search across your project repository.

The process will look like:

Step 1. Chunk and embed your code

Split your codebase into logical chunks. Generate embeddings for each chunk using your local Docker Model Runner endpoint.

Step 2. Store embeddings 

Save those embeddings along with metadata (file name, path, etc.). You would usually use a Vector Database to store these embeddings, but in this demo, we’re going to store them in a file for simplicity.

Step 3. Query by meaning

When a developer searches “user login”, you embed the query and compare it to your stored vectors using cosine similarity.

We have included a demo in the Docker Model Runner repository that does exactly that.

Figure 1: Codebase example demo with embeddings stats, example queries, and search results.

Conclusion

Embeddings help applications work with intelligent meaning, not just keywords. The old hassle was wiring up third-party APIs, juggling data privacy, and watching per-call costs creep up.

Docker Model Runner flips the script. Now, you can run embedding models locally where your data lives with full control over your data and infrastructure. Ship semantic search, RAG pipelines, or custom search with a consistent Docker workflow — private, cost-effective, and reproducible. 

No usage fees. No external dependencies. By bringing models directly to your data, Docker makes it easier than ever to explore, experiment, and innovate, safely and at your own pace.

How you can get involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Get started with Docker Model Runner →

Learn more

Check out Docker Model Runner integration with vLLM announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Docker Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/

You Want Microservices, But Do You Really Need Them?

Do you know who managed to cut costs by a staggering 90% by abandoning microservices for a monolith in May 2023? Not a cash-strapped startup or an indie project—Amazon itself, for its Prime Video service. The same AWS that earns billions every year by selling microservices infrastructure admitted that, sometimes, a good old monolith wins. 

This reversal from the company that practically wrote the playbook on distributed systems sent shockwaves through the cloud-native community. Amazon later removed the original blog post, but the internet never forgets, as you’ll see later.

I’ve been speaking up against unnecessary or premature use of microservices architecture for five, six years now. After Amazon Prime Video went back to a monolith, I came across several eminent architects who are also speaking against microservices as default.

And yet in most tech circles, microservices are still viewed as the only way to build modern software. They dominate conferences, blogs, and job listings. Teams adopt them not because their requirements justify it, but because it feels like the obvious (and résumé-boosting) choice. “Cloud-native” has become synonymous with “microservices-by-default”, as if other approaches are as obsolete as floppy disks. 

Microservices do solve real problems, but at a massive scale. Most teams don’t actually operate at that scale.

With this article, I urge you to reflect on the question the industry has mostly stopped asking: Should microservices be the default choice for building at scale? We’ll look at reversal stories and insights from seasoned architects, and weigh the trade-offs and alternatives. After considering all of this, you can decide whether your problem really needs a constellation of microservices.

Microservices: The Agility-Complexity Trade-Off

On paper, microservices look impressive. Instead of one big monolith, you split your application into many small services. Each one can be written in any language, owned by a small team, and deployed on its own schedule. If you need more capacity, you can scale only the part that’s under load. The promise is elegant: independent deployability, autonomous teams, multi-language stacks, and elastic scaling.

But the catch is that every split creates a seam, and every seam is a potential failure point. Inside a monolith, function calls are instant and predictable. Across services, those same calls become network requests: slower, failure-prone, sometimes returning inconsistent data. With dozens (or hundreds) of services, you need version management, schema evolution, distributed transactions, tracing, centralized logging, and heavy-duty CI/CD pipelines just to keep things running.

This Gartner diagram captures the trade-off perfectly: microservices exchange the simplicity of one codebase for the complexity of many.

At a massive scale (think Netflix), that trade-off may be worth it. But when operational benefits don’t outweigh the costs, teams end up paying a steep price in debugging, coordination, and glue code just to hold their product together.

Microservices make sense in very specific scenarios where distinct business capabilities need independent scaling and deployment. For example, payment processing (security-critical, rarely updated) differs fundamentally from recommendation engine (memory-intensive, constantly A/B tested). These components have different scaling patterns, deployment cycles, and risk profiles, which justify separate services.

The success of microservices hinges on clear business domain boundaries that match your team structure, as Conway’s Law predicts. If your organization naturally splits into autonomous teams that own distinct capabilities, microservices might work. (So, most “one-and-a-half pizza” startups don’t qualify, do they?) 

That’s why microservices work effectively for companies like Amazon and Uber—although not always.

In fact, most organizations lack the prerequisites: dedicated service ownership, mature CI/CD, robust monitoring, and crucially, scale that justifies the operational overhead. Startups that adapt microservices prematurely often regret their decision.

So ask yourself:

Are you using microservices to solve an independent scaling problem, or are you inviting more complexity than your solution needs?

The Great Microservices Reversal

Ironically, even though tech giants are the ones that are most likely to benefit from microservices, many of these very same companies are walking back their microservices architectures, and the results are eye-opening.

Amazon Prime Video: 90% Cost Reduction with a Monolith

In May 2023, Amazon engineers admitted the unthinkable: Prime Video had abandoned microservices for a monolith. Their Video Quality Analysis (VQA) team had built what looked like a textbook distributed system: AWS Step Functions and Lambda monitored thousands of video streams through independent, scalable components. On paper, it was serverless perfection.

In practice, it was a disaster. “We realized that distributed approach wasn’t bringing a lot of benefits in our specific use case,” said Marcin Kolny in the now-archived Prime Video Engineering blog. Their “infinitely scalable” system crumbled at just 5% of expected load due to orchestration overhead.

The fix was embarrassingly simple: collapse everything into a single process. It resulted in 90% lower costs and faster performance. 

Twilio Segment: From 140 Services to One Fast Monolith

Back in 2018, Twilio Segment, a customer data platform, documented a similar reversal in their brutally honest post “Goodbye Microservices”.

Their system had sprawled into 140+ services, creating operational chaos. At one point, three full-time engineers spent most of their time firefighting instead of building. As they admitted, “Instead of enabling us to move faster, the small team found themselves mired in exploding complexity. Essential benefits of this architecture became burdens. As our velocity plummeted, our defect rate exploded.”

Their solution was radical but effective: collapse all 140+ services into a single monolith. The impact was immediate. Test suites that once took an hour now finished in milliseconds. Developer productivity soared: they shipped 46 improvements to shared libraries in a year, up from 32 in the microservices era. 

Shopify: Sanity over Hype

Shopify runs one of the largest Ruby on Rails codebases in the world (2.8M+ lines). Instead of chasing microservices, they deliberately chose a modular monolith: a single codebase with clear component boundaries.

Shopify’s engineers concluded that “microservices would bring their own set of challenges”, so they chose modularity without the operational overhead.

All these examples beg the question:

If even the pioneers of microservices are retreating, why are we still treating it as gospel?

Expert Voices against Microservices Mania

Some of the most respected voices in software architecture—people behind many of the systems we all admire—are also cautioning against microservices and repeating mistakes they’ve seen play out at scale. (After all, cheerleaders don’t play the game; cloud DevRels rarely build at scale.)

Rails Creator: Simplicity over Sophistication

David Heinemeier Hansson (DHH), the creator of Ruby on Rails, has long advocated simplicity over architectural trends. His analysis of the Amazon Prime Video reversal puts it bluntly:

“The real-world results of all this theory are finally in, and it’s clear that in practice, microservices pose perhaps the biggest siren song for needlessly complicating your system.”

DHH’s image of a siren song is apt: microservices promise elegance but leave teams wrecked on the rocks of complexity.

Microservices: Mistake of The Decade?

Jason Warner, former CTO of GitHub, doesn’t mince words while commenting on microservices: 

“I’m convinced that one of the biggest architectural mistakes of the past decade was going full microservice.”

Warner understands scale: GitHub runs at internet scale, and he’s led engineering at Heroku and Canonical. His critique cuts deeper because it’s lived experience, beyond theoretical advice:

“90% of all companies in the world could probably just be a monolith running against a primary db cluster with db backups, some caches and proxies and be done with it.”

GraphQL Co-Creator: “Don’t”

Then there’s Nick Schrock, co-creator of GraphQL. If anyone had a reason to cheer for distributed systems, it’d be him. Instead, he says:

“Microservices are such a fundamentally and catastrophically bad idea that there are going to be an entire cohort of multi-billion companies built that do nothing but contain the damage that they have wrought.”

He goes on to describe microservices as organizational gambles:

“[Y]ou end up with these services that you have to maintain forever that match the org structure and the product requirements from five years ago. Today, they don’t make a lot of sense.”

The person who literally built tools to fix distributed system pain says don’t distribute unless you must, maybe it’s time to listen.

Other Voices Questioning Microservice Maximalism

Other engineering leaders are also reconsidering microservice maximalism. 

At Uber, Gergely Orosz admitted:

“We’re moving many of our microservices to macroservices (well-sized services). Exactly b/c testing and maintaining thousands of microservices is not only hard – it can cause more trouble long-term than it solves the short-term.”

Uber still runs microservices where they’re justified, but they’re choosing their battles.

Kelsey Hightower, known for his work with Kubernetes and Google Cloud, cut through the microservices hype with CS101:

“I’m willing to wager a monolith will outperform every microservice architecture. Just do the math on the network latency between each service and the amount of serialization and deserialization of each request.”

He subsequently deleted this tweet, but the network math still grades microservices.

When pioneers like these, including those who actually solved distributed systems at scale, start waving red flags, it’s worth taking note. 

My question here is:

If GitHub’s CTO thinks 90% of companies don’t need microservices, are you sure yours is part of the 10%?

The Hidden Costs of Microservices

Microservices demand such caution because of these hidden costs that teams often underestimate.

Operational Costs

A monolith is simple: in-process function calls. 

Microservices replace that with networks. Every request now travels across machines, through load balancers, service meshes, and authentication layers, creating more failure points and infrastructure needs. You suddenly need service discovery (how services find each other), distributed tracing (tracking requests across services), centralized logging (aggregating logs from multiple services), and monitoring systems that understand service topology.

Each of these is necessary, but together they’re complex and expensive. Duplicated data requires extra storage. Constant service-to-service calls rack up network egress fees. Cloud costs scale faster than the apps they host. Prime Video’s workflow spent more on orchestrating S3 data transfers between services than on actual processing. 

Developer Productivity Drain

In microservices, the hard part isn’t writing code; it’s navigating distributed system interactions.

In “The macro problem with microservices“, Stack Overflow identifies a critical productivity drain: distributed state forces developers to write defensive code that constantly checks for partial failures. 

In a monolith, a developer can follow a code path end-to-end within one repo. In microservices, one feature might span four or five repos with different dependencies and deploy cycles. Adding a single field triggers weeks of coordination: you need to update one service, then wait for consumers to adopt, version your APIs, manage rollouts, and so on. Different teams will also typically maintain different microservices using different tech stacks, so there’s a risk that they unintentionally break something as well. Breaking changes that a compiler would catch in a monolith now surface as runtime errors in production.

Testing and Deployment Complexity

Monolith integration and end-to-end tests are faster because they run locally, in memory. Distributed systems don’t allow that luxury: real confidence requires integration and end-to-end tests across numerous service boundaries. So these tests are slower, more brittle, and require staging environments that resemble production, all of which effectively double infrastructure costs and slow feedback loops.

Many teams discover this only after their test suite becomes a bottleneck. Deployment orchestration adds another layer. Rolling updates across interdependent services require careful sequencing to avoid breaking contracts. Version incompatibility disturbs frequently: Service A works with Service B v2.1 but breaks with v2.2.

Failed deployments leave systems partially updated and difficult to recover.

Data Management and Consistency

The most underestimated complexity of microservices lies in data consistency across service boundaries.

Monoliths benefit from ACID transactions: operations complete entirely or fail entirely. Microservices split that across services, forcing you to build distributed saga (multi-step workflows with rollback logic), live with eventual consistency (data only becomes correct after a delay), or write compensation logic (extra code to undo partial failures). What was once a single database transaction now spans network hops, retries, and partial failures. Debugging inconsistent orders or payments gets much harder when state is duplicated across services. 

As research confirms, data duplication, correctness challenges, and transactional complexity are the top pain points in microservice systems.

The Compounding Effect

These complexities multiply. Operational overhead makes debugging harder, which slows testing, which makes deployments riskier, which creates more incidents. Microservices don’t just shift complexity from code to operations; they tax every part of your engineering process. 

Unless your scale demands it, that tax often outweighs the benefits. 

Think about it:

If every network hop adds complexity and cost, does your use case really justify the price?

Beyond Microservices: Smarter Architectural Alternatives

Before defaulting to microservices, it’s worth considering how simpler, well-structured architectures can deliver comparable scalability without the distributed complexity tax. Two noteworthy alternatives are modular monoliths and service-oriented architectures.

Modular Monoliths: Structure without Distribution

Unlike traditional monoliths that become tangled messes, modular monoliths enforce strict internal boundaries through clear module APIs and disciplined separation. Each module exposes well-defined interfaces, enabling teams to work independently while deploying a single, coherent system.

As Kent Beck explains in “Monolith -> Services: Theory & Practice”, modular monoliths manage coupling through organizational discipline rather than distributed networks. The key difference: modules still communicate via explicit contracts like microservices, but they use fast, reliable function calls instead of HTTP requests that are vulnerable to network latency and partial failures.

Why does it work?

Simpler operations: microservices-level organization with monolithic simplicity

Stronger consistency: full ACID transactions

Easier debugging: one traceable system, no hunting for bugs in the ELK haystack

Better performance: function calls beat network hops

Here’s some real-world proof: Shopify’s 2.8 million-line codebase handles 30TB per minute with separate teams owning distinct modules, yet everything deploys together. Facebook runs similarly. (And principal architect Keith Adams jokes that if you want to be talked out of microservices, he’s your guy.)

With recent developments in frameworks like Spring Modulith, Django, Laravel, and Rails (as seen at scale with Shopify), modular monoliths are poised to gain wider traction in the years ahead.

Service-Oriented Architecture: The Middle Ground

Service-oriented architecture (SOA) sits between monoliths and microservices, favoring larger, domain-driven services instead of dozens or hundreds of tiny ones. These services often communicate via an enterprise service bus (ESB), which reduces orchestration overhead while preserving separation of concerns.

Instead of splitting authentication, user preferences, and notifications into separate microservices, SOA might combine them into a single “User Service”, simplifying coordination while preserving autonomy and targeted scaling. SOA provides enterprise-grade modularity without ultra-fine-grained distribution overhead.

Here’s why it works:

Right-sized boundaries: fewer, domain-aligned services instead of sprawl

Targeted scalability: scale services tied to real business domains

Pragmatic complexity: avoids ultra-fine-grained overhead while retaining modular reasoning

SOA has also been proven to work at scale. Norwegian Air Shuttle, Europe’s 9th-largest airline, used SOA to boost agility across complex flight operations. Credit Suisse’s SOA rollout powered millions of service calls per day back in the early 2000s.

Choosing Wisely: Fit over Hype

The problem you’re solving should justify your architecture.

I often use this analogy in consulting: You don’t need a sword to cut a lemon—a knife suffices. And as timeless wisdom reminds us, simplicity is the ultimate sophistication. 

In all likelihood, you’re not Google (you don’t need Google-level fault tolerance), or Amazon (you don’t need massive write availability), or LinkedIn (you don’t handle billions of events a day). Most applications don’t operate at that scale, demanding fundamentally different solutions than ultra-distributed architectures.

For most systems, well-structured modular monoliths (for most common applications, including startups) or SOA (enterprises) deliver comparable scalability and resilience as microservices, without the distributed complexity tax. Alternatively, you may also consider well-sized services (macroservices, or what Gartner proposed as miniservices) instead of tons of microservices.

It’s worth asking:

If simpler architectures can deliver comparable scalability, why are you choosing the complexity of microservices?

Docker: Built for Any Architecture

Docker isn’t just for microservices—it works great across all kinds of architectures like monoliths, SOA, APIs, and event-driven systems. The real benefit is that Docker gives you consistent performance, easier deployment, and flexibility to scale up your apps no matter what architectural approach you’re using.

Docker packages applications cleanly, keeps environments consistent from laptop to production, simplifies dependency management, and isolates applications from the host system. A Dockerized monolith offers all these benefits, minus the orchestration overhead of microservices. 

Microsoft’s guidance on containerizing monoliths clarifies that scaling containers is “far faster and easier than deploying additional VMs”, whether you run one service or fifty. Twilio Segment observed that containerized monoliths can “horizontally scale your environment easily by spinning up more containers and shutting them down when demand subsides.” For many applications, scaling the whole app is exactly what’s needed.

As for DevOps, a monolith in Docker is lighter to operate than a full-blown microservices setup. Logging aggregation becomes simpler when you’re collecting from identical containers rather than disparate services with different formats. Monitoring and debugging remain centralized, and troubleshooting avoids tracing requests across service boundaries.

So, it’s definitely worth considering:

Even without the complexity of microservices, Docker gives you the same advantages — clean deployments, easy scaling, and consistent environments. So why not keep it?

Wrapping Up

A few years ago, my then-8-year-old wanted a bicycle. He’d mostly ride around our apartment complex, maybe venture into the nearby lane. He didn’t need 21 gears, but those shiny shifters had him smitten—imagine riding faster by changing those gears! He absolutely wanted that mechanically complex beauty. (It’s hard to argue with a starry-eyed kid… or a founder :P).

Once he started riding the new bike, the gears slipped, the chain jammed, and the bicycle spent more time broken than on the road. Eventually, we had to dump it. 

I wasn’t able to convince him back then that a simpler bicycle could’ve served him better, but maybe this article will convince a few grown-ups making architectural decisions.

We techies love indulging in complex systems. (Check: were you already thinking, What’s complex about bicycles with gears??) But the more moving parts you add, the more often they break. Complexity often creates more problems than it solves.

The point I’m making isn’t to dump microservices entirely—it’s to pick an architecture that fits your actual needs, not what the cloud giant is pushing (while quietly rolling back their own commit). Most likely, modular monoliths or well-designed SOA will serve your needs better and make your team more productive.

So here’s the million-dollar question: 

Will you design for cloud-native hype or for your own business requirements?

Do you really need microservices?

Quelle: https://blog.docker.com/feed/

Security that strengthens the ecosystem: Docker’s upstream approach to CVE-2025-12735

On November 24, 2025, Docker Hardened Images resolved CVE-2025-12735 in the Kibana project, which is the visualization and user interface layer for Elasticsearch. This CVE is a critical remote code execution vulnerability that scored 9.8 on the CVSS scale. While images from other hardened image vendors were still carrying the vulnerability, Docker’s security team and tooling not only patched the CVE  for Docker Hardened Images users, but also submitted the fix to the upstream LangChain.js project. Once that pull request merges, every application that depends on LangChain.js will benefit from a more secure foundation across the entire ecosystem.

We always default to upstream patching when possible because it protects everyone who depends on these libraries – not just Docker users. Upstream patches require effort. You have to submit a PR and get it approved by the project. That can mean back and forth with maintainers. Security teams are under intense time pressures. But when we fix expr-eval for LangChain.js, we’re protecting not just Kibana users but every application that depends on that library. That’s over one million weekly downloads that become more secure.

Another Nested Dependency, Another Ticking Time Bomb

CVE-2025-12735 originated in expr-eval, a JavaScript expression parser and evaluator library. The vulnerability allowed attackers to inject crafted variables into evaluate(), enabling untrusted code paths to execute logic the application never intended. Three layers deep into the dependency chain, there was a critical RCE vulnerability in unmaintained code. In practice, this gave attackers a pathway to execute malicious behavior within affected applications. The library hadn’t been updated in years. LangChain.js depends on expr-eval, which means any application or service built with LangChain.js inherits the vulnerability. This includes AI assistants, workflow tools, and LLM-powered applications widely deployed across the industry. Kibana was affected by the same dependency chain. 

This matters because LangChain.js has become a foundational component in modern application development. The library provides a framework for building applications powered by large language models, and it has been downloaded millions of times from npm. As of November 18, 2025, the npm package langchain (which includes LangChain.js) receives approximately 1,018,076 weekly downloads. Organizations use LangChain.js to build chatbots, document analysis systems, customer service platforms, and AI-powered search tools. When a vulnerability exists in LangChain.js or its dependencies, it potentially affects thousands of production applications across the technology industry.

This is exactly the attack surface that sophisticated adversaries target. The 2024 XZ Utils backdoor attempt demonstrated how attackers focus on dependencies precisely because they affect so many downstream projects. Old vulnerabilities remain a persistent challenge because organizations focus on direct dependencies while nested dependencies slip through the cracks.

Why We Must Fix at the Source, Fast

Many security and hardened image vendors scan for CVEs, flag them, and patch their own images. The vulnerability remains in the upstream project. The next build cycle reintroduces it. The problem persists for every other user of that dependency chain. This approach treats symptoms instead of causes. You patch your copy of Kibana. The next developer who builds from upstream gets the vulnerable version. Other container image providers may still ship the vulnerable dependency until their next update cycle. When the next CVE gets assigned to expr-eval, the cycle repeats.

Docker takes a different approach. When the Docker Security team identified CVE-2025-12735 in Kibana, we traced it back through the dependency chain to expr-eval. Rather than applying a surface-level patch, we replaced the unmaintained library with math-expression-evaluator, an actively maintained alternative that did not have the vulnerability. Then we contributed that fix upstream to LangChain.js: Pull Request #9391.

This approach delivers three outcomes:

Docker Hardened Images users got immediate protection. The updated Kibana image shipped without the vulnerable dependency. There was no waiting for upstream maintainers and no emergency patching required.

The entire LangChain.js ecosystem will benefit. Once the PR merged, every project using LangChain.js inherits the fix automatically. Web applications, data processing pipelines, AI tools, and analytics platforms all get safer because the fix lives where it belongs.

Future builds are secure by default. Docker doesn’t have to maintain downstream patches or worry about the vulnerability reappearing in the next release cycle. The fix lives in the upstream project where it belongs.

Docker Hardened Images responded faster than other  vendors. We identified the root cause, selected a maintained replacement, verified it worked correctly, and contributed the fix back to the upstream project. This is possible because Docker’s security architecture is designed for a high-speed workflow without sacrificing thoroughness or attention to detail. (We are also, as a team, strongly committed to contributing back to open source!) Continuous dependency analysis through Docker Scout identifies issues the moment they’re disclosed. Deep supply chain visibility shows not just what packages are in an image but the entire dependency chain. Direct upstream engagement means we can contribute fixes rather than wait for maintainers to respond to bug reports.

What This Means for Your Organization

If you’re running Kibana in production, CVE-2025-12735 posed a critical risk. Organizations using Docker Hardened Images received immediate protection with secure, minimal, production-ready container images built from source and backed by a fast SLA that ensures rapid remediation.. The updated image shipped with expr-eval replaced by a maintained alternative. No emergency patching was required and there was no downtime. Organizations using other container distributions may still be exposed. Check your Kibana images for the vulnerable expr-eval dependency. If you’re running upstream Kibana, monitor for the LangChain.js update that incorporates Docker’s fix.

But the implications extend beyond this single CVE. The nested dependency problem affects every modern application. Your development teams probably don’t know what libraries are three or four levels deep in your dependency trees. Your security scanners might not catch them. Your vendors might not fix them upstream.

Helping Open Source Projects Helps Us All

The container ecosystem depends on thousands of open source projects. Most are maintained by small teams, often volunteers, who juggle security alongside feature development, bug fixes, and user support. When vulnerabilities emerge, maintainers may lack resources for immediate response.

Commercial vendors who benefit from open source have a responsibility to contribute back. When Docker Security fixes vulnerabilities upstream, open source maintainers get security support at no cost. The entire community benefits from hardened dependencies. Docker builds trust with the projects that power modern infrastructure. Future vulnerabilities become easier to address as relationships deepen. Together, we are more secure.

Docker is not the only company to push patches upstream, but it is a core part of our DNA. We don’t just protect our own customers but strengthen the entire ecosystem. Fixes go upstream so everyone benefits. The focus is on eliminating vulnerabilities at their source rather than playing endless rounds of patch-and-scan.

Modern supply chain attacks move faster than traditional security response times. Docker Hardened Images and Docker Scout are designed to match that speed while strengthening the entire ecosystem through upstream contributions. When vulnerabilities emerge, our customers get immediate protection. When our fixes go upstream, everyone gets safer.

Learn more about how Docker Hardened Images deliver security that protects your organization and strengthens the ecosystem.
Quelle: https://blog.docker.com/feed/