Docker State of App Dev: Dev Ex & Productivity 

Report: What’s helping devs thrive — and what’s still holding them back? 

A look at how culture, tooling, and habits are shaping the developer experience today, per Docker’s 2025 State of Application Development Survey.

Great culture, better tools — but developers often still feel stuck. From pull requests stuck in review to tasks without clear estimates, the inner loop remains cluttered with surprisingly persistent friction points. This year’s data maps the disconnect between what developers need, where they’re blocked, and how better tooling and cultural support can keep velocity on track.

Here are six key insights into developer experience and productivity from Docker’s annual State of Application Development Survey, based on responses from over 4,500 industry professionals.

1. How devs learn — and what’s changing

Self-guided learning is on the upswing. Across all industries, fully 85% of respondents turn to online courses or certifications, far outpacing traditional sources like school (33%), books (25%), or on-the-job training (25%). 

Among IT folks, the picture is more nuanced. School is still the top venue for learning to code (65%, up from 57% in our 2024 survey), but online resources are also trending upward. Some 63% of IT pros learned coding skills via online resources (up from 54% in our 2024 survey) and 57% favored online courses or certifications (up from 45% in 2024).

Note: For this year’s report, we surveyed over three times more users across a broader spectrum of industries than for our more IT-focused 2024 report.

As for how devs prefer to learn, reading documentation tops the list, as in last year’s report — that despite the rise in new and interactive forms of learning. Some 29% say they lean on documentation, edging out videos and side projects (28% each) and slightly ahead of structured online training (26%). 

AI tools play a relatively minor role in how respondents learn, with GitHub Copilot cited by just 13% overall — and only 9% among IT pros. It’s also cited by 13% as a preferred learning method.

2. Containers: the great divide?

Among IT pros, container usage soared to 92% — up from 80% in our 2024 survey. Zoom out to a broader view across industries, however, and adoption appears considerably lower. Just 30% of developers say they use containers in any part of their workflow. 

Why the gap? Differences in app structure may offer an explanation: IT industry respondents work with microservice-based architectures more often than those in other industries (68% versus 31%). So the higher container adoption may stem from IT pros’ need for modularity and scalability — which containers provide in spades.

And among container users, needs are evolving. They want better tools for time estimation (31%), task planning (18%), and monitoring/logging (15%) — stubborn pain points across the software lifecycle.

3. An equal-opportunity headache: estimating time

No matter the role, estimating how long a task will take is the most consistent pain point across the board. Whether you’re a front-end developer (28%), data scientist (31%), or a software decision-maker (49%), precision in time planning remains elusive.

Other top roadblocks? Task planning (26%) and pull-request review (25%) are slowing teams down. Interestingly, where people say they need better tools doesn’t always match where they’re getting stuck. Case in point, testing solutions and Continuous Delivery (CD) come up often when devs talk about tooling gaps — even though they’re not always flagged as blockers.

4. Productivity by persona: different hats, same struggles

When you break it down by role, some unique themes emerge:

Experienced developers struggle most with time estimation (42%).

Engineering managers face a three-way tie: planning, time estimation, and designing from scratch (28% each).

Data scientists are especially challenged by CD (21%) — a task not traditionally in their wheelhouse.

Front-end devs, surprisingly, list writing code (28%) as a challenge, closely followed by CI (26%).

Across personas, a common thread stands out: even seasoned professionals are grappling with foundational coordination tasks — not the “hard” tech itself, but the orchestration around it.

5. Tools vs. culture: two sides of the experience equation

On the tooling side, the biggest callouts for improvement include:

Time estimation (22%)

Task planning (18%)

Designing solutions from scratch (17%)

But productivity isn’t just about tools — it’s deeply cultural. When asked what’s working well, developers pointed to work-life balance (39%), location flexibility such as work from home policies (38%), and flexible hours (37%) as top cultural strengths.

The weak spots? Career development (38%), recognition (36%), and meaningful work (33%). In other words: developers like where, when, and how they work, but not always why.

6. What’s easy? What’s not?

While the dev world is full of moving parts, a few areas are surprisingly not challenging:

Editing config files (8%)

Debugging in dev (8%)

Writing config files (7%)

Contrast that with the most taxing areas:

Troubleshooting in production (9%)

Debugging in production (9%)

Security-related tasks (8%)

It’s a reminder that production is still where the stress — and the stakes — are highest.

Bottom line:

Developer productivity isn’t about just one thing. It’s the compound effect of better tools, smarter learning, sharper planning — and yes, a healthy team culture. For orgs to excel, they need to invest not just in platforms, but also in people. Because when you improve the experience, you unlock the performance.

Quelle: https://blog.docker.com/feed/

Using Gordon to Containerize Your Apps and Work with Containers

These days, almost every tech company is looking for ways to integrate AI into their apps and workflows, and Docker is no exception. They’ve been rolling out some impressive AI capabilities across their products. This is my first post as a Docker Captain and in this post, I want to shine a spotlight on a feature that hasn’t gotten nearly enough attention in my opinion: Docker’s AI Agent Gordon (also known as Docker AI), which is built into Docker Desktop and CLI.

Gordon is really helpful when it comes to containerizing applications. Not only does it help you understand how to package your app as a container, but it also reduces the overhead of figuring out dependencies, runtime configs, and other pieces that add to a developer’s daily cognitive load. The best part? Gordon doesn’t just guide you with responses; it can also generate or update the necessary files for you.

The Problem: Containerizing apps and optimizing containers isn’t always easy

Containerizing apps can range from super simple to a bit tricky, depending on what you’re working with. If your app has a single runtime like Node.js, Python, or .NET Core, with clearly defined dependencies and no external services, it will be straightforward.

A basic Dockerfile will usually get you up and running without much effort. But once you start adding more complexity, like a backend, frontend, database, and caching layer, you now have the need for a multi-container app. At this point, you might be dealing with additional Dockerfile configurations and potentially a Docker Compose setup. That’s where things can start to be challenging to get going.

This is where Gordon shines. It’s helpful in containerizing apps and can even handle multi-service container app setups, guiding you through what’s needed and even generating the supporting config files, such as Dockerfiles and docker-compose, to get you going.

Optimizing containers can be a headache too

Beyond just containerizing, there’s also the need to optimize your containers for performance, security, and image size. And let’s face it, optimizing can be tedious. You need to know what base images to use, how to slim them down, how to avoid unnecessary layers, and more.

Gordon can help here too. It provides optimization suggestions, shows you how to apply best practices like multi-stage builds or removing dev dependencies, and helps you create leaner, more secure images.

Why not just use general-purpose Generative AI?

Sure, general-purpose AI tools like ChatGPT, Claude, Gemini, etc. are great and I use them regularly. But when it comes to containers, they can lack the context needed for accurate and efficient help. Gordon, on the other hand, is purpose-built for Docker. It has access to Docker’s ecosystem and has been trained on Docker documentation, best practices, and the nuances of Docker tooling. That means its recommendations are more likely to be precise and aligned with the latest standards.

Walkthrough of Gordon

Gordon can help with containerizing applications, optimizing your containers and more. Gordon is still a Beta feature. To start using Gordon, you need Docker Desktop version 4.38 or later. Gordon is powered by Large Language Models (LLMs), and it goes beyond prompt and response: it can perform certain tasks for you as an AI agent. Gordon can have access to your local files and local images when you give it permission. It will prompt you for access if needed for a task.

Please note, the examples I will show in this post are based on a single working session. Now, let’s dive in and start to explore Gordon.

Enabling Gordon / Docker AI

In order to turn Gordon on, go to Settings > Beta features check the Enable Docker AI box as shown in the following screenshot. 

Figure 1: screenshot of where to enable Docker AI in beta features

Accept the terms. The AI in Docker Desktop is in two forms. The first one is through the Docker Desktop UI and is known as Gordon. The second option is Docker AI. Docker AI is accessed through the Docker CLI. The way you activate it is by typing Docker AI in the CLI. I will demonstrate this later on in this blog post.  

Figure 2: screenshot of Docker AI terms acceptance dialog box

Exploring Gordon in Docker Desktop

Now Gordon will appear in your Docker Desktop UI. Here you can prompt it just like any Generative AI tool. Gordon will also have examples that you can use to get started working with it.

You can access Gordon throughout Docker Desktop by clicking on the AI icon as shown in the following screenshot.

Figure 3: screenshot of Docker Desktop interface showing the AI icon for Gordon

When you click on the AI icon a Gordon prompt box appears along with suggested prompts as shown in the following screenshot. The suggestions will change based on the object the AI is next to, and are context-aware.

Figure 4: Screenshot showing Gordon’s suggestion prompt box in Docker Desktop UI

Here is another example of Docker AI suggestions being context-aware based on what area of Docker Desktop you are in. 

 Figure 5: Screenshot showing Docker AI context- specific suggestions 

Another common use case for Gordon is listing local images and using AI to work with them. You can see this in the following set of screenshots. Notice that Gordon will prompt you for permission before showing your local images.

Figure 6: Screenshot showing Gordon referencing local images 

You can also prompt Gordon to take action. As shown in the following screenshot, I asked Gordon to run one of my images.

Figure 7: Screenshot showing Gordon prompts 

If it can’t perform the action, it will attempt to help you. 

Figure 8: Screenshot showing Gordon prompt response to failed request 

Another cool use of Gordon is to explain a container image to you. When you ask this, Gordon will ask you to select the directory where the Dockerfile is and permission to access it as shown in the following screenshot.

Figure 9: Screenshot showing Gordon’s request for particular directory access 

After you give it access to the directory where the Dockerfile is, it will then breakdown what’s in the Dockerfile. 

Figure 10: Screenshot showing Gordon’s response to explaining a Dockerfile 

As shown in the following screenshot, I followed up with a prompt asking Gordon to display what’s in the Dockerfile. It did a good job of explaining its contents, as shown in the following screenshot.

Figure 11: Screenshot showing Gordon’s response regarding Dockerfile contents

Exploring Gordon in the Docker Desktop CLI

Let’s take a quick tour through Gordon in the CLI. Gordon is referred to as Docker AI in the CLI. To work with Docker AI, you need to launch the Docker CLI as shown in the following screenshot. 

Figure 12: Screenshot showing how to launch Docker AI from the CLI 

Once in the CLI you can type “docker ai” and it will bring you into the chat experience so you can prompt Gordon. In my example, I asked Gordon about one of my local images. You can see that it asked me for permission. 

Figure 13: Screenshot showing Docker CLI request for access

Next, I asked Docker AI to list all of my local images as shown in the following screenshot. 

Figure 14: Screenshot showing Docker CLI response to display local images 

I then tested pulling an image using Docker AI. As you can see in the following screenshot, Gordon pulled a nodeJS image for me!

Figure 15: Screenshot showing Docker CLI pulling nodeJS image

Containerizing an application with Gordon

Now let’s explore the experience of containerizing an application using Gordon.

I started by clicking on the example for containerizing an application. Gordon then prompted me for the directory where my application code is. 

Figure 16: Screenshot showing where to enable access to directory for containerizing an application 

I pointed it to my apps directory and gave it permission. It then started to analyze and containerize my app. It picked up the language and started to read through my app’s README file.

Figure 17: Screenshot showing Gordon starting to analyze and containerize app 

You can see it understand the app was written in JavaScript and worked through the packages and dependencies.

Figure 18: Screenshot showing final steps of Gordon processing

Gordon understands that my app has a backend, frontend, and a database, knowing from this that I would need a Docker compose file.

Figure 19: Screenshot showing successful completion of steps to complete the Dockerfiles

From the following screenshot you can see the Docker related files needed for my app. Gordon created all of these.

Figure 20: Screenshot showing files produced from Gordon 

Gordon created the Dockerfile (on the left) and a Compose yaml file (on the right) even picking up that I needed a Postgres DB for this application.

Figure 21: Screenshot showing Dockerfile and Compose yaml file produced from Gordon

I then took it a step further and asked Gordon to build and run the container for my application using the prompt “Can you build and run this application with compose?” It created the Docker Compose file, built the images, and ran the containers!

Figure 22: Screenshot showing completed containers from Gordon

Conclusion

I hope you picked up some useful insights about Docker and discovered one of its lesser-known AI features in Docker Desktop. We explored what Gordon is, how it compares to general-purpose generative AI tools like ChatGPT, Claude, and Gemini, and walked through use cases such as containerizing an application and working with local images. We also touched on how Gordon can support developers and IT professionals who work with containers. If you haven’t already, I encourage you to enable Gordon and take it for a test run. Thanks for reading and stay tuned for more blog posts coming soon.
Quelle: https://blog.docker.com/feed/

The Docker MCP Catalog: the Secure Way to Discover and Run MCP Servers

The Model Context Protocol (MCP) ecosystem is exploding. In just weeks, our Docker MCP Catalog has surpassed 1 million pulls, validating that developers are hungry for a secure way to run MCP servers. Today, we’re excited to share major updates to the Docker MCP Catalog, including enhanced discovery features and our new open submission process. With hundreds of developers already requesting to publish their MCP servers through Docker, we’re accelerating our mission to make containerized MCP servers the standard for secure AI tool distribution.

The rapid adoption of MCP servers also highlights a critical problem — the current practice of running them via npx or uvx commands exposes systems to unverified code with full host access, not to mention dependency management friction. In this post, we’ll explain why Docker is investing in the MCP ecosystem, showcase the new catalog capabilities, and share how you can contribute to building a more secure foundation for AI applications.

Figure 1: The new Docker MCP Catalog, built for easier discovery.

Why Docker is building the MCP Catalog

The security issues in MCP distribution

Every time a developer runs npx -y @untrusted/mcp-server or uvx some-mcp-tool, they’re making a dangerous trade-off: convenience over security. These commands execute arbitrary code directly on the host system with full access to:

The entire file system

Network connections

Environment variables and secrets

System resources

Some MCP clients limit environment variable access, but even that is not a universal practice. This isn’t sustainable. As MCP moves from experimentation to production, we need a fundamentally different approach.

Docker’s unique position

Docker has spent over a decade solving exactly these problems for cloud-native applications. We’ve built the infrastructure, tools, and trust that developers rely on to run billions of containers in production. Now, we’re applying these same principles to the MCP ecosystem.

When you run an MCP server from our Catalog, you get:

Cryptographic signatures verifying the image hasn’t been tampered with

Software Bill of Materials (SBOMs) documenting every component

Complete isolation from your host system

Controlled access to only what the server actually needs

This isn’t about making life harder for developers—it’s about making security the path of least resistance.

Introducing the enhanced MCP Catalog

Built for MCP discovery

We’ve reimagined the MCP Catalog to make it more accessible and easier to navigate. You can still access the MCP Catalog from Docker Hub and the MCP Toolkit in Docker Desktop just like before, or go straight to the MCP catalog. We’ve gone beyond generic container image listings by building features that help you quickly find the right MCP servers for your AI applications.  

Browse by Use Case: MCP servers are organized by what they actually do:

Data Integration (databases, APIs, file systems)

Development Tools (IDEs, code analysis, testing)

Communication (email, Slack, messaging platforms)

Productivity (task management, calendars, note-taking)

Analytics (data processing, visualization, reporting)

Enhanced Search: Find servers by capability, tools, GitHub tags, and categories — not just by name.

Security Transparency: Every catalog entry clearly shows whether it’s Docker-built (with transparent build signing and verification) or community-built (containerized and maintained by the publisher).

Figure 2: Discover MCP servers by use cases.

How we classify MCP Servers: Built by Docker vs. community-built

Docker-Built Servers: When you see “Built by Docker,” you’re getting our complete security treatment. We control the entire build pipeline, providing cryptographic signatures, SBOMs, provenance attestations, and continuous vulnerability scanning.

Community-Built Servers: These servers are packaged as Docker images by their developers. While we don’t control their build process, they still benefit from container isolation, which is a massive security improvement over direct execution.

Tiers serve important roles: Docker-built servers demonstrate the gold standard for security, while community-built servers ensure we can scale rapidly to meet developer demand. Developers can change their mind after submitting a community-built server and opt to resubmit it as a Docker-built server.

Figure 3: An example of Built by Docker MCP Server.

Open for MCP server submission: Join the secure MCP movement

Starting today, we’re opening our submission process to the community. Whether you’re an individual developer or an enterprise team, you can feature your MCP servers on the Docker MCP Catalog. By publishing through our catalog, you’re not just distributing your MCP server — you’re helping establish a new security standard for the entire ecosystem while getting your MCP tools available to millions of developers already using Docker via Docker Hub and Docker Desktop. Your containerized server becomes part of the solution, demonstrating that production-ready AI tools don’t require compromising on security. 

How to submit your MCP server

Containerize your server – Package your MCP server as a Docker image

Submit via GitHub – Create a pull request at github.com/docker/mcp-registry

Choose your tier – Opt for Docker-built (we handle the build) or community-built (you build and maintain it)

We’re committed to a fast, transparent review process. Quality MCP servers that follow our security guidelines will be published quickly, helping you reach Docker’s 20+ million developer community.

ClickHouse is one of the first companies to take advantage of Docker’s MCP Catalog, and they opted for the Docker-built tier to ensure maximum security. Here’s why they chose to partner with Docker:

“At ClickHouse, we deliver the fastest analytics database – open-source, and designed for real-time data processing and analytics at scale. As agentic AI becomes more embedded in modern applications, developers are using the ClickHouse MCP server to support intelligent, data-driven workflows that demand low latency, high concurrency, and cost efficiency.To make it easier for developers to deploy these workloads, we’re featuring ClickHouse MCP Server on Docker’s MCP Catalog, which provides a powerful way to reach 20M+ developers and makes it easier for Docker users to discover and use our solution. We opted for “Built by Docker” with the highest security standard, including cryptographic signatures, SBOMs, provenance attestations, and continuous vulnerability scanning. Together with Docker, developers can run ClickHouse MCP Server with confidence, knowing it’s secured, verified, and ready for their agentic applications.” – Tanya Bragin, VP of Product and Marketing Clickhouse.

What’s coming next

Remote MCP servers

We’re preparing for the future of cloud-native AI applications. Remote MCP servers will enable:

Managed MCP services that scale automatically

Shared capabilities across teams without distributing code

Stricter security boundaries for sensitive operations

Integration with the official MCP registry

We’re actively collaborating with the MCP community on the upcoming official registry. Our vision is complementary:

The official registry provides centralized discovery – the “yellow pages” of available MCP servers

Docker provides the secure runtime and distribution for those listings

Together, we create a complete ecosystem where discovery and security work hand-in-hand

The path forward

The explosive growth of our MCP Catalog, 1 million pulls and hundreds of publisher requests, tells us developers are ready for change. They want the power of MCP, but they need it delivered securely.

By establishing containers as the standard for MCP server distribution, we’re not trying to own the ecosystem — we’re trying to secure it. Every MCP server that moves from npx execution to containerized deployment is a win for the entire community.

Start today

Explore the enhanced MCP Catalog: Visit the MCP Catalog to discover MCP servers that solve your specific needs securely.

Use and test hundreds of MCP Servers: Download Docker Desktop to download and use any MCP server in our catalog with your favorite clients: Gordon, Claude, Cursor, VSCode, etc

Submit your server: Join the movement toward secure AI tool distribution. Check our submission guidelines for more.

Follow our progress: Star our repository and watch for updates on the MCP Gateway release and remote server capabilities.

Together, we’re building more than a catalog — we’re establishing the secure foundation that the MCP ecosystem needs to grow from experimental tool to production-ready platform. Because when it comes to AI applications, security isn’t optional. It’s fundamental.

Learn more

Check out our announcement blog

Find documentation forDocker MCP Catalog and Toolkit.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Tool Calling with Local LLMs: A Practical Evaluation

Which local model should I use for tool calling?

When building GenAI and agentic applications, one of the most pressing and persistent questions is: “Which local model should I use for tool calling?”  We kept hearing again and again, from colleagues within Docker and the developer community, ever since we started working on Docker Model Runner, a local inference engine that helps developers run and experiment with local models. 

It’s a deceptively simple question with a surprisingly nuanced answer. Even when we tried to answer it for a very specific case: “What if I just expose 5 simple tools to the model?”We realized we had no definite answer for that. Local LLM models offer control, cost-efficiency, and privacy, but when it comes to structured tool use, deciding when and how to act, they can behave very differently. We decided to dig deep and test this properly. We started with manual experimentation, then built a framework to scale our testing. This blog documents that journey and shares which models ranked highest on our tool-calling leaderboard.

The first attempt: Manual testing

Our first instinct was to build something quickly and try it out manually.

So we created chat2cart, an AI-powered shopping assistant that lets users interact via chat to build, modify, and check out a shopping cart. Through a natural conversation, users can discover products, add or remove items, and complete or cancel their purchase, all from the chat interface.

To support testing across different LLMs, we added a model selector that makes it easy to switch between local models (via Docker Model Runner or Ollama) and hosted models using the OpenAI API.

OpenAI’s GPT-4 or GPT-3.5 worked as expected, and the experience was fairly smooth. 

Called tools when they were needed

Avoided unnecessary tool usage

Handled tool responses naturally

But the local models? That’s where the challenges started to surface.

What went wrong with local models

We started experimenting with some of the local models listed on the Berkeley Function-Calling Leaderboard. Our goal was to find smaller models, ideally with fewer than 10 billion parameters, so we tested xLAM-2-8b-fc-r and watt-tool-8B. We quickly ran into several recurring issues:

Eager invocation: Tools were being called even for greeting messages like “Hi there!”

Wrong tool selection: The model would search when it should have added, or tried to remove when the cart was empty

Invalid arguments: Parameters like product_name or quantity were missing or malformed

Ignored responses: The model often failed to respond to tool output, leading to awkward or incomplete conversations

At this point, it was clear that manual testing wouldn’t scale. Different models failed in different ways, some struggled with invocation logic, while others mishandled tool arguments or responses.  Testing was not only slow, but also unreliable. Because these models are non-deterministic, we had to run each scenario multiple times just to get a reliable read on behavior.

We needed a testing setup that was repeatable, measurable, and fast.

Our second attempt: A scalable testing tool

Our goal wasn’t academic rigor.It was: “Give us good-enough answers in 2–3 days, not weeks.”

In a couple of days, we created model-test, This is a flexible project with the following capabilities

Define real-world test cases with multiple valid tool call sequences

Run them against many models (local & hosted)

Track tool-calling accuracy, tool selection, and latency

Log everything for analysis (or eventual fine-tuning)

How it works

The core idea behind model-test is simple: simulate realistic tool-using conversations, give the model room to reason and act, and check whether its behavior makes sense.

Each test case includes:

A prompt (e.g. “Add iPhone to cart”)

The initial cart state (optional)

One or more valid tool-call variants, because there’s often more than one right answer

Here’s a typical case:

{
  "prompt": "Add iPhone to cart",
  "expected_tools_variants": [
    {
      "name": "direct_add",
      "tools": [{ "name": "add_to_cart", "arguments": { "product_name": "iPhone" } }]
    },
    {
      "name": "search_then_add",
      "tools": [
        { "name": "search_products", "arguments": { "query": "iPhone" } },
        { "name": "add_to_cart", "arguments": { "product_name": "iPhone 15" } }
      ]
    }
  ]
}

In this case, we consider both “just add ‘iPhone’” and “search first, then add the result” as acceptable. Even though “iPhone” isn’t a real product name, we’re fine with it. We weren’t aiming for overly strict precision, just realistic behavior.

Each test case belongs to a test suite. We provide two built-in suites. However, you can run an entire suite, individual test cases, or a selection of multiple test cases. Additionally, you can create your own custom suites to group tests as needed. 

Simple: Greetings, single-step actions

Complex: Multi-step reasoning and tool chaining

The agent loop

To make tests feel closer to how real agents behave, we simulate an agent loop up to 5 rounds.

Example:

User: “Add iPhone 5 to cart”

Model: “Let me search for iPhone 5…”

Tool: (returns product list)

Model: “Adding product X to cart…”

Tool: (updates cart)

Model: “Done” → Great, test passed!

But if the model still wants to keep going after round 5?

That’s it, my friend,  test failed. Time’s up.

Not all-or-nothing

We deliberately avoided designing tests that require perfect predictions.

We didn’t demand that the model always know the exact product name.

What mattered was: did the tool sequence make sense for the intent?

This helped us focus on the kind of reasoning and behavior we actually want in agents, not just perfect token matches.

What We Measured

Our test outputs distilled down to a final F1 score, encapsulating three core dimensions:

Metric

What it tells us

Tool Invocation

Did the model realize a tool was needed?

Tool Selection

Did it choose the right tool(s) and use them correctly?

Parameter accuracy

Whether the tool call arguments were correct?

The F1 score is the harmonic mean of two things: precision (how often the model made valid tool calls) and recall (how often it made the tool calls it was supposed to).

We also tracked latency, the average runtime in seconds, but that wasn’t part of the F1 calculation; it simply helped us evaluate speed and user experience.

21 models and 3,570 tests later: Which models nailed tool calling?

We tested 21 models across 3570 test cases using 210 batch runs.

Hardware: MacBook Pro M4 Max, 128GB RAMRunner: test-all-models.sh

Overall Rankings (by Tool Selection F1):

Model

F1 Score

gpt-4

0.974

qwen3:14B-Q4_K_M

0.971

qwen3:14B-Q6_K

0.943

claude-3-haiku-20240307

0.933

qwen3:8B-F16

0.933

qwen3:8B-Q4_K_M

0.919

gpt-3.5-turbo

0.899

gpt-4o

0.857

gpt-4o-mini

0.852

claude-3-5-sonnet-20241022

0.851

llama3.1:8B-F16

0.835

qwen2.5:14B-Q4_K_M

0.812

claude-3-opus-20240229

0.794

llama3.1:8B-Q4_K_M

0.793

qwen2.5:7B-Q4_K_M

0.753

gemma3:4B

0.733

llama3.2:3B_F16

0.727

llama3grog:7B-Q4_K_M

0.723

llama3.3:70B.Q4_K_M

0.607

llama-xlam:8B-Q4_K_M

0.570

watt-tool:8B-Q4_K_M

0.484

Top performers

Among all models, OpenAI’s GPT-4 came out on top with a tool selection F1 score of 0.974, completing responses in just under 5 seconds on average. While hosted and not the focus of our local model exploration, it served as a reliable benchmark and provided some ground truths.

On the local side, Qwen 3 (14B) delivered outstanding results, nearly matching GPT-4 with a 0.971 F1 score, though with significantly higher latency (~142 seconds per interaction).

If you’re looking for something faster, Qwen 3 (8B) also achieved an F1 score of 0.933, while cutting latency nearly in half (~84 seconds), making it a compelling balance between speed and tool-use accuracy.

Hosted models like Claude 3 Haiku also performed very well, hitting 0.933 F1 with exceptional speed (3.56 seconds average latency), further illustrating the high bar set by cloud-based offerings.

Underperformers

Not all models handled tool calling well. The quantized Watt 8B model struggled with parameter accuracy and ended up with a tool selection F1 score of just 0.484. Similarly, the LLaMA-based XLam 8B variant often missed the correct tool path altogether, finishing with an F1 score of 0.570. These models may be suitable for other tasks, but for our structured tool use test, they underdeliver.

Quantization

We also experimented with both quantized and non-quantized variants for some models, and in all cases observed no significant difference in tool-calling behavior or performance. This suggests that quantization is beneficial for reducing resource usage without negatively impacting accuracy or reasoning quality, at least for the models and scenarios we tested.

Our recommendations

If your goal is maximum tool-calling accuracy, then Qwen 3 (14B) or Qwen 3 (8B) are your best bets, both local, both precise, with the 8B variant being notably faster.

For a good trade-off between speed and performance, Qwen 2.5 stood out as a solid option. It’s fast enough to support real-time experiences, while still maintaining decent tool selection accuracy.

If you need something more lightweight, especially for resource-constrained environments, the LLaMA 3 Groq 7B variant offers modest performance at a much lower compute footprint.

What we learned and why this matters

Our testing confirmed that the Qwen family of models leads the pack among open-source options for tool calling. But as always, there’s a trade-off; you’ll need to balance between accuracy and latency when designing your application

Qwen models dominate: Even the 8B version of Qwen3 outperformed any other local model

Reasoning = latency: Higher-accuracy models take longer, often significantly.

Tool calling is core to almost every real-world GenAI application. Whether you’re building agents or creating agentic workflows, your LLM must know when to act and how. Thanks to this simple framework, “We don’t know which model to pick” became “We’ve narrowed it down to three great options, each with clear pros and cons.”

If you’re evaluating models for your agentic applications, skip the guesswork. Try model-test and make it your own for testing! 

Learn more

Get an inside look at the design architecture of the Docker Model Runner. 

Explore the story behind our model distribution specification

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Building an Easy Private AI Assistant with Goose and Docker Model Runner

Goose is an innovative CLI assistant designed to automate development tasks using AI models. Docker Model Runner simplifies deploying AI models locally with Docker. Combining these technologies creates a powerful local environment with advanced AI assistance, ideal for coding and automation.

Looking for a seamless way to run AI-powered development tasks locally without compromising on privacy or flexibility? Look no further. By combining the power of Goose, a CLI-based AI assistant, with Docker Model Runner, you get a streamlined, developer-friendly setup for running large language models right on your machine.

Docker Model Runner makes it easy to run open-source AI models with Docker, no cloud APIs or external dependencies required. And the best part? It works out of the box with tools like Goose that expect an OpenAI-compatible interface. That means you can spin up advanced local assistants that not only chat intelligently but also automate tasks, run code, and interact with your system, without sending your data anywhere else.

In this guide, you’ll learn how to build your own AI assistant with these innovative tools. We’ll walk you through how to install Goose, configure it to work with Docker Model Runner, and unleash a private, scriptable AI assistant capable of powering real developer workflows. Whether you want to run one-off commands or schedule recurring automations, this local-first approach keeps you in control and gets things done faster.

Install Goose CLI on macOS

Goose is available on Windows, macOS, and Linux as a command-line tool, and also has a desktop application for macOS if that’s what you prefer. In this article, we’ll configure and show the CLI version on macOS. 

To install Goose on you can use this handy curl2sudo oneliner technique:

curl -fsSL https://github.com/block/goose/releases/download/stable/download_cli.sh | bash

Enable Docker Model Runner

First, ensure you have Docker Desktop installed. Then, configure Docker Model Runner with your model of choice. Go to Settings > Beta features and check the checkboxes for Docker Model Runner.

By default, it’s not wired to be available from your host machine, as a security precaution, but we want to simplify the setup and enable the TCP support as well. The default port for that would be 12434, so the base URL for the connection would be: http://localhost:12434

Figure 1: Docker Desktop beta features settings showing how to enable port 12434

Now we can pull the models from Docker Hub: hub.docker.com/u/ai and run the models. For this article, we’ll use ai/qwen3:30B-A3B-Q4_K_M because it gives a good balance of world knowledge and intelligence at just 3B active parameters: 

docker model pull ai/qwen3:30B-A3B-Q4_K_M
docker model run ai/qwen3:30B-A3B-Q4_K_M

This command starts the interactive chat with the model.

Configure Goose for Docker Model Runner

Edit your Goose config at ~/.config/goose/config.yaml:

GOOSE_MODEL: ai/qwen3:30B-A3B-Q4_K_M
GOOSE_PROVIDER: openai
extensions:
developer:
display_name: null
enabled: true
name: developer
timeout: null
type: builtin
GOOSE_MODE: auto
GOOSE_CLI_MIN_PRIORITY: 0.8
OPENAI_API_KEY: irrelevant
OPENAI_BASE_PATH: /engines/llama.cpp/v1/chat/completions
OPENAI_HOST: http://localhost:12434

The OPENAI_API_KEY is irrelevant as Docker Model Runner does not require authentication because the model is run locally and privately on your machine.

We provide the base path for the OpenAI compatible API, and choose the model GOOSE_MODEL: ai/qwen3:30B-A3B-Q4_K_M that we have pulled before.

Testing It Out

Try Goose CLI by running goose in the terminal. You can see that it automatically connects to the correct model, and when you ask for something, you’ll see the GPU spike as well.

Figure 2: Goose CLI running in terminal, showing example of response to local prompts

Now, we also configure Goose to have the Developer extension enabled. It allows it to run various commands on your behalf, and makes it a much more powerful assistant with access to your machine than just a chat application.

You can additionally configure the custom hints to Goose to tweak its behaviour using the .goosehints file.

And what’s even better, you can script Goose to run tasks on your behalf with a simple one-liner:

goose run -t “your instructions here” or goose run -i instructions.mdwhere instructions.md is the file with what to do.

On macOS you have access to crontab for scheduling recurrent scripts, so you can automate Goose with Docker Model Runner to activate repeatedly and act on your behalf. For example, crontab -e, will open the editor for the commands you want to run, and a line like the one below should do the trick:

5 8 * * 1-5 goose run -i fetch_and_summarize_news.md

Will make Goose run at 8:05 am every workday and follow the instructions in the fetch_and_summarize_news.md file. For example, to skim the internet and prioritize news based on what you like.

Conclusion

All in all, integrating Goose with Docker Model Runner creates a simple but powerful setup for using local AI for your workflows.You can make it run custom instructions for you or easily script it to perform repetitive actions intelligently.It is all powered by a local model running in Docker Model Runner, so you don’t compromise on privacy either.

Learn more

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Docker State of App Dev: AI

AI is changing software development — but not how you think

The hype is real, but so are the challenges. Here’s what developers, teams, and tech leaders need to know about AI’s uneven, evolving role in software.

Rumors of AI’s pervasiveness in software development have been greatly exaggerated. A look under the hood shows adoption is far from uniform. While some dev teams are embedding AI into daily workflows, others are still kicking the tires or sitting it out entirely. Real-world usage reveals a nuanced picture shaped by industry, role, and data readiness.

Here are six key insights into AI tools and development from Docker’s second annual State of Application Development Survey, based on responses from over 4,500 industry professionals.

1. How are people using AI?

Right off the bat, we saw a split between two classes of respondents: 

Those who use AI tools like ChatGPT and GitHub Copilot for everyday work-related tasks such as writing, documentation, and research 

Those who build applications with AI/ML functionality

2. IT leads the way in AI tool usage and app development 

Only about 1 in 4 respondents (22%) report using AI tools for work. But there’s a huge spread across industries — from 1% to 84%. Among the top AI users are IT/SaaS folks (76%). And because we surveyed over three times more users this year than for last year’s report, the snapshot covers a broader spectrum of industries beyond just those focused on IT.

Underscoring tech’s embrace of AI: 34% of IT/SaaS respondents say they develop AI/ML apps, compared to just 8% outside that bubble.

And strategy reflects this gulf. Only 16% of companies outside IT report having a real AI strategy. Within tech, the number soars to 73%. Translation: AI is gaining traction, but it’s concentrated in certain industries — at least for now.

3. AI tools are overhyped — and incredibly useful

Here’s the paradox: 64% of users say AI tools make work easier, yet almost as many (59%) think AI tools are overhyped. The hype may be loud, but utility is speaking louder, especially for those who’ve stuck with it. In fact, 65% of current users say they’re using AI more than they did a year ago, and that same percentage use it every day.

This tracks roughly with findings in our 2024 report, in which 61% of respondents agreed AI made their job easier, even as 45% reported feeling AI was overhyped. And 65% agreed that AI was a positive option.

4. AI tool usage is up — and ChatGPT leads the pack

No surprises here. The most-used AI-powered tools are the same as in our 2024 survey — ChatGPT (especially among full-stack developers), GitHub Copilot, and Google Gemini. 

But usage this year far outstrips what users reported last year, with 80% selecting ChatGPT (versus 46% in our 2024 report), 53% Copilot (versus 30%), and 23% Gemini (versus 19%).

5. Developers don’t use AI the same way

The top overall use case is coding. Beyond that, it depends.

Seasoned devs turn to AI to write documentation and tests but use it sparingly. 

DevOps engineers use it for CLI help and writing docs.

Software devs tap AI to write tests and do research.

And not all devs lean on AI equally. Seasoned devs are the least reliant, most often rating themselves as not at all dependent (0/10), while DevOps engineers rate their dependence at 7/10. Software devs are somewhere in the middle, usually landing at a 5/10 on the dependence scale. For comparison, the overall average dependence on AI in our 2024 survey was about 4 out of 10 (all users).

Looking ahead, it will be interesting to see how dependence on AI shifts and becomes further integrated by role. 

6. Data is the bottleneck no one talks about

The use of AI/ML in app development is a new and rapidly growing phenomenon that, not surprisingly, brings new pain points. For teams building AI/ML apps, one headache stands out: data prep. A full 24% of AI builders say they’re not confident in how to identify or prepare the right datasets.

Even with the right intent and tools, teams hit friction where it hurts productivity most — upfront.

Bottom line:We’re in the early stages of the next tech revolution — complex, fast-evolving, and rife of challenges. Developers are meeting it head-on, quickly ramping up on new tools and architectures, and driving innovation at every layer of the stack. And Docker is right there with them, empowering innovation every step of the way.

Quelle: https://blog.docker.com/feed/

Behind the scenes: How we designed Docker Model Runner and what’s next

The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they’re also a fundamentally different type of component, with complex software and hardware requirements that don’t (yet) fit neatly into the constraints of container-oriented development lifecycles and architectures. To help address this problem, Docker launched the Docker Model Runner with Docker Desktop 4.40. Since then, we’ve been working aggressively to expand Docker Model Runner with additional OS and hardware support, deeper integration with popular Docker tools, and improvements to both performance and usability.For those interested in Docker Model Runner and its future, we offer a behind-the-scenes look at its design, development, and roadmap.

Note: Docker Model Runner is really two components: the model runner and the model distribution specification. In this article, we’ll be covering the former, but be sure to check out the companion blog post by Emily Casey for the equally important distribution side of the story.

Design goals

Docker Model Runner’s primary design goal was to allow users to run AI models locally and to access them from both containers and host processes. While that’s simple enough to articulate, it still leaves an enormous design space in which to find a solution. Fortunately, we had some additional constraints: we were a small engineering team, and we had some ambitious timelines. Most importantly, we didn’t want to compromise on UX, even if we couldn’t deliver it all at once. In the end, this motivated design decisions that have so far allowed us to deliver a viable solution while leaving plenty of room for future improvement.

Multiple backends

One thing we knew early on was that we weren’t going to write our own inference engine (Docker’s wheelhouse is containerized development, not low-level inference engines). We’re also big proponents of open-source, and there were just so many great existing solutions! There’s llama.cpp, vLLM, MLX, ONNX, and PyTorch, just to name a few.

Of course, being spoiled for choice can also be a curse — which to choose? The obvious answer was: as many as possible, but not all at once.

We decided to go with llama.cpp for our initial implementation, but we intentionally designed our APIs with an additional, optional path component (the {name} in /engines/{name}) to allow users to take advantage of multiple future backends. We also designed interfaces and stubbed out implementations for other backends to enforce good development hygiene and to avoid becoming tethered to one “initial” implementation.

OpenAI API compatibility

The second design choice we had to make was how to expose inference to consumers in containers. While there was also a fair amount of choice in the inference API space, we found that the OpenAI API standard seemed to offer the best initial tooling compatibility. We were also motivated by the fact that several teams inside Docker were already using this API for various real-world products. While we may support additional APIs in the future, we’ve so far found that this API surface is sufficient for most applications. One gap that we know exists is full compatibility with this API surface, which is something we’re working on iteratively.

This decision also drove our choice of llama.cpp as our initial backend. The llama.cpp project already offered a turnkey option for OpenAI API compatibility through its server implementation. While we had to make some small modifications (e.g. Unix domain socket support), this offered us the fastest path to a solution. We’ve also started contributing these small patches upstream, and we hope to expand our contributions to these projects in the future.

First-class citizenship for models in the Docker API

While the OpenAI API standard was the most ubiquitous option amongst existing tooling, we also knew that we wanted models to be first-class citizens in the Docker Engine API. Models have a fundamentally different execution lifecycle than the processes that typically make up the ENTRYPOINTs of containers, and thus, they don’t fit well under the standard /containers endpoints of the Docker Engine API. However, much like containers, images, networks, and volumes, models are such a fundamental component that they really deserve their own API resource type. This motivated the addition of a set of /models endpoints, closely modeled after the /images endpoints, but separate for reasons that are best discussed in the distribution blog post.

GPU acceleration

Another critical design goal was support for GPU acceleration of inference operations. Even the smallest useful models are extremely computationally demanding, while more sophisticated models (such as those with tool-calling capabilities) would be a stretch to fit onto local hardware at all. GPU support was going to be non-negotiable for a useful experience.

Unfortunately, passing GPUs across the VM boundary in Docker Desktop, especially in a way that would be reliable across platforms and offer a usable computation API inside containers, was going to be either impossible or very flaky.

As a compromise, we decided to run inference operations outside of the Docker Desktop VM and simply proxy API calls from the VM to the host. While there are some risks with this approach, we are working on initiatives to mitigate these with containerd-hosted sandboxing on macOS and Windows. Moreover, with Docker-provided models and application-provided prompts, the risk is somewhat lower, especially given that inference consists primarily of numerical operations. We assess the risk in Docker Desktop to be about on par with accessing host-side services via host.docker.internal (something already enabled by default).

However, agents that drive tool usage with model output can cause more significant side effects, and that’s something we needed to address. Fortunately, using the Docker MCP Toolkit, we’re able to perform tool invocation inside ephemeral containers, offering reliable encapsulation of the side effects that models might drive. This hybrid approach allows us to offer the best possible local performance with relative peace of mind when using tools.

Outside the context of Docker Desktop, for example, in Docker CE, we’re in a significantly better position due to the lack of a VM boundary (or at least a very transparent VM boundary in the case of a hypervisor) between the host hardware and containers. When running in standalone mode in Docker CE, the Docker Model Runner will have direct access to host hardware (e.g. via the NVIDIA Container Toolkit) and will run inference operations within a container.

Modularity, iteration, and open-sourcing

As previously mentioned, the Docker Model Runner team is relatively small, which meant that we couldn’t rely on a monolithic architecture if we wanted to effectively parallelize the development work for Docker Model Runner. Moreover, we had an early and overarching directive: open-source as much as possible.

We decided on three high-level components around which we could organize development work: the model runner, the model distribution tooling, and the model CLI plugin.

Breaking up these components allowed us to divide work more effectively, iterate faster, and define clean API boundaries between different concerns. While there have been some tricky dependency hurdles (in particular when integrating with closed-source components), we’ve found that the modular approach has facilitated faster incremental changes and support for new platforms.

The High-Level Architecture

At a high level, the Docker Model Runner architecture is composed of the three components mentioned above (the runner, the distribution code, and the CLI), but there are also some interesting sub-components within each:

Figure 1: Docker Model Runner high-level architecture

How these components are packaged and hosted (and how they interact) also depends on the platform where they’re deployed. In each case it looks slightly different. Sometimes they run on the host, sometimes they run in a VM, sometimes they run in a container, but the overall architecture looks the same.

Model storage and client

The core architectural component is the model store. This component, provided by the model distribution code, is where the actual model tensor files are stored. These files are stored differently (and separately) from images because (1) they’re high-entropy and not particularly compressible and (2) the inference engine needs direct access to the files so that it can do things like mapping them into its virtual address space via mmap(). For more information, see the accompanying model distribution blog post.

The model distribution code also provides the model distribution client. This component performs operations (such as pulling models) using the model distribution protocol against OCI registries.

Model runner

Built on top of the model store is the model runner. The model runner maps inbound inference API requests (e.g. /v1/chat/completions or /v1/embeddings requests) to processes hosting pairs of inference engines and models. It includes scheduler, loader, and runner components that coordinate the loading of models in and out of memory so that concurrent requests can be serviced, even if models can’t be loaded simultaneously (e.g. due to resource constraints). This makes the execution lifecycle of models different from that of containers, with engines and models operating as ephemeral processes (mostly hidden from users) that can be terminated and unloaded from memory as necessary (or when idle). A different backend process is run for each combination of engine (e.g. llama.cpp) and model (e.g. ai/qwen3:8B-Q4_K_M) as required by inference API requests (though multiple requests targeting the same pair will reuse the same runner and backend processes if possible).

The runner also includes an installer service that can dynamically download backend binaries and libraries, allowing users to selectively enable features (such as CUDA support) that might require downloading hundreds of MBs of dependencies.

Finally, the model runner serves as the central server for all Docker Model Runner APIs, including the /models APIs (which it routes to the model distribution code) and the /engines APIs (which it routes to its scheduler). This API server will always opt to hold in-flight requests until the resources (primarily RAM or VRAM) are available to service them, rather than returning something like a 503 response. This is critical for a number of usage patterns, such multiple agents running with different models or concurrent requests for both embedding and completion.

Model CLI

The primary user-facing component of the Docker Model Runner architecture is the model CLI. This component is a standard Docker CLI plugin that offers an interface very similar to the docker image command. While the lifecycle of model execution is different from that of containers, the concepts (such as pushing, pulling, and running) should be familiar enough to existing Docker users.

The model CLI communicates with the model runner’s APIs to perform almost all of its operations (though the transport for that communication varies by platform). The model CLI is context-aware, allowing it to determine if it’s talking to a Docker Desktop model runner, Docker CE model runner, or a model runner on some custom platform. Because we’re using the standard Docker CLI plugin framework, we get all of the standard Docker Context functionality for free, making this detection much easier.

API design and routing

As previously mentioned, the Docker Model Runner comprises two sets of APIs: the Docker-style APIs and the OpenAI-compatible APIs. The Docker-style APIs (modeled after the /image APIs) include the following endpoints:

POST /models/create (Model pulling)

GET /models (Model listing)

GET /models/{namespace}/{name} (Model metadata)

DELETE /models/{namespace}/{name} (Model deletion)

The bodies for these requests look very similar to their image analogs. There’s no documentation at the moment, but you can get a glimpse of the format by looking at their corresponding Go types.

In contrast, the OpenAI endpoints follow a different but still RESTful convention:

GET /engines/{engine}/v1/models (OpenAI-format model listing)

GET /engines/{engine}/v1/models/{namespace}/{name} (OpenAI-format model metadata)

POST /engines/{engine}/v1/chat/completions (Chat completions)

POST /engines/{engine}/v1/completions (Chat completions (legacy endpoint))

POST /engines/{engine}/v1/embeddings (Create embeddings)

At this point in time, only one {engine} value is supported (llama.cpp), and it can also be omitted to use the default (llama.cpp) engine.

We make these APIs available on several different endpoints:

First, in Docker Desktop, they’re available on the Docker socket (/var/run/docker.sock), both inside and outside containers. This is in service of our design goal of having models as a first-class citizen in the Docker Engine API. At the moment, these endpoints are prefixed with a /exp/vDD4.40 path (to avoid dependencies on APIs that may evolve during development), but we’ll likely remove this prefix in the next few releases since these APIs have now mostly stabilized and will evolve in a backward-compatible way.

Second, also in Docker Desktop, we make the APIs available on a special model-runner.docker.internal endpoint that’s accessible just from containers (though not currently from ECI containers, because we want to have inference sandboxing implemented first). This TCP-based endpoint exposes just the /models and /engines API endpoints (not the whole Docker API) and is designed to serve existing tooling (which likely can’t access APIs via a Unix domain socket). No /exp/vDD4.40 prefix is used in this case.

Finally, in both Docker Desktop and Docker CE, we make the /models and /engines API endpoints available on a host TCP endpoint (localhost:12434, by default, again without any /exp/vDD4.40 prefix). In Docker Desktop this is optional and not enabled by default. In Docker CE, it’s a critical component of how the API endpoints are accessed, because we currently lack the integration to add endpoints to Docker CE’s /var/run/docker.sock or to inject a custom model-runner.docker.internal hostname, so we advise using the standard 172.17.0.1 host gateway address to access this localhost-exposed port (e.g. setting your OpenAI API base URL to http://172.17.0.1:12434/engines/v1). Hopefully we’ll be able to unify this across Docker platforms in the near future (see our roadmap below).

First up: Docker Desktop

The natural first step for Docker Model Runner was integration into Docker Desktop. In Docker Desktop, we have more direct control over integration with the Docker Engine, as well as existing processes that we can use to host the model runner components. In this case, the model runner and model distribution components live in the Docker Desktop host backend process (the com.docker.backend process you may have seen running) and we use special middleware and networking magic to route requests on /var/run/docker.sock and model-runner.docker.internal to the model runner’s API server. Since the individual inference backend processes run as subprocesses of com.docker.backend, there’s no risk of a crash in Docker Desktop if, for example, an inference backend is killed by an Out Of Memory (OOM) error.

We started initially with support for macOS on Apple Silicon, because it provided the most uniform platform for developing the model runner functionality, but we implemented most of the functionality along the way to build and test for all Docker Desktop platforms. This made it significantly easier to port to Windows on AMD64 and ARM64 platforms, as well as the GPU variations that we found there.

The one complexity with Windows was the larger size of the supporting library dependencies for the GPU-based backends. It wouldn’t have been feasible (or tolerated) if we added another 500 MB – 1 GB to the Docker Desktop for Windows installer, so we decided to default to a CPU-based backend in Docker Desktop for Windows with optional support for the GPU backend. This was the primary motivating factor for the dynamic installer component of the model runner (in addition to our desire for incremental updates to different backends).

This all sounds like a very well-planned exercise, and we did indeed start with a three-component design and strictly enforced API boundaries, but in truth we started with the model runner service code as a sub-package of the Docker Desktop source code. This made it much easier to iterate quickly, especially as we were exploring the architecture for the different services. Fortunately, by sticking to a relatively strict isolation policy for the code, and enforcing clean dependencies through APIs and interfaces, we were able to easily extract the code (kudos to the excellent git-filter-repo tool) into a separate repository for the purposes of open-sourcing.

Next stop: Docker CE

Aside from Docker’s penchant for open-sourcing, one of the main reasons that we wanted to make the Docker Model Runner source code publicly available was to support integration into Docker CE. Our goal was to package the docker model command in the same way as docker buildx and docker compose.

The trick with Docker CE is that we wanted to ship Docker Model Runner as a “vanilla” Docker CLI plugin (i.e. without any special privileges or API access), which meant that we didn’t have a backend process that could host the model runner service. However, in the Docker CE case, the boundary between host hardware and container processes is much less disruptive, meaning that we could actually run Docker Model Runner in a container and simply make any accelerator hardware available to it directly. So, much like a standalone BuildKit builder container, we run the Docker Model Runner as a standalone container in Docker CE, with a special named volume for model storage (meaning you can uninstall the runner without having to re-pull models). This “installation” is performed by the model CLI automatically (and when necessary) by pulling the docker/model-runner image and starting a container. Explicit configuration for the runner can also be specified using the docker model install-runner command. If you want, you can also remove the model runner (and optionally the model storage) using docker model uninstall-runner.

This unfortunately leads to one small compromise with the UX: we don’t currently support the model runner APIs on /var/run/docker.sock or on the special model-runner.docker.internal URL. Instead, the model runner API server listens on the host system’s loopback interface at localhost:12434 (by default), which is available inside most containers at 172.17.0.1:12434. If desired, users can also make this available on model-runner.docker.internal:12434 by utilizing something like –add-host=model-runner.docker.internal:host-gateway when running docker run or docker create commands. This can also be achieved by using the extra_hosts key in a Compose YAML file. We have plans to make this more ergonomic in future releases.

The road ahead…

The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2), but that’s definitely not the end of the story. Over the next few months, we have a number of initiatives planned that we think will reshape the user experience, performance, and security of Docker Model Runner.

Additional GUI and CLI functionality

The most visible functionality coming out over the next few months will be in the model CLI and the “Models” tab in the Docker Desktop dashboard. Expect to see new commands (such as df, ps, and unload) that will provide more direct support for monitoring and controlling model execution. Also, expect to see new and expanded layouts and functionality in the Models tab.

Expanded OpenAI API support

A less-visible but equally important aspect of the Docker Model Runner user experience is our compatibility with the OpenAI API. There are dozens of endpoints and parameters to support (and we already support many), so we will work to expand API surface compatibility with a focus on practical use cases and prioritization of compatibility with existing tools.

containerd and Moby integration

One of the longer-term initiatives that we’re looking at is integration with containerd. containerd already provides a modular runtime system that allows for task execution coordinated with storage. We believe this is the right way forward and that it will allow us to better codify the relationship between model storage, model execution, and model execution sandboxing.

In combination with the containerd work, we would also like tighter integration with the Moby project. While our existing Docker CE integration offers a viable and performant solution, we believe that better ergonomics could be achieved with more direct integration. In particular, niceties like support for model-runner.docker.internal DNS resolution in Docker CE are on our radar. Perhaps the biggest win from this tighter integration would be to expose Docker Model Runner APIs on the Docker socket and to include the API endpoints (e.g. /models) in the official Docker Engine API documentation.

Kubernetes

One of the product goals for Docker Model Runner was a consistent experience from development inner loop to production, and Kubernetes is inarguably a part of that path. The existing Docker Model Runner images that we’re using for Docker CE will also work within a Kubernetes cluster, and we’re currently developing instructions to set up a Docker Model Runner instance in a Kubernetes cluster. The big difference with Kubernetes is the variety of cluster and application architectures in use, so we’ll likely end up with different “recipes” for how to configure the Docker Model Runner in different scenarios.

vLLM

One of the things we’ve heard from a number of customers is that vLLM forms a core component of their production stack. This was also the first alternate backend that we stubbed out in the model runner repository, and the time has come to start poking at an implementation.

Even more to come…

Finally, there are some bits that we just can’t talk about yet, but they will fundamentally shift the way that developers interact with models. Be sure to tune-in to Docker’s sessions at WeAreDevelopers from July 9–11 for some exciting announcements around AI-related initiatives at Docker.

Learn more

Explore the story behind our model distribution specification

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Why Docker Chose OCI Artifacts for AI Model Packaging

As AI development accelerates, developers need tools that let them move fast without having to reinvent their workflows. Docker Model Runner introduces a new specification for packaging large language models (LLMs) as OCI artifacts — a format developers already know and trust. It brings model sharing into the same workflows used for containers, with support for OCI registries like Docker Hub.

By using OCI artifacts, teams can skip custom toolchains and work with models the same way they do with container images. In this post, we’ll share why we chose OCI artifacts, how the format works, and what it unlocks for GenAI developers.

Why OCI artifacts?

One of Docker’s goals is to make genAI application development accessible to a larger community of developers. We can do this by helping models become first-class citizens within the cloud native ecosystem. 

When models are packaged as OCI artifacts, developers can get started with AI development without the need to learn, vet, and adopt a new distribution toolchain. Instead, developers can discover new models on Hub and distribute variants publicly or privately via existing OCI registries, just like they do with container images today! For teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based controls and guardrails to help enforce secure, consistent access.

Packaging models as OCI artifacts also paves the way for deeper integration between inference runners like Docker Model Runner and existing tools like containerd and Kubernetes.

Understanding OCI images and artifacts

Many of these advantages apply equally to OCI images and OCI artifacts. To understand why images can be a less optimal fit for LLMs and why a custom artifact specification conveys additional advantages, it helps to first revisit the components of an OCI image and its generic cousin, the OCI artifact.

What are OCI images?

OCI images are a standardized format for container images, defined by the Open Container Initiative (OCI). They package everything needed to run a container: metadata, configuration, and filesystem layers.

An OCI image is composed of three main components:

An image manifest – a JSON file containing references to an image configuration and a set of filesystem layers.

An image configuration – a JSON file containing the layer ordering and OCI runtime configuration.

One or more layers – TAR archives (typically compressed), containing filesystem changesets that, applied in order, produce a container root filesystem.

Below is an example manifest from the busybox image:

{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:7b4721e214600044496305a20ca3902677e572127d4d976ed0e54da0137c243a",
"size": 477
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:189fdd1508372905e80cc3edcdb56cdc4fa216aebef6f332dd3cba6e300238ea",
"size": 1844697
}
],
"annotations": {
"org.opencontainers.image.url": "https://github.com/docker-library/busybox",
"org.opencontainers.image.version": "1.37.0-glibc"
}
}

Because the image manifest contains content-addressable references to all image components, the hash of the manifest file, otherwise known as the image digest, can be used to uniquely identify an image.

What are OCI artifacts?

OCI artifacts offer a way to extend the OCI image format to support distributing content beyond container images. They follow the same structure: a manifest, a config file, and one or more layers. 

The artifact guidance in the OCI image specifications describes how this same basic structure (manifest + config + layers) can be used to distribute other types of content.The artifact type is designated by the config file’s media type. For example, in the manifest below config.mediaType is set to application/vnd.cncf.helm.config.v1+json. This indicates to registries and other tooling that the artifact is a Helm chart and should be parsed accordingly.

{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.cncf.helm.config.v1+json",
"digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
"size": 117
},
"layers": [
{
"mediaType": "application/vnd.cncf.helm.chart.content.v1.tar+gzip",
"digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
"size": 2487
}
]
}

In an OCI artifact, layers may be of any media type and are not restricted to filesystem changesets. Whoever defines the artifact type defines the supported layer types and determines how the contents should be used and interpreted.

Using container images vs. custom artifact types

With this background in mind, while we could have packaged LLMs as container images, defining a custom type has some important advantages:

A custom artifact type allows us to define a domain-specific config schema. Programmatic access to key metadata provides a support structure for an ecosystem of useful tools specifically tailored to AI use-cases.

A custom artifact type allows us to package content in formats other than compressed TAR archives, thus avoiding performance issues that arise when LLMs are packaged as image layers. For more details on how model layers are different and why it matters, see the Layers section below.

A custom type ensures that models are packaged and distributed separately from inference engines. This separation is important because it allows users to consume the variant of the inference engine optimized for their system without requiring every model to be packaged in combination with every engine.

A custom artifact type frees us from the expectations that typically accompany a container image. Standalone models are not executable without an inference engine. Packaging as a custom type makes clear that they are not independently runnable, thus avoiding confusion and unexpected errors.

Docker Model Artifacts

Now that we understand the high-level goals, let’s dig deeper into the details of the format.

Media Types

The model specification defines the following media types:

application/vnd.docker.ai.model.config.v0.1+json – identifies a model config JSON file. This value in config.mediaType in a manifest identifies an artifact as a Docker model with config file adhering to v0.1 of the specification.

application/vnd.docker.ai.gguf.v3 – indicates that a layer contains a model packaged as a GGUF file.

application/vnd.docker.ai.license – indicates that a layer contains a plain text software license file.

Expect more media types to be defined in the future as we add runtime configuration, add support for new features like projectors and LoRA adaptors, and expand the supported packaging formats for model files.

Manifest

A model manifest is formatted like an image manifest and distinguished by the config.MediaType. The following example manifest, taken from the ai/gemma3, references a model config JSON and two layers, one containing a GGUF file and the other containing the model’s license.

{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
"size": 372,
"digest": "sha256:22273fd2f4e6dbaf5b5dae5c5e1064ca7d0ff8877d308eb0faf0e6569be41539"
},
"layers": [
{
"mediaType": "application/vnd.docker.ai.gguf.v3",
"size": 2489757856,
"digest": "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7"
},
{
"mediaType": "application/vnd.docker.ai.license",
"size": 8346,
"digest": "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
}
]
}

Model ID

The manifest digest uniquely identifies the model and is used by Docker Model Runner as the model ID.

Model Config JSON

The model configuration is a JSON file that surfaces important metadata about the model, such as size, parameter count, quantization, as well as metadata about the artifact provenance (like the creation timestamp).The following example comes from the ai/gemma model on Dockerhub:

{
"config": {
"format": "gguf",
"quantization": "IQ2_XXS/Q4_K_M",
"parameters": "3.88 B",
"architecture": "gemma3",
"size": "2.31 GiB"
},
"descriptor": {
"created": "2025-03-26T09:57:32.086694+01:00"
},
"rootfs": {
"type": "rootfs",
"diff_ids": [
"sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7",
"sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
]
}
}

By defining a domain-specific configuration schema, we allow tools to access and use model metadata cheaply — by fetching and parsing a small JSON file — only fetching the model itself when needed.For example, a registry frontend like Docker Hub can directly surface this data to users who can, in turn, use it to compare models or select based on system capabilities and requirements. Tooling might use this data to estimate memory requirements for a given model. It could then assist in the selection process by suggesting the best variant that is compatible with the available resources.

Layers

Layers in a model artifact differ from layers within an OCI image in two important respects.

Unlike an image layer, where compression is recommended, model layers are always uncompressed. Because models are large, high-entropy files, compressing them provides a negligible reduction in size, while (un)compressing is time and compute-intensive.

In contrast to a layer in an OCI image, which contains multiple files in an archive, each “layer” in a model artifact must contain a single raw file. This allows runtimes like Docker Model Runner to reduce disk usage on the client machine by storing a single uncompressed copy of the model. This file can then be directly memory mapped by the inference engine at runtime.

The lack of file names, hierarchy, and metadata (e.g. modification time) ensures that identical model files always result in identical reusable layer blobs. This prevents unnecessary duplication, which is particularly important when working with LLMs, given the file size.

You may have noticed that these “layers” are not really filesystem layers at all. They are files, but they do not specify a filesystem. So, how does this work at runtime? When Docker Model runner runs a model, instead of finding the GGUF file by name in a model filesystem, the desired file is identified by its media type (application/vnd.docker.ai.gguf.v3) and fetched from the model store. For more information on the Model Runner architecture, please see the architecture overview in this accompanying blog post.

Distribution

Like OCI images and other OCI artifacts, Docker model artifacts are distributed via registries like Dockerhub, Artifactory, or Azure Container Registry that comply with the OCI distribution specification.

Discovery

Docker Hub

The Docker Hub Gen AI catalog aids in the discovery of popular models. These models are packaged in the format described here and are compatible with Docker Model Runner and any other runtime that supports the OCI specification.

Hugging Face

If you are accustomed to exploring models on Hugging Face, there’s good news! Hugging Face now supports on-demand conversion to the Docker Model Artifact format when you pull from Hugging Face with docker model pull.

What’s Next?

Hopefully, you now have a better understanding of the Docker OCI Model format and how it supports our goal of making AI app development more accessible to developers via familiar workflows and commands. But this version of the artifact format is just the beginning! In the future, you can expect the enhancements to the packaging format to bring this level of accessibility and flexibility to a broader range of use cases. Future versions will support:

Additional runtime configuration options like templates, context size, and default parameters. This will allow users to configure models for specific use cases and distribute that config alongside the model, as a single immutable artifact.

LoRA adapters, allowing users to extend existing model artifacts with use-case-specific fine-tuning.

Multi-modal projectors, enabling users to package multi-modal such as language-and-vision models using LLaVA-style projectors.

Model index files that provide a set of models with different parameter count and quantizations, allowing runtimes the best option for the available resources.

In addition to adding features, we are committed to fostering an open ecosystem. Expect:

Deeper integrations into containerd for a more native runtime experience.

Efforts to harmonize with ModelPack and other model packaging standards to improve interoperability.

These advancements show our ongoing commitment to making the OCI artifact a versatile and flexible way to package and run AI models, delivering the same ease and reliability developers already expect from Docker.

Learn more

Get an inside look at the design architecture of the Docker Model Runner. 

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Behind the scenes: How we designed Docker Model Runner and what’s next

The last few years have made it clear that AI models will continue to be a fundamental component of many applications. The catch is that they’re also a fundamentally different type of component, with complex software and hardware requirements that don’t (yet) fit neatly into the constraints of container-oriented development lifecycles and architectures. To help address this problem, Docker launched the Docker Model Runner with Docker Desktop 4.40. Since then, we’ve been working aggressively to expand Docker Model Runner with additional OS and hardware support, deeper integration with popular Docker tools, and improvements to both performance and usability.For those interested in Docker Model Runner and its future, we offer a behind-the-scenes look at its design, development, and roadmap.

Note: Docker Model Runner is really two components: the model runner and the model distribution specification. In this article, we’ll be covering the former, but be sure to check out the companion blog post by Emily Casey for the equally important distribution side of the story.

Design goals

Docker Model Runner’s primary design goal was to allow users to run AI models locally and to access them from both containers and host processes. While that’s simple enough to articulate, it still leaves an enormous design space in which to find a solution. Fortunately, we had some additional constraints: we were a small engineering team, and we had some ambitious timelines. Most importantly, we didn’t want to compromise on UX, even if we couldn’t deliver it all at once. In the end, this motivated design decisions that have so far allowed us to deliver a viable solution while leaving plenty of room for future improvement.

Multiple backends

One thing we knew early on was that we weren’t going to write our own inference engine (Docker’s wheelhouse is containerized development, not low-level inference engines). We’re also big proponents of open-source, and there were just so many great existing solutions! There’s llama.cpp, vLLM, MLX, ONNX, and PyTorch, just to name a few.

Of course, being spoiled for choice can also be a curse — which to choose? The obvious answer was: as many as possible, but not all at once.

We decided to go with llama.cpp for our initial implementation, but we intentionally designed our APIs with an additional, optional path component (the {name} in /engines/{name}) to allow users to take advantage of multiple future backends. We also designed interfaces and stubbed out implementations for other backends to enforce good development hygiene and to avoid becoming tethered to one “initial” implementation.

OpenAI API compatibility

The second design choice we had to make was how to expose inference to consumers in containers. While there was also a fair amount of choice in the inference API space, we found that the OpenAI API standard seemed to offer the best initial tooling compatibility. We were also motivated by the fact that several teams inside Docker were already using this API for various real-world products. While we may support additional APIs in the future, we’ve so far found that this API surface is sufficient for most applications. One gap that we know exists is full compatibility with this API surface, which is something we’re working on iteratively.

This decision also drove our choice of llama.cpp as our initial backend. The llama.cpp project already offered a turnkey option for OpenAI API compatibility through its server implementation. While we had to make some small modifications (e.g. Unix domain socket support), this offered us the fastest path to a solution. We’ve also started contributing these small patches upstream, and we hope to expand our contributions to these projects in the future.

First-class citizenship for models in the Docker API

While the OpenAI API standard was the most ubiquitous option amongst existing tooling, we also knew that we wanted models to be first-class citizens in the Docker Engine API. Models have a fundamentally different execution lifecycle than the processes that typically make up the ENTRYPOINTs of containers, and thus, they don’t fit well under the standard /containers endpoints of the Docker Engine API. However, much like containers, images, networks, and volumes, models are such a fundamental component that they really deserve their own API resource type. This motivated the addition of a set of /models endpoints, closely modeled after the /images endpoints, but separate for reasons that are best discussed in the distribution blog post.

GPU acceleration

Another critical design goal was support for GPU acceleration of inference operations. Even the smallest useful models are extremely computationally demanding, while more sophisticated models (such as those with tool-calling capabilities) would be a stretch to fit onto local hardware at all. GPU support was going to be non-negotiable for a useful experience.

Unfortunately, passing GPUs across the VM boundary in Docker Desktop, especially in a way that would be reliable across platforms and offer a usable computation API inside containers, was going to be either impossible or very flaky.

As a compromise, we decided to run inference operations outside of the Docker Desktop VM and simply proxy API calls from the VM to the host. While there are some risks with this approach, we are working on initiatives to mitigate these with containerd-hosted sandboxing on macOS and Windows. Moreover, with Docker-provided models and application-provided prompts, the risk is somewhat lower, especially given that inference consists primarily of numerical operations. We assess the risk in Docker Desktop to be about on par with accessing host-side services via host.docker.internal (something already enabled by default).

However, agents that drive tool usage with model output can cause more significant side effects, and that’s something we needed to address. Fortunately, using the Docker MCP Toolkit, we’re able to perform tool invocation inside ephemeral containers, offering reliable encapsulation of the side effects that models might drive. This hybrid approach allows us to offer the best possible local performance with relative peace of mind when using tools.

Outside the context of Docker Desktop, for example, in Docker CE, we’re in a significantly better position due to the lack of a VM boundary (or at least a very transparent VM boundary in the case of a hypervisor) between the host hardware and containers. When running in standalone mode in Docker CE, the Docker Model Runner will have direct access to host hardware (e.g. via the NVIDIA Container Toolkit) and will run inference operations within a container.

Modularity, iteration, and open-sourcing

As previously mentioned, the Docker Model Runner team is relatively small, which meant that we couldn’t rely on a monolithic architecture if we wanted to effectively parallelize the development work for Docker Model Runner. Moreover, we had an early and overarching directive: open-source as much as possible.

We decided on three high-level components around which we could organize development work: the model runner, the model distribution tooling, and the model CLI plugin.

Breaking up these components allowed us to divide work more effectively, iterate faster, and define clean API boundaries between different concerns. While there have been some tricky dependency hurdles (in particular when integrating with closed-source components), we’ve found that the modular approach has facilitated faster incremental changes and support for new platforms.

The High-Level Architecture

At a high level, the Docker Model Runner architecture is composed of the three components mentioned above (the runner, the distribution code, and the CLI), but there are also some interesting sub-components within each:

Figure 1: Docker Model Runner high-level architecture

How these components are packaged and hosted (and how they interact) also depends on the platform where they’re deployed. In each case it looks slightly different. Sometimes they run on the host, sometimes they run in a VM, sometimes they run in a container, but the overall architecture looks the same.

Model storage and client

The core architectural component is the model store. This component, provided by the model distribution code, is where the actual model tensor files are stored. These files are stored differently (and separately) from images because (1) they’re high-entropy and not particularly compressible and (2) the inference engine needs direct access to the files so that it can do things like mapping them into its virtual address space via mmap(). For more information, see the accompanying model distribution blog post.

The model distribution code also provides the model distribution client. This component performs operations (such as pulling models) using the model distribution protocol against OCI registries.

Model runner

Built on top of the model store is the model runner. The model runner maps inbound inference API requests (e.g. /v1/chat/completions or /v1/embeddings requests) to processes hosting pairs of inference engines and models. It includes scheduler, loader, and runner components that coordinate the loading of models in and out of memory so that concurrent requests can be serviced, even if models can’t be loaded simultaneously (e.g. due to resource constraints). This makes the execution lifecycle of models different from that of containers, with engines and models operating as ephemeral processes (mostly hidden from users) that can be terminated and unloaded from memory as necessary (or when idle). A different backend process is run for each combination of engine (e.g. llama.cpp) and model (e.g. ai/qwen3:8B-Q4_K_M) as required by inference API requests (though multiple requests targeting the same pair will reuse the same runner and backend processes if possible).

The runner also includes an installer service that can dynamically download backend binaries and libraries, allowing users to selectively enable features (such as CUDA support) that might require downloading hundreds of MBs of dependencies.

Finally, the model runner serves as the central server for all Docker Model Runner APIs, including the /models APIs (which it routes to the model distribution code) and the /engines APIs (which it routes to its scheduler). This API server will always opt to hold in-flight requests until the resources (primarily RAM or VRAM) are available to service them, rather than returning something like a 503 response. This is critical for a number of usage patterns, such multiple agents running with different models or concurrent requests for both embedding and completion.

Model CLI

The primary user-facing component of the Docker Model Runner architecture is the model CLI. This component is a standard Docker CLI plugin that offers an interface very similar to the docker image command. While the lifecycle of model execution is different from that of containers, the concepts (such as pushing, pulling, and running) should be familiar enough to existing Docker users.

The model CLI communicates with the model runner’s APIs to perform almost all of its operations (though the transport for that communication varies by platform). The model CLI is context-aware, allowing it to determine if it’s talking to a Docker Desktop model runner, Docker CE model runner, or a model runner on some custom platform. Because we’re using the standard Docker CLI plugin framework, we get all of the standard Docker Context functionality for free, making this detection much easier.

API design and routing

As previously mentioned, the Docker Model Runner comprises two sets of APIs: the Docker-style APIs and the OpenAI-compatible APIs. The Docker-style APIs (modeled after the /image APIs) include the following endpoints:

POST /models/create (Model pulling)

GET /models (Model listing)

GET /models/{namespace}/{name} (Model metadata)

DELETE /models/{namespace}/{name} (Model deletion)

The bodies for these requests look very similar to their image analogs. There’s no documentation at the moment, but you can get a glimpse of the format by looking at their corresponding Go types.

In contrast, the OpenAI endpoints follow a different but still RESTful convention:

GET /engines/{engine}/v1/models (OpenAI-format model listing)

GET /engines/{engine}/v1/models/{namespace}/{name} (OpenAI-format model metadata)

POST /engines/{engine}/v1/chat/completions (Chat completions)

POST /engines/{engine}/v1/completions (Chat completions (legacy endpoint))

POST /engines/{engine}/v1/embeddings (Create embeddings)

At this point in time, only one {engine} value is supported (llama.cpp), and it can also be omitted to use the default (llama.cpp) engine.

We make these APIs available on several different endpoints:

First, in Docker Desktop, they’re available on the Docker socket (/var/run/docker.sock), both inside and outside containers. This is in service of our design goal of having models as a first-class citizen in the Docker Engine API. At the moment, these endpoints are prefixed with a /exp/vDD4.40 path (to avoid dependencies on APIs that may evolve during development), but we’ll likely remove this prefix in the next few releases since these APIs have now mostly stabilized and will evolve in a backward-compatible way.

Second, also in Docker Desktop, we make the APIs available on a special model-runner.docker.internal endpoint that’s accessible just from containers (though not currently from ECI containers, because we want to have inference sandboxing implemented first). This TCP-based endpoint exposes just the /models and /engines API endpoints (not the whole Docker API) and is designed to serve existing tooling (which likely can’t access APIs via a Unix domain socket). No /exp/vDD4.40 prefix is used in this case.

Finally, in both Docker Desktop and Docker CE, we make the /models and /engines API endpoints available on a host TCP endpoint (localhost:12434, by default, again without any /exp/vDD4.40 prefix). In Docker Desktop this is optional and not enabled by default. In Docker CE, it’s a critical component of how the API endpoints are accessed, because we currently lack the integration to add endpoints to Docker CE’s /var/run/docker.sock or to inject a custom model-runner.docker.internal hostname, so we advise using the standard 172.17.0.1 host gateway address to access this localhost-exposed port (e.g. setting your OpenAI API base URL to http://172.17.0.1:12434/engines/v1). Hopefully we’ll be able to unify this across Docker platforms in the near future (see our roadmap below).

First up: Docker Desktop

The natural first step for Docker Model Runner was integration into Docker Desktop. In Docker Desktop, we have more direct control over integration with the Docker Engine, as well as existing processes that we can use to host the model runner components. In this case, the model runner and model distribution components live in the Docker Desktop host backend process (the com.docker.backend process you may have seen running) and we use special middleware and networking magic to route requests on /var/run/docker.sock and model-runner.docker.internal to the model runner’s API server. Since the individual inference backend processes run as subprocesses of com.docker.backend, there’s no risk of a crash in Docker Desktop if, for example, an inference backend is killed by an Out Of Memory (OOM) error.

We started initially with support for macOS on Apple Silicon, because it provided the most uniform platform for developing the model runner functionality, but we implemented most of the functionality along the way to build and test for all Docker Desktop platforms. This made it significantly easier to port to Windows on AMD64 and ARM64 platforms, as well as the GPU variations that we found there.

The one complexity with Windows was the larger size of the supporting library dependencies for the GPU-based backends. It wouldn’t have been feasible (or tolerated) if we added another 500 MB – 1 GB to the Docker Desktop for Windows installer, so we decided to default to a CPU-based backend in Docker Desktop for Windows with optional support for the GPU backend. This was the primary motivating factor for the dynamic installer component of the model runner (in addition to our desire for incremental updates to different backends).

This all sounds like a very well-planned exercise, and we did indeed start with a three-component design and strictly enforced API boundaries, but in truth we started with the model runner service code as a sub-package of the Docker Desktop source code. This made it much easier to iterate quickly, especially as we were exploring the architecture for the different services. Fortunately, by sticking to a relatively strict isolation policy for the code, and enforcing clean dependencies through APIs and interfaces, we were able to easily extract the code (kudos to the excellent git-filter-repo tool) into a separate repository for the purposes of open-sourcing.

Next stop: Docker CE

Aside from Docker’s penchant for open-sourcing, one of the main reasons that we wanted to make the Docker Model Runner source code publicly available was to support integration into Docker CE. Our goal was to package the docker model command in the same way as docker buildx and docker compose.

The trick with Docker CE is that we wanted to ship Docker Model Runner as a “vanilla” Docker CLI plugin (i.e. without any special privileges or API access), which meant that we didn’t have a backend process that could host the model runner service. However, in the Docker CE case, the boundary between host hardware and container processes is much less disruptive, meaning that we could actually run Docker Model Runner in a container and simply make any accelerator hardware available to it directly. So, much like a standalone BuildKit builder container, we run the Docker Model Runner as a standalone container in Docker CE, with a special named volume for model storage (meaning you can uninstall the runner without having to re-pull models). This “installation” is performed by the model CLI automatically (and when necessary) by pulling the docker/model-runner image and starting a container. Explicit configuration for the runner can also be specified using the docker model install-runner command. If you want, you can also remove the model runner (and optionally the model storage) using docker model uninstall-runner.

This unfortunately leads to one small compromise with the UX: we don’t currently support the model runner APIs on /var/run/docker.sock or on the special model-runner.docker.internal URL. Instead, the model runner API server listens on the host system’s loopback interface at localhost:12434 (by default), which is available inside most containers at 172.17.0.1:12434. If desired, users can also make this available on model-runner.docker.internal:12434 by utilizing something like –add-host=model-runner.docker.internal:host-gateway when running docker run or docker create commands. This can also be achieved by using the extra_hosts key in a Compose YAML file. We have plans to make this more ergonomic in future releases.

The road ahead…

The status quo is Docker Model Runner support in Docker Desktop on macOS and Windows and support for Docker CE on Linux (including WSL2), but that’s definitely not the end of the story. Over the next few months, we have a number of initiatives planned that we think will reshape the user experience, performance, and security of Docker Model Runner.

Additional GUI and CLI functionality

The most visible functionality coming out over the next few months will be in the model CLI and the “Models” tab in the Docker Desktop dashboard. Expect to see new commands (such as df, ps, and unload) that will provide more direct support for monitoring and controlling model execution. Also, expect to see new and expanded layouts and functionality in the Models tab.

Expanded OpenAI API support

A less-visible but equally important aspect of the Docker Model Runner user experience is our compatibility with the OpenAI API. There are dozens of endpoints and parameters to support (and we already support many), so we will work to expand API surface compatibility with a focus on practical use cases and prioritization of compatibility with existing tools.

containerd and Moby integration

One of the longer-term initiatives that we’re looking at is integration with containerd. containerd already provides a modular runtime system that allows for task execution coordinated with storage. We believe this is the right way forward and that it will allow us to better codify the relationship between model storage, model execution, and model execution sandboxing.

In combination with the containerd work, we would also like tighter integration with the Moby project. While our existing Docker CE integration offers a viable and performant solution, we believe that better ergonomics could be achieved with more direct integration. In particular, niceties like support for model-runner.docker.internal DNS resolution in Docker CE are on our radar. Perhaps the biggest win from this tighter integration would be to expose Docker Model Runner APIs on the Docker socket and to include the API endpoints (e.g. /models) in the official Docker Engine API documentation.

Kubernetes

One of the product goals for Docker Model Runner was a consistent experience from development inner loop to production, and Kubernetes is inarguably a part of that path. The existing Docker Model Runner images that we’re using for Docker CE will also work within a Kubernetes cluster, and we’re currently developing instructions to set up a Docker Model Runner instance in a Kubernetes cluster. The big difference with Kubernetes is the variety of cluster and application architectures in use, so we’ll likely end up with different “recipes” for how to configure the Docker Model Runner in different scenarios.

vLLM

One of the things we’ve heard from a number of customers is that vLLM forms a core component of their production stack. This was also the first alternate backend that we stubbed out in the model runner repository, and the time has come to start poking at an implementation.

Even more to come…

Finally, there are some bits that we just can’t talk about yet, but they will fundamentally shift the way that developers interact with models. Be sure to tune-in to Docker’s sessions at WeAreDevelopers from July 9–11 for some exciting announcements around AI-related initiatives at Docker.

Learn more

Explore the story behind our model distribution specification

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/

Why Docker Chose OCI Artifacts for AI Model Packaging

As AI development accelerates, developers need tools that let them move fast without having to reinvent their workflows. Docker Model Runner introduces a new specification for packaging large language models (LLMs) as OCI artifacts — a format developers already know and trust. It brings model sharing into the same workflows used for containers, with support for OCI registries like Docker Hub.

By using OCI artifacts, teams can skip custom toolchains and work with models the same way they do with container images. In this post, we’ll share why we chose OCI artifacts, how the format works, and what it unlocks for GenAI developers.

Why OCI artifacts?

One of Docker’s goals is to make genAI application development accessible to a larger community of developers. We can do this by helping models become first-class citizens within the cloud native ecosystem. 

When models are packaged as OCI artifacts, developers can get started with AI development without the need to learn, vet, and adopt a new distribution toolchain. Instead, developers can discover new models on Hub and distribute variants publicly or privately via existing OCI registries, just like they do with container images today! For teams using Docker Hub, enterprise features like Registry Access Management (RAM) provide policy-based controls and guardrails to help enforce secure, consistent access.

Packaging models as OCI artifacts also paves the way for deeper integration between inference runners like Docker Model Runner and existing tools like containerd and Kubernetes.

Understanding OCI images and artifacts

Many of these advantages apply equally to OCI images and OCI artifacts. To understand why images can be a less optimal fit for LLMs and why a custom artifact specification conveys additional advantages, it helps to first revisit the components of an OCI image and its generic cousin, the OCI artifact.

What are OCI images?

OCI images are a standardized format for container images, defined by the Open Container Initiative (OCI). They package everything needed to run a container: metadata, configuration, and filesystem layers.

An OCI image is composed of three main components:

An image manifest – a JSON file containing references to an image configuration and a set of filesystem layers.

An image configuration – a JSON file containing the layer ordering and OCI runtime configuration.

One or more layers – TAR archives (typically compressed), containing filesystem changesets that, applied in order, produce a container root filesystem.

Below is an example manifest from the busybox image:

{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.oci.image.config.v1+json",
"digest": "sha256:7b4721e214600044496305a20ca3902677e572127d4d976ed0e54da0137c243a",
"size": 477
},
"layers": [
{
"mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
"digest": "sha256:189fdd1508372905e80cc3edcdb56cdc4fa216aebef6f332dd3cba6e300238ea",
"size": 1844697
}
],
"annotations": {
"org.opencontainers.image.url": "https://github.com/docker-library/busybox",
"org.opencontainers.image.version": "1.37.0-glibc"
}
}

Because the image manifest contains content-addressable references to all image components, the hash of the manifest file, otherwise known as the image digest, can be used to uniquely identify an image.

What are OCI artifacts?

OCI artifacts offer a way to extend the OCI image format to support distributing content beyond container images. They follow the same structure: a manifest, a config file, and one or more layers. 

The artifact guidance in the OCI image specifications describes how this same basic structure (manifest + config + layers) can be used to distribute other types of content.The artifact type is designated by the config file’s media type. For example, in the manifest below config.mediaType is set to application/vnd.cncf.helm.config.v1+json. This indicates to registries and other tooling that the artifact is a Helm chart and should be parsed accordingly.

{
"schemaVersion": 2,
"config": {
"mediaType": "application/vnd.cncf.helm.config.v1+json",
"digest": "sha256:8ec7c0f2f6860037c19b54c3cfbab48d9b4b21b485a93d87b64690fdb68c2111",
"size": 117
},
"layers": [
{
"mediaType": "application/vnd.cncf.helm.chart.content.v1.tar+gzip",
"digest": "sha256:1b251d38cfe948dfc0a5745b7af5ca574ecb61e52aed10b19039db39af6e1617",
"size": 2487
}
]
}

In an OCI artifact, layers may be of any media type and are not restricted to filesystem changesets. Whoever defines the artifact type defines the supported layer types and determines how the contents should be used and interpreted.

Using container images vs. custom artifact types

With this background in mind, while we could have packaged LLMs as container images, defining a custom type has some important advantages:

A custom artifact type allows us to define a domain-specific config schema. Programmatic access to key metadata provides a support structure for an ecosystem of useful tools specifically tailored to AI use-cases.

A custom artifact type allows us to package content in formats other than compressed TAR archives, thus avoiding performance issues that arise when LLMs are packaged as image layers. For more details on how model layers are different and why it matters, see the Layers section below.

A custom type ensures that models are packaged and distributed separately from inference engines. This separation is important because it allows users to consume the variant of the inference engine optimized for their system without requiring every model to be packaged in combination with every engine.

A custom artifact type frees us from the expectations that typically accompany a container image. Standalone models are not executable without an inference engine. Packaging as a custom type makes clear that they are not independently runnable, thus avoiding confusion and unexpected errors.

Docker Model Artifacts

Now that we understand the high-level goals, let’s dig deeper into the details of the format.

Media Types

The model specification defines the following media types:

application/vnd.docker.ai.model.config.v0.1+json – identifies a model config JSON file. This value in config.mediaType in a manifest identifies an artifact as a Docker model with config file adhering to v0.1 of the specification.

application/vnd.docker.ai.gguf.v3 – indicates that a layer contains a model packaged as a GGUF file.

application/vnd.docker.ai.license – indicates that a layer contains a plain text software license file.

Expect more media types to be defined in the future as we add runtime configuration, add support for new features like projectors and LoRA adaptors, and expand the supported packaging formats for model files.

Manifest

A model manifest is formatted like an image manifest and distinguished by the config.MediaType. The following example manifest, taken from the ai/gemma3, references a model config JSON and two layers, one containing a GGUF file and the other containing the model’s license.

{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.docker.ai.model.config.v0.1+json",
"size": 372,
"digest": "sha256:22273fd2f4e6dbaf5b5dae5c5e1064ca7d0ff8877d308eb0faf0e6569be41539"
},
"layers": [
{
"mediaType": "application/vnd.docker.ai.gguf.v3",
"size": 2489757856,
"digest": "sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7"
},
{
"mediaType": "application/vnd.docker.ai.license",
"size": 8346,
"digest": "sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
}
]
}

Model ID

The manifest digest uniquely identifies the model and is used by Docker Model Runner as the model ID.

Model Config JSON

The model configuration is a JSON file that surfaces important metadata about the model, such as size, parameter count, quantization, as well as metadata about the artifact provenance (like the creation timestamp).The following example comes from the ai/gemma model on Dockerhub:

{
"config": {
"format": "gguf",
"quantization": "IQ2_XXS/Q4_K_M",
"parameters": "3.88 B",
"architecture": "gemma3",
"size": "2.31 GiB"
},
"descriptor": {
"created": "2025-03-26T09:57:32.086694+01:00"
},
"rootfs": {
"type": "rootfs",
"diff_ids": [
"sha256:09b370de51ad3bde8c3aea3559a769a59e7772e813667ddbafc96ab2dc1adaa7",
"sha256:a4b03d96571f0ad98b1253bb134944e508a4e9b9de328909bdc90e3f960823e5"
]
}
}

By defining a domain-specific configuration schema, we allow tools to access and use model metadata cheaply — by fetching and parsing a small JSON file — only fetching the model itself when needed.For example, a registry frontend like Docker Hub can directly surface this data to users who can, in turn, use it to compare models or select based on system capabilities and requirements. Tooling might use this data to estimate memory requirements for a given model. It could then assist in the selection process by suggesting the best variant that is compatible with the available resources.

Layers

Layers in a model artifact differ from layers within an OCI image in two important respects.

Unlike an image layer, where compression is recommended, model layers are always uncompressed. Because models are large, high-entropy files, compressing them provides a negligible reduction in size, while (un)compressing is time and compute-intensive.

In contrast to a layer in an OCI image, which contains multiple files in an archive, each “layer” in a model artifact must contain a single raw file. This allows runtimes like Docker Model Runner to reduce disk usage on the client machine by storing a single uncompressed copy of the model. This file can then be directly memory mapped by the inference engine at runtime.

The lack of file names, hierarchy, and metadata (e.g. modification time) ensures that identical model files always result in identical reusable layer blobs. This prevents unnecessary duplication, which is particularly important when working with LLMs, given the file size.

You may have noticed that these “layers” are not really filesystem layers at all. They are files, but they do not specify a filesystem. So, how does this work at runtime? When Docker Model runner runs a model, instead of finding the GGUF file by name in a model filesystem, the desired file is identified by its media type (application/vnd.docker.ai.gguf.v3) and fetched from the model store. For more information on the Model Runner architecture, please see the architecture overview in this accompanying blog post.

Distribution

Like OCI images and other OCI artifacts, Docker model artifacts are distributed via registries like Dockerhub, Artifactory, or Azure Container Registry that comply with the OCI distribution specification.

Discovery

Docker Hub

The Docker Hub Gen AI catalog aids in the discovery of popular models. These models are packaged in the format described here and are compatible with Docker Model Runner and any other runtime that supports the OCI specification.

Hugging Face

If you are accustomed to exploring models on Hugging Face, there’s good news! Hugging Face now supports on-demand conversion to the Docker Model Artifact format when you pull from Hugging Face with docker model pull.

What’s Next?

Hopefully, you now have a better understanding of the Docker OCI Model format and how it supports our goal of making AI app development more accessible to developers via familiar workflows and commands. But this version of the artifact format is just the beginning! In the future, you can expect the enhancements to the packaging format to bring this level of accessibility and flexibility to a broader range of use cases. Future versions will support:

Additional runtime configuration options like templates, context size, and default parameters. This will allow users to configure models for specific use cases and distribute that config alongside the model, as a single immutable artifact.

LoRA adapters, allowing users to extend existing model artifacts with use-case-specific fine-tuning.

Multi-modal projectors, enabling users to package multi-modal such as language-and-vision models using LLaVA-style projectors.

Model index files that provide a set of models with different parameter count and quantizations, allowing runtimes the best option for the available resources.

In addition to adding features, we are committed to fostering an open ecosystem. Expect:

Deeper integrations into containerd for a more native runtime experience.

Efforts to harmonize with ModelPack and other model packaging standards to improve interoperability.

These advancements show our ongoing commitment to making the OCI artifact a versatile and flexible way to package and run AI models, delivering the same ease and reliability developers already expect from Docker.

Learn more

Get an inside look at the design architecture of the Docker Model Runner. 

Read our quickstart guide to Docker Model Runner.

Find documentation for Model Runner.

Subscribe to the Docker Navigator Newsletter.

New to Docker? Create an account. 

Have questions? The Docker community is here to help.

Quelle: https://blog.docker.com/feed/