How to add MCP Servers to OpenAI’s Codex with Docker MCP Toolkit

AI assistants are changing how we write code, but their true power is unleashed when they can interact with specialized, high-precision tools. OpenAI’s Codex is a formidable coding partner, but what happens when you connect it directly to your running infrastructure?

Enter the Docker MCP Toolkit.

The Model Context Protocol (MCP) Toolkit acts as a secure bridge, allowing AI models like Codex to safely discover and use any of the 200+ MCP servers from the trusted MCP catalog curated by Docker.

In this post, we’ll walk through an end-to-end demo, just like our Claude Code and Gemini CLI tutorials. But this time, we’re pairing Codex with Neo4j MCP servers.

First, we’ll connect Codex to the Neo4j server using the MCP Toolkit. Then, we’ll show a fun example: building a graph of Pokémon species and their types, and exploring the data visually. While playful, this example highlights how Codex + MCP can be applied to real-world, semi-structured data pipelines.

Read on to see how a generic AI assistant, when supercharged with Docker and MCP, can evolve into a specialized data engineering powerhouse!

Why use Codex with Docker MCP

While Codex provides powerful AI capabilities and MCP provides the protocol, Docker MCP Toolkit makes automated data modeling and graph engineering practical. Without containerization, building a knowledge graph means managing local Neo4j installations, dealing with database driver versions, writing boilerplate connection and authentication code, and manually scripting the entire data validation and loading pipeline. A setup that should take minutes can easily stretch into hours for each developer.

Docker MCP Toolkit eliminates this friction:

200+ pre-built MCP servers in the Catalog

One-click deployment through Docker Desktop

Neo4j Data Modeling MCP for schema design and validation

Neo4j Cypher MCP for direct database queries and ingestion

Secure credential management for database passwords

Consistent configuration across Mac, Windows, and Linux

Automatic updates when new server versions are released

We built Docker MCP Toolkit to meet developers where they are. If you’re using Codex, you should be able to engineer a knowledge graph without wrestling with database infrastructure.

The Setup: Connecting Codex to Neo4j Tools

Prerequisites

First, we need to give Codex access to the specialized Neo4j tools. 

Install Codex and run it at least once to get authentication out of the way

Install Docker Desktop 4.40 or later

Enable MCP Toolkit 

Step 1: Add the Neo4j MCP Servers

The Neo4j Cypher and Data Modeling servers are available out-of-the-box in the main MCP Toolkit catalog.

In Docker Desktop, navigate to the MCP Toolkit tab.

Click the Catalog tab.

Search for “Neo4j” and click + Add for both the Neo4j Cypher and Neo4j Data Modeling servers.

They will now appear in your “My servers” list.

Step 2: Connect Codex to the MCP Toolkit

With our tools ready, we run a one-time command to make Codex aware of the MCP Toolkit:

docker mcp-client configure codex

We can also do that from the Docker Desktop UI, navigate to the clients tab, and smash that connect button next to Codex and any other assistants you use:

Docker will edit the corresponding configuration files and next time Codex starts, it’ll connect to the MCP toolkit and you’ll have the tools at your disposal!

Step 3: Start and Configure Neo4j

We still need to configure the Neo4j Cypher MCP server to connect to the Neo4j database, so we’ll set this up now. We’ll use Codex to start our Neo4j database and configure the connection. First, we ask Codex to create the container:

› Spin up a Neo4j container for me in Docker please.

Codex will run the necessary Docker command, and get our Neo4j container running. You can of course do this manually, use a cloud service, or download the Neo4j Desktop application, but since we’re having fun in Codex – then why not make it do these mundane things for us.

With Neo4j available in the container we now need to configure the Neo4j Cypher MCP server to connect to it. The Neo4j Data Modeling MCP server works without any configuration. To simplify you can take a screenshot of the configuration form and ask Codex for the information on how to fill it to connect to the container we just started.

Next, we asked Codex for help, pointing it to a screenshot of the configuration page:

› There is a screenshot in this directory. It has the information we need to configure, MCP server to connect to the Neo4j database. Please look at it and give me the information to configure the MCP server to connect to our Neo4j container.

Codex inspected the container, found the running container, and determined its authentication details (e.g., NEO4J_AUTH=neo4j/password). It then provided the correct configuration values. We just need to update the Neo4j Cypher MCP server configuration in the Docker Desktop UI with the neo4j-cypher.url (neo4j://localhost:7687) and the password.

Fill the form in the Docker MCP toolkit for the Neo4j Cypher MCP server and restart codex now.

Use Case: Building a Pokémon Graph with Codex + Neo4j

Now we’re ready to tackle our original problem of building the graph of Pokémon species. First of all, enter the YOLO mode by typing /approvals and selecting the mode where it has freedom to do whatever it pleases:

Next, we can throw the full power of Codex models at our task. We give it the full problem statement: 

› Here's the page with all pokemon in generation 1: https://bulbapedia.bulbagarden.net/wiki/List_of_Pok%C3%A9mon_by_index_number_in_Generation_I
and here's the types information: https://bulbapedia.bulbagarden.net/wiki/Type

we're going to create a graph with data about these pokemon (ignore MissingNo entries – they are a glitch and ignore other glitched pokemon).
you have a neo4j container running, and the MCP tools for modelling and cypher.

Codex will accept the task, break it down into a plan, and start cranking at it. 

There are a few interesting moments in how it approaches it, though it largely depends on the model you configure Codex with. For me it was “model: gpt-5-codex” and it tends to use scripting more than chaining curl commands for example. 

Here’s the plan you might see if you’re following along:

• I’m preparing to build a Pokemon graph by fetching data from a website using network access. Since this is a complex task, I’ll create a detailed plan involving data gathering, modeling
entities like Pokémon and Types, defining relationships such as evolutions and type effectiveness, and then using tools like MCP Docker to build the Neo4j graph. The first step is to draft
• Updated Plan
└ □ Inspect requirements and gather Generation I Pokémon data with types and evolution chains
□ Define Neo4j graph data model for Pokémon, Types, and relationships
□ Load data into Neo4j via Cypher ingestion and validate

Then it immediately started cranking python scripts with requests, BeautifulSoup and other web scraping friends to fetch the data from the provided wiki. It’s interesting that the Pokemon data is not actually on a single page, but spread through subpages for every Pokemon species, so perhaps scraping with Python makes a ton of sense over curling the pages and spending tokens on processing the data in the model. And it leaves the scripts behind so you can verify and reproduce the results if needed. 

Eventually the model proceeds to analyze the data, and here the MCP tools become really helpful. You can see below that it’s calling the MCP_DOCKER.validate_data_model from the server we configured earlier.

I'm outlining a comprehensive data model for Pokemon, types, and their relationships with precise property types and units converted for better clarity. I plan to generate ingestion-ready
JSON files via a Python script to integrate cleanly with Neo4j ingestion tools, ensuring the data pipeline is robust and structured.
└ wrote data model to data/gen1_pokemon_model.json

• Called MCP_DOCKER.validate_data_model({"data_model":"data/gen1_pokemon_model.json","return_validated":true})
└ Input validation error: 'data/gen1_pokemon_model.json' is not of type 'object'
….

It took Codex a few attempts to figure out the data model json schema and produce one for the Pokémon that satisfied the Neo4j Data Modelling MCP server. 

Then it returned to Python for creating the data ingestion script and loaded the data into the Neo4j instance.

A few MCP tool calls later to query the data with cypher (query language for graph databases) which it can do because it has access to the MCP server for Neo4j Cypher. And with it, Codex and the MCP servers can answer analytical questions about our data.

– Greedy type-coverage search suggests trios such as (Rhydon, Parasect, Dragonite) or (Rhydon, Parasect, Jynx) hit 13 of the 15 defending types super-effectively; no trio can cover Normal/Rock simultaneously because Normal has no offensive 2× matchup.

Now what’s really fun about Neo4j is that it comes with a terrific console where you can explore the data. 

While our Neo4j container with the Pokémon data is still running we can go to http://localhost:7474, enter neo4j/password credentials and get to explore the data in a visual way. 

Here for example is a subset of the Pokémon and their type relationships.

And if you know Cypher or have an AI assistant that can generate Cypher queries (and verify they work with an MCP tool call), you can generate more complex projections of your data, for example this (generated by Codex) shows all Pokémon, their evolution relationships and primary/secondary types.

MATCH (p:Pokemon)
CALL {
WITH p
OPTIONAL MATCH (p)-[:EVOLVES_TO*1..]->(evo:Pokemon)
WITH collect(DISTINCT evo) AS evos
RETURN [e IN evos WHERE e IS NOT NULL | {node: e, relType: 'EVOLVES_TO'}] AS evolutionConnections
}
CALL {
WITH p
OPTIONAL MATCH (p)-[:HAS_TYPE]->(type:Type)
WITH type
ORDER BY type.name // ensures a stable primary/secondary ordering
RETURN collect(type) AS orderedTypes
}
WITH p, evolutionConnections, orderedTypes,
CASE WHEN size(orderedTypes) >= 1 THEN orderedTypes[0] END AS primaryType,
CASE WHEN size(orderedTypes) >= 2 THEN orderedTypes[1] END AS secondaryType
WITH p,
evolutionConnections +
CASE WHEN primaryType IS NULL THEN [] ELSE [{node: primaryType, relType: 'HAS_PRIMARY_TYPE'}] END +
CASE WHEN secondaryType IS NULL THEN [] ELSE [{node: secondaryType, relType: 'HAS_SECONDARY_TYPE'}] END AS connections
UNWIND connections AS connection
RETURN p AS pokemon,
connection.node AS related,
connection.relType AS relationship
ORDER BY pokemon.name, relationship, related.name;

Turn Your AI Coding Assistant into a Data Engineer, Architect, Analyst and More

While this Pokémon demo is a fun example, it’s also a practical blueprint for working with real-world, semi-structured data. Graph databases like Neo4j are especially well-suited for this kind of work. Their relationship-first model makes it easier to represent the complexity of real-world systems.

In this walkthrough, we showed how to connect OpenAI’s Codex to the Neo4j MCP Servers via Docker MCP Toolkit, enabling it to take on multiple specialized roles:

Data Engineer: Writing Python to scrape and transform web data

Data Architect: Designing and validating graph models using domain-specific tools

DevOps Engineer: Starting services and configuring tools based on its environment

Data Analyst: Running complex Cypher and Python queries to extract insights

In your own projects, you might ask your AI assistant to “Analyze production logs and identify the cause of performance spikes,” “Migrate the user database schema to a new microservice,” or “Model our product catalog from a set of messy CSVs.”

Summary

The Docker MCP Toolkit bridges the gap between powerful AI coding agents and the specialized tools they need to be truly useful. By providing secure, one-click access to a curated catalog of over 200 MCP servers, it enables AI agents to interact with real infrastructure, including databases, APIs, command-line tools, and more. Whether you’re automating data workflows, querying complex systems, or orchestrating services, the MCP Toolkit equips your assistant to work like a real developer. If you’re building with AI coding assistants and want it to go beyond code generation, it’s time to start integrating with the tools your stack already relies on!

Learn more

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Open Docker Desktop and get started with the MCP Toolkit (Requires version 4.48 or newer to launch the MCP Toolkit automatically)

Read our tutorial on How to Add MCP Servers to Claude Code with Docker MCP Toolkit

Read our tutorial on How to Add MCP Servers to Gemini CLI with Docker MCP Toolkit

Quelle: https://blog.docker.com/feed/

Debug Docker Builds with Visual Studio Code

Building Docker images is an important component of the software delivery pipeline for modern applications. It’s how we package our apps and services so that they can be distributed to others and deployed to production. While the Dockerfile has long been the standard for defining container images, it is known to be challenging to make changes and debug when issues arise. It’s currently a real pain to understand the build time state during the different stages of the build. What was the state of the ARG? Which files were copied into the image?

Recently, we have been making updates to the Docker Build clients (Buildx) and our VS Code extension (Docker DX) to improve the developer experience when using Docker. Today, we are sharing the next stage of that process with the introduction of Build Debugging in VS Code and Docker Build.

With the new debugging feature in Buildx from Docker, you will be able to reduce the time you spend fixing your Docker builds. In this post, you’ll learn how to configure the Buildx debugger in Visual Studio Code, step through a build and inspect variables and the image’s file system, and open a shell inside the image being built. Finally, you will learn a little about the debugger’s implementation and how it can be integrated into other editors.

Configuring Visual Studio Code

To start debugging Dockerfiles in Visual Studio Code:

Install the latest version of the Docker DX extension.

Update to the latest version of Docker Desktop to ensure you have the latest Docker build tooling.

Run docker buildx version and verify that your Buildx is at least version 0.29.x.

Creating a Launch Configuration

Open up your Dockerfile and open the Run and Debug view in Visual Studio Code. If you do not have any launch configurations, you will see something like the following.

Figure 1: Run and Debug view opened in Visual Studio Code with no launch configurations defined.

Click on the “create a launch.json file” hyperlink. If you have launch configurations, open up your launch.json file by clicking on the cog icon in the top right hand corner of the Run and Debug view.

In your launch.json file, create a new launch configuration for debugging your Docker build. You can use the sample below to get started. For a full description of the various attributes in a launch configuration, see here.

{
"name": "Docker: Build",
"type": "dockerfile",
"request": "launch",
"dockerfile": "Dockerfile",
"contextPath": "${workspaceFolder}"
}

Adding a Breakpoint

Now that you have completed setting up your launch configuration, let’s add a breakpoint to our Dockerfile. Place a breakpoint next to one of your RUN instructions by clicking in the editor’s left margin or by pressing F9. A circle should appear to indicate that a breakpoint has been added.

Launching the Debugger

We are now ready to start the debugger. Select the launch configuration you created and then hit F5. The build should pause at the RUN line where you placed the breakpoint.

Figure 2: Docker build suspended by a breakpoint in Visual Studio Code.

Debugging Features

We will now walk you through the three different features that the Buildx Debugger provides.

Inspecting Variables

When a build is in a suspended state, you can look at any variables that may have been defined. In this example, by looking at the executed command’s workdir value on the left-hand side, we can now see that the command is not being run in the right folder as we had copied the contents into /app. We can fix this by adding WORKDIR /app before the RUN line. Also note that we can view variables that have been defined by our image and the base image as seen by VAR and NODE_VERSION.

Figure 3: Docker build encounters an error and is suspended by the debugger instead of terminating.

File Explorer

In addition to inspecting variables, you can also look at the structure of the file system to see what is already there and what you have copied in. For text files, you can also see its file content as shown in the file’s data field.

Figure 4: View the file system of the Docker image being built.

Interactive Debugging

Creating the right Dockerfile is often an iterative process. Part of this is usually because the host system you are developing on shares few similarities with the image you are building. Consider the differences between running Ubuntu locally but trying to build an Alpine Linux image. The small differences in package names creates a lot of back and forth between your editor and your browser as you search for the right name. You add a line here and then maybe comment another line somewhere else before running docker build again to just hope for the best.

This iterative process can now be streamlined with the help of the debugger. When your build is in a suspended state, open the Debug Console view and then place your cursor in the input field at the bottom. Type in exec and then hit the enter key. The Terminal view should now open with a shell that is attached to the image that is being built.

Figure 5: Use the Debug Console to open a shell into the Docker image being built by running exec.

Figure 6: The Docker image that is being built can now be accessed and inspected with a terminal.

This feature is a game changer as you can now easily open the image of a Dockerfile at any given step and inspect its content and run commands for testing. Previously, we would have to comment everything after the buggy line, build the Docker image, and then manually run and open a shell into the image. All of that is now condensed into adding a breakpoint in your editor and starting a debug session!Keep in mind that none of the changes you make in the terminal are persisted so this is purely for experimentation. In the figure below, we can see that a file was created when the debugger was paused at line 3. When the debugger was advanced to line 4, the file disappeared.

Figure 7: Changes to the Docker image inside the exec terminal will be reset when the debugger steps to another line.

Integrations powered by an Open Specification

Just like our work with the Docker Language Server that implements the Language Server Protocol, the Buildx debugger is built on open standards as it implements the Debug Adapter Protocol which means that you can debug Dockerfile builds with any editor that supports the protocol. Besides Visual Studio Code, we also provide an official plugin for Neovim. For the JetBrains users out there, we have verified that it integrates well with the LSP4IJ plugin. If your favourite editor supports the Debug Adapter Protocol, there should be a way for the Buildx debugger to integrate with it.

Thank You

We want to take this opportunity to thank Kohei Tokunaga (ktock) for his ideas and initial work around this feature. The contributions he provided to Buildx gave us a great foundation for us to build out and complete this feature. This release would not have been possible without his help. Thank you, Kohei!

Next Steps

Download the Docker DX extension and try out the new debugging feature.

Share feedback and issues with us in our GitHub repositories for Docker DX and Buildx.

You can also submit feedback through the Docker feedback page.

Learn More

Setup the Buildx debugger in Neovim with nvim-dap-docker.

Setup the Buildx debugger in a JetBrains editor with the LSP4IJ plugin.

Read the Buildx documentation about our implementation of DAP and configuring launches.

Quelle: https://blog.docker.com/feed/

How to add MCP Servers to Gemini CLI with Docker MCP Toolkit

In the rapidly evolving landscape of AI-assisted development, most developers continue to struggle with clunky web interfaces, resource-intensive IDEs, and fragmented toolchains. But what if we told you there’s a combination that pairs Google’s 76.3K-star Gemini CLI (in just 5 months) with Docker’s innovative MCP Toolkit, quietly revolutionizing how modern AI developers work?

Enter the powerhouse duo: Gemini CLI and Docker MCP Toolkit. This isn’t just another tool combination — it’s a paradigm shift that delivers developer AI assistance with zero complexity overhead. A one-time setup of just 5 minutes can save you 20 minutes per test scenario — that’s a 97% time reduction that compounds with every test you run.

Let’s look at a concrete example: browser testing and performance analysis that developers do regularly can be simplified and completely automated. What used to require opening browsers, clicking through flows, analyzing DevTools, taking screenshots, and manually documenting buys can now happen in a single 30-second conversation.

In this guide, you’ll learn how to:

Set up Gemini CLI and connect it to Docker MCP Toolkit

Configure the Playwright MCP server for browser automation

Configure the GitHub MCP server for issue creation

Configure the Filesystem MCP server for saving test artifacts

Automate browser testing that discovers real bugs and creates documented GitHub issues

See how Gemini CLI can analyze performance, capture screenshots, and report findings—all without leaving your terminal

With 220+ pre-built MCP servers, browser automation becomes as simple as having a conversation. No Selenium WebDriver configuration, no CI/CD pipeline complexity, no manual screenshot management — just natural language instructions that execute real browser tests.

Why Gemini CLI and Docker MCP Toolkit Work Better Together

Manual browser testing and performance analysis are broken. You open Chrome DevTools, click through pages, check network requests, analyze performance metrics, take screenshots, write bug reports, and then manually create GitHub issues. This context-switching nightmare wastes hours every sprint.

Traditional automation tools don’t solve the real problem. Selenium requires brittle selectors and a complex setup. Playwright needs JavaScript knowledge and test frameworks. Both require maintaining test scripts that break with every UI change. The “solution” often takes more time than manual testing.

While Gemini provides powerful AI capabilities and MCP provides the protocol, Docker MCP Toolkit makes browser automation practical. Without containerization, setting up browser testing means managing Chrome/Firefox installations, dealing with WebDriver versions, configuring Node.js dependencies, handling screenshot directories manually, and different configurations for every developer’s machine. The setup that should take 2 minutes takes 2-6 hours per developer.

Docker MCP Toolkit eliminates this friction:

220+ pre-built MCP servers in the catalog

One-click deployment through Docker Desktop

Playwright MCP with browsers pre-installed (Chrome, Firefox, WebKit)

GitHub MCP for automated issue creation

Filesystem MCP for artifact storage

Secure credential management via OAuth or encrypted storage

Consistent configuration across Mac, Windows, and Linux

Automatic updates when new server versions are released

We built Docker MCP Toolkit to meet developers where they are. If you’re using Gemini CLI, you should be able to automate browser testing without wrestling with infrastructure.

Your tests run securely on your machine. Everything executes in isolated Docker containers on your local system. Test data, screenshots, and application access never leave your computer. No cloud uploads, no third-party services, no compliance concerns. You get enterprise-grade browser automation with complete privacy.

Setting up Gemini CLI in Docker MCP Toolkit

Prerequisites

Install Docker Desktop 4.40 or later

Enable MCP Toolkit 

Step 1. Install Gemini CLI

Install via npm:

npm install -g @google/gemini-cli

Step 2. Launch and authenticate

Once installed, just type gemini in your terminal window.

gemini

Step 3. Log in via Google

Follow the setup wizard:

Select a preferred theme style from the options.

Choose a login method. I recommend “Login with Google”, which allows up to 60 requests/minute and 1,000 requests/day for free

In case you need higher rate limits or enterprise access, I suggested you use an API key from Google AI Studio. You can easily set it as an environment variable:

export GEMINI_API_KEY="YOUR_API_KEY"

After selecting your sign-in method, a browser window will open. Simply log in with your Google account

Step 4. Start chatting with Gemini

Just type “gemini” in your terminal window to start chatting with Gemini and enter your prompt.

Connect Gemini CLI to Docker MCP Toolkit

Option 1: One-Click Connection (Recommended)

Open Docker Desktop

Navigate to MCP Toolkit in the sidebar

Click the Clients tab

Find “Gemini” in the list.

Click Connect

Docker Desktop automatically configures the MCP Gateway connection – the underlying infrastructure that routes requests between Gemini CLI and your MCP servers, handling authentication, containerisation, and secure communication seamlessly.

Option 2: Manual Command Line SetupIf you prefer a command-line setup or need to configure a specific project:

Navigate to your project folder in the terminal

Run this command:

docker mcp client connect gemini –global

You’ll see output like this:

=== System-wide MCP Configurations ===
● gemini: connected
MCP_DOCKER: Docker MCP Catalog (gateway server) (stdio)
● gordon: connected
MCP_DOCKER: Docker MCP Catalog (gateway server) (stdio)

You might have to restart 'gemini'.

The connected status confirms Gemini CLI is linked to the Docker MCP Gateway.

What’s happening under the hood?

The Gemini CLI uses the mcpServers configuration in your settings.json file to locate and connect to MCP servers. This configuration supports multiple servers with different transport mechanisms. The mcpServers object is where you define each MCP server you want the CLI to connect to.

Whenever you hit the “Connect” button under the Docker MCP Client for Gemini CLI, it adds the following Docker MCP Gateway configuration to the ~/.gemini/settings.json file.

{
"theme": "Default",
"selectedAuthType": "oauth-personal",
"mcpServers": {
"MCP_DOCKER": {
"command": "docker",
"args": ["mcp", "gateway", "run"],
"env": {}
}
}
}

Learn more about MCP and Gemini CLI interaction through this link.

Step 5. Restart Gemini CLI

# Exit Gemini CLI if running, then restart
gemini

Step 6. Verify the Connection

Inside Claude Code, type /mcp to see available MCP servers. 

You should see the Docker MCP Gateway listed, which provides access to all enabled MCP servers. The /MCP_DOCKER tools indicate a successful connection. As you enable more MCP servers in Docker Desktop, they’ll appear here automatically.

First Run: What to Expect

When you start Gemini CLI for the first time after connecting to Docker MCP Toolkit, you’ll see a prompt about the new MCP server:

Choose Option 1 (recommended). This configures your project to automatically use Docker MCP Toolkit and any MCP servers you enable in Docker Desktop. You won’t need to approve MCP servers individually each time.

You’re now ready to use Gemini with MCP servers from Docker Desktop.

Real-World Demo: Automated Browser Testing and Performance Analysis

Now that you’ve connected Gemini CLI to Docker MCP Toolkit, let’s see it in action with a practical example. We’ll automatically discover real bugs through browser testing and identify performance bottlenecks through detailed analysis — the kind that would take 20 minutes of manual testing, DevTools monitoring, and performance profiling.

What Makes This Realistic?

This isn’t a trivial “Hello World” demo. We’re performing comprehensive browser testing and performance analysis on a real e-commerce application with the kinds of issues you encounter in production:

Uses actual application running on localhost

Executes functional browser tests (navigation, element inspection, console monitoring)

Discovers genuine performance bottlenecks through browser DevTools analysis

Identifies accessibility violations that affect real users

Captures evidence with screenshots and console logs

Measures real performance metrics: page load times, network requests, resource usage

Creates properly formatted GitHub issues with actionable recommendations

Time investment:

Manual process: ~20 minutes (opening browsers, clicking through flows, DevTools analysis, performance profiling, documentation, issue creation)

Automated with Gemini CLI + MCP: ~30 seconds total

That’s a 97% time reduction, but more importantly, it’s consistent, thorough, and documented every time.

What We’re Testing

The catalog-service-node application is a realistic e-commerce catalog with intentional issues that mirror common production problems:

Performance Issues:

No pagination – loads all 15 products at once (will degrade with scale)

Duplicate API calls – requests /api/products twice unnecessarily

Missing optimization – unoptimized loading patterns

Accessibility Issues:

Missing product images – placeholder buttons instead of actual images

Vague button labels – “Fetch” and “Upload” aren’t descriptive for screen readers

Missing ARIA labels – table structure not properly announced

Browser Issues:

Missing favicon – generates 404 errors in console

Console warnings – duplicate request warnings

Let’s see if Gemini CLI can discover all of these automatically through intelligent browser testing and performance analysis, then create a comprehensive GitHub issue.

Step 1: Set Up a Real-World e-Commerce Catalog application

For this demo, we’ll use a real e-commerce catalog application. This gives us realistic performance and accessibility issues to discover.

Clone the repository:

git clone https://github.com/ajeetraina/catalog-service-node
cd catalog-service-node

Start all services:

# Start Docker services (database, S3, Kafka)
docker compose up -d

# Install dependencies
npm install –omit=optional

# Start the application
npm run dev

Verify it’s running:

Frontend: http://localhost:5173

API: http://localhost:3000

Step 2: Seed Test Data

To make testing realistic, create sample products:

# Create seed script
cat > seed-data.sh << 'EOF'
#!/bin/bash
API_URL="http://localhost:3000/api"

echo "Seeding test products…"

curl -s -X POST "$API_URL/products"
-H "Content-Type: application/json"
-d '{"name":"Vintage Camera","description":"Classic 35mm film camera","price":299.99,"upc":"CAM001"}'
> /dev/null && echo "✅ Vintage Camera"

curl -s -X POST "$API_URL/products"
-H "Content-Type: application/json"
-d '{"name":"Rare Vinyl Record – LAST ONE!","description":"Limited edition. Only 1 left!","price":149.99,"upc":"VINYL001"}'
> /dev/null && echo "✅ Rare Vinyl Record"

curl -s -X POST "$API_URL/products"
-H "Content-Type: application/json"
-d '{"name":"Professional DSLR Camera","description":"50MP camera with 8K video","price":2499.99,"upc":"CAMPRO001"}'
> /dev/null && echo "✅ Professional DSLR"

# Add bulk test products
for i in {4..15}; do
curl -s -X POST "$API_URL/products"
-H "Content-Type: application/json"
-d "{"name":"Test Product $i","description":"Bulk test product $i","price":$((50 + RANDOM % 450)).99,"upc":"BULK$(printf '%03d' $i)"}"
> /dev/null && echo "✅ Test Product $i"
done

echo ""
TOTAL=$(curl -s "$API_URL/products" | jq '. | length')
echo "Total products: $TOTAL"
echo "Ready! Visit http://localhost:5173"
EOF

chmod +x seed-data.sh
./seed-data.sh

Expected output:

Seeding test products…
✅ Vintage Camera
✅ Rare Vinyl Record
✅ Professional DSLR
✅ Test Product 4
✅ Test Product 5

✅ Test Product 15

Total products: 15
Ready! Visit http://localhost:5173

Now you have a realistic environment with 15 products to analyze.

Configure MCP Servers

For browser testing and performance analysis automation, you’ll orchestrate three MCP servers:

Playwright MCP – Controls browsers, takes screenshots, captures console logs

GitHub MCP – Creates issues automatically with full context

Filesystem MCP – Saves screenshots and test artifacts

Let’s configure each one.

Configure Playwright MCP (Browser Automation)

The Playwright MCP server gives Gemini the ability to control real browsers, Chrome, Firefox, and WebKit, just like a human would.

In Docker Desktop:

Open Docker Desktop → MCP Toolkit → Catalog

Search for “Playwright” or “Browser”

Find Playwright (Browser Automation) in the results

Click + Add

The server will be added with default configuration (no additional setup needed)

Click Start Server

What you get:

21+ browser automation tools including:

browser_navigate – Navigate to URLs

browser_snapshot – Capture page state for analysis

browser_take_screenshot – Save visual evidence

browser_click, browser_type – Interact with elements

browser_console_messages – Get console errors

browser_network_requests – Analyze HTTP requests

The Playwright MCP runs in a secure Docker container with browsers pre-installed. No manual ChromeDriver setup, no WebDriver conflicts, no OS-specific browser installations.

Configure GitHub MCP (Issue Creation)

The GitHub MCP enables Gemini to create issues, PRs, and manage repositories on your behalf.

Option 1: OAuth Authentication (Recommended – Easiest)

In MCP Toolkit → Catalog, search “GitHub Official”

Click + Add

Go to the OAuth tab in Docker Desktop

Find the GitHub entry

Click “Authorize”

Your browser opens GitHub’s authorization page

Click “Authorize Docker” on GitHub

You’re redirected back to Docker Desktop

Return to Catalog tab, find GitHub Official

Click Start Server

Advantage: No manual token creation. Authorization happens through GitHub’s secure OAuth flow with automatic token refresh.

Option 2: Personal Access Token (For Granular Control)

If you prefer manual control or need specific scopes:

Step 1: Create GitHub Personal Access Token

Go to https://github.com  and sign in

Click your profile picture → Settings

Scroll to “Developer settings” in the left sidebar

Click “Personal access tokens” → “Tokens (classic)”

Click “Generate new token” → “Generate new token (classic)”

Name it: “Docker MCP Browser Testing”

Select scopes:

repo (Full control of repositories)

workflow (Update GitHub Actions workflows)

Click “Generate token”

Copy the token immediately (you won’t see it again!)

Step 2: Configure in Docker Desktop

In MCP Toolkit → Catalog, find GitHub Official

Click + Add (if not already added)

Go to Configuration tab

Select “Personal Access Token” as the authentication method

Paste your token

Click Start Server

Or via CLI:

docker mcp secret set GITHUB.PERSONAL_ACCESS_TOKEN=github_pat_YOUR_TOKEN_HERE

Configure Filesystem MCP (Screenshot Storage)

The Filesystem MCP allows Gemini to save screenshots and test artifacts to your local machine.

In Docker Desktop:

Go to MCP Toolkit → Catalog

Search for “Filesystem”

Find Filesystem (Reference) and click + Add

Go to the Configuration tab

Under filesystem.paths, add your project directory:

Example: /Users/yourname/catalog-service-node

Or wherever you cloned the repository

You can add multiple paths by clicking the + button

Click Save

Click Start Server

Important Security Note: Only grant access to directories you’re comfortable with Gemini reading and writing to. The Filesystem MCP is scoped to these specific paths for your protection.

Run the Automation

Change directory to the root of the catalog-service-node project repository and open Gemini CLI:

gemini

Paste this prompt:

Navigate to http://host.docker.internal:5173 and perform a detailed
performance and accessibility analysis:

1. Take a screenshot of the full page
2. Count how many products are displayed
3. Open browser DevTools and check:
– Console for any JavaScript errors
– Network tab: how many HTTP requests are made?
– Performance: how long does the page take to load?
4. Identify performance issues:
– Are all products loading at once with no pagination?
– Are images optimized?
– Any unnecessary network requests?
5. Check for accessibility issues:
– Missing alt text on images
– Color contrast problems
– Vague button labels

Create a GitHub issue titled "Product catalog performance and
accessibility issues" with:
– Screenshots attached
– List of specific problems found
– Severity: Medium
– Labels: performance, accessibility, enhancement

Watch Gemini Work

Here’s what happens behind the scenes as Gemini orchestrates multiple MCP servers:

Step 1: Browser Navigation

The web server is running. I'll navigate to http://host.docker.internal:5173, take a screenshot, and then proceed with the rest of the performance analysis.

Gemini launches a real Chromium browser in the Docker container and navigates to your application.

Step 2: Console Analysis

✦ I can see 15 products listed on the page: * Vintage Camera * Rare Vinyl Record – LAST ONE! * Professional DSLR Camera * Test Product 4-15

Step 3: Performance Check

✦ Checking browser console and network requests… Opening DevTools to analyze performance metrics…

Step 4: Issue Creation

✦ Creating GitHub issue with findings…

create_issue (MCP_DOCKER MCP Server) {"labels":["performance","accessibility","enhancement"],"body":"### Summary…"}

✦ I have successfully created the issue. You can find it at: https://github.com/ajeetraina/catalog-service-node/issues/1

The Complete GitHub Issue

Gemini automatically created this comprehensive issue with structured information:

Workflow Comparison: Before vs After MCP

Step

Before MCP (Manual Testing)

Time

After MCP (Gemini CLI + MCP)

Time

1

Open browser manually

1 min

Paste prompt in Gemini CLI

5 sec

2

Navigate to application

30 sec

Gemini analyzes automatically

25 sec

3

Click through pages

3 min

GitHub issue created

auto

4

Open DevTools manually

30 sec

Screenshots attached

auto

5

Take screenshots

2 min

✓ Complete

6

Check browser console

1 min

7

Analyze network requests

2 min

8

Document findings

3 min

9

Write detailed bug report

5 min

10

Create GitHub issue

2 min

Summary

Total

~ 20 minutes per test

30 sec per test

Time saved per test: 19.5 minutes (97% faster!)

Impact over time:

Per day (5 tests): 97 minutes saved → 1.6 hours

Per week (25 tests): 8 hours saved → 1 full workday

Per sprint (50 tests): 16 hours saved → 2 full workdays

Per year (1,000 tests): 325 hours saved → 40 workdays

Wrapping Up

You’ve just witnessed how Docker MCP Toolkit transforms Gemini CLI from a chat assistant into a complete browser testing and performance analysis platform. What used to require opening browsers, clicking through flows, analyzing DevTools, documenting bugs, and creating issues manually now happens in one 30-second conversation.

The combination of Gemini CLI and Docker MCP Toolkit represents a paradigm shift in AI-assisted development. By leveraging terminal-native tools and containerized services, you get:

Unmatched flexibility in tool selection

Superior performance with minimal overhead

Future-proof architecture that scales with your needs

This setup isn’t just about convenience — it’s about building a development environment that adapts to your workflow rather than forcing you to adapt to it. The developer productivity revolution is here. The question isn’t whether you’ll adopt AI-assisted development — it’s whether you’ll lead with the best tools available or play catch-up later.

Ready to try it? Enable Docker MCP Toolkit in Docker Desktop and start building your own Gemini-powered development workflow today.

Learn more

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Open Docker Desktop and get started with the MCP Toolkit (Requires version 4.48 or newer to launch the MCP Toolkit automatically)

Explore our guide on adding MCP Servers to Claude Code with the Docker MCP Toolkit

Check out our MCP Horror Stories series to see common MCP security pitfalls and how you can avoid them.

Quelle: https://blog.docker.com/feed/

Docker Hardened Images: crafted by humans, protected by AI

At Docker, we are building our hardened images to exacting standards. That means carefully crafting by hand, because humans are still the best security architects. That said, we understand the value of AI and deploy it as an extra set of eyes at critical junctures in our Docker Hardened Image (DHI) build process. With this dual approach, our users get better coverage and more secure products. Humans aren’t perfect, but AI can help them become better. Here’s a recent example of how the AI tools we use for DHI caught a bug, made a product more secure, and contributed back to the community.

How the AI guardrail stepped in

Our upstream release automation opened a routine version bump for nginx-exporter. As part of our release process, the DHI AI guardrail fetched the upstream diffs that the bump would pull in and scanned them with language-aware checks. It spotted a classic logic inversion in the exporter’s new proxy-protocol path and immediately blocked our PR from auto-merging. I reproduced the issue, confirmed the diagnosis, and submitted a small fix upstream. Because the guardrail held the line, customers never saw the bug.

That sequence matters. A normal dependency update would have sailed through and shipped a regression. Instead, Docker’s AI guardrail acted like a release bouncer at the door of a bar. The AI guardrail looked at what was about to enter, recognized a risk pattern, and stopped it from coming in and causing trouble. A human Docker engineer then read the flagged issue, proved the failure and sent the fix. The fix was then accepted by the project, improving their code base. So, this fix not only improved DHI but also improved the project for everyone. 

While the fix pull request has been reviewed and approved by the upstream maintainers, our DHI build pipeline applied the patch and shipped the new, patched version to our customers. Making sure that customers never saw the regression in their environments. 

The AI assisted DHI process

Ironically, standard AI coding assistants didn’t spot the flaw. This is why having our own internal AI guardrails is so critical. They provide the extra layer of support and the specialization that is hard to get from general-purpose coding assistants.

This is how we want AI to show up in our pipeline. It is not a replacement for engineering judgment, but a force multiplier for safety. The guardrail focuses on a narrow set of high-leverage issues that can cause big problems such as inverted error checks, ignored failures, and resource mishandling or suspicious contributor activity. 

The example also shows the value of layered safeguards. We don’t just accept an upstream version and hope for the best. The AI guardrail scrutinizes what changed. Our policy treats high-confidence findings as a hard stop. Humans then verify, reproduce the behavior, and apply the smallest correct patch. Only then does the release move forward. The best security is proactive, not reactive. 

As I alluded to above, there’s an important broader open source benefit to our AI guardrails. DHI depends on hundreds of community projects, some of which are downloaded millions or even billions of times per month. When the DHI AI guardrail surfaces an issue, our default is to fix it upstream rather than carry a private patch. That keeps our images clean, reduces long-term maintenance, and gives every downstream user a better baseline. It also helps the upstream projects, which benefit from our AI anomaly detection, our human judgment, and our subsequent code fixes.

So back to the title. DHI is crafted by engineers, but it is protected by AI. Our guardrail is an active and continuously improving AI that reads upstream diffs with context, recognizes risky patterns, scores confidence, and blocks merges that don’t pass muster. Well-applied AI helps the human work better with faster reviews, tighter patches, and fewer regressions.

This is the partnership we’re optimizing for. Humans set intent, exercise design judgment, and ship features. AI enforces discipline at critical gates. Best of all, every catch feeds the model signals for future scans, so protections improve as our ecosystem evolves. Collectively, this adds to the security of the entire open source ecosystem. 

It’s a win for Docker, for our customers, and the community.
Quelle: https://blog.docker.com/feed/

Join Us in Rebooting the Docker Model Runner Community!

We’re thrilled to announce that we’re breathing new life into the Docker Model Runner community, and we want you to be a part of it! Our goal is to make it easier than ever for you to contribute, collaborate, and help shape the future of running AI models with Docker.

From a Limited Beta to a Universe of Possibilities

When we first announced Docker Model Runner, it was in its beta phase, exclusively available on Docker Desktop and limited to Apple and Nvidia hardware. We received a ton of valuable feedback, and we’ve been hard at work making it more accessible and powerful.

Today, we’re proud to say that Docker Model Runner is now Generally Available (GA) and can be used in all versions of Docker! But that’s not all. We’ve added Vulkan support, which means you can now run your models on virtually any GPU. This is a huge leap forward, and it’s all thanks to the incredible potential we see in this project and the community that surrounds it.

Making Contributions a Breeze

We’ve listened to your feedback about the contribution process, and we’ve made some significant changes to make it as smooth as possible.

To start, we’ve consolidated all the repositories into a single, unified home. This makes it much easier to find everything you need in one place.

We have also invested a lot of effort in updating our documentation for contributors. Whether you’re a seasoned open-source veteran or a first-time contributor, you’ll find the information you need to get started.

Your Mission, Should You Choose to Accept It

The success of Docker Model Runner depends on you, our amazing community. We’re calling on you to help us make this project the best it can be. Here’s how you can get involved:

Star our repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Fork and Contribute: Have an idea for a new feature or a bug fix? Fork the repository, make your changes, and submit a pull request. We’re excited to see what you come up with!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Docker Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/

Build a Multi-Agent System in 5 Minutes with cagent

Models are advancing quickly. GPT-5, Claude Sonnet, Gemini. Each release gives us more capabilities. But most real work isn’t solved by a single model.

Developers are realizing they need a system of agents: different types of agents working together to accomplish more complex tasks. For example, a researcher to find information, a writer to summarize, a planner to coordinate, and a reviewer to check accuracy.

The challenge is that today, building a multi-agent system is harder than it should be. Context doesn’t flow cleanly between agents. Tools require custom integration. Sharing with a teammate means sending instructions and hoping they can re-create your setup.

That’s the problem cagent solves.

In this blog, we’ll walk you through the basics, how to create a multi-agent AI system in minutes, and how cagent makes this possible. 

What’s a multi-agent system?

A multi-agent system is a coordinated group of AI agents that collaborate to complete complex tasks. Using cagent, you can build and run these systems declaratively, no complex wiring or reconfiguration needed.

Meet cagent: The best (and open source) way to build multi-agent systems

Figure 1: cagent workflow for multi-agent orchestration. 

cagent is an open-source tool for building agents and a part of Docker’s growing ecosystem of AI tools. 

Instead of writing glue code to wire up models, tools, and workflows, describe an agent (or a team of agents) in a single YAML file:

Which model the agent uses (OpenAI, Anthropic, Gemini, or a local one)

What its role or instructions are

Which tools it can use (like GitHub, search, or the filesystem)

And, if needed, which sub-agents it delegates to

This turns agents into portable, reproducible artifacts you can run anywhere and share with anyone. 

Multi-agent challenges that cagent is solving

Create, run, and share multi-agent AI systems more easily with cagent.

Orchestrate agents (and sub-agents) more easily – Define roles and delegation (sub-agents). cagent manages calls and context.

Let agents use tools with guardrails – Grant capabilities with MCP: search, GitHub, files, databases. Each agent gets only the tools you list and is auditable.

Use (and swap) models – OpenAI, Anthropic, Gemini, or local models through Docker Model Runner. Swap providers without rewriting your system.

Treat agents like artifacts – Package, version, and share agents like containers.

How to build a multi-agent system with Docker cagent

Here’s what that looks like in practice.

Step 1: Define your multi-agent system

version: "2"

agents:
root:
model: anthropic/claude-sonnet-4-0
instruction: |
Break down a user request.
Ask the researcher to gather facts, then pass them to the writer.
sub_agents: ["researcher", "writer"]

researcher:
model: openai/gpt-5-mini
description: Agent to research and gather information.
instruction: Collect sources and return bullet points with links.
toolsets:
– type: mcp
ref: docker:duckduckgo

writer:
model: dmr/ai/qwen3
description: Agent to summarize notes.

instruction: Write a concise, clear summary from the researcher’s notes.

Step 2: Run the YAML file

cagent run team.yaml

The coordinator delegates, the researcher gathers, and the writer drafts. You now have a functioning team of agents.

Step 3: Share it on Docker Hub

cagent push ./team.yaml org/research-writer

Now, anyone on your team can run the exact same setup with:

cagent run docker.io/org/research-writer

That’s a full multi-agent workflow, built and shared in under 5 minutes.

First principles: Why cagent works

These principles keep cagent an easy-to-use and customizable multi-agent runtime to orchestrate AI agents.

Declarative > imperative. Multi-agent systems are mostly wiring: roles, tools, and topology. YAML keeps that wiring declarative, making it easy to define, read, and review.

Agents as artifacts. Agents become portable artifacts you can pull, pin, and trust.

Small surface area. A thin runtime that does one job well: coordinate agents.

What developers are building with cagent

Developers are already exploring different multi-agent use cases with cagent. Here are some examples:

1. PR and issue triaging

Collector reads PRs/issues, labels, failing checks

Writer drafts comments or changelogs

Coordinator enforces rules, routes edge cases

2. Research summarizing

Researcher finds and cites sources

Writer produces a clean summary

Reviewer checks for hallucinations and tone

3. Knowledge routing

Router classifies requests

KB agent queries internal docs

Redactor strips PII before escalation

Each one starts the same way: a YAML file and an idea. And they can be pushed to a registry and run by anyone.

Get started

cagent gives you the fastest path forward to build multi-agent systems. It’s open-source, easy to use, and built for the way developers already work. Define your agents, run them locally, and share them, all in a few lines of YAML.

YAML in, agents out.

Run the following to get started:

brew install cagent
cagent new
cagent run agent.yaml

Learn more

Get the technical details from our cagent documentation. 

We’d love to hear what you think. Join us in the Docker Community Slack. 

Dive into more topics about AI and Docker. 

Subscribe to the Docker Navigator Newsletter.

Quelle: https://blog.docker.com/feed/

100% Transparency and Five Pillars

How to Do Hardened Images (and Container Security) Right

Container security is understandably a hot topic these days, with more and more workloads running atop this mainstay of the cloud native landscape. While I might be biased because I work at Docker, it is safe to say that containers are the dominant form factor for running applications today. Equally important, the next generation of applications focused on AI are already running on containers. Because the world runs on containers, getting container security right is of paramount importance.

I am sad to say that most organizations who claim to be delivering container security are not. Particularly troubling are the growing ranks of hardened image providers who claim to be providing highly secure containers but are missing important components of what makes a container secure. Granted, we have a strong opinion on container security. We run the world’s largest repository and infrastructure for container hosting and management. And to be clear, our company’s future fate depends on the continued perception that containers are secure. So we have real skin in this game. 

The Essential Elements of Container Security

All of this being said, as the lead security engineer at Docker, and someone with a very long history with containers, I want to lay down our vision for container security. That vision is actually uncomplicated. There are five essential ingredients of maximum container security and hardened images. Those ingredients are:

Minimal Attack Surface: A proper hardened image only includes absolutely necessary software in the container. This means stripping out the majority of libraries, agents, and modules that may deliver useful functionality but are put into software distributions by default and add both complexity and CVE exposure. Our hardening process on average eliminates over 98% of the CVE exposure of a container. 

A 100% Complete Software Bills of Materials. This is the baseline and must be 100% complete (as per CISA guidance) with no minimum depth. provides accurate inventory including direct dependencies, transitive dependencies, and explicit relationships. SBOMs must be fully verifiable back to source, through open standards like SPDX or CycloneDX, standard component identifiers like PURLs, and honest gap disclosure.

Verifiable Build Provenance establishes chain of custody from source code to deployed artifact. SLSA Build Level 3 provenance provides non-falsifiable attestations about what was built, where, and by what process. If you don’t know how or where it was built and who built it, you can’t be sure it’s not tainted.

Standardized Exploitability Assessment clarifies which vulnerabilities affect specific deployment contexts. OpenVEX provides machine-readable statements about vulnerability status, enabling auditors and compliance tools to process assessments independently and properly leverage SBOMs. VEX statement transparency and interoperability make container security viable and allow teams to focus only on real risks.

Cryptographic Verification proves authenticity and integrity. Modern approaches like Sigstore and Cosign enable signing with public verification, allowing anyone to verify signatures without proprietary infrastructure. The signature and provenance chain must be transparent and easy to produce or query.

100% Transparency to Bind These Pillars Together. All of the above five elements must be transparent, not just in what they produce but in how they produce attestations, evidence, and any data or statements. This means using public sources for vulnerability intelligence (National Vulnerability Database or NVD, distribution security feeds, language ecosystem advisories, GitHub Security Advisories) with visible synchronization cadence. When CVEs listed in the KEV (Known Exploited Vulnerabilities) catalog  appear, transparency ensures alignment without negotiation. This means making the CVE selection process and criteria public and allowing users to see the process. This means making the SBOM creation process transparent so users can understand how the manifests are built. Ultimately, radical transparency transforms security from a trust exercise into a verification process where you can prove your posture, auditors can validate your evidence, and customers can independently assess your claims.

Of course container security also extends into the container runtimes to execute containers with highest security standards as well as continuous observability and enforcement of organizational policies across the entire container lifecycle. I’ll cover Docker’s activities in this area in a later post.

Why You Need to Verify All Vendor Claims on “Hardened Images”

For enterprises looking to better secure containers, I want to be very, very clear. Any “hardened” container image that cannot meet these requirements is a lie. Unfortunately, a number of hardened image vendors cannot meet these requirements. Here are some of the problems we have seen with competitors’ hardened images that our users and customers have brought us for comparison:

SBOMs that don’t pass the sniff test: A Node server with no npm packages is an oxymoron. Yet, that’s what we saw. Did they rewrite Node.js to remove any need for npm? I don’t think so. This means they left key elements from their SBOMs.

SBOMs missing transitive dependencies: CISA guidance is clear. Every SBOM must contain 100% of all dependencies. Not including them may be convenient because it hand waves the problem of securing those dependencies. But it’s not right.

Proprietary and opaque CVE designation: A vendor doesn’t get to decide whether a CVE is relevant and what its severity level is. That’s what public, transparent CVE feeds are for. Any vendor that won’t reveal their exact methodology or process for CVE assessment and provide it, on demand, is hiding something.

Incomplete SLSA Build claims: SLSA Build Level 3 is a binary. You either are or you are not meeting the requirements. Calling a build “transitional” is the same as checking the “no” box.

Why We’re Flipping the Table (and Resetting Expectations) on Container Security

It’s not news to say that supply chain attacks on the open source ecosystem are out of control. The smartest Black Hat minds in the world at the most advanced APTs are laser-focused on compromising supply chains because these are among the best ways to compromise entire ecosystems. Supply chain attacks can expose a huge swath of organizations to critical breaches leading to data exfiltration, ransomware and extortion, and espionage. Because we sit at a central position in the container ecosystem, we are also exposed any time the container supply chain is compromised. 

That’s why I’m writing this post. Docker has designed our hardened images explicitly to deliver on all five of the core pillars while also providing 100% transparency into process, inputs and outputs. I want to make it very easy for any platform, team, security team, CISO, or even CEO or business leader to be able to ask the right questions to determine whether their container security posture is valid, and whether the hardened images they are buying are actually delivering on their promise. (As a side note, container security is so important that we also think hardened images should be affordable to all. That’s why we’re now offering them at extremely reasonable prices, making them accessible even to two-person startups.) 

Container security is not hard. Container security is not rocket science. Container security is about radical transparency, honesty, and doing right for your users. In a perfect world, everyone would be doing container security the right way, and every organization would have easy access to rock-solid containers that are properly hardened by default and completely transparent. 

In this perfect world, Docker as a company is better off, the users are better off, the enterprises are better off, and the world is better off. Frankly, our competitors are also better off and their products are better. That’s a good thing. This is more than a sales pitch or an engineering rant. I guess you can call it a mission. Making the technology world safer is of fundamental importance and that’s the outcome we seek.
Quelle: https://blog.docker.com/feed/

Docker Model Runner on the new NVIDIA DGX Spark: a new paradigm for developing AI locally

We’re thrilled to bring NVIDIA DGX™ Spark support to Docker Model Runner. The new NVIDIA DGX Spark delivers incredible performance, and Docker Model Runner makes it accessible. With Model Runner, you can easily run and iterate on larger models right on your local machine, using the same intuitive Docker experience you already trust.

In this post, we’ll show how DGX Spark and Docker Model Runner work together to make local model development faster and simpler, covering the unboxing experience, how to set up Model Runner, and how to use it in real-world developer workflows.

What is NVIDIA DGX Spark

NVIDIA DGX Spark is the newest member of the DGX family: a compact, workstation-class AI system, powered by the Grace Blackwell GB10 Superchip  that delivers incredible  performance for local model development. Designed for researchers and developers, it makes prototyping, fine-tuning, and serving large models fast and effortless, all without relying on the cloud.

Here at Docker, we were fortunate to get a preproduction version of  DGX Spark. And yes, it’s every bit as impressive in person as it looks in NVIDIA’s launch materials.

Why Run Local AI Models and How Docker Model Runner and NVIDIA DGX Spark Make It Easy 

Many of us at Docker and across the broader developer community are experimenting with local AI models. Running locally has clear advantages:

Data privacy and control: no external API calls; everything stays on your machine

Offline availability: work from anywhere, even when you’re disconnected

 Ease of customization: experiment with prompts, adapters, or fine-tuned variants without relying on remote infrastructure

But there are also familiar tradeoffs:

Local GPUs and memory can be limiting for large models

Setting up CUDA, runtimes, and dependencies often eats time

Managing security and isolation for AI workloads can be complex

This is where DGX Spark and Docker Model Runner (DMR) shine. DMR provides an easy and secure way to run AI models in a sandboxed environment, fully integrated with Docker Desktop or Docker Engine. When combined with the DGX Spark’s NVIDIA AI software stack and large 128GB unified memory, you get the best of both worlds: plug-and-play GPU acceleration and Docker-level simplicity.

Unboxing NVIDIA DGX Spark

The device arrived well-packaged, sleek, and surprisingly small, resembling more a mini-workstation than a server.

Setup was refreshingly straightforward: plug in power, network, and peripherals, then boot into NVIDIA DGX OS, which includes NVIDIA drivers, CUDA, and AI software stack pre-installed.

Once on the network, enabling SSH access makes it easy to integrate the Spark into your existing workflow.

This way, the DGX Spark becomes an AI co-processor for your everyday development environment, augmenting, not replacing, your primary machine.

Getting Started with Docker Model Runner on NVIDIA DGX Spark

Installing Docker Model Runner on the DGX Spark is simple and can be done in a matter of minutes.

1. Verify Docker CE is Installed

DGX OS comes with Docker Engine (CE) preinstalled. Confirm you have it:

docker version

If it’s missing or outdated, install following the regular Ubuntu installation instructions.

2. Install the Docker Model CLI Plugin

The Model Runner CLI is distributed as a Debian package via Docker’s apt repository. Once the repository is configured (see linked instructions above) install via the following commands:

sudo apt-get update
sudo apt-get apt-get install docker-model-plugin

Or use Docker’s handy installation script:

curl -fsSL https://get.docker.com | sudo bash

You can confirm it’s installed with:

docker model version

3. Pull and Run a Model

Now that the plugin is installed, let’s pull a model from the Docker Hub AI Catalog. For example, the Qwen 3 Coder model:

docker model pull ai/qwen3-coder

The Model Runner container will automatically expose an OpenAI-compatible endpoint at:

http://localhost:12434/engines/v1

You can verify it’s live with a quick test:

# Test via API

curl http://localhost:12434/engines/v1/chat/completions   -H 'Content-Type: application/json'   -d
'{"model":"ai/qwen3-coder","messages":[{"role":"user","content":"Hello!"}]}'

# Or via CLI
docker model run ai/qwen3-coder

GPUs are allocated to the Model Runner container via nvidia-container-runtime and the Model Runner will take advantage of any available GPUs automatically. To see GPU usage:

nvidia-smi

4. Architecture Overview

Here’s what’s happening under the hood:

[ DGX Spark Hardware (GPU + Grace CPU) ]

             │

     (NVIDIA Container Runtime)

             │

     [ Docker Engine (CE) ]

             │

     [ Docker Model Runner Container ]

             │

     OpenAI-compatible API :12434

The NVIDIA Container Runtime bridges the NVIDIA GB10 Grace Blackwell Superchip drivers and Docker Engine, so containers can access CUDA directly. Docker Model Runner then runs inside its own container, managing the model lifecycle and providing the standard OpenAI API endpoint. (For more info on Model Runner architecture, see this blog).

From a developer’s perspective, interact with models similarly to any other Dockerized service — docker model pull, list, inspect, and run all work out of the box.

Using Local Models in Your Daily Workflows

If you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

1. Forward the DMR Port (for Model Access)

To access the DGX Spark via SSH first set up an SSH server:

Using Local Models in Your Daily WorkflowsIf you’re using a laptop or desktop as your primary machine, the DGX Spark can act as your remote model host. With a few SSH tunnels, you can both access the Model Runner API and monitor GPU utilization via the DGX dashboard, all from your local workstation.

sudo apt install openssh-server
sudo systemctl enable –now ssh

Run the following command to access Model Runner via your local machine. Replace user with the username you configured when you first booted the DGX Spark and replace dgx-spark.local with the IP address of the DGX Spark on your local network or a hostname configured in /etc/hosts. 

ssh -N -L localhost:12435:localhost:12434 user@dgx-spark.local

This forwards the Model Runner API from the DGX Spark to your local machine.Now, in your IDE, CLI tool, or app that expects an OpenAI-compatible API, just point it to:

http://localhost:12435/engines/v1/models

Set the model name (e.g. ai/qwen3-coder) and you’re ready to use local inference seamlessly.

2. Forward the DGX Dashboard Port (for Monitoring)

The DGX Spark exposes a lightweight browser dashboard showing real-time GPU, memory, and thermal stats, usually served locally at:

http://localhost:11000

You can forward it through the same SSH session or a separate one:

ssh -N -L localhost:11000:localhost:11000 user@dgx-spark.local

Then open http://localhost:11000 in your browser on your main workstation to monitor the DGX Spark performance while running your models.

This combination makes the DGX Spark feel like a remote, GPU-powered extension of your development environment. Your IDE or tools still live on your laptop, while model execution and resource-heavy workloads happen securely on the Spark.

Example application: Configuring Opencode with Qwen3-Coder

Let’s make this concrete.

Suppose you use OpenCode, an open-source, terminal-based AI coding agent.

Once your DGX Spark is running Docker Model Runner with ai/qwen3-coder pulled and the port is forwarded, you can configure OpenCode by adding the following to ~/.config/opencode/opencode.json

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "dmr": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "Docker Model Runner",
      "options": {
        "baseURL": "http://localhost:12435/engines/v1"   // DMR’s OpenAI-compatible base
      },
      "models": {
        "ai/qwen3-coder": { "name": "Qwen3 Coder" }
      }
    }
  },
  "model": "ai/qwen3-coder"
}

Now run opencode and select Qwen3 Coder with the /models command:

That’s it! Completions and chat requests will be routed through Docker Model Runner on your DGX Spark, meaning Qwen3-Coder now powers your agentic development experience locally.

You can verify that the model is running by opening http://localhost:11000 (the DGX dashboard) to watch GPU utilization in real time while coding.This setup lets you:

Keep your laptop light while leveraging the DGX Spark GPUs

Experiment with custom or fine-tuned models through DMR

Stay fully within your local environment for privacy and cost-control

Summary

Running Docker Model Runner on the NVIDIA DGX Spark makes it remarkably easy to turn powerful local hardware into a seamless extension of your everyday Docker workflow.You install one plugin and use familiar Docker commands (docker model pull, docker model run).You get full GPU acceleration through NVIDIA’s container runtime.You can forward both the model API and monitoring dashboard to your main workstation for effortless development and visibility.This setup bridges the gap between developer productivity and AI infrastructure, giving you the speed, privacy, and flexibility of local execution with the reliability and simplicity Docker provides.As local model workloads continue to grow, the DGX Spark + Docker Model Runner combo represents a practical, developer-friendly way to bring serious AI compute to your desk — no data center or cloud dependency required.Learn more:

Read the official announcement of DGX Spark launch on NVIDIA newsroom

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo. Docker Model Runner is open-source, and we welcome collaboration and contributions from the community! Star, fork and contribute.

Quelle: https://blog.docker.com/feed/

How to Add MCP Servers to Claude Code with Docker MCP Toolkit

AI coding assistants have evolved from simple autocomplete tools into full development partners. Yet even the best of them, like Claude Code, can’t act directly on your environment. Claude Code can suggest a database query, but can’t run it. It can draft a GitHub issue, but can’t create it. It can write a Slack message, but can’t send it. You’re still copying, pasting, and context-switching between tools.

That’s where Model Context Protocol (MCP) and Docker MCP Toolkit come in. MCP connects Claude Code to your real tools, databases, repositories, browsers, and APIs, while Docker MCP Toolkit makes setup effortless and secure. We recently added Claude Code as a client that you can easily enable with one click in Docker Desktop.

In this guide, you’ll learn how to:

Set up Claude Code and connect it to Docker MCP Toolkit.

Configure the Atlassian MCP server for Jira integration.  

Configure the GitHub MCP server to access repository history and run git commands.

Configure the Filesystem MCP server to scan and read your local codebase.

Automate tech debt tracking by converting 15 TODO comments into tracked Jira tickets.

See how Claude Code can query git history, categorize issues, and create tickets — all without leaving your development environment.

With more than 200 pre-built, containerized MCP servers, one-click deployment in Docker Desktop, and automatic credential handling, developers can connect Claude Code to trusted environments in minutes — not hours. No dependency issues, no manual configuration, just a consistent, secure workflow across Mac, Windows, and Linux.

Why Claude Code and Docker MCP Toolkit work better together 

While MCP provides the protocol, Docker MCP Toolkit makes it practical. Without containerization, setting up MCP servers means managing Node.js versions, Python dependencies, credentials in plaintext config files, and different configurations for every developer’s machine. The setup that should take 2 minutes takes 2-6 hours per developer.

Docker MCP Toolkit eliminates this friction:

200+ pre-built MCP servers in the catalog

One-click deployment through Docker Desktop

Secure credential management via OAuth or encrypted storage

Consistent configuration across Mac, Windows, and Linux

Automatic updates when new server versions release

We built Docker MCP Toolkit to meet developers where they are. If you’re using Claude Code, you should be able to connect it to your tools without wrestling with infrastructure.

Setting up Claude Code in Docker MCP Toolkit

Prerequisites

Install Docker Desktop 4.40 or later

Enable MCP Toolkit

Step 1. Install Claude Code

To install Claude Code, run the following command:

# Install Claude Code
curl -fsSL https://claude.ai/install.sh | sh

# Verify installation
claude –version # Should show 2.0.5+

Step 2. Connect Claude Code to Docker MCP Toolkit

Option 1: One-Click Connection (Recommended)

Open Docker Desktop

Navigate to MCP Toolkit in the sidebar

Click the Clients tab

Find “Claude Code” in the list.

Click Connect

Docker Desktop automatically configures the MCP Gateway connection.

Option 2: Manual Command Line Setup

If you prefer a command-line setup or need to configure a specific project:

Navigate to your project folder in the terminal

Run this command:

docker mcp client connect claude-code

You’ll see output like this:

=== Project-wide MCP Configurations (/your/project/path) ===
● claude-code: connected
MCP_DOCKER: Docker MCP Catalog (gateway server) (stdio)
● cursor: no mcp configured
● vscode: no mcp configured
You might have to restart 'claude-code'.

The connected status confirms Claude Code is linked to the Docker MCP Gateway.

What’s happening under the hood?

The connection command creates a .mcp.json file in your project directory:

{
"mcpServers": {
"MCP_DOCKER": {
"command": "docker",
"args": ["mcp", "gateway", "run"],
"type": "stdio"
}
}
}

This configuration tells Claude Code to use Docker’s MCP Gateway for all MCP server access. The gateway handles routing to your containerized servers.

Step 3. Restart Claude Code

# Exit Claude Code if running, then restart
claude code

Step 4. Verify the Connection

Inside Claude Code, type /mcp to see available MCP servers.

You should see the Docker MCP Gateway listed, which provides access to all enabled MCP servers. The /MCP_DOCKER tools indicate a successful connection. As you enable more MCP servers in Docker Desktop, they’ll appear here automatically.

First Run: What to Expect

When you start Claude Code for the first time after connecting to Docker MCP Toolkit, you’ll see a prompt about the new MCP server:

New MCP server found in .mcp.json: MCP_DOCKER

MCP servers may execute code or access system resources. All tool calls require approval.
Learn more in the MCP documentation (https://docs.claude.com/s/claude-code-mcp).

❯ 1. Use this and all future MCP servers in this project
2. Use this MCP server
3. Continue without using this MCP server

Enter to confirm · Esc to reject

Choose Option 1 (recommended). This configures your project to automatically use Docker MCP Toolkit and any MCP servers you enable in Docker Desktop. You won’t need to approve MCP servers individually each time.

After confirming, you’ll see the Claude Code home screen:

Claude Code v2.0.5

Welcome back!

Sonnet 4.5 · API Usage Billing
/…/your/project/path

Tips for getting started
Run /init to create a CLAUDE.md file with…
Run /terminal-setup to set up terminal in…
Use claude to help with file analysis, ed…
Be as specific as you would with another …

Recent activity
No recent activity

You’re now ready to use Claude Code with MCP servers from Docker Desktop.

Real-World Demo: TODO-to-Ticket Automation Demo

Now that you’ve connected Claude Code to Docker MCP Toolkit, let’s see it in action with a practical example. We’ll automatically convert TODO comments in a real codebase into tracked Jira tickets — complete with git history, priority categorization, and proper linking.

Configuring the required MCP Servers

For this automation, we’ll orchestrate three MCP servers:

Filesystem MCP – to scan your codebase and read source files

GitHub MCP – to run git blame and extract author information

Atlassian (Jira) MCP – to create and manage Jira issues

We’ll walk through enabling and configuring all three MCP servers. 

What makes this realistic?

Uses actual codebase (catalog-service-node) 

Extracts git blame info to identify code authors 

Categorizes by business priority using keyword analysis 

Creates properly formatted Jira issues with context 

Links back to exact file/line numbers for easy navigation

Time investment: 

Manual process: ~20-30 minutes 

Automated with Claude Code + MCP: ~2 minutes total 

Let’s walk through it step-by-step.

1. Configure the Atlassian MCP Server

In Docker Desktop → MCP Toolkit → Catalog:

Search “Atlassian”

Click + Add

Go to Configuration tab

Add your Atlassian credentials:

atlassian.jira.url: https://yourcompany.atlassian.net

atlassian.jira.username: your email

API tokens in the Secrets section

Important notes:

For Atlassian API authentication, the “username” is always your Atlassian account email address, which you use together with the API token for basic authentication

Click Start Server

As shown in the screenshot, the Atlassian MCP provides 37 tools, including:

jira_create_issue – Create Jira issues

jira_add_comment – Add comments

jira_batch_create_issues – Bulk create

And many more Jira operations

For this demonstration, I created a new JIRA project called “TODO Demo” with a project key “TD”.

2. Configure GitHub MCP Server

The GitHub MCP server supports two authentication methods. We recommend OAuth for the easiest setup.

Option A: OAuth Authentication (Recommended – Easiest)

Open Docker Desktop → MCP Toolkit → Catalog

Search for “GitHub”

Find GitHub Official and click + Add

Go to the Configuration tab

Select OAuth as the authentication method

Click the “authorize with the GitHub OAuth provider” link

You’ll be redirected to GitHub to authorize the connection

After authorization, return to Docker Desktop

Click Start Server

Advantage: No manual token creation needed. Authorization happens through GitHub’s secure OAuth flow.

Option B: Personal Access Token (PAT)

If you prefer to use a Personal Access Token or need more granular control:

Step 1: Create a GitHub Personal Access Token

Go to GitHub.com and sign in to your account

Click your profile picture in the top-right corner

Select “Settings”

Scroll down to “Developer settings” in the left sidebar

Click on “Personal access tokens” → “Tokens (classic)”

Click “Generate new token” → “Generate new token (classic)”

Give your token a descriptive name like “Docker MCP GitHub Access”

Select the following scopes (permissions):

repo (Full control of private repositories)

workflow (if you need workflow actions)

read:org (if you need organization access)

Click “Generate token” and copy the token immediately (you won’t see it again!)

Step 2: Configure in Docker Desktop

In Docker Desktop → MCP Toolkit → Catalog

Find GitHub Official and click + Add

Go to the Configuration tab

Select Personal Access Token as the authentication method

Paste your token in the provided field

Click Start Server

Or via CLI:

docker mcp secret set GITHUB.PERSONAL_ACCESS_TOKEN=github_pat_YOUR_TOKEN_HERE

This gives users the flexibility to choose the method that works best for their workflow, with OAuth being the simpler path for most users.

3. Configure Filesystem MCP Server

The Filesystem MCP server allows Claude Code to read files from your local system. You need to specify which directories it can access.

Step 1: Enable Filesystem MCP Server

Open Docker Desktop → MCP Toolkit → Catalog

Search for “Filesystem”

Find Filesystem (Reference) and click + Add

Step 2: Configure Allowed Paths

Go to the Configuration tab

Under filesystem.paths, add the directories Claude Code should access

For this demo, add your project directory (e.g., /Users/your_username/ or your project path)

You can add multiple paths by clicking the + button

Click Save

Click Start Server

Important: Only grant access to directories you’re comfortable with Claude Code reading. The Filesystem MCP server is scoped to these specific paths for security.

Run the Automation

Clone the repository

git clone https://github.com/ajeetraina/catalog-service-node

Change directory to catalog-service-node and run the following command:

claude code

Paste this instruction into Claude Code:

Scan this codebase for all TODO and FIXME comments.
For each one:
1. Extract the comment and surrounding code context (5 lines before/after)
2. Use git blame to identify who wrote it and when
3. Determine priority based on keywords:
– High: "race condition", "data loss", "security", "failure", "crash"
– Medium: "performance", "slow", "cache", "optimization"
– Low: "documentation", "refactor", "cleanup"
4. Create a Jira issue with:
– Project: TD
– Issue Type: Task
– Summary: Extract from the TODO/FIXME comment
– Description: Include code context and explanation
– Priority: Based on categorization above (use additional_fields: {"priority": {"name": "High"}})
– Labels: ["tech-debt"]
– Add a comment with link to exact GitHub file and line number

Provide a summary showing:
– Total TODOs/FIXMEs found
– Breakdown by priority (High/Medium/Low)
– List of created Jira issues with links

Watch Claude Code Work

Claude Code will orchestrate multiple MCP servers to complete this task:

Filesystem MCP to scan your src/ directory

GitHub MCP to run git blame and get author info

Atlassian MCP to create Jira issues in your TODO Demo project

Here’s what the output looks like:

I'll scan the codebase for TODO and FIXME comments, analyze them, and create Jira issues for tracking.

⏺ Search(pattern: "(TODO|FIXME|XXX|HACK):", output_mode: "content")
⎿ Found 22 lines (ctrl+o to expand)

⏺ Now let me read each file to get the full context around the TODO/FIXME comments (excluding git hooks):

⏺ Read(src/services/StorageService.js)
⎿ Read 70 lines (ctrl+o to expand)

⏺ Read(src/services/InventoryService.js)
⎿ Read 55 lines (ctrl+o to expand)

⏺ Read(src/services/ProductService.js)
⎿ Read 117 lines (ctrl+o to expand)

⏺ Read(src/services/PublisherService.js)
⎿ Read 67 lines (ctrl+o to expand)

⏺ Read(src/index.js)
⎿ Read 100 lines (ctrl+o to expand)

Next, Claude Code runs git blame to identify who created each TODO:

⏺ Now let me run git blame on each file to identify authors and dates:

⏺ Bash(git blame -L 1,15 src/services/StorageService.js)
⎿ 4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 1) // FIXME: No error handling for S3 upload failures
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 2) // Failed uploads leave products without images
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 3) // Need to rollback product creation or retry upload
… +12 lines (ctrl+o to expand)

⏺ Bash(git blame -L 1,15 src/services/InventoryService.js)
⎿ 4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 1) // FIXME: No retry logic for inventory service failures
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 2) // When WireMock/external service is down, requests fail immediately
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 3) // Need exponential backoff retry mechanism
… +12 lines (ctrl+o to expand)

⏺ Bash(git blame -L 1,13 src/services/ProductService.js)
⎿ 4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 1)
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 2) // TODO: Add pagination support for product listing
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 3) // Currently returns all products – will cause performance issues with large
… +11 lines (ctrl+o to expand)

⏺ Bash(git blame -L 1,16 src/services/PublisherService.js)
⎿ 4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 1) // FIXME: Kafka publish failures are silently ignored
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 2) // If Kafka is unavailable, product updates aren't tracked
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 3) // Need dead letter queue or retry mechanism
… +13 lines (ctrl+o to expand)

⏺ Bash(git blame -L 1,12 src/index.js)
⎿ 4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 1) // TODO: Add request rate limiting
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 2) // API has no throttling, vulnerable to abuse
4bdb4454 (Ajeet Raina 2025-10-03 20:08:01 +0530 3) // Need per-IP rate limiting
… +9 lines (ctrl+o to expand)

⏺ Now let me categorize by priority and create Jira issues. First, let me get the GitHub remote URL to include in the Jira issues:

⏺ Bash(git remote get-url origin)
⎿ https://github.com/ajeetraina/catalog-service-node

Claude Code then categorizes issues by priority and creates Jira tickets:

Now I'll create Jira issues for all the TODO/FIXME comments. Based on my analysis:

HIGH Priority (contains keywords: race condition, data loss, security, failure, crash):
1. Inventory sync race condition
2. S3 upload failures (data loss potential)
3. Kafka publish failures (data loss)
4. Kafka acknowledgment handling (data loss)
5. Error responses expose internal details (security)
6. Inventory service failures

MEDIUM Priority (performance, slow, cache, optimization):
1. Cache inventory data (performance)
2. Add pagination support (performance)
3. Image resizing/optimization (performance)
4. Batch publishing (performance)

LOW Priority (documentation, refactor, cleanup):
1. LocalStack configuration not documented
2. Product deletion doesn't clean up S3 images (cleanup)
3. No validation for duplicate SKUs
4. Request rate limiting
5. Health check endpoint
6. Event schema versioning
7. Multiple images per product
8. Search functionality
9. Inventory low-stock alerts

In under 2 minutes, Claude Code has:

Scanned the entire codebase

Identified 15+ TODO/FIXME comments

Retrieved git history for each one

Categorized them by business priority

Created properly formatted Jira issues with full context

Linked each issue back to the exact file and line number

What would have taken 20-30 minutes of manual work is now automated and consistent. This Jira automation is just one example. Here’s how MCP transforms other common development workflows:

Workflow Comparison: Before vs. After MCP

Task

Before MCP (Manual)

After MCP (Automated)

Time Saved

Debug Checkout Failures

1. Ask Claude for SQL query

2. Copy query to database client

3. Run query and copy results

4. Paste results back to Claude

5. Get analysis

6. Ask Claude to draft GitHub issue

7. Manually create issue in GitHub

8. Notify team in Slack

You: “Why are checkouts failing? Investigate and create a GitHub issue.” 

Claude Code: Queries production database, finds 23% payment timeouts, identifies root cause as connection pool exhaustion, creates GitHub issue #1847, posts to #backend-alerts on Slack. Result: Done. Issue link provided.

~15 min → ~2 min

Investigate Performance Issue

1. Check multiple monitoring dashboards

2. Export slow query logs

3. Analyze locally

4. Document findings in Google Doc

5. Create Jira ticket manually

6. Add links between doc and ticket

7. Notify team

You: “Why are API response times spiking?”

Claude Code:

• Queries slow_queries log

• Finds: 127 queries taking >2s

• Identifies missing index on users.email

• Creates migration file

• Opens PR with fix

• Posts summary to #backend

~20 min → ~3 min

Code Review for Security

1. Review PR manually

2. Run security scanner separately

3. Document findings in notepad

4. Post review comments one by one

5. Create tracking ticket for issues

6. Update security dashboard

You: “Review PR #234 for security issues”

Claude Code:

• Analyzes 12 changed files

• Finds hardcoded API key (config.js:47)

• Finds: SQL injection risk in query build

• Posts inline review comments with fixes

• Creates security ticket SEC-445

• Updates security tracking board

~25 min → ~4 min

From 15 minutes of context switching to 2-3 minutes of continuous flow.

Conclusion

You’ve just seen how Docker MCP Toolkit transforms Claude Code from an isolated coding assistant into an integrated development partner. What used to require copying between terminals, web browsers, and documentation now happens in one continuous flow. 

Next steps:

Explore the 220+ MCP servers in the Docker MCP catalog

Connect Claude Code to your databases, APIs, and tools

Share your setup with your team for consistent workflows

The future of development isn’t about switching between tools — it’s about tools that work together seamlessly. Docker MCP Toolkit makes that future available today.

Learn more

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Download Docker Desktop to get started with the MCP Toolkit: Run MCP servers easily and securely

Check out our MCP Horror Stories series to see common MCP security pitfalls and how you can avoid them

Quelle: https://blog.docker.com/feed/

LoRA Explained: Faster, More Efficient Fine-Tuning with Docker

Fine-tuning a language model doesn’t have to be daunting. In our previous post on fine-tuning models with Docker Offload and Unsloth, we walked through how to train small, local models efficiently using Docker’s familiar workflows. This time, we’re narrowing the focus.

Instead of asking a model to be good at everything, we can specialize it: teaching it a narrow but valuable skill, like consistently masking personally identifiable information (PII) in text. Thanks to techniques like LoRA (Low-Rank Adaptation), this process is not only feasible on modest resources, it’s fast and efficient.

Even better, with Docker’s ecosystem the entire fine-tuning pipeline: training, packaging, and sharing, becomes approachable. You don’t need a bespoke ML setup or a research lab workstation. You can iterate quickly, keep your workflow portable, and publish results for others to try with the same Docker commands you already know.

In this post, I’ll walk through a hands-on fine-tuning experiment: adapting the Gemma 3 270M model into a compact assistant capable of reliably masking PII.

What’s Low-Rank Adaptation (LoRA)?

Fine-tuning starts with a pre-trained model, one that has already learned the general structure and patterns of language.

Instead of training it from scratch (which would consume massive amounts of compute and risk catastrophic forgetting, where the model loses its prior knowledge), we can use a more efficient method called LoRA (Low-Rank Adaptation).

LoRA allows us to teach the model new tasks or behaviors without overwriting what it already knows, by adding small, trainable adapter layers while keeping the base model frozen.

How does LoRA work?

At a high level, LoRA works like this:

Freeze the base model: The model’s original weights (its core knowledge of language) remain unchanged.

Add adapter layers: Small, trainable “side modules” are inserted into specific parts of the model. These adapters learn only the new behavior or skill you want to teach.

Train efficiently: During fine-tuning, only the adapter parameters are updated. The rest of the model stays static, which dramatically reduces compute and memory requirements.

LoRA experiment: Fine-tune Gemma 3 270M to mask PII

For this experiment, the model already knows how to read, write, and follow instructions. Our job is simply to teach it the specific pattern we care about, for example:

“Given some text, replace PII with standardized placeholders while leaving everything else untouched.”

The fine-tuning process consists of four steps:

Prepare the dataset

Prepare LoRA adapter

Train the model

Export the resulting model

Figure 1: Four steps of fine-tuning with LoRA

In this example, we use Supervised Fine-Tuning (SFT): each training example pairs raw text containing PII with its correctly redacted version. Over many such examples, the model internalizes the pattern and learns to generalize the redaction rules.

The quality of the dataset is critical, the cleaner and more representative your dataset, the better your fine-tuned model will perform.

Before we dive into the steps, it’s crucial to understand Chat Templates.

Understanding Chat Templates

When you send a request like below to Gemma 3 270M, the model doesn’t see this JSON structure directly.

"messages": [
{
"role": "user",
"content": "Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021"
}
]

Instead, the input is transformed into a chat-formatted prompt with special tokens:

<start_of_turn>user Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, ' ' and punctuation exactly. Return ONLY the redacted text. Text: This is an example of text that contains some data. The author of this text is Ignacio López Luna, but everybody calls him Ignasi. His ID number is 123456789. He has a son named Arnau López, who was born on 21-07-2021<end_of_turn>

Notice how the message has been rewrapped and extra tokens like <start_of_turn> and <end_of_turn> have been inserted. These tokens are part of the model’s chat template, the standardized structure it expects at inference time.

Different models use different templates. For example, Gemma uses <start_of_turn> markers, while other models might rely on <bos> or others.

This is exactly why the first step is “Prepare the dataset.” When fine-tuning, you must format your training data with the same chat template that the model will use during inference. This alignment ensures the fine-tuned model is robust, because it has been trained on data that looks exactly like what it will encounter in production.

Prepare the dataset: Teaching through examples

The dataset is the bridge between general-purpose language ability and task-specific expertise. Each example is a demonstration of what we want the model to do: a prompt with raw text containing PII, and a response showing the redacted version.

In the script this is how the original Dataset is formatted using the Chat Template of the model (see the apply_chat_template function):

max_seq_length = 2048
model, tokenizer = FastModel.from_pretrained(
model_name="unsloth/gemma-3-270m-it",
max_seq_length=max_seq_length,
load_in_4bit=False,
load_in_8bit=False,
full_finetuning=False,
)

with open("pii_redaction_train.json", "r", encoding="utf-8") as f:
data = json.load(f)

ds = Dataset.from_list(data)

def to_text(ex):
resp = ex["response"]
if not isinstance(resp, str):
resp = json.dumps(resp, ensure_ascii=False)
msgs = [
{"role": "user", "content": ex["prompt"]},
{"role": "assistant", "content": resp},
]
return {
"text": tokenizer.apply_chat_template(
msgs, tokenize=False, add_generation_prompt=False
)
}

dataset = ds.map(to_text, remove_columns=ds.column_names)

You can print some of the pairs to see how it looks like via:

for i in range(3):
print(dataset[i]["text"])
print("=" * 80)

An example of a dataset entry:

<bos><start_of_turn>user
Mask all PII in the following text. Replace each entity with the exact UPPERCASE label in square brackets (e.g., [PERSON], [EMAIL], [PHONE], [USERNAME], [ADDRESS], [CREDIT_CARD], [TIME], etc.). Preserve all non-PII text, whitespace, and punctuation exactly. Return ONLY the redacted text.

Text:
<p>My child faozzsd379223 (DOB: May/58) will undergo treatment with Dr. faozzsd379223, office at Hill Road. Our ZIP code is 28170-6392. Consult policy M.UE.227995. Contact number: 0070.606.322.6244. Handle transactions with 6225427220412963. Queries? Email: faozzsd379223@outlook.com.</p><end_of_turn>
<start_of_turn>model
<p>My child [USERNAME_2] (DOB: [DATEOFBIRTH_1]) will undergo treatment with Dr. [USERNAME_1], office at [STREET_1]. Our ZIP code is [ZIPCODE_1]. Consult policy M.UE.227995. Contact number: [TELEPHONENUM_1]. Handle transactions with [CREDITCARDNUMBER_1]. Queries? Email: [EMAIL_1].</p><end_of_turn>

Prepare LoRA adapter: Standing on the shoulders of a base model

Instead of starting from a blank slate, we begin with Gemma-3 270M-IT, a small but capable instruction-tuned model. By loading both the weights and the tokenizer, we get not just a model that understands text, but also the exact rules it uses to split and reconstruct sentences.

Fine-tuning isn’t reinventing language, it’s layering task-specific expertise on top of a foundation that already knows how to read and write.

For that, we’ll use the LoRA technique. 

Why we use LoRA

Training a large language model from scratch is extremely costly, because it means adjusting billions of parameters.

But the good news is: you usually don’t need to change everything to teach the model a new skill.

That’s where LoRA comes in. Instead of re-training the entire model, LoRA adds a few small, extra components, like “add-ons.” When we fine-tune the model, we only adjust these add-ons, while the main model stays the same.

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05
)

model = get_peft_model(base_model, lora_config)

These few lines tell the model: keep your parameters frozen, but learn through a small set of low-rank adapters. That’s why fine-tuning is efficient and affordable.

Train the model: Fine-tuning in practice

With the dataset ready and LoRA adapters in place, the actual training looks like classic supervised learning.

Feed in the input (a user prompt).

Compare the model’s output with the expected response.

Adjust the adapter weights to minimize the difference.

model = model,
tokenizer = tokenizer,
train_dataset = dataset,
eval_dataset = None, # Can set up evaluation!
args = SFTConfig(
dataset_text_field = "text",
per_device_train_batch_size = 8,
gradient_accumulation_steps = 1, # Use GA to mimic batch size!
warmup_steps = 5,
num_train_epochs = 1, # Set this for 1 full training run.
# max_steps = 100,
learning_rate = 5e-5, # Reduce to 2e-5 for long training runs
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir="outputs",
report_to = "none", # Use this for WandB etc
),
)

trainer_stats = trainer.train()

Over many iterations, the model internalizes the rules of PII masking, learning not only to replace emails with [EMAIL] but also to preserve punctuation, whitespace, and all non-PII content exactly as instructed.

What’s important here is that fine-tuning doesn’t overwrite the model’s general capabilities. The model still knows how to generate coherent text, we’re just biasing it toward one more skill.

Export the resulting model: Merging weights

Once training finishes, we have a base model plus a set of LoRA adapters. That’s useful for experimentation, but for deployment we often prefer a single consolidated model.

By merging the adapters back into the base weights, we produce a standalone checkpoint that behaves just like the original model, except it now has PII masking expertise built in.

model.save_pretrained_merged("result", tokenizer, save_method = "merged_16bit")

Try and share your model

After fine-tuning, the next natural step is to try your model in action and, if it works well, share it with others. With Docker Model Runner, you can package your fine-tuned model, push it to Docker Hub, and make it instantly runnable anywhere. No messy setup, no GPU-specific headaches, just a familiar Docker workflow for distributing and testing AI models.

So once your adapters are trained and merged, don’t stop there: run it, publish it, and let others try it too. In the previous post, I showed how easy it is to do that step-by-step.

Fine-tuning makes your model specialized, but Docker makes it accessible and shareable. Together, they turn small local models from curiosities into practical tools ready to be used, and reused, by the community.

We’re building this together!

Docker Model Runner is a community-friendly project at its core, and its future is shaped by contributors like you. If you find this tool useful, please head over to our GitHub repository. Show your support by giving us a star, fork the project to experiment with your own ideas, and contribute. Whether it’s improving documentation, fixing a bug, or a new feature, every contribution helps. Let’s build the future of model deployment together!

Learn more

Learn how to fine-tune local models with Docker Offload and Unsloth

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/