Clawdbot with Docker Model Runner, a Private Personal AI Assistant

Personal AI assistants are transforming how we manage our daily lives—from handling emails and calendars to automating smart homes. However, as these assistants gain more access to our private data, concerns about privacy, data residency, and long-term costs are at an all-time high.

By combining Clawdbot with Docker Model Runner (DMR), you can build a high-performance, agentic personal assistant while keeping full control over your data, infrastructure, and spending.

This post walks through how to configure Clawdbot to utilize Docker Model Runner, enabling a privacy-first approach to personal intelligence.

What Are Clawdbot and Docker Model Runner?

Clawdbot is a self-hosted AI assistant designed to live where you already are. Unlike browser-bound bots, Clawdbot integrates directly with messaging apps like Telegram, WhatsApp, Discord, and Signal. It acts as a proactive digital coworker capable of executing real-world actions across your devices and services.

Docker Model Runner (DMR) is Docker’s native solution for running and managing large language models (LLMs) as OCI artifacts. It exposes an OpenAI-compatible API, allowing it to serve as the private “brain” for any tool that supports standard AI endpoints.

Together, they create a unified assistant that can browse the web, manage your files, and respond to your messages without ever sending your sensitive data to a third-party cloud.

Benefits of the Clawdbot + DMR Stack

Privacy by Design

In a “Privacy-First” setup, your assistant’s memory, message history, and files stay on your hardware. Docker Model Runner isolates model inference, meaning:

No third-party training: Your personal emails and schedules aren’t used to train future commercial models.

Sandboxed execution: Models run in isolated environments, protecting your host system.

Data Sovereignty: You decide exactly which “Skills” (web browsing, file access) the assistant can use.

Cost Control and Scaling

Cloud-based agents often become expensive when they use “long-term memory” or “proactive searching,” which consume massive amounts of tokens. With Docker Model Runner, inference runs on your own GPU/CPU. Once a model is pulled, there are no per-token fees. You can let Clawdbot summarize thousands of unread emails or research complex topics for hours without worrying about a surprise API bill at the end of the month.

Configuring Clawdbot with Docker Model Runner

Modifying the Clawdbot Configuration

Clawdbot uses a flexible configuration system to define which models and providers drive its reasoning. While the onboarding wizard (clawdbot onboard) is the standard setup path, you can manually point Clawdbot to your private Docker infrastructure.

You can define your provider configuration in:

Global configuration: ~/.config/clawdbot/config.json

Workspace-specific configuration: clawdbot.json in your active workspace root.

Using Clawdbot with Docker Model Runner

To bridge the two, update your configuration to point to the DMR server. Assuming Docker Model Runner is running at its default address: http://localhost:12434/v1.

Your config.json should be updated as follows:

{
"models": {
"providers": {
"dmr": {
"baseUrl": "http://localhost:12434/v1",
"apiKey": "dmr-local",
"api": "openai-completions",
"models": [
{
"id": "gpt-oss:128K",
"name": "gpt-oss (128K context window)",
"contextWindow": 128000,
"maxTokens": 128000
},
{
"id": "glm-4.7-flash:128K",
"name": "glm-4.7-flash (128K context window)",
"contextWindow": 128000,
"maxTokens": 128000
}
]
}
}
},
"agents": {
"defaults": {
"model": {
"primary": "dmr/gpt-oss:128K"
}
}
}
}

This configuration tells Clawdbot to bypass external APIs and route all “thinking” to your private models.

Note for Docker Desktop Users:Ensure TCP access is enabled so Clawdbot can communicate with the runner. Run the following command in your terminal:docker desktop enable model-runner –tcp

Recommended Models for Personal Assistants

While coding models focus on logic, personal assistant models need a balance of instruction-following, tool-use capability, and long-term memory.

Model

Best For

DMR Pull Command

gpt-oss

Complex reasoning & scheduling

docker model pull gpt-oss

glm-4.7-flash

Fast coding assistance and debugging

docker model pull glm-4.7-flash

qwen3-coder

Agentic coding workflows

docker model pull qwem3-coder

Pulling models from the ecosystem

DMR can pull models directly from Hugging Face and convert them into OCI artifacts automatically:

docker model pull huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF

Context Length and “Soul”

For a personal assistant, context length is critical. Clawdbot relies on a SOUL.md file (which defines its personality) and a Memory Vault (which stores your preferences).

If a model’s default context is too small, it will “forget” your instructions mid-conversation. You can use DMR to repackage a model with a larger context window:

docker model package –from llama3.3 –context-size 128000 llama-personal:128k

Once packaged, reference llama-personal:128k in your Clawdbot config to ensure your assistant always remembers the full history of your requests.

Putting Clawdbot to Work: Running Scheduled Tasks 

With Clawdbot and DMR running, you can move beyond simple chat. Let’s set up a “Morning Briefing” task.

Verify the Model: docker model ls (Ensure your model is active).

Initialize the Soul: Run clawdbot init-soul to define how the assistant should talk to you.

Assign a Task:“Clawdbot, every morning at 8:00 AM, check my unread emails, summarize the top 3 priorities, and message me the summary on Telegram.”

Because Clawdbot is connected to your private Docker Model Runner, it can parse those emails and reason about your schedule privately. No data leaves your machine; you simply receive a helpful notification on your phone via your chosen messaging app.

How You Can Get Involved

The Clawdbot and Docker Model Runner ecosystems are growing rapidly. Here’s how you can help:

Share Model Artifacts: Push your optimized OCI model packages to Docker Hub for others to use.

Join the Community: Visit the Docker Model Runner GitHub repo.

Quelle: https://blog.docker.com/feed/

Run Claude Code Locally with Docker Model Runner

We recently showed how to pair OpenCode with Docker Model Runner for a privacy-first, cost-effective AI coding setup. Today, we’re bringing the same approach to Claude Code, Anthropic’s agentic coding tool.

This post walks through how to configure Claude Code to use Docker Model Runner, giving you full control over your data, infrastructure, and spend.

Figure 1: Using local models like gpt-oss to power Claude Code

What Is Claude Code?

Claude Code is Anthropic’s command-line tool for agentic coding. It lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflows through natural language commands.

Docker Model Runner (DMR) allows you to run and manage large language models locally. It exposes an Anthropic-compatible API, making it straightforward to integrate with tools like Claude Code.

Install Claude Code

Install Claude Code:macOS / Linux:

curl -fsSL https://claude.ai/install.sh | bash

Windows PowerShell:

irm https://claude.ai/install.ps1 | iex

Using Claude Code with Docker Model Runner

Claude Code supports custom API endpoints through the ANTHROPIC_BASE_URL environment variable. Since Docker Model Runner exposes an Anthropic-compatible API, integrating the two is simple.

Note for Docker Desktop users:If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled:

docker desktop enable model-runner –tcp

Once enabled, Docker Model Runner will be accessible at http://localhost:12434.

Increasing Context Size

For coding tasks, context length matters. While models like glm-4.7-flash, qwen3-coder and devstral-small-2 come with 128K context by default, gpt-oss defaults to 4,096 tokens.

Docker Model Runner makes it easy to repackage any model with an increased context size:

docker model pull gpt-oss
docker model package –from gpt-oss –context-size 32000 gpt-oss:32k

Once packaged, use it with Claude Code:

ANTHROPIC_BASE_URL=http://localhost:12434 claude –model gpt-oss:32k

ANTHROPIC_BASE_URL=http://localhost:12434 claude –model gpt-oss "Describe this repo."

That’s it. Claude Code will now send all requests to your local Docker Model Runner instance.

Run Claude Code locally with gpt-oss using Docker Model Runner

Here’s what it looks like in action:

Figure 2: Claude Code powered by Docker Model Runner keeps everything local and in your  control

$ docker model pull gpt-oss

$ docker model package –from gpt-oss –context-size 32000 gpt-oss:32k

$ ANTHROPIC_BASE_URL=http://localhost:12434 claude –model gpt-oss:32k "Describe this repo."

▐▛███▜▌ Claude Code v2.1.12
▝▜█████▛▘ gpt-oss:32k · API Usage Billing
▘▘ ▝▝ ~/workspace/model-runner

❯ Describe this repo.

Read(README.md)
⎿ Read 423 lines

Docker Model Runner (DMR)
– A Go-based tool that lets developers pull, run, and serve AI models
directly from Docker Hub or any OCI-compatible registry.
– Consists of two main components:
a. model-runner – a server/daemon that manages model lifecycle,
inference, and metrics.
b. model-cli – a CLI plugin (docker model) that communicates with
the server, listing models, pulling new ones, and executing inference.

– Supports multiple inference backends: llama.cpp, vLLM, and others.
– Built with Docker Desktop in mind, but also works with Docker Engine
on Linux.
– Provides a REST API for model management and inference, plus a
/metrics endpoint for Prometheus-style monitoring.

Claude Code reads your repository, reasons about its structure, and provides an accurate summary, all while keeping your code entirely on your local machine.

Monitor the requests sent by Claude Code

Want to see exactly what Claude Code sends to Docker Model Runner? Use the docker model requests command:

docker model requests –model gpt-oss:32k | jq .

Figure 3: Monitor requests sent by Claude Code to the LLM

This outputs the raw requests, which is useful for understanding how Claude Code communicates with the model and debugging any compatibility issues.

Making It PersistentFor convenience, set the environment variable in your shell profile:

# Add to ~/.bashrc, ~/.zshrc, or equivalent
export ANTHROPIC_BASE_URL=http://localhost:12434

Then simply run:

claude –model gpt-oss:32k "Describe this repo."

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. To get involved:

Star the repository: Show your support by starring the Docker Model Runner repo.

Contribute your ideas: Create an issue or submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends and colleagues who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn More

Read the companion post: OpenCode with Docker Model Runner for Private AI Coding

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo

Get started with a simple hello GenAI application

Learn more about Claude Code from Anthropic’s documentation

Quelle: https://blog.docker.com/feed/

Making the Most of Your Docker Hardened Images Enterprise Trial – Part 3

Customizing Docker Hardened Images

In Part 1 and Part 2, we established the baseline. You migrated a service to a Docker Hardened Image (DHI), witnessed the vulnerability count drop to zero, and verified the cryptographic signatures and SLSA provenance that make DHI a compliant foundation.

But no matter how secure a base image is, it is useless if you can’t run your application on it. This brings us to the most common question engineers ask during a DHI trial: what if I need a custom image?

Hardened images are minimal by design. They lack package managers (apt, apk, yum), utilities (wget, curl), and even shells like bash or sh. This is a security feature: if a bad actor breaks into your container, they find an empty toolbox.

However, developers often need these tools during setup. You might need to install a monitoring agent, a custom CA certificate, or a specific library.

In this final part of our series, we will cover the two strategies for customizing DHI: the Docker Hub UI (for platform teams creating “Golden Images”) and the multi-stage build pattern (for developers building applications).

Option 1: The Golden Image (Docker Hub UI)

If you are a Platform or DevOps Engineer, your goal is likely to provide a “blessed” base image for your internal teams. For example, you might want a standard Node.js image that always includes your corporate root CA certificate and your security logging agent.The Docker Hub UI is the preferred path for this. The strongest argument for using the Hub UI is maintenance automation.

The Killer Feature: Automatic Rebuilds

When you customize an image via the UI, Docker understands the relationship between your custom layers and the hardened base. If Docker releases a patch for the underlying DHI base image (e.g., a fix in glibc or openssl), Docker Hub automatically rebuilds your custom image.

You don’t need to trigger a CI pipeline. You don’t need to monitor CVE feeds. The platform handles the patching and rebuilding, ensuring your “Golden Image” is always compliant with the latest security standards.

How It Works

Since you have an Organization setup for this trial, you can explore this directly in Docker Hub.First, navigate to Repositories in your organization dashboard. Locate the image you want to customize (e.g., dhi-node), then the Customizations tab and click the “Create customization” action. This initiates a customization workflow as follows:

In the “Add packages” section, you can search for and select OS packages directly from the distribution’s repository. For example, here we are adding bash to the image for debugging purposes. You can also add “OCI Artifacts” to inject custom files like certificates or agents.

Finally, configure the runtime settings (User, Environment Variables) and review your build. Docker Hub will verify the configuration and queue the build. Once complete, this image will be available in your organization’s private registry and will automatically rebuild whenever the base DHI image is updated.

This option is best suited for creating standardized “golden” base images that are used across the entire organization. The primary advantage is zero-maintenance security patching due to automatic rebuilds by Docker Hub. However, it is less flexible for rapid, application-specific iteration by individual development teams.

Option 2: Multi-Stage Build

If you are an developper, you likely define your environment in a Dockerfile that lives alongside your code. You need flexibility, and you need it to work locally on your machine.

Since DHI images don’t have apt-get or curl, you cannot simply RUN apt-get install my-lib in your Dockerfile. It will fail.

Instead, we use the multi-stage build pattern. The concept is simple:

Stage 1 (Builder): Use a standard “fat” image (like debian:bookworm-slim) to download, compile, and prepare your dependencies.

Stage 2 (Runtime): Copy only the resulting artifacts into the pristine DHI base.

This keeps your final image minimal, non-root, and secure, while still allowing you to install whatever you need.

Hands-on Tutorial: Adding a Monitoring Agent

Let’s try this locally. We will simulate a common real-world scenario: adding the Datadog APM library (dd-trace) globally to a Node.js DHI image.

1. Setup

Create a new directory for this test and add a simple server.js file. This script attempts to load the dd-trace library to verify our installation.

app/server.js

// Simple Express server to demonstrate DHI customization
console.log('Node.js version:', process.version);
try {
require('dd-trace');
console.log('dd-trace module loaded successfully!');
} catch (e) {
console.error('Failed to load dd-trace:', e.message);
process.exit(1);
}
console.log('Running as UID:', process.getuid(), 'GID:', process.getgid());
console.log('DHI customization test successful!');

2. Hardened Dockerfile

Now, create the Dockerfile. We will use a standard Debian image to install the library, and then copy it to our DHI Node.js image. Create a new directory for this test and add a simple server.js file. This script attempts to load the dd-trace library to verify our installation.

# Stage 1: Builder – a standard Debian Slim image that has apt, curl, and full shell access.
FROM debian:bookworm-slim AS builder

# Install Node.js (matching our target version) and tools
RUN apt-get update &&
apt-get install -y curl &&
curl -fsSL https://deb.nodesource.com/setup_24.x | bash – &&
apt-get install -y nodejs

# Install Datadog APM agent globally (we force the install prefix to /usr/local so we know exactly where files go)
RUN npm config set prefix /usr/local &&
npm install -g dd-trace@5.0.0

# Stage 2: Runtime – we switch to the Docker Hardened Image.
FROM <your-org-namespace>/dhi-node:24.11-debian13-fips

# Copy only the required library from the builder stage
COPY –from=builder /usr/local/lib/node_modules/dd-trace /usr/local/lib/node_modules/dd-trace

# Environment Configuration
# DHI images are strict. We must explicitly tell Node where to find global modules.
ENV NODE_PATH=/usr/local/lib/node_modules

# Copy application code
COPY app/ /app/

WORKDIR /app

# DHI Best Practice: Use the exec form (["node", …])
# because there is no shell to process strings.
CMD ["node", "server.js"]

3. Build and Run

Build the custom image:

docker build -t dhi-monitoring-test .

Now run it. If successful, the container should start, find the library, and exit cleanly.

docker run –rm dhi-monitoring-test

Output:

Node.js version: v24.11.0
dd-trace module loaded successfully!
Running as UID: 1000 GID: 1000
DHI customization test successful!

Success! We have a working application with a custom global library, running on a hardened, non-root base.

Security Check

We successfully customized the image. But did we compromise its security?

This is the most critical lesson of operationalizing DHI: hardened base images protect the OS, but they do not protect you from the code you add.Let’s verify our new image with Docker Scout.

docker scout cves dhi-monitoring-test –only-severity critical,high

Sample Output:

✗ Detected 1 vulnerable package with 1 vulnerability

0C 1H 0M 0L lodash.pick 4.4.0
pkg:npm/lodash.pick@4.4.0

✗ HIGH CVE-2020-8203 [Improperly Controlled Modification of Object Prototype Attributes]

This result is accurate and important. The base image (OS, OpenSSL, Node.js runtime) is still secure. However, the dd-trace library we just installed pulled in a dependency (lodash.pick) that contains a High severity vulnerability.

This proves that your verification pipeline works.

If we hadn’t scanned the custom image, we might have assumed we were safe because we used a “Hardened Image.” By using Docker Scout on the final artifact, we caught a supply chain vulnerability introduced by our customization.

Let’s check how much “bloat” we added compared to the clean base.

docker scout compare –to <your-org-namespace>/dhi-node:24.11-debian13-fips dhi-monitoring-test

You will see that the only added size corresponds to the dd-trace library (~5MB) and our application code. We didn’t accidentally inherit apt, curl, or the build caches from the builder stage. The attack surface remains minimized.

A Note on Provenance: Who Signs What?

In Part 2, we verified the SLSA Provenance and cryptographic signatures of Docker Hardened Images. This is crucial for establishing a trusted supply chain. When you customize an image, the question of who “owns” the signature becomes important.

Docker Hub UI Customization: When you customize an image through the Docker Hub UI, Docker itself acts as the builder for your custom image. This means the resulting customized image inherits signed provenance and attestations directly from Docker’s build infrastructure. If the base DHI receives a security patch, Docker automatically rebuilds and re-signs your custom image, ensuring continuous trust. This is a significant advantage for platform teams creating “golden images.”

Local Dockerfile: When you build a custom image using a multi-stage Dockerfile locally (as we did in our tutorial), you are the builder. Your docker build command produces a new image with a new digest. Consequently, the original DHI signature from Docker does not apply to your final custom image (because the bits have changed and you are the new builder).However, the chain of trust is not entirely broken:

Base Layers: The underlying DHI layers within your custom image still retain their original Docker attestations.

Custom Layer: Your organization is now the “builder” of the new layers.

For production deployments using the multi-stage build, you should integrate Cosign or Docker Content Trust into your CI/CD pipeline to sign your custom images. This closes the loop, allowing you to enforce policies like: “Only run images built by MyOrg, which are based on verified DHI images and have our internal signature.”

Measuring Your ROI: Questions for Your Team

As you conclude your Docker Hardened Images trial, it’s critical to quantify the value for your organization. Reflect on the concrete results from your migration and customization efforts using these questions:

Vulnerability Reduction: How significantly did DHI impact your CVE counts? Compare the “before and after” vulnerability reports for your migrated services. What is the estimated security risk reduction?

Engineering Effort: What was the actual engineering effort required to migrate an image to DHI? Consider the time saved on patching, vulnerability triage, and security reviews compared to managing traditional base images.

Workflow: How well does DHI integrate into your team’s existing development and CI/CD workflows? Do developers find the customization patterns (Golden Image / Builder Pattern) practical and efficient? Is your team likely to adopt this long-term?

Compliance & Audit: Has DHI simplified your compliance reporting or audit processes due to its SLSA provenance and FIPS compliance? What is the impact on your regulatory burden?

Conclusion

Thanks for following through to the end! Over this 3-part blog series, you have moved from a simple trial to a fully operational workflow:

Migration: You replaced a standard base image with DHI and saw immediate vulnerability reduction.

Verification: You independently validated signatures, FIPS compliance, and SBOMs.

Customization: You learned to extend DHI using the Hub UI (for auto-patching) or multi-stage builds, while checking for new vulnerabilities introduced by your own dependencies.

The lesson here is that the “Hardened” in Docker Hardened Images isn’t a magic shield but a clean foundation. By building on top of it, you ensure that your team spends time securing your application code, rather than fighting a never-ending battle against thousands of upstream vulnerabilities.
Quelle: https://blog.docker.com/feed/

Making the Most of Your Docker Hardened Images Enterprise Trial – Part 2

Verifying Security and Compliance of Docker Hardened Images

In Part 1 of this series, we migrated a Node.js service to Docker Hardened Images (DHI) and measured impressive results: 100% vulnerability elimination, 90% package reduction, and 41.5% size decrease. We extracted the SBOM and saw compliance labels for FIPS, STIG, and CIS.

The numbers look compelling. But how do you verify these claims independently?

Security tools earn trust through verification, not promises. When evaluating a security product for production, you need cryptographic proof. This is especially true for images that form the foundation of every container you deploy.This post walks through the verification process: signature validation, provenance analysis, compliance evidence examination, and SBOM analysis. We’ll focus on practical verification you can run during your trial, with links to the official DHI documentation for deeper technical details. By the end, you’ll have independently confirmed DHI’s security posture and built confidence for a production scenario.

Understanding Security Attestations available with Docker Hardened Images

Before diving into verification, you need to understand what you’re verifying.

Docker Hardened Images include attestations: cryptographically-signed metadata about the image’s build process, contents, and compliance posture. These are signed statements that can be independently verified.

Important: If you’ve pulled the image locally, you need to use the registry:// prefix when working with attestations. This tells Docker Scout to look for attestations in the registry, not just the local image cache.

List all attestations for your hardened image:

docker scout attestation list registry://<your-org-namespace>/dhi-node:24.11-debian13-fips

This shows 16 different attestation types:

https://slsa.dev/provenance/v0.2 SLSA provenance
https://docker.com/dhi/fips/v0.1 FIPS compliance
https://docker.com/dhi/stig/v0.1 STIG scan
https://cyclonedx.org/bom/v1.6 CycloneDX SBOM
https://spdx.dev/Document SPDX SBOM
https://scout.docker.com/vulnerabilities Scout vulnerabilities
https://scout.docker.com/secrets/v0.1 Scout secret scan
https://scout.docker.com/virus/v0.1 Scout virus/malware
https://scout.docker.com/tests/v0.1 Scout test report
https://openvex.dev/ns/v0.2.0 OpenVEX

Each attestation is a JSON document describing a specific aspect of the image. The most critical attestations for verification:

SLSA provenance: Build source, builder identity, and build process details

SBOM: Complete software bill of materials

FIPS compliance: Evidence of FIPS 140-3 certified cryptographic modules

STIG scan: Security Technical Implementation Guide compliance results

Vulnerability scan: CVE assessment

VEX report: CVE exploitability

These attestations follow the in-toto specification, an open framework for supply chain security. Each attestation includes:

Subject: What the attestation describes (the container image)

Predicate: The actual claims (FIPS certified, STIG compliant, etc.)

Signature: Cryptographic signature from the builder

Let’s see how you can verify the signatures yourself.

Verifying Attestations with Docker Scout

The attestations we’re about to examine are cryptographically signed by Docker’s build infrastructure. Docker Scout provides a simple, integrated approach that handles DHI attestations natively and without the hassle of managing public keys or certificate chains.To validate an attestation, simply append the –verify flag, which provides explicit validation feedback. This process relies on cryptographic hashing: the digest is a hash of the attestation content, so even a single character change completely alters the hash. Moreover, the attestation’s signature is cryptographically bound to the specific image digest it describes, guaranteeing that the metadata you’re verifying corresponds exactly to the image you have and preventing substitution attacks.

Retrieving an Attestation

To extract a specific attestation (like SLSA provenance), use the attestation get command with the full predicate type URI:

docker scout attestation get registry://<your-org-namespace>/dhi-node:24.11-debian13-fips
–predicate-type https://slsa.dev/provenance/v0.2
–output provenance.json

Success looks like this:

✓ SBOM obtained from attestation, 32 packages found
✓ Provenance obtained from attestation
✓ Report written to provenance.json

The checkmarks confirm Docker Scout successfully retrieved and verified the attestation. Behind the scenes, Scout validated:

The attestation signature matches Docker’s signing key

The signature hasn’t expired

The attestation applies to this specific image digest

The attestation hasn’t been tampered with

If signature verification fails, Scout returns an error and won’t output the attestation file.To learn more about available predicate types, check out the DHI verification documentation.

Validating SLSA Provenance

Signatures prove attestations are authentic. Provenance shows where the image came from.

SLSA (Supply-chain Levels for Software Artifacts) is a security framework developed by Google, the Linux Foundation, and other industry partners. It defines levels of supply chain security maturity, from SLSA 0 (no guarantees) to SLSA 4 (highest assurance).

Docker Hardened Images target SLSA 3, which requires:

Build process fully scripted/automated

All build steps defined in version control

Provenance generated automatically by build service

Provenance includes source, builder, and build parameters

Using our previously extracted SLSA provenance.json, we can check the source repository and commit hash:

jq '.predicate.invocation.environment.github_repository' provenance.json

Output:

"docker-hardened-images/definitions"
jq '.predicate.invocation.environment.github_sha1' provenance.json

Output:

"698b367344efb3a7d443508782de331a84216ae4"

Similarly, you can see exactly what GitHub Actions workflow produced this image.

jq '.predicate.builder.id' provenance.json

Output:

“https://github.com/docker-hardened-images/definitions/actions/runs/18930640220/attempts/1”

For DHI Enterprise Users: Verifying High-Assurance Claims

While the free hardened images are built with security best practices, DHI Enterprise images carry the specific certifications required for FedRAMP, HIPAA, and financial audits. Here is how to verify those high-assurance claims.

FIPS 140-3 Validation

FIPS (Federal Information Processing Standard) 140-3 is a U.S. government standard for cryptographic modules. Think of it as a certification that proves the cryptography in your software has been tested and validated by independent labs against federal requirements.

If you’re building software for government agencies, financial institutions, or healthcare providers, FIPS compliance is often mandatory: without it, your software can’t be used in those environments!

Check if the image includes FIPS-certified cryptography:

docker scout attestation get registry://<your-org-namespace>/dhi-node:24.11-debian13-fips
–predicate-type https://docker.com/dhi/fips/v0.1
–output fips-attestation.json

Output:

{
"certification": "CMVP #4985",
"certificationUrl": "https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/4985",
"name": "OpenSSL FIPS Provider",
"package": "pkg:dhi/openssl-provider-fips@3.1.2",
"standard": "FIPS 140-3",
"status": "active",
"sunsetDate": "2030-03-10",
"version": "3.1.2"
}

The certificate number (4985) is the key piece. This references a specific FIPS validation in the official NIST CMVP database.

STIG Compliance

STIG (Security Technical Implementation Guide) is the Department of Defense’s (DoD) checklist for securing systems. It’s a comprehensive security configuration standard needed for deploying software for defense or government work.DHI images undergo STIG scanning before release. Docker uses a custom STIG based on the DoD’s General Operating System Security Requirements Guide. Each scan checks dozens of security controls and reports findings. You can extract and review STIG scan results:

docker scout attestation get registry://<your-org-namespace>/dhi-node:24.11-debian13-fips
–predicate-type https://docker.com/dhi/stig/v0.1
–output stig-attestation.json

Check the STIG scan summary:

jq '.predicate[0].summary' stig-attestation.json

Output:

{
"failedChecks": 0,
"passedChecks": 91,
"notApplicableChecks": 107,
"totalChecks": 198,
"defaultScore": 100,
"flatScore": 91
}

This shows DHI passed all 91 applicable STIG controls with zero failed checks and a 100% score. The 107 “notApplicableChecks” typically refer to controls that are irrelevant to the specific minimal container environment or its configuration. For a complete list of STIG controls and DHI compliance details, including how to extract and view the full STIG scan report, see the DHI STIG documentation.

CIS Benchmark Hardening

CIS (Center for Internet Security) Benchmarks are security configuration standards created by security professionals across industries. Much like STIGs, they represent consensus best practices, but unlike government-mandated frameworks (FIPS, STIG), CIS benchmarks are community-developed.

CIS compliance isn’t legally required, but it demonstrates you’re following industry-standard security practices—valuable for customer trust and audit preparation.

You can verify CIS compliance through image labels:

docker inspect <your-org-namespace>/dhi-node:24.11-debian13-fips |
jq '.[0].Config.Labels["com.docker.dhi.compliance"]'

Output: “fips,stig,cis”

The CIS label indicates that an image is hardened according to the CIS Docker Benchmark.

What exactly is a SBOM used for?

Compliance frameworks tell you what standards you meet. The SBOM tells you what’s actually in your container—and that’s where the real security work begins.

Identifying Transitive Dependencies

When you add a package to your project, you see the direct dependency. What you don’t see: that package’s dependencies, and their dependencies, and so on. This is the transitive dependency problem.

A vulnerability in a transitive dependency you’ve never heard of can compromise your entire application. Real example: the Log4Shell vulnerability affected millions of applications because Log4j was a transitive dependency buried several levels deep in dependency chains.

Most vulnerabilities hide in transitive dependencies because:

Developers don’t know they exist

They’re not updated when the direct dependency updates

Scanning tools miss them without an SBOM

Minimal images reduce this risk dramatically. Fewer packages = fewer transitive dependencies = smaller attack surface.

Compare dependency counts:

Official Node.js image: 321 packages, ~1,500 dependency relationships

DHI Node.js image: 32 packages, ~150 dependency relationships

90% reduction in packages means 90% reduction in transitive dependency risk.

Scanning for Known (Exploitable) Vulnerabilities

With the SBOM extracted, scan for known vulnerabilities:

docker scout cves registry://<your-org-namespace>/dhi-node:24.11-debian13-fips

Output:

Target: <your-org-namespace>/dhi-node:24.11-debian13-fips

0C 0H 0M 8L

8 vulnerabilities found in 2 packages
CRITICAL 0
HIGH 0
MEDIUM 0
LOW 8

Zero critical, high, or medium severity vulnerabilities. Docker Scout cross-references the SBOM against multiple vulnerability databases (NVD, GitHub Security Advisories, etc.).This is the payoff of minimal images: fewer packages means fewer potential vulnerabilities. The official Node.js image had 25 CVEs across CRITICAL, HIGH, and MEDIUM severities. The hardened version has zero actionable vulnerabilities—not because vulnerabilities were patched, but because vulnerable packages were removed entirely.

Understanding Exploitability with VEX

Not all CVEs are relevant to your deployment. A vulnerability in a library function your application never calls, or a flaw in a service that isn’t running, doesn’t pose real risk. Docker Hardened Images include signed VEX attestations that identify which reported CVEs are not actually exploitable in the image’s runtime context. This helps you distinguish between CVEs that exist in a package (reported), and CVEs that can actually be exploited given how the package is used in this specific image (exploitable). In other words, VEX reduces false positives.

Docker Scout applies VEX statements automatically when scanning DHI images: when you run docker scout cves, Scout uses VEX attestations to suppress vulnerabilities marked as non-exploitable.

You can see which CVEs have been evaluated with this command:

docker scout attestation get registry://<your-org-namespace>/dhi-node:24.11-debian13-fips
–predicate-type https://openvex.dev/ns/v0.2.0
–output vex.json

License Compliance Analysis

When you use open source software, you’re bound by license terms. Some licenses (MIT, Apache) are permissive and you can use them freely, even in commercial products. Others (GPL, AGPL) are copyleft: they require you to release your source code if you distribute software using them.

SBOMs make license compliance visible. Without an SBOM, you’re blind to what licenses your containers include.

Export the SBOM in SPDX format:

docker scout sbom registry://<your-org-namespace>/dhi-node:24.11-debian13-fips
–format spdx
–output node-sbom-spdx.json

Analyze license distribution:

jq '.packages[].licenseConcluded' node-sbom-spdx.json |
sort | uniq -c | sort -rn

Output:

15 "MIT"
8 "Apache-2.0"
5 "GPL-2.0-or-later"
2 "BSD-3-Clause"
1 "OpenSSL"
1 "NOASSERTION"

In this example:

✅ MIT and Apache-2.0 are permissive (safe for commercial use)

⚠️ GPL-2.0-or-later requires review (is this a runtime dependency or build tool?)

⚠️ NOASSERTION needs investigation

Conclusion: What You’ve Proven

You’ve independently verified critical security claims Docker makes about Hardened Images:

Authenticity: Cryptographic signatures prove images are genuine and unmodified

Provenance: SLSA attestations trace builds to specific source commits in public repositoriesCompliance: FIPS certificate, STIG controls passed, and CIS benchmarks met

Security posture

Every claim you verified (except CIS) has a corresponding attestation you can check yourself, audit, and validate in your CI/CD pipeline.

You can customize a Docker Hardened Image (DHI) to suit your specific needs using the Docker Hub UI. This allows you to select a base image, add packages, add OCI artifacts (such as custom certificates or additional tools), and configure settings. In addition, the build pipeline ensures that your customized image is built securely and includes attestations.

In Part 3, we’ll cover how to customize Docker Hardened Images to suit your specific needs, while keeping all the benefits we just explored.

You’ve confirmed DHI delivers on security promises. Next, we’ll make it operational.

If you missed reading part 1, where we discussed how you can get to 100% vulnerability elimination and 90% package reduction, read the blog here.

Quelle: https://blog.docker.com/feed/

How to Automate Arm Migration with Docker MCP Toolkit, VS Code, and GitHub Copilot

This post is a collaboration between Docker and Arm, demonstrating how Docker MCP Toolkit and the Arm MCP Server work together to simplify architecture migrations.

Moving workloads from x86 to Arm64 architecture has become increasingly important. Organizations seek to reduce cloud costs and improve performance. AWS Graviton, Azure Cobalt, and Google Cloud Axion have made Arm-based computing mainstream, promising 20-40% cost savings and better performance for many workloads.

But here’s the challenge: How do you migrate your applications to Arm without breaking things?

Traditional migration approaches require:

Manual code analysis for x86-specific dependencies

Tedious compatibility checks across multiple tools

Manual performance evaluation

What if you could orchestrate the entire Arm migration workflow from a single interface? Docker MCP Toolkit makes this possible. 

By connecting specialized Arm migration tools directly to GitHub Copilot, you can automate compatibility analysis, intrinsic conversion, and performance prediction—all through natural conversation in VS Code.

Here’s what that looks like in practice: You ask GitHub Copilot to migrate your legacy C++ application to Arm64. Copilot doesn’t just tell you what needs changing—it actually executes: scanning your code for x86 intrinsics, converting x86 SIMD intrinsics to Arm SIMD intrinsics, updating your Dockerfile, predicting Arm performance improvements, and creating a pull request with all changes. All through natural conversation in VS Code. No manual porting. No up-front architecture expertise required.

If you have questions about any step in the process, you can directly ask Copilot, which will invoke the Arm MCP Server knowledge base tool. The knowledge base has information pulled directly from all Learning Paths on learn.arm.com, as well as knowledge of all Arm intrinsics, and will both summarize that information for you as well as provide links to the concrete documentation that you can peruse yourself. 

Now you might ask – “Can’t I just rebuild my Docker image for Arm64?” True, for most applications. But when you hit that one legacy app with hand-optimized x86 assembly, AVX2 intrinsics, or architecture-specific compiler flags? That’s when Docker MCP Toolkit with the Arm MCP Server becomes essential.

By the end of this guide, you’ll migrate a real-world legacy application—a matrix multiplication benchmark written with AVX2 intrinsics for x86—to Arm64 automatically using GitHub Copilot and Docker MCP Toolkit.

What normally takes 5-7 hours of manual work will take you about 25 to 30 minutes.

The Arm Migration Challenge

Let me show you exactly what we’re solving. Consider a matrix multiplication benchmark originally written for x86-64 with AVX2 optimizations—the kind of code that makes Arm migration painful.

Here’s a Dockerfile that will cause problems when trying to migrate to Graviton:

FROM centos:6

# CentOS 6 reached EOL, need to use vault mirrors
RUN sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-Base.repo &&
sed -i 's|^#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Base.repo

# Install EPEL repository (required for some development tools)
RUN yum install -y epel-release &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/epel.repo &&
sed -i 's|^#baseurl=http://download.fedoraproject.org/pub/epel|baseurl=http://archives.fedoraproject.org/pub/archive/epel|g' /etc/yum.repos.d/epel.repo

# Install Developer Toolset 2 for better C++11 support (GCC 4.8)
RUN yum install -y centos-release-scl &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo &&
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo &&
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo

# Install build tools
RUN yum install -y
devtoolset-2-gcc
devtoolset-2-gcc-c++
devtoolset-2-binutils
make
&& yum clean all

WORKDIR /app
COPY *.h *.cpp ./

# AVX2 intrinsics are used in the code
RUN scl enable devtoolset-2 "g++ -O2 -mavx2 -o benchmark
main.cpp
matrix_operations.cpp
-std=c++11"

CMD ["./benchmark"]

Now you might ask why this won’t work on Arm? Looking at this Dockerfile, there are two immediate blockers for Graviton migration:

No Arm64 support in base image – The centos:6 image was built for x86 only, so this container won’t even start on Arm hardware.

x86-specific compiler flag – The -mavx2 flag tells the compiler to use AVX2 vector instructions, which don’t exist on Arm processors.

Even experienced developers miss these issues in larger codebases.

The source code uses AVX2 intrinsics for vectorized operations:

#include "matrix_operations.h"
#include <iostream>
#include <random>
#include <chrono>
#include <stdexcept>
#include <immintrin.h> // AVX2 intrinsics

Matrix::Matrix(size_t r, size_t c) : rows(r), cols(c) {
data.resize(rows, std::vector<double>(cols, 0.0));
}

void Matrix::randomize() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0.0, 10.0);

for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < cols; j++) {
data[i][j] = dis(gen);
}
}
}

Matrix Matrix::multiply(const Matrix& other) const {
if (cols != other.rows) {
throw std::runtime_error("Invalid matrix dimensions for multiplication");
}

Matrix result(rows, other.cols);

// x86-64 optimized using AVX2 for double-precision
for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < other.cols; j++) {
__m256d sum_vec = _mm256_setzero_pd();
size_t k = 0;

// Process 4 elements at a time with AVX2
for (; k + 3 < cols; k += 4) {
__m256d a_vec = _mm256_loadu_pd(&data[i][k]);
__m256d b_vec = _mm256_set_pd(
other.data[k+3][j],
other.data[k+2][j],
other.data[k+1][j],
other.data[k][j]
);
sum_vec = _mm256_add_pd(sum_vec, _mm256_mul_pd(a_vec, b_vec));
}

// Horizontal add using AVX
__m128d sum_high = _mm256_extractf128_pd(sum_vec, 1);
__m128d sum_low = _mm256_castpd256_pd128(sum_vec);
__m128d sum_128 = _mm_add_pd(sum_low, sum_high);

double sum_arr[2];
_mm_storeu_pd(sum_arr, sum_128);
double sum = sum_arr[0] + sum_arr[1];

// Handle remaining elements
for (; k < cols; k++) {
sum += data[i][k] * other.data[k][j];
}

result.data[i][j] = sum;
}
}

return result;
}

double Matrix::sum() const {
double total = 0.0;
for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < cols; j++) {
total += data[i][j];
}
}
return total;
}

void benchmark_matrix_ops() {
std::cout << "n=== Matrix Multiplication Benchmark ===" << std::endl;

const size_t size = 200;
Matrix a(size, size);
Matrix b(size, size);

a.randomize();
b.randomize();

auto start = std::chrono::high_resolution_clock::now();
Matrix c = a.multiply(b);
auto end = std::chrono::high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end – start);

std::cout << "Matrix size: " << size << "x" << size << std::endl;
std::cout << "Time: " << duration.count() << " ms" << std::endl;
std::cout << "Result sum: " << c.sum() << std::endl;
}

If you look at the following code, you might find that this code is heavily optimized for Intel/AMD x86 processors and won’t work on Arm.

x86-exclusive header – #include <immintrin.h> only exists on x86 systems. Arm uses <arm_neon.h> instead.

AVX2 intrinsics throughout – Every _mm256_* function is Intel-specific:

_mm256_setzero_pd() – Creates a 256-bit zero vector (Arm NEON is 128-bit)

_mm256_loadu_pd() – Loads 4 doubles at once (NEON loads 2)

_mm256_set_pd() – Sets 4 doubles (no direct NEON equivalent)

_mm256_add_pd() / _mm256_mul_pd() – 256-bit operations (NEON uses 128-bit)

_mm256_extractf128_pd() – Extracts high 128 bits (not needed on NEON)

Vector width mismatch – AVX2 processes 4 doubles per operation, while Arm NEON processes 2. The entire loop structure needs adjustment. (SVE/SVE2 on newer Arm cores (Neoverse V1/V2, Graviton 3/4) provides 256-bit or wider vector-length agnostic (VLA) registers, matching or exceeding AVX2 registers.)

Horizontal reduction logic – The horizontal add pattern using _mm256_extractf128_pd and _mm256_castpd256_pd128 is x86-specific and must be completely rewritten for Arm SIMD.

Manual conversion requires rewriting 30+ lines of intrinsic code, adjusting loop strides, and testing numerical accuracy. This is exactly where automated migration tools become essential.

Each of these issues blocks Arm migration in different ways. Manual migration requires not just converting intrinsics, but also modernizing the entire build infrastructure, finding Arm equivalents, and validating performance. For any substantial codebase, this becomes prohibitively expensive.

What GitHub Copilot Can and Can’t Do Without Arm MCP

Let’s be clear about what changes when you add the Arm MCP Server to Docker MCP Toolkit.

Without Arm MCP

You ask GitHub Copilot to migrate your C++ application from x86 to Arm64. Copilot responds with general advice: “Convert AVX2 intrinsics to NEON”, “Update your Dockerfile to use ARM64 base image”, “Change compiler flags”. Then you must manually research NEON equivalents, rewrite hundreds of lines of intrinsic code, update the Dockerfile yourself, hope you got the conversion right, and spend hours debugging compilation errors.

Yes, Copilot can write code. But without specialized tools, it’s guessing based on training data—not using concrete knowledge base documentation or using purpose-built tools to analyze your actual application architecture.

With Arm MCP + Docker MCP Toolkit

You ask GitHub Copilot the same thing. Within minutes, it:

Uses check_image tool to verify your base image supports ARM64

Runs migrate_ease_scan on your actual codebase to find x86-specific code

Uses knowledge_base_search to find correct Arm SIMD equivalents for every x86 intrinsic

Converts your code with architecture-specific accuracy

Updates your Dockerfile with Arm-compatible base images

Creates a pull request with all changes.

Real code gets scanned. Real intrinsics get converted. Real pull requests appear in your repository. Close VS Code, come back tomorrow, and the migration is ready to test, complete with documentation explaining every change.

The difference? Docker MCP Toolkit gives GitHub Copilot access to actual Arm migration tooling, not just general knowledge about Arm architecture.

Why This Is Different from Manual Migration

You could manually use Arm migration tools: install utilities locally, run checks, research intrinsics, update code. Here’s what that process looks like:

Manual process:

Install Arm migration tools (15 minutes)

Run compatibility scans (5 minutes)

Research each x86 intrinsic equivalent (30 minutes per intrinsic)

Manually rewrite code (2-3 hours)

Update Dockerfile (15 minutes)

Fix compilation errors (1-2 hours)

Document changes (30 minutes)

Total: 5-7 hours per application

With Docker MCP Toolkit + Arm MCP:

Ask GitHub Copilot to migrate (20 minutes)

Review and approve changes (10-20 minutes)

Merge pull request

Total: 30-40 minutes per application

Setting Up Visual Studio Code with Docker MCP Toolkit

Prerequisites

Before you begin, make sure you have:

A machine with 8 GB RAM minimum (16GB recommended)

The latest Docker Desktop release

VS Code with GitHub Copilot extension

GitHub account with personal access token

Step 1. Enable Docker MCP Toolkit

Open Docker Desktop and enable the MCP Toolkit from Settings.

To enable:

Open Docker Desktop

Go to Settings → Beta Features

Toggle Docker MCP Toolkit ON

Click Apply

Caption: Enabling Docker MCP Toolkit under Docker Desktop 

Add Required MCP Servers from CatalogAdd Arm, Sequential Thinking and GitHub Official by following the links below, or by selecting “Catalog” in the Docker Desktop MCP toolkit:

Arm MCP Server – Arm migration tools and architecture expertise

GitHub MCP Server – Repository operations and pull request management

Sequential Thinking MCP Server – Complex problem decomposition and planning

Caption: Searching for Arm MCP Server in the Docker MCP Catalog

Step 2. Configure the Servers

Configure the Arm MCP Server

To access your local code for the migrate-ease scan and MCA tools, the Arm MCP Server needs a directory configured to point to your local code.

Caption: Arm MCP Server configuration

Once you click ‘Save’, the Arm MCP Server will know where to look for your code. If you want to give a different directory access in the future, you’ll need to change this path.

Available Arm Migration Tools

Click Tools to view all the six MCP tools available under Arm MCP Server.

Caption: List of MCP tools provided by the Arm MCP Server

knowledge_base_search – Semantic search of Arm learning resources, intrinsics documentation, and software compatibility

migrate_ease_scan – Code scanner supporting C++, Python, Go, JavaScript, and Java for Arm compatibility analysis

check_image – Docker image architecture verification (checks if images support Arm64)

skopeo – Remote container image inspection without downloading

mca – Machine Code Analyzer for assembly performance analysis and IPC predictions

sysreport_instructions – System architecture information gathering

Configure GitHub MCP Server

The GitHub MCP Server lets GitHub Copilot create pull requests, manage issues, and commit changes.

Caption: Steps to configure GitHub Official MCP Server

Configure Authentication:

Select GitHub official

Choose your preferred authentication method 

For Personal Access Token, you’ll need to get the token from GitHub > Settings > Developer Settings

Caption: Setting up Personal Access Token in GitHub MCP Server

Configure Sequential Thinking MCP Server

Click “Sequential Thinking”

No configuration needed

Caption: Sequential MCP Server requires zero configuration

This server helps GitHub Copilot break down complex Arm migration decisions into logical steps.

Step 3. Add the Servers to VS Code

The Docker MCP Toolkit makes it incredibly easy to configure MCP servers for clients like VS Code.

To configure, click “Clients” and scroll down to Visual Studio Code. Click the “Connect” button:

Caption: Setting up Visual Studio Code as MCP Client

Now open VS Code and click on the ‘Extensions’ icon in the left toolbar:

Caption: Configuring MCP_DOCKER under VS Code Extensions

Click the MCP_DOCKER gear, and click ‘Start Server’:

Caption: Starting MCP Server under VS Code

Now you’re ready to perform an Arm migration!

Step 4. Verify Connection

Open GitHub Copilot Chat in VS Code and ask:

What Arm migration tools do you have access to?

You should see tools from all three servers listed. If you see them, your connection works. Let’s migrate some code.

Caption: Playing around with GitHub Co-Pilot

Real-World Demo: Migrating a Legacy x86 Application

Now that you’ve connected GitHub Copilot to Docker MCP Toolkit, let’s migrate that matrix multiplication benchmark we looked at earlier.

Time to migrate: 20 minutesInfrastructure: $0 (all runs in Docker containers)Prerequisites: The code we showed earlier in this post

The Workflow

Docker MCP Toolkit orchestrates the migration through a secure MCP Gateway that routes requests to specialized tools: the Arm MCP Server scans code and converts intrinsics, GitHub MCP Server creates pull requests, and Sequential Thinking plans multi-step migrations. Each tool runs in an isolated Docker container: secure, reproducible, and under your control.

Step 1. Clone the repo

git clone https://github.com/JoeStech/docker-blog-arm-migration

Give GitHub Copilot Migration Instructions

Open your project in VS Code. In GitHub Copilot Chat, paste this prompt:

Your goal is to migrate this codebase from x86 to Arm64. Use the Arm MCP Server tools to help you with this migration.

Steps to follow:
1. Check all Dockerfiles – use check_image and/or skopeo tools to verify Arm compatibility, changing the base image if necessary
2. Scan the codebase – run migrate_ease_scan with the appropriate language scanner and apply the suggested changes
3. Use knowledge_base_search when you need Arm architecture guidance or intrinsic equivalents
4. Update compiler flags and dependencies for Arm64 compatibility
5. **Create a pull request with all changes using GitHub MCP Server**

Important notes:
– Your current working directory is mapped to /workspace on the MCP server
– NEON lane indices must be compile-time constants, not variables
– If you're unsure about Arm equivalents, use knowledge_base_search to find documentation
– Be sure to find out from the user or system what the target machine is, and use the appropriate intrinsics. For instance, if neoverse (Graviton, Axion, Cobalt) is targeted, use the latest SME/SME2.

**After completing the migration:**
– Create a pull request with a detailed description of changes
– Include performance predictions and cost savings in the PR description
– List all tools used and validation steps needed

Step 2. Watch Docker MCP Toolkit Execute

GitHub Copilot orchestrates the migration using Docker MCP Toolkit. Here’s what happens:

Phase 1: Image Analysis

GitHub Copilot starts by analyzing the Dockerfile’s base image using the Arm MCP Server’s skopeo tool.

Caption: GitHub Copilot uses the skopeo tool from the Arm MCP Server to analyze the centos:6 base image. The tool reports that this image has no arm64 build available. This is the first blocker identified – the container won’t even start on Arm hardware.

This immediately identifies that CentOS 6 has no Arm64 builds and must be replaced.

Phase 2: Code Analysis

Next, Copilot runs the migrate_ease_scan tool with the C++ scanner on the codebase.

Caption: The migrate_ease_scan tool analyzes the C++ source code and detects AVX2 intrinsics, the -mavx2 compiler flag, and x86-specific headers. This automated scan identifies all architecture-dependent code that requires conversion – work that could take hours to find manually.

The scan results show exactly what needs to change for Arm compatibility. Each detected issue includes the file location, line number, and specific code that requires modification. This precision eliminates guesswork and ensures nothing is missed.

Phase 3: Arm Optimization and Best Practices

Forx86 intrinsics found in Phase 2, Copilot queries the Arm MCP Server’s knowledge base for Arm equivalents, if needed. It then makes replacements as necessary.

Caption: GitHub Copilot uses the knowledge_base_search tool to find Arm NEON equivalents for each AVX2 intrinsic.

The tool returns official Arm documentation showing the conversions: _mm256_loadu_pd() becomes vld1q_f64(), _mm256_add_pd() becomes vaddq_f64(), and so on. This knowledge comes from learn.arm.com learning paths and intrinsic documentation.

The knowledge base provides not just the conversion mappings, but also architectural context: AVX2’s 256-bit vectors vs NEON’s 128-bit vectors, which means loop adjustments are needed. Copilot uses this information to rewrite the matrix multiplication code correctly.

Phase 4: Create the GitHub PR and Summarize

After completing the migration, Copilot creates a PR in GitHub and summarizes the changes made.

The changes are substantial: 

Replaced centos:6 → ubuntu:22.04, added TARGETARCH for multi-arch builds

Added Arm64 detection and -march=armv8-a+simd compiler flag

Converted AVX2 → NEON intrinsics with architecture guards

The build is now simpler, modern, and Arm-compatible.

Phase 5: Checking the Pull Request

You can verify the Pull Request by visiting https://github.com/JoeStech/docker-blog-arm-migration/pull/1/

To verify performance, you can build and run the benchmark:

docker buildx build –platform linux/arm64 -t benchmark:arm64 . –load

docker run –rm benchmark:arm64

Which should output:

SIMD Matrix Operations Benchmark
================================
Running on Arm64 architecture with NEON optimizations
=== Matrix Multiplication Benchmark ===
Matrix size: 200×200
Time: 17 ms
Result sum: 1.98888e+08

Caveats

A very important thing to remember is that not all models will provide equal results, and while the Arm MCP Server provides deterministic context, the models themselves are stochastic. Always use a flagship latest-generation model to get the best results, and test any guesses the model makes regarding performance improvement.

How Docker MCP Toolkit Changes Development

Docker MCP Toolkit changes how developers interact with specialized knowledge and capabilities. Rather than learning new tools, installing dependencies, or managing credentials, developers connect their AI assistant once and immediately access containerized expertise.

The benefits extend beyond Arm migration:

Consistency – Same tools, same results across all developers

Security – Containerized isolation prevents tool interference

Version Control – MCP server versions tracked with application code

Reproducibility – Migrations behave identically across environments

Discoverability – Docker MCP Catalog makes finding the right server straightforward

Most importantly, developers remain in their existing workflow. VS Code. GitHub Copilot. Git. No context switching to external tools or dashboards.

Wrapping Up

You’ve just automated Arm64 migration using Docker MCP Toolkit, the Arm MCP Server, and GitHub Copilot. What used to require architecture expertise, manual intrinsic conversion, and hours of debugging now happens through natural conversation, safely executed in Docker containers.

Ready to try it? Open Docker Desktop and explore the MCP Catalog. Start with the Arm MCP Server, add GitHub, experiment with Sequential Thinking. Each server unlocks new capabilities.

The future of migration isn’t manually porting every application. It’s having an AI assistant that can execute tasks across your entire stack securely, reproducibly, and at the speed of thought.

Learn More

New to Docker? Download Docker Desktop

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Get Started with MCP Toolkit: Official Documentation

Quelle: https://blog.docker.com/feed/

Your Dependencies Don’t Care About Your FIPS Configuration

FIPS compliance is a great idea that makes the entire software supply chain safer. But teams adopting FIPS-enabled container images are running into strange errors that can be challenging to debug. What they are learning is that correctness at the base image layer does not guarantee compatibility across the ecosystem. Change is complicated, and changing complicated systems with intricate dependency webs often yields surprises. We are in the early adaptation phase of FIPS, and that actually provides interesting opportunities to optimize how things work. Teams that recognize this will rethink how they build FIPS and get ahead of the game.

FIPS in practice

FIPS is a U.S. government standard for cryptography. In simple terms, if you say a system is “FIPS compliant,” that means the cryptographic operations like TLS, hashing, signatures, and random number generation are performed using a specific, validated crypto module in an approved mode. That sounds straightforward until you remember that modern software is built not as one compiled program, but as a web of dependencies that carry their own baggage and quirks.

The FIPS crypto error that caught us off guard

We got a ticket recently for a Rails application in a FIPS-enabled container image. On the surface, everything looked right. Ruby was built to use OpenSSL 3.x with the FIPS provider. The OpenSSL configuration was correct. FIPS mode was active.

However, the application started throwing cryptography module errors from the Postgres Rubygem module. Even more confusing, a minimal reproducer of a basic Ruby app and a stock postgres did not reproduce the error and a connection was successfully established. The issue only manifested when using ActiveRecord.

The difference came down to code paths. A basic Ruby script using the pg gem directly exercises a simpler set of operations. ActiveRecord triggers additional functionality that exercises different parts of libpq. The non-FIPS crypto was there all along, but only certain operations exposed it.

Your container image can be carefully configured for FIPS, and your application can still end up using non-FIPS crypto because a dependency brought its own crypto along for the ride. In this case, the culprit was a precompiled native artifact associated with the database stack. When you install pg, Bundler may choose to download a prebuilt binary dependency such as libpq.

Unfortunately those prebuilt binaries are usually built with assumptions that cause problems. They may be linked against a different OpenSSL than the one in your image. They may contain statically embedded crypto code. They may load crypto at runtime in a way that is not obvious.

This is the core challenge with FIPS adoption. Your base image can do everything right, but prebuilt dependencies can silently bypass your carefully configured crypto boundary.

Why we cannot just fix it in the base image yet

The practical fix for the Ruby case was adding this to your Gemfile.

gem "pg", "~> 1.1", force_ruby_platform: true

You also need to install libpq-dev to allow compiling from source. This forces Bundler to build the gem from source on your system instead of using a prebuilt binary. When you compile from source inside your controlled build environment, the resulting native extension is linked against the OpenSSL that is actually in your FIPS image.

Bundler also supports an environment/config knob for the same idea called BUNDLE_FORCE_RUBY_PLATFORM. The exact mechanism matters less than the underlying strategy of avoiding prebuilt native artifacts when you are trying to enforce a crypto boundary.

You might reasonably ask why we do not just add BUNDLE_FORCE_RUBY_PLATFORM to the Ruby FIPS image by default. We discussed this internally, and the answer illustrates why FIPS complexity cascades.

Setting that flag globally is not enough on its own. You also need a C compiler and the relevant libraries and headers in the build stage. And not every gem needs this treatment. If you flip the switch globally, you end up compiling every native gem from source, which drags in additional headers and system libraries that you now need to provide. The “simple fix” creates a new dependency management problem.

Teams adopt FIPS images to satisfy compliance. Then they have to add back build complexity to make the crypto boundary real and verify that every dependency respects it. This is not a flaw in FIPS or in the tooling. It is an inherent consequence of retrofitting a strict cryptographic boundary onto an ecosystem built around convenience and precompiled artifacts.

The patterns we are documenting today will become the defaults tomorrow. The tooling will catch up. Prebuilt packages will get better. Build systems will learn to handle the edge cases. But right now, teams need to understand where the pitfalls are.

What to do if you are starting a FIPS journey

You do not need to become a crypto expert to avoid the obvious traps. You only need a checklist mindset. The teams working through these problems now are building real expertise that will be valuable as FIPS requirements expand across industries.

Treat prebuilt native dependencies as suspect. If a dependency includes compiled code, assume it might carry its own crypto linkage until you verify otherwise. You can use ldd on Linux to inspect dynamic linking and confirm that binaries link against your system OpenSSL rather than a bundled alternative.

Use a multi-stage build and compile where it matters. Keep your runtime image slim, but allow a builder stage with the compiler and headers needed to compile the few native pieces that must align with your FIPS OpenSSL.

Test the real execution path, not just “it starts.” For Rails, that means running a query, not only booting the app or opening a connection. The failures we saw appeared when using the ORM, not on first connection.

Budget for supply-chain debugging. The hard part is not turning on FIPS mode. The hard part is making sure all the moving parts actually respect it. Expect to spend time tracing crypto usage through your dependency graph.

Why this matters beyond government contracts

FIPS compliance has traditionally been seen as a checkbox for federal sales. That is changing. As supply chain security becomes a board-level concern across industries, validated cryptography is moving from “nice to have” to “expected.” The skills teams build solving FIPS problems today translate directly to broader supply chain security challenges.

Think about what you learn when you debug a FIPS failure. You learn to trace crypto usage through your dependency graph, to question prebuilt artifacts, to verify that your security boundaries are actually enforced at runtime. Those skills matter whether you are chasing a FedRAMP certification or just trying to answer your CISO’s questions about software provenance.

The opportunity in the complexity

FIPS is not “just a switch” you flip in a base image. View FIPS instead as a new layer of complexity that you might have to debug across your dependency graph. That can sound like bad news, but switch the framing and it becomes an opportunity to get ahead of where the industry is going.

The ecosystem will adapt and the tooling will improve. The teams investing in understanding these problems now will be the ones who can move fastest when FIPS or something like it becomes table stakes.

If you are planning a FIPS rollout, start by controlling the prebuilt native artifacts that quietly bypass the crypto module you thought you were using. Recognize that every problem you solve is building institutional knowledge that compounds over time. This is not just compliance work. It is an investment in your team’s security engineering capability.

Quelle: https://blog.docker.com/feed/

Using MCP Servers: From Quick Tools to Multi-Agent Systems

Model Context Protocol (MCP) servers are a spec for exposing tools, models, or services to language models through a common interface. Think of them as smart adapters: they sit between a tool and the LLM, speaking a predictable protocol that lets the model interact with things like APIs, databases, and agents without needing to know implementation details.

But like most good ideas, the devil’s in the details.

The Promise—and the Problems of Running MCP Servers

Running an MCP sounds simple: spin up a Python or Node server that exposes your tool. Done, right? Not quite.

You run into problems fast:

Runtime friction: If an MCP is written in Python, your environment needs Python (plus dependencies, plus maybe a virtualenv strategy, plus maybe GPU drivers). Same goes for Node. This multiplies fast when you’re managing many MCPs or deploying them across teams.

Secrets management: MCPs often need credentials (API keys, tokens, etc.). You need a secure way to store and inject those secrets into your MCP runtime. That gets tricky when different teams, tools, or clouds are involved.

N×N integration pain: Let’s say you’ve got three clients that want to consume MCPs, and five MCPs to serve up. Now you’re looking at 15 individual integrations. No thanks.

To make MCPs practical, you need to solve these three core problems: runtime complexity, secret injection, and client-to-server wiring. 

If you’re wondering where I’m going with all this, take a look at those problems. We already have a technology that has been used by developers for over a decade that helps solve them: Docker containers.

In the rest of this blog I’ll walk through three different approaches, going from least complex to most complex, for integrating MCP servers into your developer experience. 

Option 1 — Docker MCP Toolkit & Catalog

For the developer who already uses containers and wants a low-friction way to start with MCP.

If you’re already comfortable with Docker but just getting your feet wet with MCP, this is the sweet spot. In the raw MCP world, you’d clone Python/Node servers, manage runtimes, inject secrets yourself, and hand-wire connections to every client. That’s exactly the pain Docker’s MCP ecosystem set out to solve.

Docker’s MCP Catalog is a curated, containerized registry of MCP servers. Each entry is a prebuilt container with everything you need to run the MCP server. 

The MCP Toolkit (available via Docker Desktop) is your control panel: search the catalog, launch servers with secure defaults, and connect them to clients.

How it helps:

No language runtimes to install

Built-in secrets management

One-click enablement via Docker Desktop

Easily wire the MCPs to your existing agents (Claude Desktop, Copilot in VS Code, etc)

Centralized access via the MCP Gateway

Figure 1: Docker MCP Catalog: Browse hundreds of MCP servers with filters for local or remote and clear distinctions between official and community servers

A Note on the MCP GatewayOne important piece working behind the scenes in both the MCP Toolkit and cagent (a framework for easily building multi-agent applications that we cover below) is the MCP Gateway, an open-source project from Docker that acts as a centralized frontend for all your MCP servers. Whether you’re using a GUI to start containers or defining agents in YAML, the Gateway handles all the routing, authentication, and translation between clients and tools. It also exposes a single endpoint that custom apps or agent frameworks can call directly, making it a clean bridge between GUI-based workflows and programmatic agent development.

Moving on: Using MCP servers alongside existing AI agents is often the first step for many developers. You wire up a couple tools, maybe connect to a calendar or a search API, and use them in something like Claude, ChatGPT, or a small custom agent. For step-by-step tutorials on how to automate dev workflows with Docker’s MCP Catalog and Toolkit with popular clients, check out these guides on ChatGPT, Claude Desktop,Codex, Gemini CLI, and Claude Code. Once that pattern clicks, the next logical step is to use those same MCP servers as tools inside a multi-agent system.

Option 2 — cagent: Declarative Multi-Agent Apps

For the developer who wants to build custom multi-agent applications but isn’t steeped in traditional agentic frameworks.

If you’re past simple MCP servers and want agents that can delegate, coordinate, and reason together, cagent is your next step. It’s Docker’s open-source, YAML-first framework for defining and running multi-agent systems—without needing to dive into complex agent SDKs or LLM loop logic.

Cagent lets you describe:

The agents themselves (model, role, instructions)

Who delegates to whom

What tools each agent can access (via MCP or local capabilities)

Below is an example of a pirate flavored chat bot:

agents:
root:
description: An agent that talks like a pirate
instruction: Always answer by talking like a pirate.
welcome_message: |
Ahoy! I be yer pirate guide, ready to set sail on the seas o' knowledge! What be yer quest?
model: auto

cagent run agents.yaml

You don’t write orchestration code. You describe what you want, and Cagent runs the system.

Why it works:

Tools are scoped per agent

Delegation is explicit

Uses MCP Gateway behind the scene

Ideal for building agent systems without writing Python

If you’d like to give cagent a try, we have a ton of examples in the project’s GitHub repository. Check out this guide on building multi-agent systems in 5 minutes. 

Option 3 — Traditional Agent Frameworks (LangGraph, CrewAI, ADK)

For developers building complex, custom, fully programmatic agent systems.

Traditional agent frameworks like LangGraph, CrewAI, or Google’s Agent Development Kit (ADK) let you define, control, and orchestrate agent behavior directly in code. You get full control over logic, state, memory, tools, and workflows.

They shine when you need:

Complex branching logic

Error recovery, retries, and persistence

Custom memory or storage layers

Tight integration with existing backend code

Example: LangGraph + MCP via Gateway

import requests
from langgraph.graph import StateGraph
from langchain.agents import Tool
from langchain_openai import ChatOpenAI

# Discover MCP endpoint from Gateway
resp = requests.get("http://localhost:6600/v1/servers")
servers = resp.json()["servers"]
duck_url = next(s["url"] for s in servers if s["name"] == "duckduckgo")

# Define a callable tool
def mcp_search(query: str) -> str:
return requests.post(duck_url, json={"input": query}).json()["output"]

search_tool = Tool(name="web_search", func=mcp_search, description="Search via MCP")

# Wire it into a LangGraph loop
llm = ChatOpenAI(model="gpt-4")
graph = StateGraph()
graph.add_node("agent", llm.bind_tools([search_tool]))
graph.add_edge("agent", "agent")
graph.set_entry_point("agent")

app = graph.compile()
app.invoke("What’s the latest in EU AI regulation?")

In this setup, you decide which tools are available. The agent chooses when to use them based on context, but you’ve defined the menu.And yes, this is still true in the Docker MCP Toolkit: you decide what to enable. The LLM can’t call what you haven’t made visible.

Choosing the Right Approach

Approach

Best For

You Manage

You Get

Docker MCP Toolkit + Catalog

Devs new to MCP, already using containers

Tool selection

One-click setup, built-in secrets, Gateway integration

Cagent

YAML-based multi-agent apps without custom code

Roles & tool access

Declarative orchestration, multi-agent workflows

LangGraph / CrewAI / ADK

Complex, production-grade agent systems

Full orchestration

Max control over logic, memory, tools, and flow

Wrapping UpWhether you’re just connecting a tool to Claude, designing a custom multi-agent system, or building production workflows by hand, Docker’s MCP tooling helps you get started easily and securely. 

Check out the Docker MCP Toolkit, cagent, and MCP Gateway for example code, docs, and more ways to get started.

Quelle: https://blog.docker.com/feed/

Making (Very) Small LLMs Smarter

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. I started getting seriously interested in generative AI about two years ago. What interests me most is the ability to run language models (LLMs) directly on my laptop (For work, I have a MacBook Pro M2 max, but on a more personal level, I run LLMs on my personal MacBook Air M4 and on Raspberry Pis – yes, it’s possible, but I’ll talk about that another time).

Let’s be clear, reproducing a Claude AI Desktop or Chat GPT on a laptop with small language models is not possible. Especially since I limit myself to models that have between 0.5 and 7 billion parameters. But I find it an interesting challenge to see how far we can go with these small models. So, can we do really useful things with small LLMs? The answer is yes, but you need to be creative and put in a bit of effort.

I’m going to take a concrete use case, related to development (but in the future I’ll propose “less technical” use cases).

(Specific) Use Case: Code Writing Assistance

I need help writing code

Currently, I’m working in my free time on an open-source project, which is a Golang library for quickly developing small generative AI agents. It’s both to get my hands dirty with Golang and prepare tools for other projects. This project is called Nova; there’s nothing secret about it, you can find it here.

If I use Claude AI and ask it to help me write code with Nova: “I need a code snippet of a Golang Nova Chat agent using a stream completion.”

The response will be quite disappointing, because Claude doesn’t know Nova (which is normal, it’s a recent project). But Claude doesn’t want to disappoint me and will still propose something which has nothing to do with my project.

And it will be the same with Gemini.

So, you’ll tell me, give the “source code of your repository to feed” to Claude AI or Gemini. OK, but imagine the following situation: I don’t have access to these services, for various reasons. Some of these reasons could be confidentiality, the fact that I’m on a project where we don’t have the right to use the internet, for example. That already disqualifies Claude AI and Gemini. How can I get help writing code with a small local LLM? So as you guessed, with a local LLM. And moreover, a “very small” LLM.

Choosing a language model

When you develop a solution based on generative AI, the choice of language model(s) is crucial. And you’ll have to do a lot of technology watching, research, and testing to find the model that best fits your use case. And know that this is non-negligible work.

For this article (and also because I use it), I’m going to use hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, which you can find here. It’s a 3 billion parameter language model, optimized for code generation. You can install it with Docker Model Runner with the following command:

docker model pull hf.co/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF:Q4_K_M

And to start chatting with the model, you can use the following command:

docker model run hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

Or use Docker Desktop:

So, of course, as you can see in the illustration above, this little “Qwen Coder” doesn’t know my Nova library either. But we’re going to fix that.

Feeding the model with specific information

For my project, I have a markdown file in which I save the code snippets I use to develop examples with Nova. You can find it here. For now, there’s little content, but it will be enough to prove and illustrate my point.

So I could add the entire content of this file to a user prompt that I would give to the model. But that will be ineffective. Indeed, small models have a relatively small context window. But even if my “Qwen Coder” was capable of ingesting all the content of my markdown file, it would have trouble focusing on my request and on what it should do with this information. So,

1st essential rule: when you use a very small LLM, the larger the content provided to the model, the less effective the model will be.

2nd essential rule: the more you keep the conversation history, the more the content provided to the model will grow, and therefore it will decrease the effectiveness of the model.

So, to work around this problem, I’m going to use a technique called RAG (Retrieval Augmented Generation). The principle is simple: instead of providing all the content to the model, we’re going to store this content in a “vector” type database, and when the user makes a request, we’re going to search in this database for the most relevant information based on the user’s request. Then, we’re going to provide only this relevant information to the language model. For this blog post, the data will be kept in memory (which is not optimal, but sufficient for a demonstration).

RAG?

There are already many articles on the subject, so I won’t go into detail. But here’s what I’m going to do for this blog post:

My snippets file is composed of sections: a markdown title (## snippet name), possibly a description in free text, and a code block (golang … ).

I’m going to split this file by sections into chunks of text (we also talk about “chunks”),

Then, for each section I’m going to create an “embedding” (vector representation of text == mathematical representation of the semantic meaning of the text) with the ai/embeddinggemma:latest model (a relatively small and efficient embedding model). Then I’m going to store these embeddings (and the associated text) in an in-memory vector database (a simple array of JSON objects).

If you want to learn more about embedding, please read this article:Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Diagram of the vector database creation process:

Similarity search and user prompt construction

Once I have this in place, when I make a request to the language model (so hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m), I’m going to:

Create an embedding of the user’s request with the embedding model.

Compare this embedding with the embeddings stored in the vector database to find the most relevant sections (by calculating the distance between the vector representation of my question and the vector representations of the snippets). This is called a similarity search.

From the most relevant sections (the most similar), I’ll be able to construct a user prompt that includes only the relevant information and my initial request.

Diagram of the search and user prompt construction process:

So the final user prompt will contain:

The system instructions. For example: “You are a helpful coding assistant specialized in Golang and the Nova library. Use the provided code snippets to help the user with their requests.”

The relevant sections were extracted from the vector database.

The user’s request.

Remarks:

I explain the principles and results, but all the source code (NodeJS with LangchainJS) used to arrive at my conclusions is available in this project 

To calculate distances between vectors, I used cosine similarity (A cosine similarity score of 1 indicates that the vectors point in the same direction. A cosine similarity score of 0 indicates that the vectors are orthogonal, meaning they have no directional similarity.)

You can find the JavaScript function I used here: 

And the piece of code that I use to split the markdown snippets file: 

Warning: embedding models are limited by the size of text chunks they can ingest. So you have to be careful not to exceed this size when splitting the source file. And in some cases, you’ll have to change the splitting strategy (fixed-size chunk,s for example, with or without overlap)

Implementation and results, or creating my Golang expert agent

Now that we have the operating principle, let’s see how to put this into music with LangchainJS, Docker Model Runner, and Docker Agentic Compose.

Docker Agentic Compose configuration

Let’s start with the Docker Agentic Compose project structure:

services:
golang-expert:
build:
context: .
dockerfile: Dockerfile
environment:
TERM: xterm-256color

HISTORY_MESSAGES: 2
MAX_SIMILARITIES: 3
COSINE_LIMIT: 0.45

OPTION_TEMPERATURE: 0.0
OPTION_TOP_P: 0.75
OPTION_PRESENCE_PENALTY: 2.2

CONTENT_PATH: /app/data

volumes:
– ./data:/app/data

stdin_open: true # docker run -i
tty: true # docker run -t

configs:
– source: system.instructions.md
target: /app/system.instructions.md

models:
chat-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CHAT

embedding-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_EMBEDDING

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

embedding-model:
model: ai/embeddinggemma:latest

configs:
system.instructions.md:
content: |
Your name is Bob (the original replicant).
You are an expert programming assistant in Golang.
You write clean, efficient, and well-documented code.
Always:
– Provide complete, working code
– Include error handling
– Add helpful comments
– Follow best practices for the language
– Explain your approach briefly

Use only the information available in the provided data and your KNOWLEDGE BASE.

What’s important here is:

I only keep the last 2 messages in my conversation history, and I only select the 2 or 3 best similarities found at most (to limit the size of the user prompt):

HISTORY_MESSAGES: 2
MAX_SIMILARITIES: 3
COSINE_LIMIT: 0.45

You can adjust these values according to your use case and your language model’s capabilities.

The models section, where I define the language models I’m going to use:

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

embedding-model:
model: ai/embeddinggemma:latest

One of the advantages of this section is that it will allow Docker Compose to download the models if they’re not already present on your machine.

As well as the models section of the golang-expert service, where I map the environment variables to the models defined above:

models:
chat-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CHAT

embedding-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_EMBEDDING

And finally, the system instructions configuration file:

configs:
– source: system.instructions.md
target: /app/system.instructions.md

Which I define a bit further down in the configs section:

configs:
system.instructions.md:
content: |
Your name is Bob (the original replicant).
You are an expert programming assistant in Golang.
You write clean, efficient, and well-documented code.
Always:
– Provide complete, working code
– Include error handling
– Add helpful comments
– Follow best practices for the language
– Explain your approach briefly

Use only the information available in the provided data and your KNOWLEDGE BASE.

You can, of course, adapt these system instructions to your use case. And also persist them in a separate file if you prefer.

Dockerfile

It’s rather simple:

FROM node:22.19.0-trixie

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY *.js .

# Create non-root user
RUN groupadd –gid 1001 nodejs &amp;&amp;
useradd –uid 1001 –gid nodejs –shell /bin/bash –create-home bob-loves-js

# Change ownership of the app directory
RUN chown -R bob-loves-js:nodejs /app

# Switch to non-root user
USER bob-loves-js

Now that the configuration is in place, let’s move on to the agent’s source code.

Golang expert agent source code, a bit of LangchainJS with RAG

The JavaScript code is rather simple (probably improvable, but functional) and follows these main steps:

1. Initial configuration

Connection to both models (chat and embeddings) via LangchainJS

Loading parameters from environment variables

2. Vector database creation (at startup)

Reading the snippets.md file

Splitting into sections (chunks)

Generating an embedding for each section

Storing in an in-memory vector database

3. Interactive conversation loop

The user asks a question

Creating an embedding of the question

Similarity search in the vector database to find the most relevant snippets

Construction of the final prompt with: history + system instructions + relevant snippets + question

Sending to the LLM and displaying the response in streaming

Updating the history (limited to the last N messages)

import { ChatOpenAI } from "@langchain/openai";
import { OpenAIEmbeddings} from '@langchain/openai';

import { splitMarkdownBySections } from './chunks.js'
import { VectorRecord, MemoryVectorStore } from './rag.js';

import prompts from "prompts";
import fs from 'fs';

// Define [CHAT MODEL] Connection
const chatModel = new ChatOpenAI({
model: process.env.MODEL_RUNNER_LLM_CHAT || `ai/qwen2.5:latest`,
apiKey: "",
configuration: {
baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
},
temperature: parseFloat(process.env.OPTION_TEMPERATURE) || 0.0,
top_p: parseFloat(process.env.OPTION_TOP_P) || 0.5,
presencePenalty: parseFloat(process.env.OPTION_PRESENCE_PENALTY) || 2.2,
});

// Define [EMBEDDINGS MODEL] Connection
const embeddingsModel = new OpenAIEmbeddings({
model: process.env.MODEL_RUNNER_LLM_EMBEDDING || "ai/embeddinggemma:latest",
configuration: {
baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
apiKey: ""
}
})

const maxSimilarities = parseInt(process.env.MAX_SIMILARITIES) || 3
const cosineLimit = parseFloat(process.env.COSINE_LIMIT) || 0.45

// —————————————————————-
// Create the embeddings and the vector store from the content file
// —————————————————————-

console.log("========================================================")
console.log(" Embeddings model:", embeddingsModel.model)
console.log(" Creating embeddings…")
let contentPath = process.env.CONTENT_PATH || "./data"

const store = new MemoryVectorStore();

let contentFromFile = fs.readFileSync(contentPath+"/snippets.md", 'utf8');
let chunks = splitMarkdownBySections(contentFromFile);
console.log(" Number of documents read from file:", chunks.length);

// ————————————————-
// Create and save the embeddings in the memory vector store
// ————————————————-
console.log(" Creating the embeddings…");

for (const chunk of chunks) {
try {
// EMBEDDING COMPLETION:
const chunkEmbedding = await embeddingsModel.embedQuery(chunk);
const vectorRecord = new VectorRecord('', chunk, chunkEmbedding);
store.save(vectorRecord);

} catch (error) {
console.error(`Error processing chunk:`, error);
}
}

console.log(" Embeddings created, total of records", store.records.size);
console.log();

console.log("========================================================")

// Load the system instructions from a file
let systemInstructions = fs.readFileSync('/app/system.instructions.md', 'utf8');

// —————————————————————-
// HISTORY: Initialize a Map to store conversations by session
// —————————————————————-
const conversationMemory = new Map()

let exit = false;

// CHAT LOOP:
while (!exit) {
const { userMessage } = await prompts({
type: "text",
name: "userMessage",
message: `Your question (${chatModel.model}): `,
validate: (value) => (value ? true : "Question cannot be empty"),
});

if (userMessage == "/bye") {
console.log(" See you later!");
exit = true;
continue
}

// HISTORY: Get the conversation history for this session
const history = getConversationHistory("default-session-id")

// —————————————————————-
// SIMILARITY SEARCH:
// —————————————————————-
// ————————————————-
// Create embedding from the user question
// ————————————————-
const userQuestionEmbedding = await embeddingsModel.embedQuery(userMessage);

// ————————————————-
// Use the vector store to find similar chunks
// ————————————————-
// Create a vector record from the user embedding
const embeddingFromUserQuestion = new VectorRecord('', '', userQuestionEmbedding);

const similarities = store.searchTopNSimilarities(embeddingFromUserQuestion, cosineLimit, maxSimilarities);

let knowledgeBase = "KNOWLEDGE BASE:n";

for (const similarity of similarities) {
console.log(" CosineSimilarity:", similarity.cosineSimilarity, "Chunk:", similarity.prompt);
knowledgeBase += `${similarity.prompt}n`;
}

console.log("n Similarities found, total of records", similarities.length);
console.log();
console.log("========================================================")
console.log()

// ————————————————-
// Generate CHAT COMPLETION:
// ————————————————-

// MESSAGES== PROMPT CONSTRUCTION:
let messages = [
…history,
["system", systemInstructions],
["system", knowledgeBase],
["user", userMessage]
]

let assistantResponse = ''
// STREAMING COMPLETION:
const stream = await chatModel.stream(messages);
for await (const chunk of stream) {
assistantResponse += chunk.content
process.stdout.write(chunk.content);
}
console.log("n");

// HISTORY: Add both user message and assistant response to history
addToHistory("default-session-id", "user", userMessage)
addToHistory("default-session-id", "assistant", assistantResponse)

}

// Helper function to get or create a conversation history
function getConversationHistory(sessionId, maxTurns = parseInt(process.env.HISTORY_MESSAGES)) {
if (!conversationMemory.has(sessionId)) {
conversationMemory.set(sessionId, [])
}
return conversationMemory.get(sessionId)
}

// Helper function to add a message to the conversation history
function addToHistory(sessionId, role, content) {
const history = getConversationHistory(sessionId)
history.push([role, content])

// Keep only the last maxTurns conversations
const maxTurns = parseInt(process.env.HISTORY_MESSAGES) // Adjust this value based on your needs
if (history.length > maxTurns * 2) { // *2 because each turn has user & assistant message
history.splice(0, 2) // Remove oldest turn (user + assistant messages)
}
}

You can find it here: 

All that’s left is to launch to verify my hypotheses

In the project folder, run the following command:

docker compose up –build –no-log-prefix -d

Then connect to the container and launch the application:

docker compose exec golang-expert /bin/bash
node index.js

Nova Chat Agent with streaming completion

Once the application started, it’s time to ask our question to our Golang expert agent in development with the Nova library:

========================================================
Embeddings model: ai/embeddinggemma:latest
Creating embeddings…
Number of documents read from file: 4
Creating the embeddings…
Embeddings created, total of records 4

========================================================
? Your question (hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m): › I need a code snippet of a Golang Nova Chat agent using a stream completion

After a quick similarity search, our agent found two relevant snippets in the vector database. And it was able to provide me with complete, functional, and commented code to answer my request:

Certainly! Below is a complete, working Go code snippet for a Nova Chat agent that uses streaming completion. This example demonstrates how to set up an agent with the necessary configurations and generate a stream completion response.

“`go
package main

import (
"context"
"fmt"

"github.com/snipwise/nova/nova-sdk/agents"
"github.com/snipwise/nova/nova-sdk/agents/chat"
"github.com/snipwise/nova/nova-sdk/messages"
"github.com/snipwise/nova/nova-sdk/messages/roles"
"github.com/snipwise/nova/nova-sdk/models"
)

func main() {

ctx := context.Background()

// Create a new chat agent with streaming capabilities
agent, err := chat.NewAgent(
ctx,
agents.Config{
EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
SystemInstructions: "You are Bob, a helpful AI assistant.",
KeepConversationHistory: true,
},
models.Config{
Name: "ai/qwen2.5:1.5B-F16",
Temperature: models.Float64(0.8),
},
)
if err != nil {
panic(err)
}

// Generate a stream completion response
result, err := agent.GenerateStreamCompletion(
[]messages.Message{
{Role: roles.User, Content: "Who is James T Kirk?"},
},
func(chunk string, finishReason string) error {
// Simple callback that receives strings only
if chunk != "" {
fmt.Print(chunk)
}
if finishReason == "stop" {
fmt.Println()
}
return nil
},
)
if err != nil {
panic(err)
}

// Print the final response and finish reason
fmt.Println("Final Response:", result.Response)
fmt.Println("Finish Reason:", result.FinishReason)
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with chat agents.

2. **Context**: A context is created using `context.Background()` to manage the lifecycle of requests.

3. **Agent Creation**:
– An agent is created using `chat.NewAgent` with specified configurations:
– `EngineURL`: The URL of the engine to use for generating text.
– `SystemInstructions`: Instructions for the AI assistant, such as who Bob is.
– `KeepConversationHistory`: Whether to keep a history of previous messages.

4. **Model Configuration**:
– A model configuration is provided with:
– `Name`: The name of the model to use.
– `Temperature`: Controls the randomness of the output (higher values make responses more random).
– `MaxTokens`: Limits the maximum number of tokens in the response.

5. **Stream Completion Generation**:
– The `GenerateStreamCompletion` method is called with a list of messages and a callback function that processes each chunk of text as it arrives.
– The callback prints chunks to the console until the finish reason is "stop", indicating the completion has been reached.

6. **Output**:
– The final response and finish reason are printed after the stream completes.

This code snippet demonstrates how to set up a streaming chat agent using Nova, allowing you to receive text responses incrementally as they become available.

I then asked other questions.

Nova Structured Agent

This time, I asked for a structured agent: “I need a code snippet of a Nova Structured Agent.”

And the result was conclusive:

Extract from the response:

Certainly! Below is a complete, working example of a Nova Structured Agent in Go. This agent will generate structured data about a specified country based on the user's input.

“`go
package main

import (
"context"
"fmt"

"github.com/snipwise/nova/nova-sdk/agents"
"github.com/snipwise/nova/nova-sdk/agents/structured"
"github.com/snipwise/nova/nova-sdk/messages"
"github.com/snipwise/nova/nova-sdk/messages/roles"
"github.com/snipwise/nova/nova-sdk/models"
)

type Country struct {
Name string `json:"name"`
Capital string `json:"capital"`
Population int `json:"population"`
Languages []string `json:"languages"`
}

func main() {
ctx := context.Background()

// Create a new Structured Agent
agent, err := structured.NewAgent[Country](
ctx,
agents.Config{
EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
SystemInstructions: `
Your name is Bob.
You are an assistant that answers questions about countries around the world.
`,
},
models.Config{
Name: "hf.co/menlo/jan-nano-gguf:q4_k_m",
Temperature: models.Float64(0.0),
},
)
if err != nil {
panic(err)
}

// Generate structured data about Canada
response, finishReason, err := agent.GenerateStructuredData([]messages.Message{
{Role: roles.User, Content: "Tell me about Canada."},
})

if err != nil {
panic(err)
}

// Print the generated country information
fmt.Println("Country Name:", response.Name)
fmt.Println("Capital:", response.Capital)
fmt.Println("Population:", response.Population)
fmt.Println("Languages:", strings.Join(response.Languages, ", "))
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with agents and models.

2. **Country Struct**: A simple struct `Country` is defined to represent the data structure expected by the agent.

3. **Main Function**:
– A context is created using `context.Background()`.
– A new Structured Agent is created for generating structured data about countries.
– The system instructions are set to provide a friendly greeting and instruction on answering questions about countries.
– The model configuration specifies the engine URL, temperature, and other parameters.

4. **GenerateStructuredData Method**: This method generates structured data based on the user's input. It takes an array of messages as input, where each message has a role (e.g., `User`) and content (e.g., "Tell me about Canada").

5. **Error Handling**: The code includes error handling to ensure that any issues during agent creation or data generation are properly caught and reported.

6. **Output**: The generated country information is printed, including the name, capital, population, and languages.

This example demonstrates how to use a Nova Structured Agent to generate structured data about countries based on user input. You can modify the system instructions and model configuration as needed for your specific use case.

Then, I moved on to the next question.

Nova RAG Agent with vector store. Be careful with your similarity settings!

This time, I asked for a “RAG” agent: “I need a snippet of a Nova RAG agent with a vector store.”

And once again, I got a relevant response.

However, when I tried with this question (after restarting the agent to start from a clean base without conversation history): “I need a snippet of a Nova RAG agent.”

The similarity search returned no relevant results (because the words “vector store” were not present in the snippets). And the agent responded with generic code that had nothing to do with Nova or was using code from Nova Chat Agents.

There may be several possible reasons:

The embedding model is not suitable for my use case,

The embedding model is not precise enough,

The splitting of the code snippets file is not optimal (you can add metadata to chunks to improve similarity search, for example, but don’t forget that chunks must not exceed the maximum size that the embedding model can ingest).

In that case, there’s a simple solution that works quite well: you lower the similarity thresholds and/or increase the number of returned similarities. This allows you to have more results to construct the user prompt, but be careful not to exceed the maximum context size of the language model. And you can also do tests with other “bigger” LLMs (more parameters and/or larger context window).

In the latest version of the snippets file, I added a KEYWORDS: … line below the markdown titles to help with similarity search. Which greatly improved the results obtained.

Conclusion

Using “Small Language Models” (SLM) or “Tiny Language Models” (TLM) requires a bit of energy and thought to work around their limitations. But it’s possible to build effective solutions for very specific problems. And once again, always think about the context size for the chat model and how you’ll structure the information for the embedding model. And by combining several specialized “small agents”, you can achieve very interesting results. This will be the subject of future articles.

Learn more

Check out Docker Model Runner

Learn more about Docker Agentic Compose

Read more about embedding in our recent blog Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Quelle: https://blog.docker.com/feed/

OpenCode with Docker Model Runner for Private AI Coding

AI-powered coding assistants are becoming a core part of modern development workflows. At the same time, many teams are increasingly concerned about where their code goes, how it’s processed, and who has access to it.

By combining OpenCode with Docker Model Runner, you can build a powerful AI-assisted coding experience while keeping full control over your data, infrastructure and spend.

This post walks through how to configure OpenCode to use Docker Model Runner and explains why this setup enables a privacy-first and cost-aware approach to AI-assisted development.

What Are OpenCode and Docker Model Runner?

OpenCode is an open-source coding assistant designed to integrate directly into developer workflows. It supports multiple model providers and exposes a flexible configuration system that makes it easy to switch between them.

Docker Model Runner (DMR) allows you to run and manage large language models easily. It exposes an OpenAI-compatible API, making it straightforward to integrate with existing tools that already support OpenAI-style endpoints.

Together, they provide a familiar developer experience backed by models running entirely within infrastructure you control.

Modifying the OpenCode Configuration

OpenCode can be customized using a configuration file that controls how providers and models are defined.

You can define this configuration in one of two places:

Global configuration: ~/.config/opencode/opencode.json

Project-specific configuration: opencode.json in the root of your project

When a project-level configuration is present, it takes precedence over the global one.

Using OpenCode with Docker Model Runner

Docker Model Runner (DMR) exposes an OpenAI-compatible API, which makes integrating it with OpenCode straightforward. To enable this integration, you simply need to update your opencode.json file to point to the DMR server and declare the locally available models.

Assuming Docker Model Runner is running at: http://localhost:12434/v1

your opencode.json configuration could look like this:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1",
},
"models": {
"qwen-coder3": {
"name": "qwen-coder3"
},
"devstral-small-2": {
"name": "devstral-small-2"
}
}
}
}
}

This configuration allows OpenCode to utilize locally hosted models through DMR, providing a powerful and private coding assistant.Note for Docker Desktop users:

If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled. OpenCode connects to Docker Model Runner over HTTP, which requires the TCP port to be exposed:

docker desktop enable model-runner –tcp

Once enabled, Docker Model Runner will be accessible at http://localhost:12434/v1.

Figure 1: Enabling OpenCode to utilize locally hosted models through Docker Model Runner

Figure 2: Models like qwen3-coder, devstral-small-2, gpt-oss are good for coding use cases.

Benefits of using OpenCode with Model Runner

Privacy by Design

Using OpenCode with Docker Model Runner enables a privacy-first approach to AI-assisted development by keeping all model inference within the infrastructure you control.

Docker Model Runner runs models behind an OpenAI-compatible API endpoint. OpenCode sends prompts, source code, and context only to that endpoint, and nowhere else.

This means:

No third-party AI providers are involved

No external data sharing or vendor-side retention

No training on your code by external services

From OpenCode’s perspective, the provider is simply an API endpoint. Where that endpoint runs, on a developer machine, an internal server, or a private cloud, is entirely up to you.

Cost Control

Beyond privacy, running models with Docker Model Runner provides a significant cost advantage over hosted AI APIs.

Cloud-hosted coding assistants, can become expensive very quickly, especially when:

Working with large repositories

Passing long conversational or code context

Running frequent iterative prompts during development

With Docker Model Runner, inference runs on your own hardware. Once the model is pulled, there are no per-token fees, no request-based pricing, and no surprise bills. Teams can scale usage freely without worrying about escalating API costs.

Recommended Models for Coding

When using OpenCode with Docker Model Runner, model choice has a direct impact on both quality and developer experience. While many general-purpose might models work reasonably well, coding-focused models are optimized for long context windows and code-aware reasoning, which is especially important for real-world repositories.

The following models are well suited for use with OpenCode and Docker Model Runner:

qwen3-coder

devstral-small-2

gpt-oss

Each of these models can be served through Docker Model Runner and exposed via its OpenAI-compatible API.

You can pull these models by simply running:

docker model pull qwen3-coder

Pulling Models from Docker Hub and Hugging Face

Docker Model Runner can pull models not only from Docker Hub, but also directly from Hugging Face and automatically convert them into OCI artifacts that can be run and shared like any other Docker model.

For example, you can pull a model directly from Hugging Face with:

docker model pull huggingface.co/unsloth/Ministral-3-14B-Instruct-2512-GGUF

This gives teams access to the broader open model ecosystem without sacrificing consistency or operability.

Context Length Matters

For coding tasks, context length is often more important than raw parameter count. Large repositories, multi-file refactors, and long conversational histories all benefit from being able to pass more context to the model.

By default:

qwen3-coder → 128K context

devstral-small-2 → 128K context

gpt-oss → 4,096 tokens

The difference comes down to model intent.

qwen3-coder and devstral-small-2 are coding-focused models, designed to ingest large amounts of source code, project structure, and related context in a single request. A large default context window is critical for these use cases.

gpt-oss, on the other hand, is a general-purpose model. Its default context size reflects a broader optimization target, where extremely long inputs are less critical than they are for code-centric workflows.

Increasing Context Size for GPT-OSS

If you want to use gpt-oss for coding tasks that benefit from a larger context window, Docker Model Runner makes it easy to repackage the model with an increased context size.

For example, to create a version of gpt-oss with a 128K context window, you can run:

docker model pull gpt-oss # In case it's not pulled
docker model package –from gpt-oss –context-size 128000 gpt-oss:128K

This creates a new model artifact with an expanded context length that can be served by Docker Model Runner like any other model.Once packaged, you can reference this model in your opencode.json configuration:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1"
},
"models": {
"gpt-oss:128K": {
"name": "gpt-oss (128K)"
}
}
}
}
}

Sharing Models Across Your Team

Packaging models as OCI Artifacts has an additional benefit: the resulting model can be pushed to Docker Hub or a private registry.

This allows teams to:

Standardize on specific model variants (including context size)

Share models across developers without local reconfiguration

Ensure consistent behavior across environments

Version and roll back model changes explicitly

Instead of each developer tuning models independently, teams can treat models as first-class artifacts, built once and reused everywhere.

Putting It All Together: Using the Model from the CLI

With Docker Model Runner configured and the gpt-oss:128K model packaged, you can start using it immediately from OpenCode.

This section walks through selecting the model and using it to generate an agents.md file directly inside the Docker Model project.

Step 1: Verify the Model Is Available

First, confirm that the packaged model is available locally:

docker model ls

You should see gpt-oss:128K listed among the available models. If not, make sure the packaging step is completed successfully.

Step 2: Configure OpenCode to Use the Model

Ensure your project’s opencode.json includes the packaged model:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1"
},
"models": {
"gpt-oss": {
"name": "gpt-oss:128K"
}
}
}
}
}

This makes the model available to OpenCode under the dmr provider.

Step 3: Start OpenCode in the Project

From the root of the Docker Model project, start OpenCode:

opencode

Select the model from the list by running:

/models

Figure 3: Selecting gpt-oss model powered by Docker Model Runner in OpenCode

Step 4: Ask OpenCode to Generate agents.md

Once OpenCode is running, prompt the model to generate an agents.md file using the repository as context:

Generate an agents.md file in the project root following the agents.md specification and examples.

Use this repository as context and include sections that help an AI agent work effectively with this project, including:
– Project overview
– Build and test commands
– Code style guidelines
– Testing instructions
– Security considerations

Base the content on the actual structure, tooling, and conventions used in this repository.
Keep the file concise, practical, and actionable for an AI agent contributing to the project.

Because OpenCode is connected to Docker Model Runner, it can safely pass repository structure and relevant files to the model without sending any data outside your infrastructure.

The expanded 128K context window allows the model to reason over a larger portion of the project, resulting in a more accurate and useful agents.md.

Figure 4: The resulting agents.md file

Step 5: Review and Contribute to Docker Model Runner

Once the file is generated:

cat agents.md

Make any necessary adjustments so it accurately reflects the project, then commit it like any other project artifact:

git add agents.md
git commit -m "Add agents documentation"

At this point, you’re ready to open your first Docker Model Runner pull request.

Using OpenCode with Docker Model Runner makes it easy to contribute high-quality documentation and project artifacts, while keeping all model inference and repository context within the infrastructure you control.

How You Can Get Involved

The strength of Docker Model Runner lies in its community and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Docker Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/

Permission-Aware RAG: End-to-End Testing with the SpiceDB Testcontainer

We use GenAI in every facet of technology now – internal knowledge bases, customer support systems, and code review bots, to name just a few use cases. And in nearly every one of these, someone eventually asks:

What stops the model from returning something the user shouldn’t see?”

This is a roadblock that companies building RAG features or AI Agents eventually hit – the moment where an LLM returns data from a document that the user was not authorized to access, introducing potential legal, financial, and reputational risk to all parties. Unfortunately, traditional methods of authorization are not suited for the hierarchical, dynamic nature of access control in RAG. This is exactly where modern authorization permissioning systems such as SpiceDB shine: in building fine-grained authorization for filtering content in your AI-powered applications.

In fact, OpenAI uses SpiceDB to secure 37 Billion documents for 5 Million users who use ChatGPT Connectors – a feature where you bring your data from different sources such as Google Drive, Dropbox, GitHub etc. into ChatGPT.

This blog post shows how you can pair SpiceDB with Testcontainers to give you the ability to test your permission logic inside your RAG pipeline, end-to-end, automatically, with zero infrastructure dependencies.The example repo can be found here.

Quick Primer on Authorization

Before diving into implementation, let’s clarify two foundational concepts: Authentication (verifying who a user is) and Authorization (deciding what they can access).

Authorization is commonly implemented via techniques such as:

Access Control Lists (ACLs)

Role-Based Access Control (RBAC)

Attribute-Based Access Control (ABAC)

However, for complex, dynamic, and context-rich applications like RAG pipelines, traditional methods such as RBAC or ABAC fall short. The new kid on the block – ReBAC (Relationship-Based Access Control) is ideal as it models access as a graph of relationships rather than fixed rules, providing the necessary flexibility and scalability required.

ReBAC was popularized in Google Zanzibar, the internal authorization system Google built to manage permissions across all its products (e.g., Google Docs, Drive). Zanzibar systems are optimized for low-latency, high-throughput authorization checks, and global consistency – requirements that are well-suited for RAG systems.

SpiceDB is the most scalable open-source implementation of Google’s Zanzibar authorization model. It stores access as a relationship graph, where the fundamental check reduces to: 

Is this actor allowed to perform this action on this resource?

For a Google Docs-style example:

definition user {}
definition document {
relation reader: user
relation writer: user

permission read = reader + writer
permission write = writer
}

This schema defines object types (user and document), explicit Relations between the objects (reader, writer), and derived Permissions (read, write). SpiceDB evaluates the relationship graph in microseconds, enabling real-time authorization checks at massive scale.

Access Control for RAG 

RAG (Retrieval-Augmented Generation) is an architectural pattern that enhances Large Language Models (LLMs) by letting them consult an external knowledge base, typically involving a Retriever component finding document chunks and the LLM generating an informed response.

This pattern is now used by businesses and enterprises for apps like chatbots that query sensitive data such as customer playbooks or PII – all stored in a vector database for performance. However, the fundamental risk in this flow is data leakage: the Retriever component ignores permissions, and the LLM will happily summarize unauthorized data. In fact, OWASP has a Top 10 Risks for Large Language Model Applications list which includes Sensitive Information Disclosure, Excessive Agency & Vector and Embedding Weaknesses. The consequences of this leakage can be severe, ranging from loss of customer trust to massive financial and reputational damage from compliance violations.

This setup desperately needs fine-grained authorization, and that’s where SpiceDB comes in. SpiceDB can post-filter retrieved documents by performing real-time authorization checks, ensuring the model only uses data the querying user is permitted to see. The only requirement is that the documents have metadata that indicates where the information came from.But testing this critical permission logic without mocks, manual Docker setup, or flaky Continuous Integration (CI) environments is tricky. Testcontainers provides the perfect solution, allowing you to spin up a real, production-grade, and disposable SpiceDB instance inside your unit tests to deterministically verify that your RAG pipeline respects permissions end-to-end.

Spin Up Real Authorization for Every Test

Instead of mocking your authorization system or manually running it on your workstation, you can add this line of code in your test:

container, _ := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")

And Testcontainers will:

Pull the real SpiceDB image

Start it in a clean, isolated environment

Assign it dynamic ports

Wait for it to be ready

Hand you the gRPC endpoint

Clean up afterwards

Because Testcontainers handles the full lifecycle – from pulling the container, exposing dynamic ports, and tearing it down automatically, you eliminate manual processes such as running Docker commands, and writing cleanup scripts. This isolation ensures that every single test runs with a fresh, clean authorization graph, preventing data conflicts, and making your permission tests completely reproducible in your IDE and across parallel Continuous Integration (CI) builds.

Suddenly you have a real, production-grade, Zanzibar-style permissions engine inside your unit test. 

Using SpiceDB & Testcontainers

Here’s a walkthrough of how you can achieve end-to-end permissions testing using SpiceDB and Testcontainers. The source code for this tutorial can be found here.

1. Testing Our RAG 

For the sake of simplicity, we have a minimal RAG and the retrieval mechanism is trivial too. 

We’re going to test three documents which have doc_ids (doc1 doc2 ..) that act as metadata. 

doc1: Internal roadmap

doc2: Customer playbook

doc3: Public FAQ

And three users:

Emilia owns doc1

Beatrice can view doc2

Charlie (or anyone) can view doc3

This SpiceDB schema defines a user and a document object type. A user has read permission on a document if they are the direct viewer or the owner of the document.

definition user {}

definition document {
relation owner: user
relation viewer: user | owner
permission read = owner + viewer
}

2. Starting the Testcontainer 

Here’s how a line of code can start a test to launch the disposable SpiceDB instance:

container, err := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")
require.NoError(t, err)

Next, we connect to the running containerized service:

host, _ := container.Host(ctx)
port, _ := container.MappedPort(ctx, "50051/tcp")
endpoint := fmt.Sprintf("%s:%s", host, port.Port())

client, err := authzed.NewClient(
endpoint,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpcutil.WithInsecureBearerToken("somepresharedkey"),
)

This is now a fully-functional SpiceDB instance running inside your test runner.

3. Load the Schema + Test Data

The test seeds data the same way your application would:

_, err := client.WriteSchema(ctx, &apiv1.WriteSchemaRequest{Schema: schema})
require.NoError(t, err)

Then:

rel("document", "doc1", "owner", "user", "emilia")
rel("document", "doc2", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "emilia")
rel("document", "doc3", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "charlie")

We now have a predictable, reproducible authorization graph for every test run.

4. Post-Filtering With SpiceDB

Before the LLM sees anything, we check permissions with SpiceDB which acts as the source of truth of the permissions in the documents.

resp, err := r.spiceClient.CheckPermission(ctx, &apiv1.CheckPermissionRequest{
Resource: docObject,
Permission: "read",
Subject: userSubject,
})

If SpiceDB says no, the doc is never fed into the LLM, thereby ensuring the user gets an answer to their query only based on what they have permissions to read.

This avoids:

Accidental data leakage

Overly permissive vector search

Compliance problems

Traditional access controls break down when data becomes embeddings hence having guardrails prevents this from happening. 

End-to-End Permission Checks in a Single Test

Here’s what the full test asserts:

Emilia queries “roadmap” → gets doc1 Because they’re the owner.

Beatrice queries “playbook” → gets doc2 Because she’s a viewer.

Charlie queries “public” → gets doc3 Because it’s the only doc he can read, as it’s a public doc

If there is a single failing permission rule, the end-to-end test will immediately fail, which is critical given the constant changes in RAG pipelines (such as new retrieval modes, embeddings, document types, or permission rules). 

What If Your RAG Pipeline Isn’t in Go?

First, a shoutout to Guillermo Mariscal for his original contribution to the SpiceDB Go Testcontainers module. 

What if your RAG pipeline is written in a different language such as Python? Not to worry, there’s also a community Testcontainers module written in Python that you can use similarly. The module can be found here.

Typically, you would integrate it in your integration tests like this:

# Your RAG pipeline test
def test_rag_pipeline_respects_permissions():
with SpiceDBContainer() as spicedb:
# Set up permissions schema
client = create_spicedb_client(
spicedb.get_endpoint(),
spicedb.get_secret_key()
)

# Load your permissions model
client.WriteSchema(your_document_permission_schema)

# Write test relationships
# User A can access Doc 1
# User B can access Doc 2

# Test RAG pipeline with User A
results = rag_pipeline.search(query="…", user="A")
assert "Doc 1" in results
assert "Doc 2" not in results # Should be filtered out!

Similar to the Go module, this container gives you a clean, isolated SpiceDB instance for every test run.

Why This Approach Matters

Authorization testing in RAG pipelines can be tricky, given the scale and latency requirement and it can get trickier in systems handling sensitive data. By integrating the flexibility and scale of SpiceDB with the automated, isolated environments of Testcontainers, you shift to a completely reliable, deterministic approach to authorization. 

Every time your code ships, a fresh, production-grade authorization engine is spun up, loaded with test data, and torn down cleanly, guaranteeing zero drift between your development machine and CI. This pattern can ensure that your RAG system is safe, correct, and permission-aware as it scales from three documents to millions.

Try It Yourself

The complete working example in Go along with a sample RAG pipeline is here:https://github.com/sohanmaheshwar/spicedb-testcontainer-ragClone it.Run go test -v.Watch it spin up a fresh SpiceDB instance, load permissions, and assert RAG behavior.Also, find the community modules for the SpiceDB testcontainer in Go and Python.

Quelle: https://blog.docker.com/feed/