How to Automate Arm Migration with Docker MCP Toolkit, VS Code, and GitHub Copilot

This post is a collaboration between Docker and Arm, demonstrating how Docker MCP Toolkit and the Arm MCP Server work together to simplify architecture migrations.

Moving workloads from x86 to Arm64 architecture has become increasingly important. Organizations seek to reduce cloud costs and improve performance. AWS Graviton, Azure Cobalt, and Google Cloud Axion have made Arm-based computing mainstream, promising 20-40% cost savings and better performance for many workloads.

But here’s the challenge: How do you migrate your applications to Arm without breaking things?

Traditional migration approaches require:

Manual code analysis for x86-specific dependencies

Tedious compatibility checks across multiple tools

Manual performance evaluation

What if you could orchestrate the entire Arm migration workflow from a single interface? Docker MCP Toolkit makes this possible. 

By connecting specialized Arm migration tools directly to GitHub Copilot, you can automate compatibility analysis, intrinsic conversion, and performance prediction—all through natural conversation in VS Code.

Here’s what that looks like in practice: You ask GitHub Copilot to migrate your legacy C++ application to Arm64. Copilot doesn’t just tell you what needs changing—it actually executes: scanning your code for x86 intrinsics, converting x86 SIMD intrinsics to Arm SIMD intrinsics, updating your Dockerfile, predicting Arm performance improvements, and creating a pull request with all changes. All through natural conversation in VS Code. No manual porting. No up-front architecture expertise required.

If you have questions about any step in the process, you can directly ask Copilot, which will invoke the Arm MCP Server knowledge base tool. The knowledge base has information pulled directly from all Learning Paths on learn.arm.com, as well as knowledge of all Arm intrinsics, and will both summarize that information for you as well as provide links to the concrete documentation that you can peruse yourself. 

Now you might ask – “Can’t I just rebuild my Docker image for Arm64?” True, for most applications. But when you hit that one legacy app with hand-optimized x86 assembly, AVX2 intrinsics, or architecture-specific compiler flags? That’s when Docker MCP Toolkit with the Arm MCP Server becomes essential.

By the end of this guide, you’ll migrate a real-world legacy application—a matrix multiplication benchmark written with AVX2 intrinsics for x86—to Arm64 automatically using GitHub Copilot and Docker MCP Toolkit.

What normally takes 5-7 hours of manual work will take you about 25 to 30 minutes.

The Arm Migration Challenge

Let me show you exactly what we’re solving. Consider a matrix multiplication benchmark originally written for x86-64 with AVX2 optimizations—the kind of code that makes Arm migration painful.

Here’s a Dockerfile that will cause problems when trying to migrate to Graviton:

FROM centos:6

# CentOS 6 reached EOL, need to use vault mirrors
RUN sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-Base.repo &&
sed -i 's|^#baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-Base.repo

# Install EPEL repository (required for some development tools)
RUN yum install -y epel-release &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/epel.repo &&
sed -i 's|^#baseurl=http://download.fedoraproject.org/pub/epel|baseurl=http://archives.fedoraproject.org/pub/archive/epel|g' /etc/yum.repos.d/epel.repo

# Install Developer Toolset 2 for better C++11 support (GCC 4.8)
RUN yum install -y centos-release-scl &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo &&
sed -i 's|^mirrorlist=|#mirrorlist=|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo &&
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl.repo &&
sed -i 's|^# baseurl=http://mirror.centos.org|baseurl=http://vault.centos.org|g' /etc/yum.repos.d/CentOS-SCLo-scl-rh.repo

# Install build tools
RUN yum install -y
devtoolset-2-gcc
devtoolset-2-gcc-c++
devtoolset-2-binutils
make
&& yum clean all

WORKDIR /app
COPY *.h *.cpp ./

# AVX2 intrinsics are used in the code
RUN scl enable devtoolset-2 "g++ -O2 -mavx2 -o benchmark
main.cpp
matrix_operations.cpp
-std=c++11"

CMD ["./benchmark"]

Now you might ask why this won’t work on Arm? Looking at this Dockerfile, there are two immediate blockers for Graviton migration:

No Arm64 support in base image – The centos:6 image was built for x86 only, so this container won’t even start on Arm hardware.

x86-specific compiler flag – The -mavx2 flag tells the compiler to use AVX2 vector instructions, which don’t exist on Arm processors.

Even experienced developers miss these issues in larger codebases.

The source code uses AVX2 intrinsics for vectorized operations:

#include "matrix_operations.h"
#include <iostream>
#include <random>
#include <chrono>
#include <stdexcept>
#include <immintrin.h> // AVX2 intrinsics

Matrix::Matrix(size_t r, size_t c) : rows(r), cols(c) {
data.resize(rows, std::vector<double>(cols, 0.0));
}

void Matrix::randomize() {
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<> dis(0.0, 10.0);

for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < cols; j++) {
data[i][j] = dis(gen);
}
}
}

Matrix Matrix::multiply(const Matrix& other) const {
if (cols != other.rows) {
throw std::runtime_error("Invalid matrix dimensions for multiplication");
}

Matrix result(rows, other.cols);

// x86-64 optimized using AVX2 for double-precision
for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < other.cols; j++) {
__m256d sum_vec = _mm256_setzero_pd();
size_t k = 0;

// Process 4 elements at a time with AVX2
for (; k + 3 < cols; k += 4) {
__m256d a_vec = _mm256_loadu_pd(&data[i][k]);
__m256d b_vec = _mm256_set_pd(
other.data[k+3][j],
other.data[k+2][j],
other.data[k+1][j],
other.data[k][j]
);
sum_vec = _mm256_add_pd(sum_vec, _mm256_mul_pd(a_vec, b_vec));
}

// Horizontal add using AVX
__m128d sum_high = _mm256_extractf128_pd(sum_vec, 1);
__m128d sum_low = _mm256_castpd256_pd128(sum_vec);
__m128d sum_128 = _mm_add_pd(sum_low, sum_high);

double sum_arr[2];
_mm_storeu_pd(sum_arr, sum_128);
double sum = sum_arr[0] + sum_arr[1];

// Handle remaining elements
for (; k < cols; k++) {
sum += data[i][k] * other.data[k][j];
}

result.data[i][j] = sum;
}
}

return result;
}

double Matrix::sum() const {
double total = 0.0;
for (size_t i = 0; i < rows; i++) {
for (size_t j = 0; j < cols; j++) {
total += data[i][j];
}
}
return total;
}

void benchmark_matrix_ops() {
std::cout << "n=== Matrix Multiplication Benchmark ===" << std::endl;

const size_t size = 200;
Matrix a(size, size);
Matrix b(size, size);

a.randomize();
b.randomize();

auto start = std::chrono::high_resolution_clock::now();
Matrix c = a.multiply(b);
auto end = std::chrono::high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end – start);

std::cout << "Matrix size: " << size << "x" << size << std::endl;
std::cout << "Time: " << duration.count() << " ms" << std::endl;
std::cout << "Result sum: " << c.sum() << std::endl;
}

If you look at the following code, you might find that this code is heavily optimized for Intel/AMD x86 processors and won’t work on Arm.

x86-exclusive header – #include <immintrin.h> only exists on x86 systems. Arm uses <arm_neon.h> instead.

AVX2 intrinsics throughout – Every _mm256_* function is Intel-specific:

_mm256_setzero_pd() – Creates a 256-bit zero vector (Arm NEON is 128-bit)

_mm256_loadu_pd() – Loads 4 doubles at once (NEON loads 2)

_mm256_set_pd() – Sets 4 doubles (no direct NEON equivalent)

_mm256_add_pd() / _mm256_mul_pd() – 256-bit operations (NEON uses 128-bit)

_mm256_extractf128_pd() – Extracts high 128 bits (not needed on NEON)

Vector width mismatch – AVX2 processes 4 doubles per operation, while Arm NEON processes 2. The entire loop structure needs adjustment. (SVE/SVE2 on newer Arm cores (Neoverse V1/V2, Graviton 3/4) provides 256-bit or wider vector-length agnostic (VLA) registers, matching or exceeding AVX2 registers.)

Horizontal reduction logic – The horizontal add pattern using _mm256_extractf128_pd and _mm256_castpd256_pd128 is x86-specific and must be completely rewritten for Arm SIMD.

Manual conversion requires rewriting 30+ lines of intrinsic code, adjusting loop strides, and testing numerical accuracy. This is exactly where automated migration tools become essential.

Each of these issues blocks Arm migration in different ways. Manual migration requires not just converting intrinsics, but also modernizing the entire build infrastructure, finding Arm equivalents, and validating performance. For any substantial codebase, this becomes prohibitively expensive.

What GitHub Copilot Can and Can’t Do Without Arm MCP

Let’s be clear about what changes when you add the Arm MCP Server to Docker MCP Toolkit.

Without Arm MCP

You ask GitHub Copilot to migrate your C++ application from x86 to Arm64. Copilot responds with general advice: “Convert AVX2 intrinsics to NEON”, “Update your Dockerfile to use ARM64 base image”, “Change compiler flags”. Then you must manually research NEON equivalents, rewrite hundreds of lines of intrinsic code, update the Dockerfile yourself, hope you got the conversion right, and spend hours debugging compilation errors.

Yes, Copilot can write code. But without specialized tools, it’s guessing based on training data—not using concrete knowledge base documentation or using purpose-built tools to analyze your actual application architecture.

With Arm MCP + Docker MCP Toolkit

You ask GitHub Copilot the same thing. Within minutes, it:

Uses check_image tool to verify your base image supports ARM64

Runs migrate_ease_scan on your actual codebase to find x86-specific code

Uses knowledge_base_search to find correct Arm SIMD equivalents for every x86 intrinsic

Converts your code with architecture-specific accuracy

Updates your Dockerfile with Arm-compatible base images

Creates a pull request with all changes.

Real code gets scanned. Real intrinsics get converted. Real pull requests appear in your repository. Close VS Code, come back tomorrow, and the migration is ready to test, complete with documentation explaining every change.

The difference? Docker MCP Toolkit gives GitHub Copilot access to actual Arm migration tooling, not just general knowledge about Arm architecture.

Why This Is Different from Manual Migration

You could manually use Arm migration tools: install utilities locally, run checks, research intrinsics, update code. Here’s what that process looks like:

Manual process:

Install Arm migration tools (15 minutes)

Run compatibility scans (5 minutes)

Research each x86 intrinsic equivalent (30 minutes per intrinsic)

Manually rewrite code (2-3 hours)

Update Dockerfile (15 minutes)

Fix compilation errors (1-2 hours)

Document changes (30 minutes)

Total: 5-7 hours per application

With Docker MCP Toolkit + Arm MCP:

Ask GitHub Copilot to migrate (20 minutes)

Review and approve changes (10-20 minutes)

Merge pull request

Total: 30-40 minutes per application

Setting Up Visual Studio Code with Docker MCP Toolkit

Prerequisites

Before you begin, make sure you have:

A machine with 8 GB RAM minimum (16GB recommended)

The latest Docker Desktop release

VS Code with GitHub Copilot extension

GitHub account with personal access token

Step 1. Enable Docker MCP Toolkit

Open Docker Desktop and enable the MCP Toolkit from Settings.

To enable:

Open Docker Desktop

Go to Settings → Beta Features

Toggle Docker MCP Toolkit ON

Click Apply

Caption: Enabling Docker MCP Toolkit under Docker Desktop 

Add Required MCP Servers from CatalogAdd Arm, Sequential Thinking and GitHub Official by following the links below, or by selecting “Catalog” in the Docker Desktop MCP toolkit:

Arm MCP Server – Arm migration tools and architecture expertise

GitHub MCP Server – Repository operations and pull request management

Sequential Thinking MCP Server – Complex problem decomposition and planning

Caption: Searching for Arm MCP Server in the Docker MCP Catalog

Step 2. Configure the Servers

Configure the Arm MCP Server

To access your local code for the migrate-ease scan and MCA tools, the Arm MCP Server needs a directory configured to point to your local code.

Caption: Arm MCP Server configuration

Once you click ‘Save’, the Arm MCP Server will know where to look for your code. If you want to give a different directory access in the future, you’ll need to change this path.

Available Arm Migration Tools

Click Tools to view all the six MCP tools available under Arm MCP Server.

Caption: List of MCP tools provided by the Arm MCP Server

knowledge_base_search – Semantic search of Arm learning resources, intrinsics documentation, and software compatibility

migrate_ease_scan – Code scanner supporting C++, Python, Go, JavaScript, and Java for Arm compatibility analysis

check_image – Docker image architecture verification (checks if images support Arm64)

skopeo – Remote container image inspection without downloading

mca – Machine Code Analyzer for assembly performance analysis and IPC predictions

sysreport_instructions – System architecture information gathering

Configure GitHub MCP Server

The GitHub MCP Server lets GitHub Copilot create pull requests, manage issues, and commit changes.

Caption: Steps to configure GitHub Official MCP Server

Configure Authentication:

Select GitHub official

Choose your preferred authentication method 

For Personal Access Token, you’ll need to get the token from GitHub > Settings > Developer Settings

Caption: Setting up Personal Access Token in GitHub MCP Server

Configure Sequential Thinking MCP Server

Click “Sequential Thinking”

No configuration needed

Caption: Sequential MCP Server requires zero configuration

This server helps GitHub Copilot break down complex Arm migration decisions into logical steps.

Step 3. Add the Servers to VS Code

The Docker MCP Toolkit makes it incredibly easy to configure MCP servers for clients like VS Code.

To configure, click “Clients” and scroll down to Visual Studio Code. Click the “Connect” button:

Caption: Setting up Visual Studio Code as MCP Client

Now open VS Code and click on the ‘Extensions’ icon in the left toolbar:

Caption: Configuring MCP_DOCKER under VS Code Extensions

Click the MCP_DOCKER gear, and click ‘Start Server’:

Caption: Starting MCP Server under VS Code

Now you’re ready to perform an Arm migration!

Step 4. Verify Connection

Open GitHub Copilot Chat in VS Code and ask:

What Arm migration tools do you have access to?

You should see tools from all three servers listed. If you see them, your connection works. Let’s migrate some code.

Caption: Playing around with GitHub Co-Pilot

Real-World Demo: Migrating a Legacy x86 Application

Now that you’ve connected GitHub Copilot to Docker MCP Toolkit, let’s migrate that matrix multiplication benchmark we looked at earlier.

Time to migrate: 20 minutesInfrastructure: $0 (all runs in Docker containers)Prerequisites: The code we showed earlier in this post

The Workflow

Docker MCP Toolkit orchestrates the migration through a secure MCP Gateway that routes requests to specialized tools: the Arm MCP Server scans code and converts intrinsics, GitHub MCP Server creates pull requests, and Sequential Thinking plans multi-step migrations. Each tool runs in an isolated Docker container: secure, reproducible, and under your control.

Step 1. Clone the repo

git clone https://github.com/JoeStech/docker-blog-arm-migration

Give GitHub Copilot Migration Instructions

Open your project in VS Code. In GitHub Copilot Chat, paste this prompt:

Your goal is to migrate this codebase from x86 to Arm64. Use the Arm MCP Server tools to help you with this migration.

Steps to follow:
1. Check all Dockerfiles – use check_image and/or skopeo tools to verify Arm compatibility, changing the base image if necessary
2. Scan the codebase – run migrate_ease_scan with the appropriate language scanner and apply the suggested changes
3. Use knowledge_base_search when you need Arm architecture guidance or intrinsic equivalents
4. Update compiler flags and dependencies for Arm64 compatibility
5. **Create a pull request with all changes using GitHub MCP Server**

Important notes:
– Your current working directory is mapped to /workspace on the MCP server
– NEON lane indices must be compile-time constants, not variables
– If you're unsure about Arm equivalents, use knowledge_base_search to find documentation
– Be sure to find out from the user or system what the target machine is, and use the appropriate intrinsics. For instance, if neoverse (Graviton, Axion, Cobalt) is targeted, use the latest SME/SME2.

**After completing the migration:**
– Create a pull request with a detailed description of changes
– Include performance predictions and cost savings in the PR description
– List all tools used and validation steps needed

Step 2. Watch Docker MCP Toolkit Execute

GitHub Copilot orchestrates the migration using Docker MCP Toolkit. Here’s what happens:

Phase 1: Image Analysis

GitHub Copilot starts by analyzing the Dockerfile’s base image using the Arm MCP Server’s skopeo tool.

Caption: GitHub Copilot uses the skopeo tool from the Arm MCP Server to analyze the centos:6 base image. The tool reports that this image has no arm64 build available. This is the first blocker identified – the container won’t even start on Arm hardware.

This immediately identifies that CentOS 6 has no Arm64 builds and must be replaced.

Phase 2: Code Analysis

Next, Copilot runs the migrate_ease_scan tool with the C++ scanner on the codebase.

Caption: The migrate_ease_scan tool analyzes the C++ source code and detects AVX2 intrinsics, the -mavx2 compiler flag, and x86-specific headers. This automated scan identifies all architecture-dependent code that requires conversion – work that could take hours to find manually.

The scan results show exactly what needs to change for Arm compatibility. Each detected issue includes the file location, line number, and specific code that requires modification. This precision eliminates guesswork and ensures nothing is missed.

Phase 3: Arm Optimization and Best Practices

Forx86 intrinsics found in Phase 2, Copilot queries the Arm MCP Server’s knowledge base for Arm equivalents, if needed. It then makes replacements as necessary.

Caption: GitHub Copilot uses the knowledge_base_search tool to find Arm NEON equivalents for each AVX2 intrinsic.

The tool returns official Arm documentation showing the conversions: _mm256_loadu_pd() becomes vld1q_f64(), _mm256_add_pd() becomes vaddq_f64(), and so on. This knowledge comes from learn.arm.com learning paths and intrinsic documentation.

The knowledge base provides not just the conversion mappings, but also architectural context: AVX2’s 256-bit vectors vs NEON’s 128-bit vectors, which means loop adjustments are needed. Copilot uses this information to rewrite the matrix multiplication code correctly.

Phase 4: Create the GitHub PR and Summarize

After completing the migration, Copilot creates a PR in GitHub and summarizes the changes made.

The changes are substantial: 

Replaced centos:6 → ubuntu:22.04, added TARGETARCH for multi-arch builds

Added Arm64 detection and -march=armv8-a+simd compiler flag

Converted AVX2 → NEON intrinsics with architecture guards

The build is now simpler, modern, and Arm-compatible.

Phase 5: Checking the Pull Request

You can verify the Pull Request by visiting https://github.com/JoeStech/docker-blog-arm-migration/pull/1/

To verify performance, you can build and run the benchmark:

docker buildx build –platform linux/arm64 -t benchmark:arm64 . –load

docker run –rm benchmark:arm64

Which should output:

SIMD Matrix Operations Benchmark
================================
Running on Arm64 architecture with NEON optimizations
=== Matrix Multiplication Benchmark ===
Matrix size: 200×200
Time: 17 ms
Result sum: 1.98888e+08

Caveats

A very important thing to remember is that not all models will provide equal results, and while the Arm MCP Server provides deterministic context, the models themselves are stochastic. Always use a flagship latest-generation model to get the best results, and test any guesses the model makes regarding performance improvement.

How Docker MCP Toolkit Changes Development

Docker MCP Toolkit changes how developers interact with specialized knowledge and capabilities. Rather than learning new tools, installing dependencies, or managing credentials, developers connect their AI assistant once and immediately access containerized expertise.

The benefits extend beyond Arm migration:

Consistency – Same tools, same results across all developers

Security – Containerized isolation prevents tool interference

Version Control – MCP server versions tracked with application code

Reproducibility – Migrations behave identically across environments

Discoverability – Docker MCP Catalog makes finding the right server straightforward

Most importantly, developers remain in their existing workflow. VS Code. GitHub Copilot. Git. No context switching to external tools or dashboards.

Wrapping Up

You’ve just automated Arm64 migration using Docker MCP Toolkit, the Arm MCP Server, and GitHub Copilot. What used to require architecture expertise, manual intrinsic conversion, and hours of debugging now happens through natural conversation, safely executed in Docker containers.

Ready to try it? Open Docker Desktop and explore the MCP Catalog. Start with the Arm MCP Server, add GitHub, experiment with Sequential Thinking. Each server unlocks new capabilities.

The future of migration isn’t manually porting every application. It’s having an AI assistant that can execute tasks across your entire stack securely, reproducibly, and at the speed of thought.

Learn More

New to Docker? Download Docker Desktop

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Get Started with MCP Toolkit: Official Documentation

Quelle: https://blog.docker.com/feed/

Your Dependencies Don’t Care About Your FIPS Configuration

FIPS compliance is a great idea that makes the entire software supply chain safer. But teams adopting FIPS-enabled container images are running into strange errors that can be challenging to debug. What they are learning is that correctness at the base image layer does not guarantee compatibility across the ecosystem. Change is complicated, and changing complicated systems with intricate dependency webs often yields surprises. We are in the early adaptation phase of FIPS, and that actually provides interesting opportunities to optimize how things work. Teams that recognize this will rethink how they build FIPS and get ahead of the game.

FIPS in practice

FIPS is a U.S. government standard for cryptography. In simple terms, if you say a system is “FIPS compliant,” that means the cryptographic operations like TLS, hashing, signatures, and random number generation are performed using a specific, validated crypto module in an approved mode. That sounds straightforward until you remember that modern software is built not as one compiled program, but as a web of dependencies that carry their own baggage and quirks.

The FIPS crypto error that caught us off guard

We got a ticket recently for a Rails application in a FIPS-enabled container image. On the surface, everything looked right. Ruby was built to use OpenSSL 3.x with the FIPS provider. The OpenSSL configuration was correct. FIPS mode was active.

However, the application started throwing cryptography module errors from the Postgres Rubygem module. Even more confusing, a minimal reproducer of a basic Ruby app and a stock postgres did not reproduce the error and a connection was successfully established. The issue only manifested when using ActiveRecord.

The difference came down to code paths. A basic Ruby script using the pg gem directly exercises a simpler set of operations. ActiveRecord triggers additional functionality that exercises different parts of libpq. The non-FIPS crypto was there all along, but only certain operations exposed it.

Your container image can be carefully configured for FIPS, and your application can still end up using non-FIPS crypto because a dependency brought its own crypto along for the ride. In this case, the culprit was a precompiled native artifact associated with the database stack. When you install pg, Bundler may choose to download a prebuilt binary dependency such as libpq.

Unfortunately those prebuilt binaries are usually built with assumptions that cause problems. They may be linked against a different OpenSSL than the one in your image. They may contain statically embedded crypto code. They may load crypto at runtime in a way that is not obvious.

This is the core challenge with FIPS adoption. Your base image can do everything right, but prebuilt dependencies can silently bypass your carefully configured crypto boundary.

Why we cannot just fix it in the base image yet

The practical fix for the Ruby case was adding this to your Gemfile.

gem "pg", "~> 1.1", force_ruby_platform: true

You also need to install libpq-dev to allow compiling from source. This forces Bundler to build the gem from source on your system instead of using a prebuilt binary. When you compile from source inside your controlled build environment, the resulting native extension is linked against the OpenSSL that is actually in your FIPS image.

Bundler also supports an environment/config knob for the same idea called BUNDLE_FORCE_RUBY_PLATFORM. The exact mechanism matters less than the underlying strategy of avoiding prebuilt native artifacts when you are trying to enforce a crypto boundary.

You might reasonably ask why we do not just add BUNDLE_FORCE_RUBY_PLATFORM to the Ruby FIPS image by default. We discussed this internally, and the answer illustrates why FIPS complexity cascades.

Setting that flag globally is not enough on its own. You also need a C compiler and the relevant libraries and headers in the build stage. And not every gem needs this treatment. If you flip the switch globally, you end up compiling every native gem from source, which drags in additional headers and system libraries that you now need to provide. The “simple fix” creates a new dependency management problem.

Teams adopt FIPS images to satisfy compliance. Then they have to add back build complexity to make the crypto boundary real and verify that every dependency respects it. This is not a flaw in FIPS or in the tooling. It is an inherent consequence of retrofitting a strict cryptographic boundary onto an ecosystem built around convenience and precompiled artifacts.

The patterns we are documenting today will become the defaults tomorrow. The tooling will catch up. Prebuilt packages will get better. Build systems will learn to handle the edge cases. But right now, teams need to understand where the pitfalls are.

What to do if you are starting a FIPS journey

You do not need to become a crypto expert to avoid the obvious traps. You only need a checklist mindset. The teams working through these problems now are building real expertise that will be valuable as FIPS requirements expand across industries.

Treat prebuilt native dependencies as suspect. If a dependency includes compiled code, assume it might carry its own crypto linkage until you verify otherwise. You can use ldd on Linux to inspect dynamic linking and confirm that binaries link against your system OpenSSL rather than a bundled alternative.

Use a multi-stage build and compile where it matters. Keep your runtime image slim, but allow a builder stage with the compiler and headers needed to compile the few native pieces that must align with your FIPS OpenSSL.

Test the real execution path, not just “it starts.” For Rails, that means running a query, not only booting the app or opening a connection. The failures we saw appeared when using the ORM, not on first connection.

Budget for supply-chain debugging. The hard part is not turning on FIPS mode. The hard part is making sure all the moving parts actually respect it. Expect to spend time tracing crypto usage through your dependency graph.

Why this matters beyond government contracts

FIPS compliance has traditionally been seen as a checkbox for federal sales. That is changing. As supply chain security becomes a board-level concern across industries, validated cryptography is moving from “nice to have” to “expected.” The skills teams build solving FIPS problems today translate directly to broader supply chain security challenges.

Think about what you learn when you debug a FIPS failure. You learn to trace crypto usage through your dependency graph, to question prebuilt artifacts, to verify that your security boundaries are actually enforced at runtime. Those skills matter whether you are chasing a FedRAMP certification or just trying to answer your CISO’s questions about software provenance.

The opportunity in the complexity

FIPS is not “just a switch” you flip in a base image. View FIPS instead as a new layer of complexity that you might have to debug across your dependency graph. That can sound like bad news, but switch the framing and it becomes an opportunity to get ahead of where the industry is going.

The ecosystem will adapt and the tooling will improve. The teams investing in understanding these problems now will be the ones who can move fastest when FIPS or something like it becomes table stakes.

If you are planning a FIPS rollout, start by controlling the prebuilt native artifacts that quietly bypass the crypto module you thought you were using. Recognize that every problem you solve is building institutional knowledge that compounds over time. This is not just compliance work. It is an investment in your team’s security engineering capability.

Quelle: https://blog.docker.com/feed/

Using MCP Servers: From Quick Tools to Multi-Agent Systems

Model Context Protocol (MCP) servers are a spec for exposing tools, models, or services to language models through a common interface. Think of them as smart adapters: they sit between a tool and the LLM, speaking a predictable protocol that lets the model interact with things like APIs, databases, and agents without needing to know implementation details.

But like most good ideas, the devil’s in the details.

The Promise—and the Problems of Running MCP Servers

Running an MCP sounds simple: spin up a Python or Node server that exposes your tool. Done, right? Not quite.

You run into problems fast:

Runtime friction: If an MCP is written in Python, your environment needs Python (plus dependencies, plus maybe a virtualenv strategy, plus maybe GPU drivers). Same goes for Node. This multiplies fast when you’re managing many MCPs or deploying them across teams.

Secrets management: MCPs often need credentials (API keys, tokens, etc.). You need a secure way to store and inject those secrets into your MCP runtime. That gets tricky when different teams, tools, or clouds are involved.

N×N integration pain: Let’s say you’ve got three clients that want to consume MCPs, and five MCPs to serve up. Now you’re looking at 15 individual integrations. No thanks.

To make MCPs practical, you need to solve these three core problems: runtime complexity, secret injection, and client-to-server wiring. 

If you’re wondering where I’m going with all this, take a look at those problems. We already have a technology that has been used by developers for over a decade that helps solve them: Docker containers.

In the rest of this blog I’ll walk through three different approaches, going from least complex to most complex, for integrating MCP servers into your developer experience. 

Option 1 — Docker MCP Toolkit & Catalog

For the developer who already uses containers and wants a low-friction way to start with MCP.

If you’re already comfortable with Docker but just getting your feet wet with MCP, this is the sweet spot. In the raw MCP world, you’d clone Python/Node servers, manage runtimes, inject secrets yourself, and hand-wire connections to every client. That’s exactly the pain Docker’s MCP ecosystem set out to solve.

Docker’s MCP Catalog is a curated, containerized registry of MCP servers. Each entry is a prebuilt container with everything you need to run the MCP server. 

The MCP Toolkit (available via Docker Desktop) is your control panel: search the catalog, launch servers with secure defaults, and connect them to clients.

How it helps:

No language runtimes to install

Built-in secrets management

One-click enablement via Docker Desktop

Easily wire the MCPs to your existing agents (Claude Desktop, Copilot in VS Code, etc)

Centralized access via the MCP Gateway

Figure 1: Docker MCP Catalog: Browse hundreds of MCP servers with filters for local or remote and clear distinctions between official and community servers

A Note on the MCP GatewayOne important piece working behind the scenes in both the MCP Toolkit and cagent (a framework for easily building multi-agent applications that we cover below) is the MCP Gateway, an open-source project from Docker that acts as a centralized frontend for all your MCP servers. Whether you’re using a GUI to start containers or defining agents in YAML, the Gateway handles all the routing, authentication, and translation between clients and tools. It also exposes a single endpoint that custom apps or agent frameworks can call directly, making it a clean bridge between GUI-based workflows and programmatic agent development.

Moving on: Using MCP servers alongside existing AI agents is often the first step for many developers. You wire up a couple tools, maybe connect to a calendar or a search API, and use them in something like Claude, ChatGPT, or a small custom agent. For step-by-step tutorials on how to automate dev workflows with Docker’s MCP Catalog and Toolkit with popular clients, check out these guides on ChatGPT, Claude Desktop,Codex, Gemini CLI, and Claude Code. Once that pattern clicks, the next logical step is to use those same MCP servers as tools inside a multi-agent system.

Option 2 — cagent: Declarative Multi-Agent Apps

For the developer who wants to build custom multi-agent applications but isn’t steeped in traditional agentic frameworks.

If you’re past simple MCP servers and want agents that can delegate, coordinate, and reason together, cagent is your next step. It’s Docker’s open-source, YAML-first framework for defining and running multi-agent systems—without needing to dive into complex agent SDKs or LLM loop logic.

Cagent lets you describe:

The agents themselves (model, role, instructions)

Who delegates to whom

What tools each agent can access (via MCP or local capabilities)

Below is an example of a pirate flavored chat bot:

agents:
root:
description: An agent that talks like a pirate
instruction: Always answer by talking like a pirate.
welcome_message: |
Ahoy! I be yer pirate guide, ready to set sail on the seas o' knowledge! What be yer quest?
model: auto

cagent run agents.yaml

You don’t write orchestration code. You describe what you want, and Cagent runs the system.

Why it works:

Tools are scoped per agent

Delegation is explicit

Uses MCP Gateway behind the scene

Ideal for building agent systems without writing Python

If you’d like to give cagent a try, we have a ton of examples in the project’s GitHub repository. Check out this guide on building multi-agent systems in 5 minutes. 

Option 3 — Traditional Agent Frameworks (LangGraph, CrewAI, ADK)

For developers building complex, custom, fully programmatic agent systems.

Traditional agent frameworks like LangGraph, CrewAI, or Google’s Agent Development Kit (ADK) let you define, control, and orchestrate agent behavior directly in code. You get full control over logic, state, memory, tools, and workflows.

They shine when you need:

Complex branching logic

Error recovery, retries, and persistence

Custom memory or storage layers

Tight integration with existing backend code

Example: LangGraph + MCP via Gateway

import requests
from langgraph.graph import StateGraph
from langchain.agents import Tool
from langchain_openai import ChatOpenAI

# Discover MCP endpoint from Gateway
resp = requests.get("http://localhost:6600/v1/servers")
servers = resp.json()["servers"]
duck_url = next(s["url"] for s in servers if s["name"] == "duckduckgo")

# Define a callable tool
def mcp_search(query: str) -> str:
return requests.post(duck_url, json={"input": query}).json()["output"]

search_tool = Tool(name="web_search", func=mcp_search, description="Search via MCP")

# Wire it into a LangGraph loop
llm = ChatOpenAI(model="gpt-4")
graph = StateGraph()
graph.add_node("agent", llm.bind_tools([search_tool]))
graph.add_edge("agent", "agent")
graph.set_entry_point("agent")

app = graph.compile()
app.invoke("What’s the latest in EU AI regulation?")

In this setup, you decide which tools are available. The agent chooses when to use them based on context, but you’ve defined the menu.And yes, this is still true in the Docker MCP Toolkit: you decide what to enable. The LLM can’t call what you haven’t made visible.

Choosing the Right Approach

Approach

Best For

You Manage

You Get

Docker MCP Toolkit + Catalog

Devs new to MCP, already using containers

Tool selection

One-click setup, built-in secrets, Gateway integration

Cagent

YAML-based multi-agent apps without custom code

Roles & tool access

Declarative orchestration, multi-agent workflows

LangGraph / CrewAI / ADK

Complex, production-grade agent systems

Full orchestration

Max control over logic, memory, tools, and flow

Wrapping UpWhether you’re just connecting a tool to Claude, designing a custom multi-agent system, or building production workflows by hand, Docker’s MCP tooling helps you get started easily and securely. 

Check out the Docker MCP Toolkit, cagent, and MCP Gateway for example code, docs, and more ways to get started.

Quelle: https://blog.docker.com/feed/

Making (Very) Small LLMs Smarter

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker. I started getting seriously interested in generative AI about two years ago. What interests me most is the ability to run language models (LLMs) directly on my laptop (For work, I have a MacBook Pro M2 max, but on a more personal level, I run LLMs on my personal MacBook Air M4 and on Raspberry Pis – yes, it’s possible, but I’ll talk about that another time).

Let’s be clear, reproducing a Claude AI Desktop or Chat GPT on a laptop with small language models is not possible. Especially since I limit myself to models that have between 0.5 and 7 billion parameters. But I find it an interesting challenge to see how far we can go with these small models. So, can we do really useful things with small LLMs? The answer is yes, but you need to be creative and put in a bit of effort.

I’m going to take a concrete use case, related to development (but in the future I’ll propose “less technical” use cases).

(Specific) Use Case: Code Writing Assistance

I need help writing code

Currently, I’m working in my free time on an open-source project, which is a Golang library for quickly developing small generative AI agents. It’s both to get my hands dirty with Golang and prepare tools for other projects. This project is called Nova; there’s nothing secret about it, you can find it here.

If I use Claude AI and ask it to help me write code with Nova: “I need a code snippet of a Golang Nova Chat agent using a stream completion.”

The response will be quite disappointing, because Claude doesn’t know Nova (which is normal, it’s a recent project). But Claude doesn’t want to disappoint me and will still propose something which has nothing to do with my project.

And it will be the same with Gemini.

So, you’ll tell me, give the “source code of your repository to feed” to Claude AI or Gemini. OK, but imagine the following situation: I don’t have access to these services, for various reasons. Some of these reasons could be confidentiality, the fact that I’m on a project where we don’t have the right to use the internet, for example. That already disqualifies Claude AI and Gemini. How can I get help writing code with a small local LLM? So as you guessed, with a local LLM. And moreover, a “very small” LLM.

Choosing a language model

When you develop a solution based on generative AI, the choice of language model(s) is crucial. And you’ll have to do a lot of technology watching, research, and testing to find the model that best fits your use case. And know that this is non-negligible work.

For this article (and also because I use it), I’m going to use hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, which you can find here. It’s a 3 billion parameter language model, optimized for code generation. You can install it with Docker Model Runner with the following command:

docker model pull hf.co/Qwen/Qwen2.5-Coder-3B-Instruct-GGUF:Q4_K_M

And to start chatting with the model, you can use the following command:

docker model run hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

Or use Docker Desktop:

So, of course, as you can see in the illustration above, this little “Qwen Coder” doesn’t know my Nova library either. But we’re going to fix that.

Feeding the model with specific information

For my project, I have a markdown file in which I save the code snippets I use to develop examples with Nova. You can find it here. For now, there’s little content, but it will be enough to prove and illustrate my point.

So I could add the entire content of this file to a user prompt that I would give to the model. But that will be ineffective. Indeed, small models have a relatively small context window. But even if my “Qwen Coder” was capable of ingesting all the content of my markdown file, it would have trouble focusing on my request and on what it should do with this information. So,

1st essential rule: when you use a very small LLM, the larger the content provided to the model, the less effective the model will be.

2nd essential rule: the more you keep the conversation history, the more the content provided to the model will grow, and therefore it will decrease the effectiveness of the model.

So, to work around this problem, I’m going to use a technique called RAG (Retrieval Augmented Generation). The principle is simple: instead of providing all the content to the model, we’re going to store this content in a “vector” type database, and when the user makes a request, we’re going to search in this database for the most relevant information based on the user’s request. Then, we’re going to provide only this relevant information to the language model. For this blog post, the data will be kept in memory (which is not optimal, but sufficient for a demonstration).

RAG?

There are already many articles on the subject, so I won’t go into detail. But here’s what I’m going to do for this blog post:

My snippets file is composed of sections: a markdown title (## snippet name), possibly a description in free text, and a code block (golang … ).

I’m going to split this file by sections into chunks of text (we also talk about “chunks”),

Then, for each section I’m going to create an “embedding” (vector representation of text == mathematical representation of the semantic meaning of the text) with the ai/embeddinggemma:latest model (a relatively small and efficient embedding model). Then I’m going to store these embeddings (and the associated text) in an in-memory vector database (a simple array of JSON objects).

If you want to learn more about embedding, please read this article:Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Diagram of the vector database creation process:

Similarity search and user prompt construction

Once I have this in place, when I make a request to the language model (so hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m), I’m going to:

Create an embedding of the user’s request with the embedding model.

Compare this embedding with the embeddings stored in the vector database to find the most relevant sections (by calculating the distance between the vector representation of my question and the vector representations of the snippets). This is called a similarity search.

From the most relevant sections (the most similar), I’ll be able to construct a user prompt that includes only the relevant information and my initial request.

Diagram of the search and user prompt construction process:

So the final user prompt will contain:

The system instructions. For example: “You are a helpful coding assistant specialized in Golang and the Nova library. Use the provided code snippets to help the user with their requests.”

The relevant sections were extracted from the vector database.

The user’s request.

Remarks:

I explain the principles and results, but all the source code (NodeJS with LangchainJS) used to arrive at my conclusions is available in this project 

To calculate distances between vectors, I used cosine similarity (A cosine similarity score of 1 indicates that the vectors point in the same direction. A cosine similarity score of 0 indicates that the vectors are orthogonal, meaning they have no directional similarity.)

You can find the JavaScript function I used here: 

And the piece of code that I use to split the markdown snippets file: 

Warning: embedding models are limited by the size of text chunks they can ingest. So you have to be careful not to exceed this size when splitting the source file. And in some cases, you’ll have to change the splitting strategy (fixed-size chunk,s for example, with or without overlap)

Implementation and results, or creating my Golang expert agent

Now that we have the operating principle, let’s see how to put this into music with LangchainJS, Docker Model Runner, and Docker Agentic Compose.

Docker Agentic Compose configuration

Let’s start with the Docker Agentic Compose project structure:

services:
golang-expert:
build:
context: .
dockerfile: Dockerfile
environment:
TERM: xterm-256color

HISTORY_MESSAGES: 2
MAX_SIMILARITIES: 3
COSINE_LIMIT: 0.45

OPTION_TEMPERATURE: 0.0
OPTION_TOP_P: 0.75
OPTION_PRESENCE_PENALTY: 2.2

CONTENT_PATH: /app/data

volumes:
– ./data:/app/data

stdin_open: true # docker run -i
tty: true # docker run -t

configs:
– source: system.instructions.md
target: /app/system.instructions.md

models:
chat-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CHAT

embedding-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_EMBEDDING

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

embedding-model:
model: ai/embeddinggemma:latest

configs:
system.instructions.md:
content: |
Your name is Bob (the original replicant).
You are an expert programming assistant in Golang.
You write clean, efficient, and well-documented code.
Always:
– Provide complete, working code
– Include error handling
– Add helpful comments
– Follow best practices for the language
– Explain your approach briefly

Use only the information available in the provided data and your KNOWLEDGE BASE.

What’s important here is:

I only keep the last 2 messages in my conversation history, and I only select the 2 or 3 best similarities found at most (to limit the size of the user prompt):

HISTORY_MESSAGES: 2
MAX_SIMILARITIES: 3
COSINE_LIMIT: 0.45

You can adjust these values according to your use case and your language model’s capabilities.

The models section, where I define the language models I’m going to use:

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

embedding-model:
model: ai/embeddinggemma:latest

One of the advantages of this section is that it will allow Docker Compose to download the models if they’re not already present on your machine.

As well as the models section of the golang-expert service, where I map the environment variables to the models defined above:

models:
chat-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CHAT

embedding-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_EMBEDDING

And finally, the system instructions configuration file:

configs:
– source: system.instructions.md
target: /app/system.instructions.md

Which I define a bit further down in the configs section:

configs:
system.instructions.md:
content: |
Your name is Bob (the original replicant).
You are an expert programming assistant in Golang.
You write clean, efficient, and well-documented code.
Always:
– Provide complete, working code
– Include error handling
– Add helpful comments
– Follow best practices for the language
– Explain your approach briefly

Use only the information available in the provided data and your KNOWLEDGE BASE.

You can, of course, adapt these system instructions to your use case. And also persist them in a separate file if you prefer.

Dockerfile

It’s rather simple:

FROM node:22.19.0-trixie

WORKDIR /app
COPY package*.json ./
RUN npm install
COPY *.js .

# Create non-root user
RUN groupadd –gid 1001 nodejs &amp;&amp;
useradd –uid 1001 –gid nodejs –shell /bin/bash –create-home bob-loves-js

# Change ownership of the app directory
RUN chown -R bob-loves-js:nodejs /app

# Switch to non-root user
USER bob-loves-js

Now that the configuration is in place, let’s move on to the agent’s source code.

Golang expert agent source code, a bit of LangchainJS with RAG

The JavaScript code is rather simple (probably improvable, but functional) and follows these main steps:

1. Initial configuration

Connection to both models (chat and embeddings) via LangchainJS

Loading parameters from environment variables

2. Vector database creation (at startup)

Reading the snippets.md file

Splitting into sections (chunks)

Generating an embedding for each section

Storing in an in-memory vector database

3. Interactive conversation loop

The user asks a question

Creating an embedding of the question

Similarity search in the vector database to find the most relevant snippets

Construction of the final prompt with: history + system instructions + relevant snippets + question

Sending to the LLM and displaying the response in streaming

Updating the history (limited to the last N messages)

import { ChatOpenAI } from "@langchain/openai";
import { OpenAIEmbeddings} from '@langchain/openai';

import { splitMarkdownBySections } from './chunks.js'
import { VectorRecord, MemoryVectorStore } from './rag.js';

import prompts from "prompts";
import fs from 'fs';

// Define [CHAT MODEL] Connection
const chatModel = new ChatOpenAI({
model: process.env.MODEL_RUNNER_LLM_CHAT || `ai/qwen2.5:latest`,
apiKey: "",
configuration: {
baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
},
temperature: parseFloat(process.env.OPTION_TEMPERATURE) || 0.0,
top_p: parseFloat(process.env.OPTION_TOP_P) || 0.5,
presencePenalty: parseFloat(process.env.OPTION_PRESENCE_PENALTY) || 2.2,
});

// Define [EMBEDDINGS MODEL] Connection
const embeddingsModel = new OpenAIEmbeddings({
model: process.env.MODEL_RUNNER_LLM_EMBEDDING || "ai/embeddinggemma:latest",
configuration: {
baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
apiKey: ""
}
})

const maxSimilarities = parseInt(process.env.MAX_SIMILARITIES) || 3
const cosineLimit = parseFloat(process.env.COSINE_LIMIT) || 0.45

// —————————————————————-
// Create the embeddings and the vector store from the content file
// —————————————————————-

console.log("========================================================")
console.log(" Embeddings model:", embeddingsModel.model)
console.log(" Creating embeddings…")
let contentPath = process.env.CONTENT_PATH || "./data"

const store = new MemoryVectorStore();

let contentFromFile = fs.readFileSync(contentPath+"/snippets.md", 'utf8');
let chunks = splitMarkdownBySections(contentFromFile);
console.log(" Number of documents read from file:", chunks.length);

// ————————————————-
// Create and save the embeddings in the memory vector store
// ————————————————-
console.log(" Creating the embeddings…");

for (const chunk of chunks) {
try {
// EMBEDDING COMPLETION:
const chunkEmbedding = await embeddingsModel.embedQuery(chunk);
const vectorRecord = new VectorRecord('', chunk, chunkEmbedding);
store.save(vectorRecord);

} catch (error) {
console.error(`Error processing chunk:`, error);
}
}

console.log(" Embeddings created, total of records", store.records.size);
console.log();

console.log("========================================================")

// Load the system instructions from a file
let systemInstructions = fs.readFileSync('/app/system.instructions.md', 'utf8');

// —————————————————————-
// HISTORY: Initialize a Map to store conversations by session
// —————————————————————-
const conversationMemory = new Map()

let exit = false;

// CHAT LOOP:
while (!exit) {
const { userMessage } = await prompts({
type: "text",
name: "userMessage",
message: `Your question (${chatModel.model}): `,
validate: (value) => (value ? true : "Question cannot be empty"),
});

if (userMessage == "/bye") {
console.log(" See you later!");
exit = true;
continue
}

// HISTORY: Get the conversation history for this session
const history = getConversationHistory("default-session-id")

// —————————————————————-
// SIMILARITY SEARCH:
// —————————————————————-
// ————————————————-
// Create embedding from the user question
// ————————————————-
const userQuestionEmbedding = await embeddingsModel.embedQuery(userMessage);

// ————————————————-
// Use the vector store to find similar chunks
// ————————————————-
// Create a vector record from the user embedding
const embeddingFromUserQuestion = new VectorRecord('', '', userQuestionEmbedding);

const similarities = store.searchTopNSimilarities(embeddingFromUserQuestion, cosineLimit, maxSimilarities);

let knowledgeBase = "KNOWLEDGE BASE:n";

for (const similarity of similarities) {
console.log(" CosineSimilarity:", similarity.cosineSimilarity, "Chunk:", similarity.prompt);
knowledgeBase += `${similarity.prompt}n`;
}

console.log("n Similarities found, total of records", similarities.length);
console.log();
console.log("========================================================")
console.log()

// ————————————————-
// Generate CHAT COMPLETION:
// ————————————————-

// MESSAGES== PROMPT CONSTRUCTION:
let messages = [
…history,
["system", systemInstructions],
["system", knowledgeBase],
["user", userMessage]
]

let assistantResponse = ''
// STREAMING COMPLETION:
const stream = await chatModel.stream(messages);
for await (const chunk of stream) {
assistantResponse += chunk.content
process.stdout.write(chunk.content);
}
console.log("n");

// HISTORY: Add both user message and assistant response to history
addToHistory("default-session-id", "user", userMessage)
addToHistory("default-session-id", "assistant", assistantResponse)

}

// Helper function to get or create a conversation history
function getConversationHistory(sessionId, maxTurns = parseInt(process.env.HISTORY_MESSAGES)) {
if (!conversationMemory.has(sessionId)) {
conversationMemory.set(sessionId, [])
}
return conversationMemory.get(sessionId)
}

// Helper function to add a message to the conversation history
function addToHistory(sessionId, role, content) {
const history = getConversationHistory(sessionId)
history.push([role, content])

// Keep only the last maxTurns conversations
const maxTurns = parseInt(process.env.HISTORY_MESSAGES) // Adjust this value based on your needs
if (history.length > maxTurns * 2) { // *2 because each turn has user & assistant message
history.splice(0, 2) // Remove oldest turn (user + assistant messages)
}
}

You can find it here: 

All that’s left is to launch to verify my hypotheses

In the project folder, run the following command:

docker compose up –build –no-log-prefix -d

Then connect to the container and launch the application:

docker compose exec golang-expert /bin/bash
node index.js

Nova Chat Agent with streaming completion

Once the application started, it’s time to ask our question to our Golang expert agent in development with the Nova library:

========================================================
Embeddings model: ai/embeddinggemma:latest
Creating embeddings…
Number of documents read from file: 4
Creating the embeddings…
Embeddings created, total of records 4

========================================================
? Your question (hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m): › I need a code snippet of a Golang Nova Chat agent using a stream completion

After a quick similarity search, our agent found two relevant snippets in the vector database. And it was able to provide me with complete, functional, and commented code to answer my request:

Certainly! Below is a complete, working Go code snippet for a Nova Chat agent that uses streaming completion. This example demonstrates how to set up an agent with the necessary configurations and generate a stream completion response.

“`go
package main

import (
"context"
"fmt"

"github.com/snipwise/nova/nova-sdk/agents"
"github.com/snipwise/nova/nova-sdk/agents/chat"
"github.com/snipwise/nova/nova-sdk/messages"
"github.com/snipwise/nova/nova-sdk/messages/roles"
"github.com/snipwise/nova/nova-sdk/models"
)

func main() {

ctx := context.Background()

// Create a new chat agent with streaming capabilities
agent, err := chat.NewAgent(
ctx,
agents.Config{
EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
SystemInstructions: "You are Bob, a helpful AI assistant.",
KeepConversationHistory: true,
},
models.Config{
Name: "ai/qwen2.5:1.5B-F16",
Temperature: models.Float64(0.8),
},
)
if err != nil {
panic(err)
}

// Generate a stream completion response
result, err := agent.GenerateStreamCompletion(
[]messages.Message{
{Role: roles.User, Content: "Who is James T Kirk?"},
},
func(chunk string, finishReason string) error {
// Simple callback that receives strings only
if chunk != "" {
fmt.Print(chunk)
}
if finishReason == "stop" {
fmt.Println()
}
return nil
},
)
if err != nil {
panic(err)
}

// Print the final response and finish reason
fmt.Println("Final Response:", result.Response)
fmt.Println("Finish Reason:", result.FinishReason)
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with chat agents.

2. **Context**: A context is created using `context.Background()` to manage the lifecycle of requests.

3. **Agent Creation**:
– An agent is created using `chat.NewAgent` with specified configurations:
– `EngineURL`: The URL of the engine to use for generating text.
– `SystemInstructions`: Instructions for the AI assistant, such as who Bob is.
– `KeepConversationHistory`: Whether to keep a history of previous messages.

4. **Model Configuration**:
– A model configuration is provided with:
– `Name`: The name of the model to use.
– `Temperature`: Controls the randomness of the output (higher values make responses more random).
– `MaxTokens`: Limits the maximum number of tokens in the response.

5. **Stream Completion Generation**:
– The `GenerateStreamCompletion` method is called with a list of messages and a callback function that processes each chunk of text as it arrives.
– The callback prints chunks to the console until the finish reason is "stop", indicating the completion has been reached.

6. **Output**:
– The final response and finish reason are printed after the stream completes.

This code snippet demonstrates how to set up a streaming chat agent using Nova, allowing you to receive text responses incrementally as they become available.

I then asked other questions.

Nova Structured Agent

This time, I asked for a structured agent: “I need a code snippet of a Nova Structured Agent.”

And the result was conclusive:

Extract from the response:

Certainly! Below is a complete, working example of a Nova Structured Agent in Go. This agent will generate structured data about a specified country based on the user's input.

“`go
package main

import (
"context"
"fmt"

"github.com/snipwise/nova/nova-sdk/agents"
"github.com/snipwise/nova/nova-sdk/agents/structured"
"github.com/snipwise/nova/nova-sdk/messages"
"github.com/snipwise/nova/nova-sdk/messages/roles"
"github.com/snipwise/nova/nova-sdk/models"
)

type Country struct {
Name string `json:"name"`
Capital string `json:"capital"`
Population int `json:"population"`
Languages []string `json:"languages"`
}

func main() {
ctx := context.Background()

// Create a new Structured Agent
agent, err := structured.NewAgent[Country](
ctx,
agents.Config{
EngineURL: "http://localhost:12434/engines/llama.cpp/v1",
SystemInstructions: `
Your name is Bob.
You are an assistant that answers questions about countries around the world.
`,
},
models.Config{
Name: "hf.co/menlo/jan-nano-gguf:q4_k_m",
Temperature: models.Float64(0.0),
},
)
if err != nil {
panic(err)
}

// Generate structured data about Canada
response, finishReason, err := agent.GenerateStructuredData([]messages.Message{
{Role: roles.User, Content: "Tell me about Canada."},
})

if err != nil {
panic(err)
}

// Print the generated country information
fmt.Println("Country Name:", response.Name)
fmt.Println("Capital:", response.Capital)
fmt.Println("Population:", response.Population)
fmt.Println("Languages:", strings.Join(response.Languages, ", "))
}

### Explanation:

1. **Imports**: The code imports necessary packages from the Nova SDK to interact with agents and models.

2. **Country Struct**: A simple struct `Country` is defined to represent the data structure expected by the agent.

3. **Main Function**:
– A context is created using `context.Background()`.
– A new Structured Agent is created for generating structured data about countries.
– The system instructions are set to provide a friendly greeting and instruction on answering questions about countries.
– The model configuration specifies the engine URL, temperature, and other parameters.

4. **GenerateStructuredData Method**: This method generates structured data based on the user's input. It takes an array of messages as input, where each message has a role (e.g., `User`) and content (e.g., "Tell me about Canada").

5. **Error Handling**: The code includes error handling to ensure that any issues during agent creation or data generation are properly caught and reported.

6. **Output**: The generated country information is printed, including the name, capital, population, and languages.

This example demonstrates how to use a Nova Structured Agent to generate structured data about countries based on user input. You can modify the system instructions and model configuration as needed for your specific use case.

Then, I moved on to the next question.

Nova RAG Agent with vector store. Be careful with your similarity settings!

This time, I asked for a “RAG” agent: “I need a snippet of a Nova RAG agent with a vector store.”

And once again, I got a relevant response.

However, when I tried with this question (after restarting the agent to start from a clean base without conversation history): “I need a snippet of a Nova RAG agent.”

The similarity search returned no relevant results (because the words “vector store” were not present in the snippets). And the agent responded with generic code that had nothing to do with Nova or was using code from Nova Chat Agents.

There may be several possible reasons:

The embedding model is not suitable for my use case,

The embedding model is not precise enough,

The splitting of the code snippets file is not optimal (you can add metadata to chunks to improve similarity search, for example, but don’t forget that chunks must not exceed the maximum size that the embedding model can ingest).

In that case, there’s a simple solution that works quite well: you lower the similarity thresholds and/or increase the number of returned similarities. This allows you to have more results to construct the user prompt, but be careful not to exceed the maximum context size of the language model. And you can also do tests with other “bigger” LLMs (more parameters and/or larger context window).

In the latest version of the snippets file, I added a KEYWORDS: … line below the markdown titles to help with similarity search. Which greatly improved the results obtained.

Conclusion

Using “Small Language Models” (SLM) or “Tiny Language Models” (TLM) requires a bit of energy and thought to work around their limitations. But it’s possible to build effective solutions for very specific problems. And once again, always think about the context size for the chat model and how you’ll structure the information for the embedding model. And by combining several specialized “small agents”, you can achieve very interesting results. This will be the subject of future articles.

Learn more

Check out Docker Model Runner

Learn more about Docker Agentic Compose

Read more about embedding in our recent blog Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Quelle: https://blog.docker.com/feed/

OpenCode with Docker Model Runner for Private AI Coding

AI-powered coding assistants are becoming a core part of modern development workflows. At the same time, many teams are increasingly concerned about where their code goes, how it’s processed, and who has access to it.

By combining OpenCode with Docker Model Runner, you can build a powerful AI-assisted coding experience while keeping full control over your data, infrastructure and spend.

This post walks through how to configure OpenCode to use Docker Model Runner and explains why this setup enables a privacy-first and cost-aware approach to AI-assisted development.

What Are OpenCode and Docker Model Runner?

OpenCode is an open-source coding assistant designed to integrate directly into developer workflows. It supports multiple model providers and exposes a flexible configuration system that makes it easy to switch between them.

Docker Model Runner (DMR) allows you to run and manage large language models easily. It exposes an OpenAI-compatible API, making it straightforward to integrate with existing tools that already support OpenAI-style endpoints.

Together, they provide a familiar developer experience backed by models running entirely within infrastructure you control.

Modifying the OpenCode Configuration

OpenCode can be customized using a configuration file that controls how providers and models are defined.

You can define this configuration in one of two places:

Global configuration: ~/.config/opencode/opencode.json

Project-specific configuration: opencode.json in the root of your project

When a project-level configuration is present, it takes precedence over the global one.

Using OpenCode with Docker Model Runner

Docker Model Runner (DMR) exposes an OpenAI-compatible API, which makes integrating it with OpenCode straightforward. To enable this integration, you simply need to update your opencode.json file to point to the DMR server and declare the locally available models.

Assuming Docker Model Runner is running at: http://localhost:12434/v1

your opencode.json configuration could look like this:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1",
},
"models": {
"qwen-coder3": {
"name": "qwen-coder3"
},
"devstral-small-2": {
"name": "devstral-small-2"
}
}
}
}
}

This configuration allows OpenCode to utilize locally hosted models through DMR, providing a powerful and private coding assistant.Note for Docker Desktop users:

If you are running Docker Model Runner via Docker Desktop, make sure TCP access is enabled. OpenCode connects to Docker Model Runner over HTTP, which requires the TCP port to be exposed:

docker desktop enable model-runner –tcp

Once enabled, Docker Model Runner will be accessible at http://localhost:12434/v1.

Figure 1: Enabling OpenCode to utilize locally hosted models through Docker Model Runner

Figure 2: Models like qwen3-coder, devstral-small-2, gpt-oss are good for coding use cases.

Benefits of using OpenCode with Model Runner

Privacy by Design

Using OpenCode with Docker Model Runner enables a privacy-first approach to AI-assisted development by keeping all model inference within the infrastructure you control.

Docker Model Runner runs models behind an OpenAI-compatible API endpoint. OpenCode sends prompts, source code, and context only to that endpoint, and nowhere else.

This means:

No third-party AI providers are involved

No external data sharing or vendor-side retention

No training on your code by external services

From OpenCode’s perspective, the provider is simply an API endpoint. Where that endpoint runs, on a developer machine, an internal server, or a private cloud, is entirely up to you.

Cost Control

Beyond privacy, running models with Docker Model Runner provides a significant cost advantage over hosted AI APIs.

Cloud-hosted coding assistants, can become expensive very quickly, especially when:

Working with large repositories

Passing long conversational or code context

Running frequent iterative prompts during development

With Docker Model Runner, inference runs on your own hardware. Once the model is pulled, there are no per-token fees, no request-based pricing, and no surprise bills. Teams can scale usage freely without worrying about escalating API costs.

Recommended Models for Coding

When using OpenCode with Docker Model Runner, model choice has a direct impact on both quality and developer experience. While many general-purpose might models work reasonably well, coding-focused models are optimized for long context windows and code-aware reasoning, which is especially important for real-world repositories.

The following models are well suited for use with OpenCode and Docker Model Runner:

qwen3-coder

devstral-small-2

gpt-oss

Each of these models can be served through Docker Model Runner and exposed via its OpenAI-compatible API.

You can pull these models by simply running:

docker model pull qwen3-coder

Pulling Models from Docker Hub and Hugging Face

Docker Model Runner can pull models not only from Docker Hub, but also directly from Hugging Face and automatically convert them into OCI artifacts that can be run and shared like any other Docker model.

For example, you can pull a model directly from Hugging Face with:

docker model pull huggingface.co/unsloth/Ministral-3-14B-Instruct-2512-GGUF

This gives teams access to the broader open model ecosystem without sacrificing consistency or operability.

Context Length Matters

For coding tasks, context length is often more important than raw parameter count. Large repositories, multi-file refactors, and long conversational histories all benefit from being able to pass more context to the model.

By default:

qwen3-coder → 128K context

devstral-small-2 → 128K context

gpt-oss → 4,096 tokens

The difference comes down to model intent.

qwen3-coder and devstral-small-2 are coding-focused models, designed to ingest large amounts of source code, project structure, and related context in a single request. A large default context window is critical for these use cases.

gpt-oss, on the other hand, is a general-purpose model. Its default context size reflects a broader optimization target, where extremely long inputs are less critical than they are for code-centric workflows.

Increasing Context Size for GPT-OSS

If you want to use gpt-oss for coding tasks that benefit from a larger context window, Docker Model Runner makes it easy to repackage the model with an increased context size.

For example, to create a version of gpt-oss with a 128K context window, you can run:

docker model pull gpt-oss # In case it's not pulled
docker model package –from gpt-oss –context-size 128000 gpt-oss:128K

This creates a new model artifact with an expanded context length that can be served by Docker Model Runner like any other model.Once packaged, you can reference this model in your opencode.json configuration:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1"
},
"models": {
"gpt-oss:128K": {
"name": "gpt-oss (128K)"
}
}
}
}
}

Sharing Models Across Your Team

Packaging models as OCI Artifacts has an additional benefit: the resulting model can be pushed to Docker Hub or a private registry.

This allows teams to:

Standardize on specific model variants (including context size)

Share models across developers without local reconfiguration

Ensure consistent behavior across environments

Version and roll back model changes explicitly

Instead of each developer tuning models independently, teams can treat models as first-class artifacts, built once and reused everywhere.

Putting It All Together: Using the Model from the CLI

With Docker Model Runner configured and the gpt-oss:128K model packaged, you can start using it immediately from OpenCode.

This section walks through selecting the model and using it to generate an agents.md file directly inside the Docker Model project.

Step 1: Verify the Model Is Available

First, confirm that the packaged model is available locally:

docker model ls

You should see gpt-oss:128K listed among the available models. If not, make sure the packaging step is completed successfully.

Step 2: Configure OpenCode to Use the Model

Ensure your project’s opencode.json includes the packaged model:

{
"$schema": "https://opencode.ai/config.json",
"provider": {
"dmr": {
"npm": "@ai-sdk/openai-compatible",
"name": "Docker Model Runner",
"options": {
"baseURL": "http://localhost:12434/v1"
},
"models": {
"gpt-oss": {
"name": "gpt-oss:128K"
}
}
}
}
}

This makes the model available to OpenCode under the dmr provider.

Step 3: Start OpenCode in the Project

From the root of the Docker Model project, start OpenCode:

opencode

Select the model from the list by running:

/models

Figure 3: Selecting gpt-oss model powered by Docker Model Runner in OpenCode

Step 4: Ask OpenCode to Generate agents.md

Once OpenCode is running, prompt the model to generate an agents.md file using the repository as context:

Generate an agents.md file in the project root following the agents.md specification and examples.

Use this repository as context and include sections that help an AI agent work effectively with this project, including:
– Project overview
– Build and test commands
– Code style guidelines
– Testing instructions
– Security considerations

Base the content on the actual structure, tooling, and conventions used in this repository.
Keep the file concise, practical, and actionable for an AI agent contributing to the project.

Because OpenCode is connected to Docker Model Runner, it can safely pass repository structure and relevant files to the model without sending any data outside your infrastructure.

The expanded 128K context window allows the model to reason over a larger portion of the project, resulting in a more accurate and useful agents.md.

Figure 4: The resulting agents.md file

Step 5: Review and Contribute to Docker Model Runner

Once the file is generated:

cat agents.md

Make any necessary adjustments so it accurately reflects the project, then commit it like any other project artifact:

git add agents.md
git commit -m "Add agents documentation"

At this point, you’re ready to open your first Docker Model Runner pull request.

Using OpenCode with Docker Model Runner makes it easy to contribute high-quality documentation and project artifacts, while keeping all model inference and repository context within the infrastructure you control.

How You Can Get Involved

The strength of Docker Model Runner lies in its community and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Learn more

Check out the Docker Model Runner General Availability announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Docker Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/

Permission-Aware RAG: End-to-End Testing with the SpiceDB Testcontainer

We use GenAI in every facet of technology now – internal knowledge bases, customer support systems, and code review bots, to name just a few use cases. And in nearly every one of these, someone eventually asks:

What stops the model from returning something the user shouldn’t see?”

This is a roadblock that companies building RAG features or AI Agents eventually hit – the moment where an LLM returns data from a document that the user was not authorized to access, introducing potential legal, financial, and reputational risk to all parties. Unfortunately, traditional methods of authorization are not suited for the hierarchical, dynamic nature of access control in RAG. This is exactly where modern authorization permissioning systems such as SpiceDB shine: in building fine-grained authorization for filtering content in your AI-powered applications.

In fact, OpenAI uses SpiceDB to secure 37 Billion documents for 5 Million users who use ChatGPT Connectors – a feature where you bring your data from different sources such as Google Drive, Dropbox, GitHub etc. into ChatGPT.

This blog post shows how you can pair SpiceDB with Testcontainers to give you the ability to test your permission logic inside your RAG pipeline, end-to-end, automatically, with zero infrastructure dependencies.The example repo can be found here.

Quick Primer on Authorization

Before diving into implementation, let’s clarify two foundational concepts: Authentication (verifying who a user is) and Authorization (deciding what they can access).

Authorization is commonly implemented via techniques such as:

Access Control Lists (ACLs)

Role-Based Access Control (RBAC)

Attribute-Based Access Control (ABAC)

However, for complex, dynamic, and context-rich applications like RAG pipelines, traditional methods such as RBAC or ABAC fall short. The new kid on the block – ReBAC (Relationship-Based Access Control) is ideal as it models access as a graph of relationships rather than fixed rules, providing the necessary flexibility and scalability required.

ReBAC was popularized in Google Zanzibar, the internal authorization system Google built to manage permissions across all its products (e.g., Google Docs, Drive). Zanzibar systems are optimized for low-latency, high-throughput authorization checks, and global consistency – requirements that are well-suited for RAG systems.

SpiceDB is the most scalable open-source implementation of Google’s Zanzibar authorization model. It stores access as a relationship graph, where the fundamental check reduces to: 

Is this actor allowed to perform this action on this resource?

For a Google Docs-style example:

definition user {}
definition document {
relation reader: user
relation writer: user

permission read = reader + writer
permission write = writer
}

This schema defines object types (user and document), explicit Relations between the objects (reader, writer), and derived Permissions (read, write). SpiceDB evaluates the relationship graph in microseconds, enabling real-time authorization checks at massive scale.

Access Control for RAG 

RAG (Retrieval-Augmented Generation) is an architectural pattern that enhances Large Language Models (LLMs) by letting them consult an external knowledge base, typically involving a Retriever component finding document chunks and the LLM generating an informed response.

This pattern is now used by businesses and enterprises for apps like chatbots that query sensitive data such as customer playbooks or PII – all stored in a vector database for performance. However, the fundamental risk in this flow is data leakage: the Retriever component ignores permissions, and the LLM will happily summarize unauthorized data. In fact, OWASP has a Top 10 Risks for Large Language Model Applications list which includes Sensitive Information Disclosure, Excessive Agency & Vector and Embedding Weaknesses. The consequences of this leakage can be severe, ranging from loss of customer trust to massive financial and reputational damage from compliance violations.

This setup desperately needs fine-grained authorization, and that’s where SpiceDB comes in. SpiceDB can post-filter retrieved documents by performing real-time authorization checks, ensuring the model only uses data the querying user is permitted to see. The only requirement is that the documents have metadata that indicates where the information came from.But testing this critical permission logic without mocks, manual Docker setup, or flaky Continuous Integration (CI) environments is tricky. Testcontainers provides the perfect solution, allowing you to spin up a real, production-grade, and disposable SpiceDB instance inside your unit tests to deterministically verify that your RAG pipeline respects permissions end-to-end.

Spin Up Real Authorization for Every Test

Instead of mocking your authorization system or manually running it on your workstation, you can add this line of code in your test:

container, _ := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")

And Testcontainers will:

Pull the real SpiceDB image

Start it in a clean, isolated environment

Assign it dynamic ports

Wait for it to be ready

Hand you the gRPC endpoint

Clean up afterwards

Because Testcontainers handles the full lifecycle – from pulling the container, exposing dynamic ports, and tearing it down automatically, you eliminate manual processes such as running Docker commands, and writing cleanup scripts. This isolation ensures that every single test runs with a fresh, clean authorization graph, preventing data conflicts, and making your permission tests completely reproducible in your IDE and across parallel Continuous Integration (CI) builds.

Suddenly you have a real, production-grade, Zanzibar-style permissions engine inside your unit test. 

Using SpiceDB & Testcontainers

Here’s a walkthrough of how you can achieve end-to-end permissions testing using SpiceDB and Testcontainers. The source code for this tutorial can be found here.

1. Testing Our RAG 

For the sake of simplicity, we have a minimal RAG and the retrieval mechanism is trivial too. 

We’re going to test three documents which have doc_ids (doc1 doc2 ..) that act as metadata. 

doc1: Internal roadmap

doc2: Customer playbook

doc3: Public FAQ

And three users:

Emilia owns doc1

Beatrice can view doc2

Charlie (or anyone) can view doc3

This SpiceDB schema defines a user and a document object type. A user has read permission on a document if they are the direct viewer or the owner of the document.

definition user {}

definition document {
relation owner: user
relation viewer: user | owner
permission read = owner + viewer
}

2. Starting the Testcontainer 

Here’s how a line of code can start a test to launch the disposable SpiceDB instance:

container, err := spicedbcontainer.Run(ctx, "authzed/spicedb:v1.47.1")
require.NoError(t, err)

Next, we connect to the running containerized service:

host, _ := container.Host(ctx)
port, _ := container.MappedPort(ctx, "50051/tcp")
endpoint := fmt.Sprintf("%s:%s", host, port.Port())

client, err := authzed.NewClient(
endpoint,
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpcutil.WithInsecureBearerToken("somepresharedkey"),
)

This is now a fully-functional SpiceDB instance running inside your test runner.

3. Load the Schema + Test Data

The test seeds data the same way your application would:

_, err := client.WriteSchema(ctx, &apiv1.WriteSchemaRequest{Schema: schema})
require.NoError(t, err)

Then:

rel("document", "doc1", "owner", "user", "emilia")
rel("document", "doc2", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "emilia")
rel("document", "doc3", "viewer", "user", "beatrice")
rel("document", "doc3", "viewer", "user", "charlie")

We now have a predictable, reproducible authorization graph for every test run.

4. Post-Filtering With SpiceDB

Before the LLM sees anything, we check permissions with SpiceDB which acts as the source of truth of the permissions in the documents.

resp, err := r.spiceClient.CheckPermission(ctx, &apiv1.CheckPermissionRequest{
Resource: docObject,
Permission: "read",
Subject: userSubject,
})

If SpiceDB says no, the doc is never fed into the LLM, thereby ensuring the user gets an answer to their query only based on what they have permissions to read.

This avoids:

Accidental data leakage

Overly permissive vector search

Compliance problems

Traditional access controls break down when data becomes embeddings hence having guardrails prevents this from happening. 

End-to-End Permission Checks in a Single Test

Here’s what the full test asserts:

Emilia queries “roadmap” → gets doc1 Because they’re the owner.

Beatrice queries “playbook” → gets doc2 Because she’s a viewer.

Charlie queries “public” → gets doc3 Because it’s the only doc he can read, as it’s a public doc

If there is a single failing permission rule, the end-to-end test will immediately fail, which is critical given the constant changes in RAG pipelines (such as new retrieval modes, embeddings, document types, or permission rules). 

What If Your RAG Pipeline Isn’t in Go?

First, a shoutout to Guillermo Mariscal for his original contribution to the SpiceDB Go Testcontainers module. 

What if your RAG pipeline is written in a different language such as Python? Not to worry, there’s also a community Testcontainers module written in Python that you can use similarly. The module can be found here.

Typically, you would integrate it in your integration tests like this:

# Your RAG pipeline test
def test_rag_pipeline_respects_permissions():
with SpiceDBContainer() as spicedb:
# Set up permissions schema
client = create_spicedb_client(
spicedb.get_endpoint(),
spicedb.get_secret_key()
)

# Load your permissions model
client.WriteSchema(your_document_permission_schema)

# Write test relationships
# User A can access Doc 1
# User B can access Doc 2

# Test RAG pipeline with User A
results = rag_pipeline.search(query="…", user="A")
assert "Doc 1" in results
assert "Doc 2" not in results # Should be filtered out!

Similar to the Go module, this container gives you a clean, isolated SpiceDB instance for every test run.

Why This Approach Matters

Authorization testing in RAG pipelines can be tricky, given the scale and latency requirement and it can get trickier in systems handling sensitive data. By integrating the flexibility and scale of SpiceDB with the automated, isolated environments of Testcontainers, you shift to a completely reliable, deterministic approach to authorization. 

Every time your code ships, a fresh, production-grade authorization engine is spun up, loaded with test data, and torn down cleanly, guaranteeing zero drift between your development machine and CI. This pattern can ensure that your RAG system is safe, correct, and permission-aware as it scales from three documents to millions.

Try It Yourself

The complete working example in Go along with a sample RAG pipeline is here:https://github.com/sohanmaheshwar/spicedb-testcontainer-ragClone it.Run go test -v.Watch it spin up a fresh SpiceDB instance, load permissions, and assert RAG behavior.Also, find the community modules for the SpiceDB testcontainer in Go and Python.

Quelle: https://blog.docker.com/feed/

Safer Docker Hub Pulls via a Sonatype-Protected Proxy

Why a “protected repo”?

Modern teams depend on public container images, yet most environments lack a single, auditable control point for what gets pulled and when. This often leads to three operational challenges:

Inconsistent or improvised base images that drift across teams and pipelines.

Exposure to new CVEs when tags remain unchanged but upstream content does not.

Unreliable workflows due to rate limiting, throttling, or pull interruptions.

A protected repository addresses these challenges by evaluating images at the boundary between public sources and internal systems, ensuring only trusted content is available to the build process. Routing upstream pulls through a Nexus Repository Docker proxy that authenticates to Docker Hub and caches approved layers and creates a security and reliability checkpoint. Repository Firewall inspects image layers and their components against configured policies and enforces the appropriate action, such as allow, quarantine, or block, based on the findings. This provides teams a standard, dependable entry point for base images. Approved content is cached to accelerate subsequent pulls, while malware and high-severity vulnerabilities are blocked before any layer reaches the developer’s environment.Combining this workflow with curated sources such as Docker Official Images or Docker Hardened Images provides a stable, vetted baseline for the entire organization.

Docker Hub authentication (PAT/OAT) quick setup

Before configuring a Nexus Docker proxy, set up authenticated access to Docker Hub. Authentication prevents anonymous-pull rate limits and ensures that shared systems do not rely on personal developer credentials. Docker Hub supports two types of access tokens, and for proxies or CI/CD systems the recommended option is an Organization Access Token (OAT).

Choose the appropriate token type

Personal Access Token (PAT): Use a PAT when authentication is tied to an individual account, such as local development or small teams.

Tied to a single user account

Required for CLI logins when the user enables two-factor authentication

Not recommended for shared infrastructure

Organization Access Token (OAT) (recommended): Use an OAT when authentication is needed for systems that serve multiple users or teams.

Associated with an organization rather than an individual

Suitable for CI/CD systems, build infrastructure, and Nexus Docker proxies

Compatible with SSO and 2FA enforcement

Supports granular permissions and revocation

Requires a Docker Hub Team or Business plan

Create an access token

To create a Personal Access Token (PAT):

Open Docker Hub account settings (clink on your hub avatar in the top right corner).

Select “Personal access tokens”.

Click on “Generate new token”.

Define token Name, Expiration and Access permissions.

Choose “Generate” and save the value immediately, as it cannot be viewed again.

To create an Organization Access Token (OAT):

Sign in to Docker Home and select your organization.

Select Admin Console, then Access tokens.

Select Generate access token.

Expand the Repository drop-down and assign only the required permissions, typically read/pull for proxies or CI systems.

Select Generate token. Copy the token that appears on the screen and save it. You won’t be able to retrieve the token once you exit the screen.

Recommended practices

Scope tokens to the minimum necessary permissions

Rotate tokens periodically

Revoke tokens immediately if they are exposed

Monitor last-used timestamps to confirm expected usage patterns

Step-by-step: create a Docker Hub proxy

The next step after configuring authentication is to make your protected repo operational by turning Nexus into your organization’s Docker Hub proxy. A Docker proxy repository in Nexus Repository provides  a single, policy-enforced registry endpoint that performs upstream pulls on behalf of developers and CI, caches layers locally for faster and more reliable builds, and centralizes access and audit trails so teams can manage credentials and image usage from one place.

To create the proxy:

As an administrator, navigate to the Settings view (gear icon).

Open Repositories and select Create repository.

Choose docker (proxy) as the repository type.

Configure the following settings:

Remote storage: https://registry-1.docker.io

Docker V1 API: Enabled

Index type: Select “Use Docker Hub”

Blob store and network settings as appropriate for your environment

Save the repository to finalize the configuration.

Provide a Clean Pull EndpointTo keep developer workflows simple, expose the proxy at a stable, organization-wide hostname. This avoids custom ports or per-team configurations and makes the proxy a transparent drop-in replacement for direct Docker Hub pulls.Common examples include:

docker-proxy.company.com

hub.company.internal

Use a reverse proxy or ingress controller to route this hostname to the Nexus proxy repository.

Validate Connectivity

Once the proxy is exposed, verify that it responds correctly and can authenticate to Docker Hub.Run:

docker login docker-proxy.company.comdocker pull docker-proxy.company.com/dhi/node:24

A successful pull confirms that the proxy is functioning correctly, upstream connectivity is working, and authenticated access is in place.

Turn on Repository Firewall for containers

Once the Docker proxy is in place, enable Repository Firewall so images are inspected before they reach internal systems. Repository Firewall enforces policy at download time, stopping malware and high-severity vulnerabilities at the registry edge, reducing the blast radius of newly disclosed issues and cutting remediation work for engineering teams.

To enable Firewall for the proxy repository:

As an administrator, navigate to the Settings view (gear icon).

Navigate to Capabilities under the System menu.

Create a ‘Firewall Audit and Quarantine’ capability for your Docker proxy repository.

Configure your policies to quarantine new violating components and protect against introducing risk.

Inform your development teams of the change to set expectations.

Understanding “Quarantine” vs. “Audit”Repository Firewall evaluates each image as it is requested:

Quarantine – Images that violate a policy are blocked and isolated. They do not reach the developer or CI system. The user receives clear feedback indicating the reason for the failure.

Audit – Images that pass the policies are served normally and cached. This improves performance and makes the proxy a consistent, reliable source of trusted base images.

Enabling Repository Firewall gives you immediate, download-time protection and the telemetry to operate it confidently. Start with conservative policies (quarantine on malware, and on CVSS ≥ 8), monitor violations and cache hit rate, tune thresholds based on real-world telemetry, and move to stricter block enforcement once false positives are resolved and teams are comfortable with the workflow.

What a blocked pull looks like

After enabling Repository Firewall and configuring your baseline policies, any pull that fails those checks is denied at the registry edge and no image layers are downloaded. By default Nexus returns a non-descriptive 404 to avoid exposing policy or vulnerability details, though you can surface a short, internal-facing failure message.As an example, If Firewall is enabled and your CVSS threshold policy is configured correctly, the following pull should fail with a 404 message. 

docker pull docker-proxy.company.com/library/node:20

This confirms that:

The request is passing through the proxy.

Repository Firewall is inspecting the image metadata.

Policy violations are blocked before any image layers are downloaded.

In the Firewall UI, you can open the proxy repository and view the recorded violations. The details can include detected CVEs, severity information, and the policy that triggered the denial. This provides administrators with visibility and confirms that enforcement is functioning as expected.

Additionally, the Quarantined Containers dashboard lists every image that Repository Firewall has blocked, showing the triggering policy and severity so teams can triage with full context. Administrators use this view to review evidence, add remediation notes, and release or delete quarantined items; note that malware is quarantined by default while other violations are quarantined only when their rules are set to Fail at the Proxy stage.

Fix forward: choose an approved base and succeed

Once Policy Enforcement is validated, the next step is to pull a base image that complies with your organization’s security rules. This shows what the normal developer experience looks like when using approved and trusted content.

Pull a compliant tag through the proxy:

docker pull docker-proxy.company.com/dhi/node:24

This request passes the Repository Firewall checks, and the image is pulled successfully. The proxy caches each layer locally so that future pulls are faster and no longer affected by upstream rate limits or registry availability.If you repeat the pull, the second request is noticeably quicker because it is served directly from the cache. This illustrates the everyday workflow developers should expect: trusted images, predictable performance, and fewer interruptions.

Get started: protect your Docker pulls

A Sonatype-protected Docker proxy gives developers one policy-compliant registry endpoint for image pulls. Layers are cached for speed, policy violations surface with actionable guidance, and teams work with vetted base images with the same Docker CLI workflows they already rely on. When paired with trusted sources such as Docker Hardened Images, this pattern delivers predictable baselines with minimal developer friction.Ready to try this pattern? Check the following pages:

Sonatype Nexus Repository basic documentation

Integration with Docker Hub

Register for Nexus Repository trial here

Quelle: https://blog.docker.com/feed/

Security Is a Developer Experience Problem, Rooted in Our Foundations

For more than a decade, the industry has tried to improve software security by pushing it closer to developers. We moved scanners into CI, added security checks to pull requests, and asked teams to respond faster to an ever-growing stream of vulnerabilities. And yet, the underlying problems have not gone away.

The issue is not that developers care too little about security. It is that we keep trying to fix security at the edges, instead of fixing the foundations. Hardened container images change that dynamic by reducing attack surface and eliminating much of the low-signal security noise before it ever reaches development teams.

Security Fails When It Becomes Noise

Most developers I know care deeply about building secure software. What they do not care about is security theater.

The way we handle security issues today, especially CVEs, often creates a steady stream of low-signal work for development teams. Alerts fire constantly. Many are technically valid but practically irrelevant. Others ask developers to patch components they did not choose and do not meaningfully control. Over time, this turns security into background noise.

When that happens, the system has already failed. Developers are forced to context switch, teams burn time debating severity scores, and real risk gets buried alongside issues that do not matter. This is not a motivation problem. It is a system design problem.

The industry responded by trying to “shift left” and push security earlier in the development cycle. In practice, this often meant pushing more work onto developers without giving them better defaults or foundations. The result was more toil, more alerts, and more reasons to tune it all out.

Shifting left was the right instinct but the wrong execution. The goal should not be making developers do more security work. It should be making secure choices the painless, obvious default so developers do less security work while achieving better outcomes.

Why Large Images Were the Default

To understand how we got here, it helps to be honest about why most teams start with large, generic base images.

When Docker launched in 2013, containers were unfamiliar. Developers reached for what they knew: full Linux distributions and familiar Debian or Ubuntu environments with all the debugging tools they relied on. 

Large images that had everything were a rational default. This approach optimized for ease and flexibility. When everything you might ever need is already present, development friction goes down. Builds fail less often. Debugging is simpler. Unknown dependencies are less likely to surprise you at the worst possible time.

For a long time, doing something more secure has required real investment. Teams needed a platform group that could design, harden, and continuously maintain custom base images. That work had to compete with product features and infrastructure priorities. Most organizations never made that tradeoff, and that decision was understandable.

So the industry converged on a familiar pattern. Start with a big image. Ship faster in the short term. Deal with the consequences later.

Those consequences compound. Large images dramatically increase the attack surface. They accumulate stale dependencies. They generate endless CVEs that developers are asked to triage long after the original choice was made. What began as a convenience slowly turns into persistent security and operational drag that slows development velocity and software shipments.

Secure Foundations Can Improve Developer Experience

There is a widely held belief that better security requires worse developer experience. In practice, the opposite is often true.

Starting from a secure, purpose-built foundation, like Docker Hardened Images, reduces complexity rather than adding to it. Smaller images contain fewer packages, which means fewer vulnerabilities and fewer alerts. Developers spend less time chasing low-impact CVEs and more time building actual product.

The key is that security is built into the foundation itself. Image contents are explicit and reproducible. Supply chain metadata like signatures, SBOMs, and provenance are part of the image by default, not additional steps developers have to wire together themselves. At the same time, these foundations are easy to customize securely. Teams can extend or tweak their images without undoing the hardening, thanks to predictable layering and supported customization patterns. This eliminates entire categories of hidden dependencies and security toil that would otherwise fall on individual teams.

There are also tangible performance benefits. Smaller images pull faster, build faster, and deploy faster. In larger environments, these gains add up quickly.

Importantly, this does not require sacrificing flexibility. Developers can still use rich build environments and familiar tools, while shipping minimal, hardened runtime images into production.

This is one of the rare cases where improving security directly improves developer experience. The tradeoff we have accepted for years is not inevitable.

What Changes When Secure Foundations Are the Default

When secure foundations and hardened images become the default starting point, the system behaves differently. Developers keep using the same Docker workflows they already know. The difference is the base they start from. 

Security hardening, patching, and supply chain hygiene are handled once in the foundation instead of repeatedly in every service. Secure foundations are not limited to operating system base images. The same principles apply to the software teams actually build on top of, such as databases, runtimes, and common services. Starting from a hardened MySQL or application image removes an entire class of security and maintenance work before a single line of application code is written.

This is the problem Docker Hardened Images are designed to address. The same hardening principles are applied consistently across widely used open source container images, not just at the operating system layer, so teams can start from secure defaults wherever their applications actually begin. The goal is not to introduce another security workflow or tool. It is to give developers better building blocks from day one.

Because the foundation is maintained by experts, teams see fewer interruptions. Fewer emergency rebuilds. Fewer organization-wide scrambles when a widely exploited vulnerability appears. Security teams can focus on adoption and posture instead of asking dozens of teams to solve the same problem independently.

The result is less security toil and more time spent on product work. That is a win for developers, security teams, and the business.

Build on Better Defaults

For years, we have tried to improve security by asking developers to do more. Patch faster. Respond to more alerts. Learn more tools. That approach does not scale.

Security scales when defaults are strong. When foundations are designed to be secure and maintained over time. When developers are not forced to constantly compensate for decisions that were made far below their code.

If we want better security outcomes without slowing teams down, we should start where software actually starts. That requires secure foundations, like hardened images, that are safe by default. With better foundations, security becomes quieter, development becomes smoother, and the entire system works the way it should.

That is the bar we should be aiming for.
Quelle: https://blog.docker.com/feed/

Deterministic AI Testing with Session Recording in cagent

AI agents introduce a challenge that traditional software doesn’t have: non-determinism. The same prompt can produce different outputs across runs, making reliable testing difficult. Add API costs and latency to the mix, and developer productivity takes a hit.

Session recording in cagent addresses this directly. Record an AI interaction once, replay it indefinitely—with identical results, zero API costs, and millisecond execution times.

How session recording works

cagent implements the VCR pattern, a proven approach for HTTP mocking. During recording, cagent proxies requests to the AI provider, captures the full request/response cycle, and saves it to a YAML “cassette” file. During replay, incoming requests are matched against the recording and served from cache—no network calls required.

One implementation detail worth noting: tool call IDs are normalized before matching. OpenAI generates random IDs on each request, which would otherwise break replay. cagent handles this automatically.

Getting started

Recording a session requires a single flag:

cagent run my-agent.yaml –record "What is Docker?"
# creates: cagent-recording-1736089234.yaml

cagent run my-agent.yaml –record my-test "Explain containers"
# creates: my-test.yaml

Replaying uses the –fake flag with the cassette path:

cagent exec my-agent.yaml –fake my-test.yaml "Explain containers"

The replay completes in milliseconds with no API calls.

Example: CI/CD integration testing

Consider a code review agent:

# code-reviewer.yaml
agents:
root:
model: anthropic/claude-sonnet-4-0
description: Code review assistant
instruction: |
You are an expert code reviewer. Analyze code for best practices,
security issues, performance concerns, and readability.
toolsets:
– type: filesystem

Record the interaction with –yolo to auto-approve tool calls:

cagent exec code-reviewer.yaml –record code-review –yolo
"Review pkg/auth/handler.go for security issues"

In CI, replay without API keys or network access:

cagent exec code-reviewer.yaml –fake code-review.yaml
"Review pkg/auth/handler.go for security issues"

Cassettes can be version-controlled alongside test code. When agent instructions change significantly, delete the cassette and re-record to capture the new behaviour.

Other use cases

Cost-effective prompt iteration. Record a single interaction with an expensive model, then iterate on agent configuration against that recording. The first run incurs API costs; subsequent iterations are free.

cagent exec ./agent.yaml –record expensive-test "Complex task"
for i in {1..100}; do
cagent exec ./agent-v$i.yaml –fake expensive-test.yaml "Complex task"
done

Issue reproduction. Users can record a session with –record bug-report and share the cassette file. Support teams replay the exact interaction locally for debugging.

Multi-agent systems. Recording captures the complete delegation graph: root agent decisions, sub-agent tool calls, and inter-agent communication.

Security and provider support

Cassettes automatically strip sensitive headers (Authorization, X-Api-Key) before saving, making them safe to commit to version control. The format is human-readable YAML:

version:2
interactions:
-id:0
request:
method: POST
url: &lt;https://api.openai.com/v1/chat/completions&gt;
body:"{…}"
response:
status: 200 OK
body:"data: {…}"

Session recording works with all supported providers: OpenAI, Anthropic, Google, Mistral, xAI, and Nebius.

Get started

Session recording is available now in cagent. To try it:

cagent run ./your-agent.yaml –record my-session "Your prompt here"

For questions, feedback, or feature requests, visit the cagent repository or join the GitHub Discussions.
Quelle: https://blog.docker.com/feed/

2025 Recap: The Year Software Development Changed Shape

2025 was the year software teams stopped optimizing models and started optimizing systems.

By December, a few truths were impossible to ignore.

1. Developer Productivity Became the Real Competitive Advantage

By mid-year, every major AI lab had cleared the “good enough reasoning” bar. With model quality converging, the differentiator was no longer raw intelligence. It was how fast teams could ship.

The fastest teams used systems that were:

Declarative: automation defined in YAML and config, not code

Composable: agents calling tools with minimal glue

Observable: evaluated, traced, and versioned 

Reproducible: identical behavior every run

Productivity became a platform problem, not a talent problem.

2. Security Went From “Filters” to “Blast Radius”

The real problem wasn’t what models say. It was what they could do.

Once agents can act, blast radius matters more than the prompt.

Production incidents across the industry made it clear:

Agents leaking internal data within minutes

Malicious plugins shipping ransomware

Supply-chain bugs in AI tooling 

Agents deleting repos or months of work

Smart teams adopted the same guardrails they use for privileged system processes:

Sandboxed runtimes

Containerized toolchains

Signed artifacts

Policies in front of tool calls

Hardened bases and reproducible builds

The industry stopped filtering danger out of the model. They focused on containing it.

3. Agents Stopped Being Demos 

Agents became good enough to do real jobs.

At the start of the year, “agent” meant a clever prototype. By the end, agents were doing operational work: updating infrastructure, resolving customer issues, moving money, managing SaaS tools.

Two shifts unlocked this:

Reasoning took a leap.OpenAI’s o3 solved 25% of FrontierMath, problems that take researchers hours or days. DeepSeek sent waves with their R1 model, proving that the frontier moved from model size to compute at inference time.

Tools became standardized.MCP became the USB-C port of AI, a universal way for agents to safely access tools, data, and workflows. Once the ecosystem aligned on a common port, everything accelerated.

4. Containers Quietly Remained the Execution Layer for Everything

Containers continued doing the quiet work of powering every stack.

More than 90% of companies used containers as the default environment where:

Applications run

Build systems operate

Agents execute real tasks

Infrastructure is tested before hitting production

Even in an agent-driven world, developers need environments that act the same way every time. Containers remained a universal, stable execution surface.

5. Hardened Images Became the New Starting Point

You can’t trust the system if you don’t trust the base image. 

Docker Hardened Images (DHI) solved the first question every team had to ask: “What are we actually running?” 

DHI answered that with:

A known, verified base image

A transparent bill of materials

Reproducible builds

Signed artifacts

When hardened images became free, the cost of doing the right thing dropped to zero. Teams no longer layered security patches on top of unknown upstream risk. They began from a secure, trusted baseline.

What’s Next for 2026

The race for raw model intelligence is over. What separates winners in 2026 will be everything around the model.

Agents become a standard runtime target. Versioned and deployed like services.

Security frameworks treat agents as users. With permissions, onboarding, and monitoring.

Ecosystem gravity increases. MCP is the start. The dominant tool interface becomes the center of the agent economy.

Trust becomes infrastructure. Signed models to verified tools to hardened bases. Winners will have the smallest blast radius when things break.

The term “AI engineer” fades. What remains is what has always been: software engineers who build secure, governable, and resilient systems.

In 2023, we learned to talk to models. In 2024, we learned to chain them. In 2025, we gave them real power.

2026 will be about earning the right to keep it.
Quelle: https://blog.docker.com/feed/