Securing the Docker MCP Catalog: Commit Pinning, Agentic Auditing, and Publisher Trust Levels

Trust is the most important consideration when you connect AI assistants to real tools. While MCP containerization provides strong isolation and limits the blast radius of malfunctioning or compromised servers, we’re continuously strengthening trust and security across the Docker MCP solutions to further reduce exposure to malicious code. As the MCP ecosystem scales from hundreds to tens of thousands of servers (and beyond), we need stronger mechanisms to prove what code is running, how it was built, and why it’s trusted.

To strengthen trust across the entire MCP lifecycle, from submission to maintenance to daily use, we’ve introduced three key enhancements:

Commit Pinning: Every Docker-built MCP server in the Docker MCP Registry (the source of truth for the MCP Catalog) is now tied to a specific Git commit, making each release precisely attributable and verifiable.

Automated, AI-Audited Updates: A new update workflow keeps submitted MCP servers current, while agentic reviews of incoming changes make vigilance scalable and traceable.

Publisher Trust Levels: We’ve introduced clearer trust indicators in the MCP Catalog, so developers can easily distinguish between official, verified servers and community-contributed entries.

These updates raise the bar on transparency and security for everyone building with and using MCP at scale with Docker.

Commit pins for local MCP servers

Local MCP servers in the Docker MCP Registry are now tied to a specific Git commit with source.commit. That commit hash is a cryptographic fingerprint for the exact revision of the server code that we build and publish. Without this pinning, a reference like latest or a branch name would build whatever happens to be at that reference right now, making builds non-deterministic and vulnerable to supply chain attacks if an upstream repository is compromised. Even Git tags aren’t really immutable since they can be deleted and recreated to point to another commit. By contrast, commit hashes are cryptographically linked to the content they address, making the outcome of an audit of that commit a persistent result.

To make things easier, we’ve updated our authoring tools (like the handy MCP Registry Wizard) to automatically add this commit pin when creating a new server entry, and we now enforce the presence of a commit pin in our CI pipeline (missing or malformed pins will fail validation). This enforcement is deliberate: it’s impossible to accidentally publish a server without establishing clear provenance for the code being distributed. We also propagate the pin into the MCP server image metadata via the org.opencontainers.image.revision label for traceability.

Here’s an example of what this looks like in the registry:

# servers/aws-cdk-mcp-server/server.yaml
name: aws-cdk-mcp-server
image: mcp/aws-cdk-mcp-server
type: server
meta:
category: devops
tags:
– aws-cdk-mcp-server
– devops
about:
title: AWS CDK
description: AWS Cloud Development Kit (CDK) best practices, infrastructure as code patterns, and security compliance with CDK Nag.
icon: https://avatars.githubusercontent.com/u/3299148?v=4
source:
project: https://github.com/awslabs/mcp
commit: 7bace1f81455088b6690a44e99cabb602259ddf7
directory: src/cdk-mcp-server

And here’s an example of how you can verify the commit pin for a published MCP server image:

$ docker image inspect mcp/aws-core-mcp-server:latest
–format '{{index .Config.Labels "org.opencontainers.image.revision"}}'
7bace1f81455088b6690a44e99cabb602259ddf7

In fact, if you have the cosign and jq commands available, you can perform additional verifications:

$ COSIGN_REPOSITORY=mcp/signatures cosign verify mcp/aws-cdk-mcp-server –key https://raw.githubusercontent.com/docker/keyring/refs/heads/main/public/mcp/latest.pub | jq -r ' .[].optional["org.opencontainers.image.revision"] '

Verification for index.docker.io/mcp/aws-cdk-mcp-server:latest —
The following checks were performed on each of these signatures:
– The cosign claims were validated
– Existence of the claims in the transparency log was verified offline
– The signatures were verified against the specified public key
7bace1f81455088b6690a44e99cabb602259ddf7

Keeping in sync

Once a server is in the registry, we don’t want maintainers needing to hand‑edit pins every time they merge something into their upstream repos (they have better things to do with their time), so a new automated workflow scans upstreams nightly, bumping source.commit when there’s a newer revision, and opening an auditable PR in the registry to track the incoming upstream changes.  This gives you the security benefits of pinning (immutable references to reviewed code) without the maintenance toil. Updates still flow through pull requests, so you get a review gate and approval trail showing exactly what new code is entering your supply chain. The update workflow operates on a per-server basis, with each server update getting its own branch and pull request.

This raises the question, though: how do we know that the incoming changes are safe?

AI in the review loop, humans in charge

Every proposed commit pin bump (and any new local server) will now be subject to an agentic AI security review of the incoming upstream changes. The reviewers (Claude Code and OpenAI Codex) analyze MCP server behavior, flagging risky or malicious code, adding structured reports to the PR, and offering standardized labels such as security-risk:high or security-blocked. Humans remain in the loop for final judgment, but the agents are relentless and scalable.

The challenge: untrusted code means untrusted agents

When you run AI agents in CI to analyze untrusted code, you face a fundamental problem: the agents themselves become attack vectors. They’re susceptible to prompt injection through carefully crafted code comments, file names, or repository structure. A malicious PR could attempt to manipulate the reviewing agent into approving dangerous changes, exfiltrating secrets, or modifying the review process itself.

We can’t trust the code under review, but we also can’t fully trust the agents reviewing it.

Isolated agents

Our Compose-based security reviewer architecture addresses this trust problem by treating the AI agents as untrusted components. The agents run inside heavily isolated Docker containers with tightly controlled inputs and outputs:

The code being audited is mounted read-only — The agent can analyze code but never modify it. Moreover, the code it audits is just a temporary copy of the upstream repository, but the read-only access means that the agent can’t do something like modify a script that might be accidentally executed outside the container.

The agent can only write to an isolated output directory — Once the output is written, the CLI wrapper for the agent only extracts specific files (a Markdown report and a text file of labels, both with fixed names), meaning any malicious scripts or files that might be written to that directory are deleted.

The agent lacks direct Internet access — the reviewer container cannot reach external services.

CI secrets and API credentials never enter the reviewer container — Instead, a lightweight reverse proxy on a separate Docker network accepts requests from the reviewer, injects inference provider API keys on outbound requests, and shields those keys from the containerized code under review.

All of this is encapsulated in a Docker Compose stack and wrapped by a convenient CLI that allows running the agent both locally and in CI.

Most importantly, this architecture ensures that even if a malicious PR successfully manipulates the agent through prompt injection, the damage is contained: the agent cannot access secrets, cannot modify code, and cannot communicate with external attackers.

CI integration and GitHub Checks

The review workflow is automatically triggered when a PR is opened or updated. We still maintain some control over these workflows for external PRs, requiring manual triggering to prevent malicious PRs from exhausting inference API credits. These reviews surface directly as GitHub Status Checks, with each server being reviewed receiving dedicated status checks for any analyses performed.

The resulting check status maps to the associated risk level determined by the agent: critical findings result in a failed check that blocks merging, high and medium findings produce neutral warnings, while low and info findings pass. We’re still tuning these criteria (since we’ve asked the agents to be extra pedantic) and currently reviewing the reports manually, but eventually we’ll have the heuristics tuned to a point where we can auto-approve and merge most updated PRs. In the meantime, these reports serve as a scalable “canary in the coal mine”, alerting Docker MCP Registry maintainers to incoming upstream risks — both malicious and accidental.

It’s worth noting that the agent code in the MCP Registry repository is just an example (but a functional one available under an MIT License). The actual security review agent that we run lives in a private repository with additional isolation, but it follows the same architecture.

Reports and risk labels

Here’s an example of a report our automated reviewers produced:

# Security Review Report

## Scope Summary
– **Review Mode:** Differential
– **Repository:** /workspace/input/repository (stripe)
– **Head Commit:** 4eb0089a690cb60c7a30c159bd879ce5c04dd2b8
– **Base Commit:** f495421c400748b65a05751806cb20293c764233
– **Commit Range:** f495421c400748b65a05751806cb20293c764233…4eb0089a690cb60c7a30c159bd879ce5c04dd2b8
– **Overall Risk Level:** MEDIUM

## Executive Summary

This differential review covers 23 commits introducing significant changes to the Stripe Agent Toolkit repository, including: folder restructuring (moving tools to a tools/ directory), removal of evaluation code, addition of new LLM metering and provider packages, security dependency updates, and GitHub Actions workflow permission hardening.

The reviewers can produce both differential analyses (looking at the changes brought in by a specific set of upstream commits) as well as full analyses (looking at entire codebases). We intend to run both differential for PRs and full analyses regularly.

Why behavioral analysis matters

Traditional scanners remain essential, but they tend to focus on things like dependencies with CVEs, syntactical errors (such as a missing break in a switch statement), or memory safety issues (such as dereferencing an uninitialized pointer) — MCP requires us to also examine code’s behavior. Consider the recent malicious postmark-mcp package impersonation: a one‑line backdoor quietly BCC’d outgoing emails to an attacker. Events like this reinforce why our registry couples provenance with behavior‑aware reviews before updates ship.

Real-world results

In our scans so far, we’ve already found several real-world issues in upstream projects (stay tuned for a follow-up blog post), both in MCP servers and with a similar agent in our Docker Hardened Images pipeline. We’re happy to say that we haven’t run across anything malicious so far, just logic errors with security implications, but the granularity and subtlety of issues that these agents can identify is impressive.

Trust levels in the Docker MCP Catalog

In addition to the aforementioned technical changes, we’ve also introduced publisher trust levels in the Docker MCP Catalog, exposing them in both the Docker MCP Toolkit in Docker Desktop and on Docker MCP Hub. Each server will now have an associated icon indicating whether the server is from a “known publisher” or maintained by the community. In both cases, we’ll still subject the code to review, but these indicators should provide additional context on the origin of the MCP server.

Figure 1: Here’s an example of an MCP server, the AWS Terraform MCP published by a known, trusted publisher

Figure 2: The Fetch MCP server, an example of an MCP community server

What does this mean for the community?

Publishers now benefit from a steady stream of upstream improvements, backed by a documented, auditable trail of code changes. Commit pins make each release precisely attributable, while the nightly updater keeps the catalog current with no extra effort from publishers or maintainers. AI-powered reviewers scale our vigilance, freeing up human reviewers to focus on the edge cases that matter most.

At the same time, developers using MCP servers get clarity about a server’s publisher, making it easier to distinguish between official, community, and third-party contributions. These enhancements strengthen trust and security for everyone contributing to or relying on MCP servers in the Docker ecosystem.

Submit your MCP servers to Docker by following the submission guidance here!

Learn more

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers.

Get started with the MCP Toolkit: Run MCP servers easily and securely.

Find documentation for Docker MCP Catalog and Toolkit.

Quelle: https://blog.docker.com/feed/

Introducing elastic training on Amazon SageMaker HyperPod

Amazon SageMaker HyperPod now supports elastic training, enabling organizations to accelerate foundation model training by automatically scaling training workloads based on resource availability and workload priorities. This represents a fundamental shift from training with a fixed set of resources, as it saves hours of engineering time spent reconfiguring training jobs based on compute availability.
Any change in compute availability previously required manually halting training, reconfiguring training parameters, and restarting jobs—a process that requires distributed training expertise and leaves expensive AI accelerators sitting idle during training job reconfiguration. Elastic training automatically expands training jobs to absorb idle AI accelerators and seamlessly contracting when higher-priority workloads need resources—all without halting training entirely.
By eliminating manual reconfiguration overhead and ensuring continuous utilization of available compute, elastic training can help save time previously spent on infrastructure management, reduce costs by maximizing cluster utilization, and accelerate time-to-market. Training can start immediately with minimal resources and grow opportunistically as capacity becomes available.
SageMaker HyperPod is available in all regions where Amazon SageMaker HyperPod is currently available. Organizations can enable elastic training with zero code changes using HyperPod recipes for publicly available models including Llama and GPT OSS. For custom model architectures, customers can integrate elastic training capabilities through lightweight configuration updates and minimal code modifications, making it accessible to teams without requiring distributed systems expertise.
To get started, visit the Amazon SageMaker HyperPod product page and see the elastic training documentation for implementation guidance.
Quelle: aws.amazon.com

Amazon Bedrock now supports reinforcement fine-tuning delivering 66% accuracy gains on average over base models

Amazon Bedrock now supports reinforcement fine-tuning, helping you improve model accuracy without needing deep machine learning expertise or large sums of labeled data. Amazon Bedrock automates the reinforcement fine-tuning workflow, making this advanced model customization technique accessible to everyday developers. Models learn to align with your specific requirements using a small set of prompts rather than the large sums of data needed for traditional fine-tuning methods, enabling teams to get started quickly. This capability teaches models through feedback on multiple possible responses to the same prompt, improving their judgement of what makes a good response. Reinforcement fine-tuning in Amazon Bedrock delivers 66% accuracy gains on average over base models so you can use smaller, faster, and more cost-effective model variants while maintaining high quality.
Organizations struggle to adapt AI models to their unique business needs, forcing them to choose between generic models with average performance or expensive, complex customization that requires specialized talent, infrastructure, and risky data movement. Reinforcement fine-tuning in Amazon Bedrock removes this complexity by making advanced model customization fast, automated, and secure. You can train models by uploading training data directly from your computer or choose from datasets already stored in Amazon S3, eliminating the need for any labeled datasets. You can define reward functions using verifiable rule-based graders or AI-based judges along with built-in templates to optimize your models for both objective tasks such as code generation or math reasoning, and subjective tasks such as instruction following or chatbot interactions. Your proprietary data never leaves AWS’s secure, governed environment during the entire customization process, mitigating security and compliance concerns.
You can get started with reinforcement fine-tuning in Amazon Bedrock through the Amazon Bedrock console and via the Amazon Bedrock APIs. At launch, you can use reinforcement fine-tuning with Amazon Nova 2 Lite with support for additional models coming soon. To learn more about reinforcement fine-tuning in Amazon Bedrock, read the launch blog, pricing page, and documentation.
Quelle: aws.amazon.com

New serverless model customization capability in Amazon SageMaker AI

Amazon Web Services (AWS) announces a new serverless model customization capability that empowers AI developers to quickly customize popular models with supervised fine-tuning and the latest techniques like reinforcement learning. Amazon SageMaker AI is a fully managed service that brings together a broad set of tools to enable high-performance, low-cost AI model development for any use case. 
Many AI developers seek to customize models with proprietary data for improved accuracy, but this often requires lengthy iteration cycles. For example, AI developers must define a use case and prepare data, select a model and customization technique, train the model, then evaluate the model for deployment. Now AI developers can simplify the end-to-end model customization workflow, from data preparation to evaluation and deployment, and accelerate the process. With an easy-to-use interface, AI developers can quickly get started and customize popular models, including Amazon Nova, Llama, Qwen, DeepSeek, and GPT-OSS, with their own data. They can use supervised fine-tuning and the latest customization techniques such as reinforcement learning and direct preference optimization. In addition, AI developers can use the AI agent-guided workflow (in preview), and use natural language to generate synthetic data, analyze data quality, and handle model training and evaluation—all entirely serverless. 
You can use this easy-to-use interface in the following AWS Regions: Europe (Ireland), US East (N. Virginia), Asia Pacific (Tokyo), and US West (Oregon). To join the waitlist to access the AI agent-guided workflow, visit the sign-up page. 
To learn more, visit the SageMaker AI model customization page and blog.
Quelle: aws.amazon.com

Announcing TypeScript support in Strands Agents (preview) and more

In May, we open sourced the Strands Agents SDK, an open source python framework that takes a model-driven approach to building and running AI agents in just a few lines of code. Today, we’re announcing that TypeScript support is available in preview. Now, developers can choose between Python and TypeScript for building Strands Agents. TypeScript support in Strands has been designed to provide an idiomatic TypeScript experience with full type safety, async/await support, and modern JavaScript/TypeScript patterns. Strands can be easily run in client applications, in browsers, and server-side applications in runtimes like AWS Lambda and Bedrock AgentCore. Developers can also build their entire stack in Typescript using the AWS CDK. We’re also announcing three additional updates for the Strands SDK. First, edge device support for Strands Agents is generally available, extending the SDK with bidirectional streaming and additional local model providers like llama.cpp that let you run agents on small-scale devices using local models. Second, Strands steering is now available as an experimental feature, giving developers a modular prompting mechanism that provides feedback to the agent at the right moment in its lifecycle, steering agents toward a desired outcome without rigid workflows. Finally, Strands evaluations is available in preview. Evaluations gives developers the ability to systematically validate agent behavior, measure improvements, and deploy with confidence during development cycles. Head to the Strands Agents GitHub to get started building.
Quelle: aws.amazon.com

Amazon SageMaker HyperPod now supports checkpointless training

Amazon SageMaker HyperPod now supports checkpointless training, a new foundational model training capability that mitigates the need for a checkpoint-based job-level restart for fault recovery. Checkpointless training maintains forward training momentum despite failures, reducing recovery time from hours to minutes. This represents a fundamental shift from traditional checkpoint-based recovery, where failures require pausing the entire training cluster, diagnosing issues manually, and restoring from saved checkpoints, a process that can leave expensive AI accelerators idle for hours, costing your organization wasted compute.
Checkpointless training transforms this paradigm by preserving the model training state across the distributed cluster, automatically swapping out faulty training nodes on the fly and using peer-to-peer state transfer from healthy accelerators for failure recovery. By mitigating checkpoint dependencies during recovery, checkpointless training can help your organization save on idle AI accelerator costs and accelerate time. Even at larger scales, checkpointless training on Amazon SageMaker HyperPod enables upwards of 95% training goodput on cluster sizes with thousands of AI accelerators.
Checkpointless training on SageMaker HyperPod is available in all AWS Regions where Amazon SageMaker HyperPod is currently available. You can enable checkpointless training with zero code changes using HyperPod recipes for popular publicly available models such as Llama and GPT OSS. For custom model architectures, you can integrate checkpointless training components with minimal modifications for PyTorch-based workflows, making it accessible to your teams regardless of their distributed training expertise.
To get started, visit the Amazon SageMaker HyperPod product page and see the checkpointless training GitHub page for implementation guidance.
Quelle: aws.amazon.com