Breaking Free From AI Vendor Lock-in: Integrating GitHub Models with Docker cagent

The landscape of AI development is rapidly evolving, and one of the most exciting developments in 2025 from Docker is the release of Docker cagent. cagent is Docker’s open-source multi-agent runtime that orchestrates AI agents through declarative YAML configuration. Rather than managing Python environments, SDK versions, and orchestration logic, developers define agent behavior in a single configuration file and execute it with “cagent run”.In this article, we’ll explore how cagent’s integration with GitHub Models delivers true vendor independence, demonstrate building a real-world podcast generation agent that leverages multiple specialized sub-agents, and show you how to package and distribute your AI agents through Docker Hub. By the end, you’ll understand how to break free from vendor lock-in and build AI agent systems that remain flexible, cost-effective, and production-ready throughout their entire lifecycle.

What is Docker cagent?

cagent is Docker’s open-source multi-agent runtime that orchestrates AI agents through declarative YAML configuration. Rather than managing Python environments, SDK versions, and orchestration logic, developers define agent behavior in a single configuration file and execute it with “cagent run”. 

Some of the key features of Docker cagents:

Declarative YAML Configuration: single-file agent definitions with model configuration, clear instructions, tool access, and delegation rules to interact and coordinate with sub-agents

Multi-Provider Support: OpenAI, Anthropic, Google Gemini, and Docker Model Runner (DMR) for local inference. 

MCP Integration support: Leverage MCP (Stdio, HTTP, SSE) for connecting external tools and services

Secured Registry Distribution: Package and share agents securely via Docker Hub using standard container registry infrastructure.

Built-In Reasoning Tools: “think’, “todo” and “memory” capabilities for complex problem solving workflows.

The core value proposition is simple: declare what your agent should do, and cagent handles your execution. Each agent operates with isolated context, specialized tools via the Model Context Protocol (MCP), and configurable models. Agents can delegate tasks to sub-agents, creating hierarchical teams that mirror human organizational structures.

What are GitHub Models?

GitHub Models is a suite of developer tools that take you from AI idea to deployment, including a model catalog, prompt management, and quantitative evaluations.GitHub Models provides rate-limited free access to production-grade language models from OpenAI (GPT-4o, GPT-5, o1-preview), Meta (Llama 3.1, Llama 3.2), Microsoft (Phi-3.5), and DeepSeek models.The advantage with GitHub Models are you need to Authenticate only once via GitHub Personal Access Tokens and you can plug and play any models of your choice supported by GitHub Models.

You can browse to GitHub Marketplace at https://github.com/marketplace to see the list of all models supported. Currently GitHub supports all the popular models and the list continues to grow. Recently, Anthropic Claude models were also added.

Figure 1.1: GitHub Marketplace displaying list of all models available on the platform

GitHub has designed its platform, including GitHub Models and GitHub Copilot agents, to support production-level agentic AI workflows, offering the necessary infrastructure, governance, and integration points.GitHub Models employs a number of content filters. These filters cannot be turned off as part of the GitHub Models experience. If you decide to employ models through Azure AI  or a paid service, please configure your content filters to meet your requirements.

To get started with GitHub Models, visit https://docs.github.com/en/github-models/quickstart which contains detailed quick start guides. 

Configuring cagent with GitHub Models

GitHub Models OpenAI-compatible API allows straightforward integration with cagent by treating it as a custom OpenAI provider with modified base URL and authentication.

In this article, we will create and deploy a PodCast Generator agent using Github models and show you how easy it is to deploy and share AI agents by deploying it to Docker Hub registry. It is necessary to create a fine-grained personal access token by navigating to this url: https://github.com/settings/personal-access-tokens/new  

Figure 1.2: Generating a new personal access token (PAT) from GitHub developer settings.

Prerequisites

Docker Desktop 4.49+ with MCP Toolkit enabled​

GitHub Personal Access Token with models scope​

Download cagent binary from https://github.com/docker/cagent repository. Place it inside the folder C:Dockercagent. Run .cagent-exe –help to see more options.

Define your agent

I will showcase a simple podcast generator agent, which I created months ago during my testing of Docker cagent. This Agent’s purpose is to generate podcasts by sharing blogs/articles/youtube videos.Below Podcastgenerator yaml file describes a sophisticated multi-agent workflow for automated podcast production, leveraging GitHub Models and MCP tools (DuckDuckGo) for external data access. The DuckDuckGo MCP server runs in an isolated Docker container managed by the MCP gateway. To learn more about docker MCP server and MCP Gateway refer to official product documentation at https://docs.docker.com/ai/mcp-catalog-and-toolkit/mcp-gateway/. The root agent uses sub_agents: [“researcher”, “scriptwriter”] to create a hierarchical structure where specialized agents handle domain-specific tasks. 

sunnynagavo55_podcastgenerator.yaml

#!/usr/bin/env cagent run

agents:
root:
description: "Podcast Director – Orchestrates the entire podcast creation workflow and generates text file"
instruction: |
You are the Podcast Director responsible for coordinating the entire podcast creation process.

Your workflow:
1. Analyze input requirements (topic, length, style, target audience)
2. Delegate research to the research agent which can open duck duck go browser for researching
3. Pass the researched information to the scriptwriter for script creation
4. Output is generated as a text file which can be saved to file or printed out
5. Ensure quality control throughout the process

Always maintain a professional, engaging tone and ensure the final podcast meets broadcast standards.
model: github-model
toolsets:
– type: mcp
command: docker
args: ["mcp", "gateway", "run", "–servers=duckduckgo"]
sub_agents: ["researcher", "scriptwriter"]
researcher:
model: github-model
description: "Podcast Researcher – Gathers comprehensive information for podcast content"
instruction: |
You are an expert podcast researcher who gathers comprehensive, accurate, and engaging information.

Your responsibilities:
– Research the given topic thoroughly using web search
– Find current news, trends, and expert opinions
– Gather supporting statistics, quotes, and examples
– Identify interesting angles and story hooks
– Create detailed research briefs with sources
– Fact-check information for accuracy

Always provide well-sourced, current, and engaging research that will make for compelling podcast content.
toolsets:
– type: mcp
command: docker
args: ["mcp", "gateway", "run", "–servers=duckduckgo"]
scriptwriter:
model: github-model
description: "Podcast Scriptwriter – Creates engaging, professional podcast scripts"
instruction: |
You are a professional podcast scriptwriter who creates compelling, conversational content.

Your expertise:
– Transform research into engaging conversational scripts
– Create natural dialogue and smooth transitions
– Add hooks, sound bite moments, and calls-to-action
– Structure content with clear intro, body, and outro
– Include timing cues and production notes
– Adapt tone for target audience and podcast style
– Create multiple format options (interview, solo, panel discussion)

Write scripts that sound natural when spoken and keep listeners engaged throughout.
toolsets:
– type: mcp
command: docker
args: ["mcp", "gateway", "run", "–servers=filesystem"]
models:
github-model:
provider: openai
model: openai/gpt-5
base_url: https://models.github.ai/inference
env:
OPENAI_API_KEY: ${GITHUB_TOKEN}

Note: Since we are using DuckDuckGo MCP server, make sure to add and install this MCP server from MCP catalog on your docker desktop

Running your Agent on Local Machine

Make sure to update your GitHub PAT token and Run the below command to run your agent from the root folder where your cagent binaries reside.

cagent run ./sunnynagavo55_podcastgenerator.yaml

Pushing your Agent as Docker Image

Run the below command to push your agent as a docker image to your favorite registry to share it with your team.

cagent push Sunnynagavo55/Podcastgenerator

You can see your published images inside your repositories as shown below. 

Congratulations! Now we have our first AI Agent created using cagent and deployed to Docker Hub.

Pulling your Agent as Docker Image on a different machine

Run the below command to pull your docker image agent, created by your teammate, which gets the agent yaml file and saves it in the current directory.

cagent pull Sunnynagavo55/Podcastgenerator

Alternatively, you can run the same agent directly without pulling the image by using the below command.

cagent run Sunnynagavo55/Podcastgenerator

Note: Above Podcastgenerator example agent has been added to Docker/cagent GitHub repository under examples folder. Give it a try and share your experience. https://github.com/docker/cagent/blob/main/examples/podcastgenerator_githubmodel.yaml

Conclusion

The traditional AI development workflow locks you into specific providers, requiring separate API keys, managing multiple billing accounts, and navigating vendor-specific SDKs. cagent with GitHub Models fundamentally changes this equation by combining Docker’s declarative agent framework with GitHub’s unified model marketplace. This integration grants you true vendor independence—a single GitHub Personal Access Token provides access to models from OpenAI, Meta, Microsoft, Anthropic, and DeepSeek, eliminating the friction of managing multiple credentials and authentication schemes.

The future of AI development isn’t about choosing a vendor and committing to their ecosystem. Instead, it’s about building systems flexible enough to adapt as the landscape evolves, new models emerge, and your business requirements change. cagent and GitHub Models make that architectural freedom possible today.

What are you waiting for? Start building now with the power of cagent and GitHub Models and share your story with us.

Resources

To learn more about docker cagent, read the product documentation from https://docs.docker.com/ai/cagent/

For more information about cagent, see the GitHub repository. Give this repository a star and let us know what you build.

Quelle: https://blog.docker.com/feed/

Docker Model Runner now supports vLLM on Windows

Great news for Windows developers working with AI models: Docker Model Runner now supports vLLM on Docker Desktop for Windows with WSL2 and NVIDIA GPUs!

Until now, vLLM support in Docker Model Runner was limited to Docker Engine on Linux. With this update, Windows developers can take advantage of vLLM’s high-throughput inference capabilities directly through Docker Desktop, leveraging their NVIDIA GPUs for accelerated local AI development.

What is Docker Model Runner?

For those who haven’t tried it yet, Docker Model Runner is our new “it just works” experience for running generative AI models.

Our goal is to make running a model as simple as running a container.

Here’s what makes it great:

Simple UX: We’ve streamlined the process down to a single, intuitive command: docker model run <model-name>.

Broad GPU Support: While we started with NVIDIA, we’ve recently added Vulkan support. This is a big deal—it means Model Runner works on pretty much any modern GPU, including AMD and Intel, making AI accessible to more developers than ever.

vLLM: Perform high-throughput inference with an NVIDIA GPU

What is vLLM?

vLLM is a high-throughput inference engine for large language models. It’s designed for efficient memory management of the KV cache and excels at handling concurrent requests with impressive performance. If you’re building AI applications that need to serve multiple requests or require high-throughput inference, vLLM is an excellent choice. Learn more here.

Prerequisites

Before getting started, make sure you have the prerequisites for GPU support:

Docker Desktop for Windows (starting with Docker Desktop 4.54)

WSL2 backend enabled in Docker Desktop

NVIDIA GPU with updated drivers with compute capability >= 8.0

GPU support configured in Docker Desktop

Getting Started

Step 1: Enable Docker Model Runner

First, ensure Docker Model Runner is enabled in Docker Desktop. You can do this through the Docker Desktop settings or via the command line:

docker desktop enable model-runner –tcp 12434

Step 2: Install the vLLM Backend

In order to be able to use vLLM, install the vLLM runner with CUDA support:

docker model install-runner –backend vllm –gpu cuda

Step 3: Verify the Installation

Check that both inference engines are running:

docker model install-runner –backend vllm –gpu cuda

You should see output similar to:

Docker Model Runner is running

Status:
llama.cpp: running llama.cpp version: c22473b
vllm: running vllm version: 0.12.0

Step 4: Run a Model with vLLM

Now you can pull and run models optimized for vLLM. Models with the -vllm suffix on Docker Hub are packaged for vLLM:

docker model run ai/smollm2-vllm "Tell me about Docker."

Troubleshooting Tips

GPU Memory Issues

If you encounter an error like:

ValueError: Free memory on device (6.96/8.0 GiB) on startup is less than desired GPU memory utilization (0.9, 7.2 GiB).

You can configure the GPU memory utilization for a specific mode:

docker model configure –gpu-memory-utilization 0.7 ai/smollm2-vllm

This reduces the memory footprint, allowing the model to run alongside other GPU workloads.

Why This Matters

This update brings several benefits for Windows developers:

Production parity: Test with the same inference engine you’ll use in production

Unified workflow: Stay within the Docker ecosystem you already know

Local development: Keep your data private and reduce API costs during development

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!
Quelle: https://blog.docker.com/feed/

AWS Application Migration Service supports IPv6

AWS Application Migration Service (MGN) now supports Internet Protocol version 6 (IPv6) for both service communication and application migrations. Organizations can migrate applications that use IPv6 addressing, enabling transitions to modern network infrastructures.
You can connect to AWS MGN using new dual-stack service endpoints that support both IPv4 and IPv6 communications. When migrating applications, you can transfer replication data using IPv4 or IPv6 while maintaining network connections and security. Then, during testing and cutover phases, you can use your chosen network configuration (IPv4, IPv6, or dual-stack) to launch servers in your target environment. This feature is available in every AWS Region that supports AWS MGN and Amazon Elastic Compute Cloud (Amazon EC2) dual-stack endpoints. For supported regions, see the AWS MGN Supported AWS Regions and Amazon EC2 Endpoints documentation. To learn more about AWS MGN, visit our product page or documentation. To get started, sign in to the AWS Application Migration Service Console.
Quelle: aws.amazon.com

Amazon EC2 High Memory U7i instances now available in additional regions

Amazon EC2 High Memory U7i instances with 24TB of memory (u7in-24tb.224xlarge) are now available in AWS Europe (Frankfurt), U7i instances with 16TB of memory (u7in-16tb.224xlarge) are now available in AWS Asia Pacific (Mumbai), and U7i instances with 6TB of memory (u7i-6tb.112xlarge) are now available in the AWS Europe (Paris) region. U7i instances are part of AWS 7th generation and are powered by custom fourth generation Intel Xeon Scalable Processors (Sapphire Rapids). U7in-24tb instances offer 24TiB of DDR5 memory, U7in-16tb instances offer 16TiB of DDR5 memory, and U7i-6tb instances offer 6TiB of DDR5 memory, enabling customers to scale transaction processing throughput in a fast-growing data environment.
U7i-6tb instances offer 448 vCPUs, support up to 100Gbps Elastic Block Storage (EBS) for faster data loading and backups, deliver up to 100Gbps of network bandwidth, and support ENA Express. U7in-16tb instances offer 896 vCPUs, support up to 100Gbps Elastic Block Storage (EBS) for faster data loading and backups, deliver up to 200Gbps of network bandwidth, and support ENA Express. U7in-24tb instances offer 896 vCPUs, support up to 100Gbps Elastic Block Storage (EBS) for faster data loading and backups, deliver up to 200Gbps of network bandwidth, and support ENA Express. U7i instances are ideal for customers using mission-critical in-memory databases like SAP HANA, Oracle, and SQL Server.
To learn more about U7i instances, visit the High Memory instances page.
Quelle: aws.amazon.com

Amazon EC2 I7i instances now available in additional AWS regions

Amazon Web Services (AWS) announces the availability of high performance Storage Optimized Amazon EC2 I7i instances in AWS Asia Pacific (Singapore, Jakarta), Europe (Stockholm) regions. Powered by 5th generation Intel Xeon Scalable processors with an all-core turbo frequency of 3.2 GHz, these instances deliver up to 23% better compute performance and more than 10% better price performance over previous generation I4i instances. Powered by 3rd generation AWS Nitro SSDs, I7i instances offer up to 45TB of NVMe storage with up to 50% better real-time storage performance, up to 50% lower storage I/O latency, and up to 60% lower storage I/O latency variability compared to I4i instances. I7i instances are ideal for I/O intensive and latency-sensitive workloads that demand very high random IOPS performance with real-time latency to access small to medium size datasets (multi-TBs). I7i instances support torn write prevention feature with up to 16KB block sizes, enabling customers to eliminate database performance bottlenecks. I7i instances are available in eleven sizes – nine virtual sizes up to 48xlarge and two bare metal sizes – delivering up to 100Gbps of network bandwidth and 60Gbps of Amazon Elastic Block Store (EBS) bandwidth. To learn more, visit the I7i instances page.
Quelle: aws.amazon.com

Amazon Cognito identity pools now support private connectivity with AWS PrivateLink

Amazon Cognito identity pools now support AWS PrivateLink, enabling you to securely exchange federated identities for AWS credentials through private connectivity between your virtual private cloud (VPC) and Cognito. This eliminates the need to route authentication traffic over the public internet, providing enhanced security for your workloads. Identity pools map authenticated and guest identities to your AWS Identity and Access Management (IAM) roles and provide temporary AWS credentials, with this new feature, through a secure and private connection. You can use PrivateLink connections in all AWS Regions where Amazon Cognito identity pools are available, except AWS China (Beijing) Region, operated by Sinnet, and AWS GovCloud (US) Regions. Creating VPC endpoints on AWS PrivateLink will incur additional charges; refer to AWS PrivateLink pricing page for details. You can get started by creating an AWS PrivateLink VPC interface endpoint for Amazon Cognito identity pools using the AWS Management Console, AWS Command Line Interface (CLI), AWS Software Development Kits (SDKs), AWS Cloud Development Kit (CDK), or AWS CloudFormation. To learn more, refer to the documentation on creating a VPC interface endpoint and Amazon Cognito’s developer guide. 
Quelle: aws.amazon.com

Amazon WorkSpaces Secure Browser introduces Web Content Filtering

Amazon WorkSpaces Secure Browser now includes Web Content Filtering, a comprehensive security and compliance feature that enables organizations to control and monitor web content access. This new capability allows administrators to define granular access policies, block specific URLs or entire domain categories using 25+ predefined categories, and seamlessly integrate with Session Logger for enhanced monitoring and compliance reporting. While existing Chrome policies for domain control remain supported, Web Content Filtering provides a more comprehensive way to control web access through category-based filtering and improved logging capabilities. Organizations can better manage their remote work security and compliance requirements through centralized policy management that scales across the enterprise. IT security teams can implement default-deny policies for high-security environments, while compliance officers benefit from detailed logging and monitoring capabilities. The feature maintains flexibility by allowing customized policies and exceptions based on specific business needs. This feature is available at no additional cost in 10 AWS Regions, including US East (N. Virginia), US West (Oregon), Canada (Central), Europe (Frankfurt, London, Ireland), and Asia Pacific (Tokyo, Mumbai, Sydney, Singapore). WorkSpaces Secure Browser offers pay-as-you go pricing. To get started with WorkSpaces Secure Browser, see Getting Started with Amazon WorkSpaces Secure Browser. You can enable this feature in your AWS console and automatically migrate any browser policies for URL Blocklists or URL Allowlists. To learn more about the feature, please refer to the feature documentation.
Quelle: aws.amazon.com