Docker Model Runner now included with the Universal Blue family

Running large language models (LLMs) and other generative AI models can be a complex, frustrating process of managing dependencies, drivers, and environments. At Docker, we believe this should be as simple as docker model run.

That’s why we built Docker Model Runner, and today, we’re thrilled to announce a new collaboration with Universal Blue. Thanks to the fantastic work of these contributors,  Docker Model Runner is now included in OSes such as Aurora and Bluefin, giving developers a powerful, out-of-the-box AI development environment.

What is Docker Model Runner?

For those who haven’t tried it yet, Docker Model Runner is our new “it just works” experience for running generative AI models.

Our goal is to make running a model as simple as running a container.

Here’s what makes it great:

Simple UX: We’ve streamlined the process down to a single, intuitive command: docker model run <model-name>.

Broad GPU Support: While we started with NVIDIA, we’ve recently added Vulkan support. This is a big deal—it means Model Runner works on pretty much any modern GPU, including AMD and Intel, making AI accessible to more developers than ever.

vLLM: Perform high-throughput inference with an NVIDIA GPU

The Perfect Home for Model Runner

If you’re new to it, Universal Blue is a family of next-generation, developer-focused Linux desktops. They provide modern, atomic, and reliable environments that are perfect for “cloud-native” workflows.

As Jorge Castro who leads developer relations at Cloud Native Computing Foundation explains, “Bluefin and Aurora are reference architectures for bootc, which is a CNCF Sandbox Project. They are just two examples showing how the same container pattern used by application containers can also apply to operating systems. Working with AI models is no different – one common set of tools, built around OCI standards.”

The team already ships Docker as a core part of its developer-ready experience. By adding Docker Model Runner to the default installation (specifically in the -dx mode for developers), they’ve created a complete, batteries-included AI development environment.

There’s no setup, no config. If you’re on Bluefin/Aurora, you just open a terminal and start running models.

Get Started Today

If you’re running the latest Bluefin LTS, you’re all set when you turn on developer mode. The Docker engine and Model Runner CLI are already installed and waiting for you. Aurora’s enablement instructions are documented here.

You can run your first model in seconds:

This command will download the model (if not already cached) and run it, ready for you to interact with.

If you’re on another Linux, you can get started just as easily. Just follow the instructions on our GitHub repository.

What’s Next?

This collaboration is a fantastic example of community-driven innovation. We want to give a huge shoutout to the greater bootc enthusiast community for their forward-thinking approach and for integrating Docker Model Runner so quickly.

This is just the beginning. We’re committed to making AI development accessible, powerful, and fun for all developers.

How You Can Get Involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!
Quelle: https://blog.docker.com/feed/

Develop and deploy voice AI apps using Docker

Voice is the next frontier of conversational AI. It is the most natural modality for people to chat and interact with another intelligent being. However, the voice AI software stack is complex, with many moving parts. Docker has emerged as one of the most useful tools for AI agent deployment.

In this article, we’ll explore how to use open-source technologies and Docker to create voice AI agents that utilize your custom knowledge base, voice style, actions, fine-tuned AI models, and run on your own computer. It is based on a talk I recently gave at the Docker Captains Summit in Istanbul.

Docker and AI

Most developers consider Docker the “container store” for software. The Docker container provides a reliable and reproducible environment for developing software locally on your own machine and then shipping it to the cloud. It also provides a safe sandbox to isolate, run, and scale user-submitted software in the cloud. For complex AI applications, Docker provides a suite of tools that makes it easy for developers and platform engineers to build and deploy.

The Docker container is a great tool for running software components and functions in an AI agent system. It can run web servers, API servers, workflow orchestrators, LLM actions or tool calls, code interpreters, simulated web browsers, search engines, and vector databases.

With the NVIDIA Container Toolkit, you can access the host machine’s GPU from inside Docker containers, enabling you to run inference applications such as LlamaEdge that serve open-source AI models inside the container.

The Docker Model Runner runs OpenAI-compatible API servers for open-source LLMs locally on your own computer.

The Docker MCP Toolkit provides an easy way to run MCP servers in containers and make them available to AI agents.

The EchoKit platform provides a set of Docker images and utilizes Docker tools to simplify the deployment of complex AI workflows.

EchoKit

The EchoKit consists of a server and a client. The client could be an ESP32-based hardware device that can listen for user voices using a microphone, stream the voice data to the server, receive and play the server’s voice response through a speaker. EchoKit provides the device hardware specifications and firmware under open-source licenses. To see it in action, check out the following video demos.

EchoKit tells the story about the Diana exhibit at the MET museum

EchoKit recommends BBQ in a Texas accent

EchoKit helps a user practice for the US Civics test

You can check out the GitHub repo for EchoKit.

The AI agent orchestrator

The EchoKit server is an open-source AI service orchestrator focused on real-time voice use cases. It starts up a WebSocket server that listens for streaming audio input and returns streaming audio responses. It ties together multiple AI models, including voice activity detection (VAD), automatic speech recognition (ASR), large language models (LLM), and text-to-speech (TTS), using one model’s output as the input for the next model.

You can start an EchoKit server on your local computer and configure the EchoKit device to access it over the local WiFi network. The “edge server” setup reduces network latency, which is crucial for voice AI applications.

The EchoKit team publishes a multi-platform Docker image that you can use directly to start an EchoKit server. The following command starts the EchoKit server with your own config.toml file and runs in the background.

docker run –rm
-p 8080:8080
-v $(pwd)/config.toml:/app/config.toml
secondstate/echokit:latest-server &amp;

The config.toml file is mapped into the container to configure how the EchoKit server utilizes various AI services in its voice response workflow. The following is an example of config.toml. It starts the WebSocket server on port 8080. That’s why in the Docker command, we map the container’s port 8080 to the same port on the host. That allows the EchoKit server to be accessible through the host computer’s IP address. The rest of the config.toml specifies how to access the ASR, LLM, and TTS models to generate a voice response for the input voice data.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hellon你好n(noise)n(bgm)n(silence)n"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

The AI services configured for the above EchoKit server are as follows.

It utilizes Groq for ASR (voice-to-text) and LLM tasks. You will need to fill in your own Groq API key.

It utilizes ElevenLabs for streaming TTS (text-to-speech). You will need to fill in your own ElevenLabs API key.

Then, in the EchoKit device setup, you just need to point your device to the local EchoKit server.

ws://local-network-ip.address:8080/ws/

For more options on the EchoKit server configuration, please refer to our documentation!

The VAD server

The voice-to-text ASR is not sufficient by itself. It could hallucinate and generate nonsensical text if the input voice is not human speech (e.g., background noise, street noise, or music). It also would not know when the user has finished speaking, and the EchoKit server needs to ask the LLM to start generating a response.

A VAD model is used to detect human voice and conversation turns in the voice stream. The EchoKit team has a multi-platform Docker image that incorporates the open-source Silero VAD model. The image is much larger than the plain EchoKit server, and it requires more CPU resources to run. But it delivers substantially better voice recognition results. Here is the Docker command to start the EchoKit server with VAD in the background.

docker run –rm
-p 8080:8080
-v $(pwd)/config.toml:/app/config.toml
secondstate/echokit:latest-server-vad &amp;

The config.toml file for this Docker container also needs an additional line in the ASR section, so that the EchoKit server knows to stream incoming audio data to the local VAD service and act on the VAD signals. The Docker container runs the Silero VAD model as a WebSocket service inside the container on port 8000. There is no need to expose the container port 8000 to the host.

addr = "0.0.0.0:8080"
hello_wav = "hello.wav"

[asr]
platform = "openai"
url = "https://api.groq.com/openai/v1/audio/transcriptions"
api_key = "gsk_XYZ"
model = "whisper-large-v3"
lang = "en"
prompt = "Hellon你好n(noise)n(bgm)n(silence)n"
vad_url = "http://localhost:9093/v1/audio/vad"

[llm]
platform = "openai_chat"
url = "https://api.groq.com/openai/v1/chat/completions"
api_key = "gsk_XYZ"
model = "openai/gpt-oss-20b"
history = 20

[tts]
platform = "elevenlabs"
url = "wss://api.elevenlabs.io/v1/text-to-speech/"
token = "sk_xyz"
voice = "VOICE-ID-ABCD"

[[llm.sys_prompts]]
role = "system"
content = """
You are a comedian. Engage in lighthearted and humorous conversation with the user. Tell jokes when appropriate.

"""

We recommend using the VAD enabled EchoKit server whenever possible.

MCP services

A key feature of AI agents is to perform actions, such as making web-based API calls, on behalf of LLMs. For example, the “US civics test prep” example for EchoKit requires the agent to get exam questions from a database, and then generate responses that guide the user toward the official answer.

The MCP protocol is the industry standard for providing tools (function calls) to LLM agents. For example, the DuckDuckGo MCP server provides a search tool for LLMs to search the internet if the user asks for current information that is not available in the LLM’s pre-training data. The Docker MCP Toolkit provides a set of tools that make it easy to run MCP servers that can be utilized by EchoKit.

The command below starts a Docker MCP gateway server. The MCP protocol defines several ways for agents or LLMs to access MCP tools. Our gateway server is accessible through the streaming HTTP protocol at port 8011.

docker mcp gateway run –port 8011 –transport streaming

Next, you can add the DuckDuckGo MCP server to the gateway. The search tool provided by the DuckDuckGo MCP server is now available on HTTP port 8011.

docker mcp server enable duckduckgo

You can simply configure the EchoKit server to use the DuckDuckGo MCP tools in the config.toml file.

[[llm.mcp_server]]
server = "http://localhost:8011/mcp"
type = "http_streamable"
call_mcp_message = "Please hold on a few seconds while I am searching for an answer!"

Now, when you ask EchoKit a current event question, such as “What is the latest Tesla stock price?”, it will first call the DuckDuckGo MCP’s search tool to retrieve this information and then respond to the user.

The call_mcp_message field is a message the EchoKit device will read aloud when the server calls the MCP tool. It is needed since the MCP tool call could introduce significant latency in the response.

Docker Model Runner

The EchoKit server orchestrates multiple AI services. In the examples in this article so far, the EchoKit server is configured to use cloud-based AI services, such as Groq and ElevenLabs. However, many applications—especially in the voice AI area—require the AI models to run locally or on-premises for security, cost, and performance reasons.

Docker Model Runner is Docker’s solution to run LLMs locally. For example, the following command downloads and starts OpenAI’s open-source gpt-oss-20b model on your computer.

docker model run ai/gpt-oss

The Docker Model Runner starts an OpenAI-compatible API server at port 12434. It could be directly utilized by the EchoKit server via config.toml.

[llm]
platform = "openai_chat"
url = "http://localhost:12434/engines/llama.cpp/v1/chat/completions"
model = "ai/gpt-oss"
history = 20

At the time of this writing, the Docker Model Runner only supports LLMs. The EchoKit server still relies on cloud services, or local AI solutions such as LlamaEdge, for other types of AI services.

Conclusion

The complexity of the AI agent software stack has created new challenges in software deployment and security. Docker is a proven and extremely reliable tool for delivering software to production. Docker images are repeatable and cross-platform deployment packages. The Docker container isolates software execution to eliminate large categories of security issues.

With new AI tools, such as the Docker Model Runner and MCP Toolkit, Docker continues to address emerging challenges in AI portability, discoverability, and security.

The easiest, most reliable, and most secure way to set up your own EchoKit servers is to use Docker.

Learn more

Check out the GitHub repo for EchoKit

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers.

Get started with the MCP Toolkit: Run MCP servers easily and securely.

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Quelle: https://blog.docker.com/feed/

Amazon Quick Suite now supports memory for chat agents

We are announcing memory for chat agents in Amazon Quick Suite – a feature that allows users to get personalized responses based on their previous conversations. With this feature, Quick Suite remembers the preferences users specify in chat and generate responses that are tailored to them. Users can also view their inferred preferences and remove any memory they don’t want Quick chat agents to use. Previously, chat users needed to repeat their preferences around response format, acronyms, dashboards, and integrations in every conversation. They also had to clarify ambiguous topics and entities in chat, increasing the tedious back and forth needed to get accurate and insightful responses. Memory addresses this pain point by remembering facts and details about users in a way that ensures responses provided to users continuously learn and improve. Users also control what Quick Suite remembers about them – all the memories are viewable and removable by users, and users have the choice to start chat in Private Mode in which conversations are not used to infer memories. Memory in Quick Suite chat agents is available in US East (N. Virginia) and US West (Oregon). To learn more, visit the Amazon Quick Suite User Guide.
Quelle: aws.amazon.com

AWS Clean Rooms publishes events for member invitations and table readiness to EventBridge

AWS Clean Rooms now publishes events to Amazon EventBridge for new member invitations and table readiness, delivering real-time insights and increasing transparency to collaboration members. Invited members to a collaboration now receive an EventBridge notification when invited to a Clean Rooms collaboration, making it easier for members to review new invitations and join collaborations. Collaboration members are also notified when AWS Entity Resolution resources are associated to a collaboration, such as ID mapping tables and ID namespaces, enabling you to automatically start analysis that uses related records across collaborators’ datasets. For example, when a publisher invites an advertiser to a collaboration, the publisher can automatically run their media planning analyses as soon as the advertiser has created their ID mapping table in the collaboration, reducing time-to-action from hours to minutes and increasing transparency between collaboration members. With AWS Clean Rooms, customers can create a secure data clean room in minutes and collaborate with any company on AWS or Snowflake to generate unique insights about advertising campaigns, investment decisions, and research and development. For more information about the AWS Regions where AWS Clean Rooms is available, see the AWS Regions table. To learn more about collaborating with AWS Clean Rooms, visit AWS Clean Rooms or AWS Entity Resolution.
Quelle: aws.amazon.com

Amazon EC2 M8i-flex instances are now available in Asia Pacific (Sydney) Region

Starting today, Amazon EC2 M8i-flex instances are now available in Asia Pacific (Sydney) Region. These instances are powered by custom Intel Xeon 6 processors, available only on AWS, delivering the highest performance and fastest memory bandwidth among comparable Intel processors in the cloud. The M8i-flex instances offer up to 15% better price-performance, and 2.5x more memory bandwidth compared to previous generation Intel-based instances. They deliver up to 20% better performance than M7i-flex instances, with even higher gains for specific workloads. The M8i-flex instances are up to 30% faster for PostgreSQL databases, up to 60% faster for NGINX web applications, and up to 40% faster for AI deep learning recommendation models compared to M7i-flex instances. M8i-flex instances are the easiest way to get price performance benefits for a majority of general-purpose workloads like web and application servers, microservices, small and medium data stores, virtual desktops, and enterprise applications. They offer the most common sizes, from large to 16xlarge, and are a great first choice for applications that don’t fully utilize all compute resources. To get started, sign in to the AWS Management Console. For more information about the new instances, visit the M8i-flex instance page or visit the AWS News blog.
Quelle: aws.amazon.com

Amazon Route 53 launches detailed metrics for Resolver endpoints

Starting today, you can enable Amazon CloudWatch metrics for Route 53 to monitor the performance and health of Route 53 Resolver endpoints and any target name servers associated with your outbound Resolver endpoints. Amazon Route 53 Resolver endpoints make hybrid cloud configurations easier to manage by enabling seamless DNS query resolution between your on-premises data center and Amazon Virtual Private Cloud (Amazon VPC). The new metrics can be enabled on CloudWatch for each of your Resolver endpoints allowing you to assess the response latency of DNS queries received by the Resolver endpoints, the number of query responses that resulted in SERVFAIL, NXDOMAIN, REFUSED, or FORMERR statuses. You can also view the availability of target name servers associated with outbound Resolver endpoints, by viewing the response latency of the target server, and the number of queries that resulted in a timeout. The new metrics make it easy for you to quickly evaluate the health of your Route 53 Resolver endpoints and the associated target name servers. To learn more about the metrics, read the documentation or visit the Route 53 VPC Resolver page. Standard CloudWatch and Resolver endpoint charges are applied for using the Route 53 Resolver endpoint detailed metrics. For more information, see Amazon CloudWatch pricing and Resolver endpoint pricing.
Quelle: aws.amazon.com