Amazon EC2 M8azn instances are now available in US East (Ohio) Region

Starting today, Amazon EC2 M8azn instances are now available in US East (Ohio) Region. These general purpose high-frequency high-network instances are powered by fifth generation AMD EPYC (formerly code named Turin) processors and offer the highest maximum CPU frequency, 5GHz in the cloud. M8azn instances offer up to 2x compute performance compared to previous generation M5zn instances, and up to 24% higher performance than M8a instances. M8azn instances deliver up to 4.3x higher memory bandwidth and 10x larger L3 cache compared to M5zn instances allowing latency-sensitive and compute-intensive workloads to achieve results faster. These instances also offer up to 2x networking throughput and up to 3x EBS throughput versus M5zn instances. Built on the AWS Nitro System using sixth generation Nitro Cards, these instances are ideal for applications such as real-time financial analytics, high-performance computing, high-frequency trading (HFT), CI/CD, intensive gaming, and simulation modeling for the automotive, aerospace, energy, and telecommunication industries. M8azn instances are available in 9 sizes ranging from 2 to 96 vCPUs with up to 384 GiB of memory, including two bare metal variants. To get started, sign in to the AWS Management Console. For more information visit the Amazon EC2 M8azn instance page.
Quelle: aws.amazon.com

Amazon EC2 R8a instances are now available in Asia Pacific (Tokyo) Region

Starting today, Amazon EC2 R8a instances are now available in Asia Pacific (Tokyo) Region. These instances, feature 5th Gen AMD EPYC processors (formerly code named Turin) with a maximum frequency of 4.5 GHz, deliver up to 30% higher performance, and up to 19% better price-performance compared to R7a instances. R8a instances deliver 45% more memory bandwidth compared to R7a instances, making these instances ideal for latency sensitive workloads. Compared to Amazon EC2 R7a instances, R8a instances provide up to 60% faster performance for GroovyJVM, allowing higher request throughput and better response times for business-critical applications. Built on the AWS Nitro System using sixth generation Nitro Cards, R8a instances are ideal for high performance, memory-intensive workloads, such as SQL and NoSQL databases, distributed web scale in-memory caches, in-memory databases, real-time big data analytics, and Electronic Design Automation (EDA) applications. R8a instances offer 12 sizes including 2 bare metal sizes. Amazon EC2 R8a instances are SAP-certified, and providing 38% more SAPS compared to R7a instances. To get started, sign in to the AWS Management Console. For more information about the new instances, visit the Amazon EC2 R8a instance page.
Quelle: aws.amazon.com

Amazon CloudWatch Application Signals adds new SLO capabilities

Amazon CloudWatch Application Signals now offers three new console based capabilities for Service Level Objectives (SLOs): SLO Recommendations, Service-Level SLOs, and SLO Performance Report. CloudWatch Application Signals helps customers monitor and improve application performance on AWS. It automatically collects data from applications running on services like Amazon EC2, Amazon ECS, and Lambda. Previously, customers had to manually set SLO thresholds without data-driven guidance, often leading to misconfigured targets and alert fatigue. They also lacked visibility into overall service health across operations and had no way to track reliability trends over time or generate calendar periods performance reports. These new capabilities address each of those gaps, making it easier to set data-driven reliability targets, monitor overall service health, and identify reliability trends before they become incidents. SLO Recommendations analyzes 30 days of service metrics (P99 latency and error rates) to suggest appropriate reliability targets. Customers can validate proposed targets before implementation to help reduce the cognitive and operational effort needed for new SLO deployments. Service-Level SLOs provide a holistic view of service reliability across all operations, simplifying alignment between technical monitoring and business objectives. SLO Performance Report provides historical analysis aligned with calendar periods, supporting daily, weekly, and monthly intervals. These capabilities support key use cases including proactive reliability management, SLO threshold optimization, and business reporting aligned with calendar periods. These features are available in all AWS Regions where Amazon CloudWatch Application Signals is available. Pricing is based on the number of inbound and outbound requests to and from applications, plus Service Level Objectives charges, with each SLO generating 2 application signals per service level indicator metric period.
Quelle: aws.amazon.com

Secure Agent Execution with NanoClaw and Docker Sandboxes

Agents have enormous potential to power secure, personal AI assistants that automate complex tasks and workflows. Realizing that potential, however, requires strong isolation, a codebase that teams can easily inspect and understand, and clear control boundaries they can trust. 

Today, NanoClaw, a lightweight agent framework, is integrating with Docker Sandboxes to deliver secure-by-design agent execution. With this integration, every NanoClaw agent runs inside a disposable, MicroVM-based Docker Sandbox that enforces strong operating system level isolation. Combined with NanoClaw’s minimal attack surface and fully auditable open-source codebase, the stack is purpose-built to meet enterprise security standards from day one. 

From Powerful Agents to Trusted Agents

The timing reflects a broader shift in the agent landscape. Agents are no longer confined to answering prompts. They are becoming operational systems.

Modern agents connect to live data sources, execute code, trigger workflows, and operate directly within collaboration platforms such as Slack, Discord, WhatsApp, and Telegram. They are evolving from conversational interfaces into active participants in real work.

That shift from prototype to production introduces two critical requirements: transparency and isolation.

First, transparency. 

Organizations need agents built on code they can inspect and understand, with clear visibility into dependencies, source files, and core behavior. NanoClaw delivers exactly that. Its agent behavior is powered by just 15 core source files, with lines of code up to 100 times smaller than many alternatives. That simplicity makes it dramatically easier to evaluate risk, understand system behavior, and build with confidence.

Second, isolation. 

Agents must run within restricted environments, with tightly controlled filesystems and limited host access. Through the Docker Sandbox integration, each NanoClaw agent runs inside a dedicated MicroVM that mirrors your development environment, with only your project workspace mounted in. Agents can install packages, modify configurations, and even run Docker itself, while your host machine remains untouched.

In traditional environments, enabling more permissive agent modes can introduce significant risk. Inside a Docker Sandbox, that risk is contained within an isolated MicroVM that can be discarded instantly. This makes advanced modes such as –dangerously-skip-permissions practical in production because their impact is fully confined. 

The result is greater autonomy without greater exposure.

Agents no longer require constant approval prompts to move forward. They can install tools, adapt their environment, and iterate independently. Because their actions are contained within secure, disposable boundaries, they can safely explore broader solution spaces while preserving enterprise-grade safeguards.

Powerful agents are easy to prototype. Trusted agents are built with isolation by design.

Together, NanoClaw and Docker make secure-by-default the standard for agent deployment.

“Infrastructure needs to catch up to the intelligence of agents. Powerful agents require isolation,” said Mark Cavage, President and Chief Operating Officer at Docker, Inc. “Running NanoClaw inside Docker Sandboxes gives the agent a secure, disposable boundary, so it can run freely, safely.”

“Teams trust agents to take on increasingly complex and valuable work, but securing agents cannot be based on trust,” said Gavriel Cohen, CEO and co-founder of NanoCo and creator of NanoClaw. “It needs to be based on a provably secure hard boundary, scoped access to data and tools, and control over the actions agents are allowed to take. The security model should not limit what agents can accomplish. It should make it safe to let them loose. NanoClaw was built on that principle, and Docker Sandboxes provides the enterprise-grade infrastructure to enforce it.”

Get Started

Ready to try it out? Deploy NanoClaw in Docker Sandboxes today:

GitHub: github.com/qwibitai/nanoclaw

Docker Sandboxes: Learn more

Quelle: https://blog.docker.com/feed/

How to Run Claude Code with Docker: Local Models, MCP Servers, and Secure Sandboxes

Claude Code is quickly becoming a go-to AI coding assistant for developers and increasingly for non-developers who want to build with code. But to truly unlock its potential, it needs the right local infrastructure, tool access, and security boundaries.

In this blog, we’ll show you how to run Claude Code with Docker to gain full control over your models, securely connect it to real-world tools using MCP servers, and safely give it autonomy inside isolated sandboxes. Read on for practical resources to help you build a secure, private, and cost-efficient AI-powered development workflow.

Run Claude Code Locally with Docker Model Runner

This post walks through how to configure Claude Code to use Docker Model Runner, giving you full control over your data, infrastructure, and spend. Claude Code supports custom API endpoints through the ANTHROPIC_BASE_URL environment variable. Since Docker Model Runner exposes an Anthropic-compatible API, integrating the two is simple. This allows you to run models locally while maintaining the Claude Code experience.

With your model running under your control, it’s time to connect Claude Code to tools to expand its capabilities. 

How to Add MCP Servers to Claude Code with Docker MCP Toolkit

MCP is becoming the de facto standard to connect coding agents like Claude Code to your real tools, databases, repositories, browsers, and APIs. With more than 300 pre-built,containerized MCP servers, one-click deployment in Docker Desktop, and automatic credential handling, developers can connect Claude Code to trusted environments in minutes — not hours. No dependency issues, no manual configuration, just a consistent, secure workflow across Mac, Windows, and Linux.

In this guide, you’ll learn how to:

Set up Claude Code and connect it to Docker MCP Toolkit.

Configure the Atlassian MCP server for Jira integration.  

Configure the GitHub MCP server to access repository history and run git commands.

Configure the Filesystem MCP server to scan and read your local codebase.

Automate tech debt tracking by converting 15 TODO comments into tracked Jira tickets.

See how Claude Code can query git history, categorize issues, and create tickets — all without leaving your development environment.

Prefer a video walkthrough? Check out our tutorial on how to add MCP servers to Claude Code with Docker MCP Toolkit.

Connecting tools unlocks powerful automation but with greater capability comes greater responsibility. If you’re going to let agents take action, you need to run them safely.

Docker Sandboxes: Run Claude Code and Other Coding Agents Unsupervised (but Safely)

As Claude Code moves from suggestions to real-world actions like installing packages and modifying files, isolation becomes critical.

Sandboxes provide disposable, isolated environments purpose-built for coding agents. Each agent runs in an isolated version of your development environment, so when it installs packages, modifies configurations, deletes files, or runs Docker containers, your host machine remains untouched.

This isolation lets you run agents like Claude Code with autonomy. Since they can’t harm your computer, let them run free. Check out our announcement on more secure, easier to use, and more powerful Docker Sandboxes. 

Summary 

Claude Code is powerful on its own but when used with Docker, it becomes a secure, extensible, and fully controlled AI development environment.

In this post, you learned how to:

Run Claude Code locally using Docker Model Runner with an Anthropic-compatible API endpoint, giving you full control over your data, infrastructure, and cost.

Connect Claude Code to tools using the Docker MCP Toolkit, with 300+ containerized MCP servers for services like Jira, GitHub, and local filesystems — all deployable in one click.

Run Claude Code safely in Docker Sandboxes, isolated environments that allow coding agents to operate autonomously without risking your host machine.

By combining local model execution, secure tool connectivity, and isolated runtime environments, Docker enables you to run AI coding agents like Claude Code with both autonomy and control, making them practical for real-world development workflows.
Quelle: https://blog.docker.com/feed/

Achieving Test Reliability for Native E2E Testing: Beyond Fixing Broken Tests

End-to-end (E2E) tests are particularly important for native applications that run on various platforms (Android/iOS), screen sizes, and OS versions. E2E testing picks up differences in behavior across this fragmented ecosystem.

But keeping E2E tests reliable is often more challenging than writing them in the first place. 

The fragmented device ecosystem, gaps in test frameworks, network inconsistencies, unstable test environments, and constantly changing UI all contribute to test flakiness. Teams easily get trapped in a cycle of constantly fixing failing tests due to UI changes or environment instability rather than improving the overall reliability of their test infrastructure. They end up frustrated and hesitant to adopt E2E tests in their workflows.

Having led the native E2E testing infrastructure setup at a mid-sized company, I learned the hard way how critical it is to define and implement strategies for test ownership, observability, and notifications in ensuring long-term test stability. In this piece, I discuss the challenges I’ve seen teams face and share lessons on how to build reliable E2E systems that you actually trust.

Challenges with Reactive Test Maintenance

After setting up periodic E2E runs on the CI, our team initially focused on triaging, investigating, and fixing every failing test to improve test stability. However, even after nearly a year of patching flaky tests, the reliability of our E2E suite didn’t improve, and engineers slowly lost confidence in the usefulness and reliability of the test suite.

I learned that teams that focus primarily on fixing broken tests often end up in a cycle of chasing failures without fixing the root causes of instability. This reactive approach creates several problems:

Test suite fragility: If teams continue patching broken tests without addressing real issues with either the underlying app changes or unstable environments, the test suite becomes increasingly brittle. Over time, tests fail for reasons unrelated to real product defects, making it harder to distinguish genuine regressions from noise.

High maintenance overhead: Debugging and fixing flaky tests often requires a significant amount of developer time and resources. Unlike unit tests, which run quickly and fail in isolation, E2E tests execute against the development, staging, or pre-production environment, making failures harder to reproduce and diagnose. Adjusting E2E tests to work across devices with different screen sizes or OS versions requires additional work, making fixes a non-trivial task.

Reduces trust in the test suites: When failures are common and noisy, teams lose confidence in the E2E suite, and they often start ignoring test failures. This undermines the purpose of having automated tests in the first place. Instead, teams rely on local dev testing or manual QA cycles to validate changes. Over time, the suite becomes more of a liability than a safeguard, slowing down delivery instead of enabling it.

A reactive approach to fixing E2E tests slows down release cycles. Developers must spend significant amounts of time repeatedly fixing and rerunning failing tests, while teams rely on manual QA to catch actual regressions.

Building a Reliable E2E Infrastructure

When our test suite stability didn’t improve after more than a year of chasing failures, we took a step back to analyze historical results and look for patterns. 

We discovered that a significant number of failures could be attributed to an unstable environment or an unexpected state of the test account. For example, spikes in API latencies in the test environment frequently caused false negatives, adding to the noise. Similarly, tests run against existing user accounts could become inconsistent due to a past failure or if multiple tests attempted to use the same account.

I learned that investing in improving your test infrastructure is the only way to get to a stable and reliable native E2E testing workflow. This involves stabilizing the test environment, defining clear test ownership, reducing noisy alerts, and improving observability. Let’s look at each of these in more detail.

Stabilize the Test Environment

Many flaky E2E tests can be traced back to inconsistencies in the underlying environment, such as sporadic device issues, network instability, or API downtime in a staging environment. 

To avoid noisy and unreliable tests, ensure you have a stable and standardized test environment with the following test practices:

Standardize device and environment setup: Device and test environment stability issues heavily impact test stability. To reduce API downtimes, isolate the E2E testing environment from the developer or staging environment to prevent interference from unstable builds and experimental features. Teams could either build a stable pre-prod environment that uses a production-ready artifact or spin up ephemeral environments for each E2E run to ensure consistency. Running tests on standardized device images or containerized emulators with consistent OS versions, configurations, and resources further improves stability. For critical flows, you can schedule periodic runs on physical device farms to validate against real hardware while keeping day-to-day tests stable and cost-effective.

Isolating test data per session: A test that makes modifications to any data should start from a clean slate. For instance, while testing a todo application, every test session should use a new test account to avoid unexpected scenarios because of unpredictable account state. To speed up tests, execute setup scripts in `before` hooks to handle account creation, and seed any required data automatically.

Mocking certain network responses: While an E2E test is meant to test the entire user journey with real data, in some cases, it’s necessary to mock specific API responses to maintain a predictable test environment. For instance, if your application relies on A/B tests or uses feature flags, different sessions might receive different experiences based on the user allocation. This can cause unexpected failures unrelated to actual regressions. Mocking these responses in test builds ensures consistency across sessions, and it avoids building complex test cases that handle different user experiences.

Establish Clear Test Ownership

When a test fails, it’s often unclear who’s responsible for investigating and fixing it. Over time, such an absence of clear test ownership and accountability results in unreliable, unmaintained, and flaky tests. 

Assigning ownership of tests based on the ownership of product features can alleviate this problem to some extent. Ideally, the owning team should be responsible for writing, maintaining, and fixing tests for their critical flows. This ownership model ensures that failures are triaged quickly and that tests are updated as the product evolves instead of becoming stale and unstable. 

Test ownership becomes challenging in codebases where multiple product teams own parts of a single user flow. For example, in a shopping application, different teams might own the login, product catalog, and checkout experiences. If a checkout flow test fails at the login step, it can be confusing which team should triage the issue. Without a clear policy, the failure might be ignored, or multiple teams might end up duplicating the effort. 

To handle these scenarios, set a policy that defines the first point of contact (POC) per test based on the end-user experience. This ensures a single team takes responsibility for triaging the issue, but that fixes can be handed off to upstream dependencies as needed.

Reduce Noise and Improve Alerting

A common challenge with native E2E testing is noisy alerts due to flaky or failing tests. Teams are often flooded with non-actionable alerts when flaky tests fail because of transient network or device issues. Repeated failure notifications about known bugs can also lead to alert fatigue.

The following techniques reduce this noise so that teams are only notified for actionable failures:

Mute flaky tests and known bugs: Instead of reporting and notifying teams about all test failures, allow alerts from tests that are identified as flaky or linked to known issues to be muted without a code change. You can manage muted tests through a remote configuration, environment variables, or a tool like BrowserStack. Flag them for follow-up work, but let alerts only go out for new or unexpected regressions. Muting is particularly important for E2E tests since fixing failing tests often requires significant developer time and resources. Repeated alerts can be especially distracting for developers.

Enrich notifications with failure details: Instead of generic failure messages, include details such as the failing user flow, commit details, the error message, and links to logs or dashboards in your alerts. These details help developers identify and triage issues quicker, resulting in faster fixes and higher confidence in the suite.

Track test metrics and trends: In addition to test suite level reports, track and analyze the historical results of your tests to understand failure rates, flakiness trends, and failure hotspots. For example, if you observe repeated failures in the login flow, it might indicate unstable tests or sporadic bugs in that flow. Tracking these metrics over time provides visibility into whether the E2E suite is improving or degrading, and it helps you prioritize stabilization efforts based on impact.

Hybrid Strategies for Scaling E2E with Dockerized Emulators

Running native E2E tests at scale is challenging due to cost and resource constraints. Device farms that provide access to real cloud-based devices are expensive for running a large suite of tests at high frequency. This becomes a constraint for integrating E2E tests with the CI pipeline that executes with every pull request before the changes are merged. 

As mentioned earlier, a hybrid testing approach that uses Dockerized emulators for PR builds alongside real devices for periodic runs can help you overcome this challenge. When our team moved PR checks to Dockerized emulators, we got faster feedback and significantly reduced cloud device costs.

Containerized device runners can be spun up quickly in CI. For example, the docker-android image lets you run an Android emulator in a containerized Docker environment. It supports multiple device profiles, OS versions, and UI-testing frameworks such as Appium and Espresso. Teams can easily integrate these emulators into CI pipelines to run E2E tests at scale without investing in a huge testing budget

If you are building E2E tests for mobile web, you can also use containerized browser images to run tests consistently across different environments to further reduce cost and setup complexity.

There’s Hope!

If your team has been chasing native E2E test failures like we were, you’re probably also burning engineering time and resources without improving test stability. I hope that this article has encouraged you that there’s a better way: improving your test environment, device setup, alerting, and observability. 

Your best first step is to analyze your historical test failures and categorize them into buckets. Use these insights to define actionable items for reducing flakiness. Use this roadmap to identify test infrastructure investments or process changes that will deliver the most impact. 

After our team invested in test infrastructure improvements, we saw a clear improvement in stability. Developers had a better understanding of real failures, and the number of noisy alerts was reduced. Flakiness didn’t disappear entirely, but the improved reliability of the test suite helped us catch multiple native app regressions before the changes were released to production.

I hope this article will help you achieve similar wins.

Quelle: https://blog.docker.com/feed/

AWS Lambda Managed Instances now supports Rust

AWS Lambda Managed Instances now supports Rust, enabling developers to run high-performance Rust-based functions on Lambda-managed Amazon EC2 instances while maintaining Lambda’s operational simplicity. This combination makes it easier than ever to run performance-critical applications without the complexity of managing servers. Lambda Managed Instances gives Lambda developers access to specialized compute configurations, including the latest-generation processors and high-bandwidth networking. Lambda Managed Instances are fully managed EC2 instances, with built-in routing, load-balancing and auto-scaling, with no operational overhead. They combine Lambda’s serverless experience with EC2 pricing advantages including Compute Savings Plans and Reserved Instances. Rust support for Lambda Managed Instances combines these benefits with the performance and efficiency of Rust, including parallel request processing within each execution environment. Together, using Lambda Managed Instances with Rust maximizes utilization and price-performance. Rust support for Lambda Managed Instances is available today in all AWS Regions where Lambda Managed Instances is available. To get started with Rust on Lambda Managed Instances, see the Lambda documentation. To learn more about more about this release, see the release notes.
Quelle: aws.amazon.com

AWS Network Firewall Launch in the AWS European Sovereign Cloud

Starting today, AWS Network Firewall is available in the AWS European Sovereign Cloud. With this launch, European customers, particularly those in highly regulated industries, government agencies, and organizations with strict data sovereignty requirements, can deploy AWS Network Firewall to protect their most sensitive workloads while maintaining full compliance with European Union (EU) data protection regulations. Through this expansion, customers using the AWS European Sovereign Cloud can leverage the same AWS Network Firewall capabilities available in other AWS Regions, while ensuring that all data and operations remain entirely within EU borders and under EU-based control. AWS Network Firewall is a managed firewall service that provides essential network protections for your Amazon Virtual Private Clouds (VPCs). The service automatically scales with network traffic volume to provide high-availability protections without the need to set up or maintain the underlying infrastructure. To learn more about AWS Network Firewall availability, visit the AWS Region Table. For more information, please see the AWS Network Firewall product page and the service documentation.
Quelle: aws.amazon.com

Amazon MSK announces support for Standard brokers Graviton-3 instance in Africa (Cape Town) region

You can now create provisioned Amazon Managed Streaming for Apache Kafka (Amazon MSK) clusters with Standard brokers running on AWS Graviton3-based M7g instances in Africa (Cape Town) region.
Graviton M7g instances for Standard brokers deliver up to 24% compute cost savings and up to 29% higher write and read throughput over comparable MSK clusters running on M5 instances. To get started, create a new cluster with M7g brokers or upgrade your M5 cluster to M7g through the Amazon MSK console or the Amazon CLI and read our Amazon MSK Developer Guide for more information.
Quelle: aws.amazon.com