Building AI agents shouldn’t be hard. According to theCUBE Research, Docker makes it easy

For most developers, getting started with AI is still too complicated. Different models, tools, and platforms don’t always play nicely together. But with Docker, that’s changing fast.

Docker is emerging as essential infrastructure for standardized, portable, and scalable AI environments. By bringing composability, simplicity, and GPU accessibility to the agentic era, Docker is helping developers and the enterprises they support move faster, safer, and with far less friction. 

Real results: Faster AI delivery with Docker

The platform is accelerating innovation: According to the latest report from theCUBE Research, 88% of respondents reported that Docker reduced the time-to-market for new features or products, with nearly 40% achieving efficiency gains of more than 25%. Docker is playing an increasingly vital role in AI development as well. 52% of respondents cut AI project setup time by over 50%, while 97% report increased speed for new AI product development.

Reduced AI project failures and delays

Reliability remains a key performance indicator for AI initiatives, and Docker is proving instrumental in minimizing risk. 90% of respondents indicated that Docker helped prevent at least 10% of project failures or delays, while 16% reported prevention rates exceeding 50%. Additionally, 78% significantly improved testing and validation of AI models. These results highlight how Docker’s consistency, isolation, and repeatability not only speed development but also reduce costly rework and downtime, strengthening confidence in AI project delivery.

Build, share, and run agents with Docker, easily and securely

Docker’s mission for AI is simple: make building and running AI and agentic applications as easy, secure, and shareable as any other kind of software.

Instead of wrestling with fragmented tools, developers can now rely on Docker’s trusted, container-based foundation with curated catalogs of verified models and tools, and a clean, modular way to wire them together. Whether you’re connecting an LLM to a database or linking services into a full agentic workflow, Docker makes it plug-and-play.

With Docker Model Runner, you can pull and run large language models locally with GPU acceleration. The Docker MCP Catalog and Toolkit connect agents to over 270 MCP servers from partners like Stripe, Elastic, and GitHub. And with Docker Compose, you can define the whole AI stack of models, tools, and services in a single YAML file that runs the same way locally or in the cloud. Cagent, our open-source agent builder, lets you easily build, run, and share AI agents, with behavior, tools, and persona all defined in a single YAML file. And with Docker Sandboxes, you can run coding agents like Claude Code in a secure, local environment, keeping your workflows isolated and your data protected.

Even hardware limits aren’t a blocker anymore when building agents. Docker Offload lets developers run heavy compute tasks on cloud GPUs with one click. 

Conclusion 

Docker’s vision is clear: to make AI development as simple and powerful as the workflows developers already know and love. And it’s working: theCUBE reports 52% of users cut AI project setup time by more than half, while 87% say they’ve accelerated time-to-market by at least 26%.

Learn more

Read more about ROI of working with Docker in our latest blog

Download theCUBE Research Report and eBook – economic validation of Docker

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Open Docker Desktop and get started with the MCP Toolkit (Requires version 4.48 or newer to launch the MCP Toolkit automatically)

Head over to the cagent GitHub repository, give the repository a star, try it out, and let us know what amazing agents you build!

Quelle: https://blog.docker.com/feed/

Run Embedding Models and Unlock Semantic Search with Docker Model Runner

Embeddings have become the backbone of many modern AI applications. From semantic search to retrieval-augmented generation (RAG) and intelligent recommendation systems, embedding models enable systems to understand the meaning behind text, code, or documents, not just the literal words.

But generating embeddings comes with trade-offs. Using a hosted API for embedding generation often results in reduced data privacy, higher call costs, and time-consuming model regeneration. When your data is private or constantly evolving (think internal documentation, proprietary code, or customer support content), these limitations quickly become blockers.

Instead of sending data to a remote service, you can easily run local embedding models on-premises with Docker Model Runner. Model Runner brings the power of modern embeddings to your local environment, giving you privacy, control, and cost-efficiency out of the box. 

In this post, you’ll learn how to use embedding models for semantic search. We’ll start by covering the theory behind embedding and why developers should run them. Then, we’ll wrap up with a practical example, using Model Runner, to help you get started.

Understanding semantic search embeddings 

Let’s take a moment to first demystify what embeddings are.

Embeddings represent words, sentences, and even code as high-dimensional numerical vectors that capture semantic relationships. In this vector space, similar items cluster together, while dissimilar ones are farther apart.

For example, a traditional keyword search looks for exact matches. If you search for “authentication”, you’ll only find documents containing that exact term. But with embeddings, searching for “user login” might also surface results about authentication, session management, or security tokens because the model understands that these are semantically related ideas.

This makes embeddings the foundation for more intelligent search, retrieval, and discovery — where systems understand what you mean, not just what you type.

For a deeper perspective on how language and meaning intersect in AI, check out “The Language of Artificial Intelligence”.

How Vector Similarity Enables Semantic Search with Embeddings

Here’s where the math behind semantic search comes in, and it’s elegantly simple.

Once text is converted into vectors (lists of numbers), we can measure how similar two pieces of text are using cosine similarity:

Where:

A is your query vector (e.g., “user login”),

B is another vector (e.g., a code snippet or document).

The result is a similarity score, typically between 0 and 1, where values closer to 1 mean the texts are more similar in meaning.

In practice:

A search query and a relevant document will have a high cosine similarity.

Irrelevant results will have low similarity.

This simple mathematical measure allows you to rank documents by how semantically close they are to your query, which powers features like:

Natural language search over docs or code

RAG pipelines that retrieve contextually relevant snippets

Deduplication or clustering of related content

With Model Runner, you can generate these embeddings locally, feed them into a vector database (like Milvus, Qdrant, or pgvector), and start building your own semantic search system without sending a single byte to a third-party API.

Why use Docker Model Runner to run embedding models

With Model Runner, you don’t have to worry about setting up environments or dependencies. Just pull a model, start the runner, and you’re ready to generate embeddings, all inside a familiar Docker workflow.

Full data privacy 

Your sensitive data never leaves your environment. Whether you’re embedding source code, internal documents, or customer content, you can rest assured that everything stays local — no third-party API calls, no network exposure.

Zero cost per embedding

There are no usage-based API costs. Once you have the model running locally, you can generate, update, or rebuild your embeddings as often as you need, at no extra cost.

That means iterating on your dataset or experimenting with new prompts won’t affect your budget.

Performance and control

Run the model that best fits your use case, leveraging your own CPU or GPU for inference.

Models are distributed as OCI artifacts, so they integrate seamlessly into your existing Docker workflows, CI/CD pipelines, and local development setups. This means you can manage and version models just like any other container image, ensuring consistency and reproducibility across environments.

Model Runner lets you bring models to your data, not the other way around, unlocking local, private, and cost-effective AI workflows.

Hands-on: Generating embeddings with Docker Model Runner

Now that we understand what embeddings are and how they capture semantic meaning, let’s see how simple it is to generate embeddings locally using Model Runner.

Step 1. Pull the model

docker model pull ai/qwen3-embedding

Step 2. Generate Embeddings

You can now send text to this endpoint via curl or your preferred HTTP client:

curl http://localhost:12434/engines/v1/embeddings
-H "Content-Type: application/json"
-d '{
"model": "ai/qwen3-embedding",
"input": "A dog is an animal"
}'

The response will include a list of embedding vectors, which is a numerical representation of your input text.

You can store these vectors in a vector database like Milvus, Qdrant, or pgvector to perform semantic search or similarity queries.

Example use case: Semantic search over your codebase

Let’s make it practical.

Imagine you want to enable semantic code search across your project repository.

The process will look like:

Step 1. Chunk and embed your code

Split your codebase into logical chunks. Generate embeddings for each chunk using your local Docker Model Runner endpoint.

Step 2. Store embeddings 

Save those embeddings along with metadata (file name, path, etc.). You would usually use a Vector Database to store these embeddings, but in this demo, we’re going to store them in a file for simplicity.

Step 3. Query by meaning

When a developer searches “user login”, you embed the query and compare it to your stored vectors using cosine similarity.

We have included a demo in the Docker Model Runner repository that does exactly that.

Figure 1: Codebase example demo with embeddings stats, example queries, and search results.

Conclusion

Embeddings help applications work with intelligent meaning, not just keywords. The old hassle was wiring up third-party APIs, juggling data privacy, and watching per-call costs creep up.

Docker Model Runner flips the script. Now, you can run embedding models locally where your data lives with full control over your data and infrastructure. Ship semantic search, RAG pipelines, or custom search with a consistent Docker workflow — private, cost-effective, and reproducible. 

No usage fees. No external dependencies. By bringing models directly to your data, Docker makes it easier than ever to explore, experiment, and innovate, safely and at your own pace.

How you can get involved

The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can:

Star the repository: Show your support and help us gain visibility by starring the Docker Model Runner repo.

Contribute your ideas: Have an idea for a new feature or a bug fix? Create an issue to discuss it. Or fork the repository, make your changes, and submit a pull request. We’re excited to see what ideas you have!

Spread the word: Tell your friends, colleagues, and anyone else who might be interested in running AI models with Docker.

We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!

Get started with Docker Model Runner →

Learn more

Check out Docker Model Runner integration with vLLM announcement

Visit our Model Runner GitHub repo! Docker Model Runner is open-source, and we welcome collaboration and contributions from the community!

Get started with Docker Model Runner with a simple hello GenAI application

Quelle: https://blog.docker.com/feed/

You Want Microservices, But Do You Really Need Them?

Do you know who managed to cut costs by a staggering 90% by abandoning microservices for a monolith in May 2023? Not a cash-strapped startup or an indie project—Amazon itself, for its Prime Video service. The same AWS that earns billions every year by selling microservices infrastructure admitted that, sometimes, a good old monolith wins. 

This reversal from the company that practically wrote the playbook on distributed systems sent shockwaves through the cloud-native community. Amazon later removed the original blog post, but the internet never forgets, as you’ll see later.

I’ve been speaking up against unnecessary or premature use of microservices architecture for five, six years now. After Amazon Prime Video went back to a monolith, I came across several eminent architects who are also speaking against microservices as default.

And yet in most tech circles, microservices are still viewed as the only way to build modern software. They dominate conferences, blogs, and job listings. Teams adopt them not because their requirements justify it, but because it feels like the obvious (and résumé-boosting) choice. “Cloud-native” has become synonymous with “microservices-by-default”, as if other approaches are as obsolete as floppy disks. 

Microservices do solve real problems, but at a massive scale. Most teams don’t actually operate at that scale.

With this article, I urge you to reflect on the question the industry has mostly stopped asking: Should microservices be the default choice for building at scale? We’ll look at reversal stories and insights from seasoned architects, and weigh the trade-offs and alternatives. After considering all of this, you can decide whether your problem really needs a constellation of microservices.

Microservices: The Agility-Complexity Trade-Off

On paper, microservices look impressive. Instead of one big monolith, you split your application into many small services. Each one can be written in any language, owned by a small team, and deployed on its own schedule. If you need more capacity, you can scale only the part that’s under load. The promise is elegant: independent deployability, autonomous teams, multi-language stacks, and elastic scaling.

But the catch is that every split creates a seam, and every seam is a potential failure point. Inside a monolith, function calls are instant and predictable. Across services, those same calls become network requests: slower, failure-prone, sometimes returning inconsistent data. With dozens (or hundreds) of services, you need version management, schema evolution, distributed transactions, tracing, centralized logging, and heavy-duty CI/CD pipelines just to keep things running.

This Gartner diagram captures the trade-off perfectly: microservices exchange the simplicity of one codebase for the complexity of many.

At a massive scale (think Netflix), that trade-off may be worth it. But when operational benefits don’t outweigh the costs, teams end up paying a steep price in debugging, coordination, and glue code just to hold their product together.

Microservices make sense in very specific scenarios where distinct business capabilities need independent scaling and deployment. For example, payment processing (security-critical, rarely updated) differs fundamentally from recommendation engine (memory-intensive, constantly A/B tested). These components have different scaling patterns, deployment cycles, and risk profiles, which justify separate services.

The success of microservices hinges on clear business domain boundaries that match your team structure, as Conway’s Law predicts. If your organization naturally splits into autonomous teams that own distinct capabilities, microservices might work. (So, most “one-and-a-half pizza” startups don’t qualify, do they?) 

That’s why microservices work effectively for companies like Amazon and Uber—although not always.

In fact, most organizations lack the prerequisites: dedicated service ownership, mature CI/CD, robust monitoring, and crucially, scale that justifies the operational overhead. Startups that adapt microservices prematurely often regret their decision.

So ask yourself:

Are you using microservices to solve an independent scaling problem, or are you inviting more complexity than your solution needs?

The Great Microservices Reversal

Ironically, even though tech giants are the ones that are most likely to benefit from microservices, many of these very same companies are walking back their microservices architectures, and the results are eye-opening.

Amazon Prime Video: 90% Cost Reduction with a Monolith

In May 2023, Amazon engineers admitted the unthinkable: Prime Video had abandoned microservices for a monolith. Their Video Quality Analysis (VQA) team had built what looked like a textbook distributed system: AWS Step Functions and Lambda monitored thousands of video streams through independent, scalable components. On paper, it was serverless perfection.

In practice, it was a disaster. “We realized that distributed approach wasn’t bringing a lot of benefits in our specific use case,” said Marcin Kolny in the now-archived Prime Video Engineering blog. Their “infinitely scalable” system crumbled at just 5% of expected load due to orchestration overhead.

The fix was embarrassingly simple: collapse everything into a single process. It resulted in 90% lower costs and faster performance. 

Twilio Segment: From 140 Services to One Fast Monolith

Back in 2018, Twilio Segment, a customer data platform, documented a similar reversal in their brutally honest post “Goodbye Microservices”.

Their system had sprawled into 140+ services, creating operational chaos. At one point, three full-time engineers spent most of their time firefighting instead of building. As they admitted, “Instead of enabling us to move faster, the small team found themselves mired in exploding complexity. Essential benefits of this architecture became burdens. As our velocity plummeted, our defect rate exploded.”

Their solution was radical but effective: collapse all 140+ services into a single monolith. The impact was immediate. Test suites that once took an hour now finished in milliseconds. Developer productivity soared: they shipped 46 improvements to shared libraries in a year, up from 32 in the microservices era. 

Shopify: Sanity over Hype

Shopify runs one of the largest Ruby on Rails codebases in the world (2.8M+ lines). Instead of chasing microservices, they deliberately chose a modular monolith: a single codebase with clear component boundaries.

Shopify’s engineers concluded that “microservices would bring their own set of challenges”, so they chose modularity without the operational overhead.

All these examples beg the question:

If even the pioneers of microservices are retreating, why are we still treating it as gospel?

Expert Voices against Microservices Mania

Some of the most respected voices in software architecture—people behind many of the systems we all admire—are also cautioning against microservices and repeating mistakes they’ve seen play out at scale. (After all, cheerleaders don’t play the game; cloud DevRels rarely build at scale.)

Rails Creator: Simplicity over Sophistication

David Heinemeier Hansson (DHH), the creator of Ruby on Rails, has long advocated simplicity over architectural trends. His analysis of the Amazon Prime Video reversal puts it bluntly:

“The real-world results of all this theory are finally in, and it’s clear that in practice, microservices pose perhaps the biggest siren song for needlessly complicating your system.”

DHH’s image of a siren song is apt: microservices promise elegance but leave teams wrecked on the rocks of complexity.

Microservices: Mistake of The Decade?

Jason Warner, former CTO of GitHub, doesn’t mince words while commenting on microservices: 

“I’m convinced that one of the biggest architectural mistakes of the past decade was going full microservice.”

Warner understands scale: GitHub runs at internet scale, and he’s led engineering at Heroku and Canonical. His critique cuts deeper because it’s lived experience, beyond theoretical advice:

“90% of all companies in the world could probably just be a monolith running against a primary db cluster with db backups, some caches and proxies and be done with it.”

GraphQL Co-Creator: “Don’t”

Then there’s Nick Schrock, co-creator of GraphQL. If anyone had a reason to cheer for distributed systems, it’d be him. Instead, he says:

“Microservices are such a fundamentally and catastrophically bad idea that there are going to be an entire cohort of multi-billion companies built that do nothing but contain the damage that they have wrought.”

He goes on to describe microservices as organizational gambles:

“[Y]ou end up with these services that you have to maintain forever that match the org structure and the product requirements from five years ago. Today, they don’t make a lot of sense.”

The person who literally built tools to fix distributed system pain says don’t distribute unless you must, maybe it’s time to listen.

Other Voices Questioning Microservice Maximalism

Other engineering leaders are also reconsidering microservice maximalism. 

At Uber, Gergely Orosz admitted:

“We’re moving many of our microservices to macroservices (well-sized services). Exactly b/c testing and maintaining thousands of microservices is not only hard – it can cause more trouble long-term than it solves the short-term.”

Uber still runs microservices where they’re justified, but they’re choosing their battles.

Kelsey Hightower, known for his work with Kubernetes and Google Cloud, cut through the microservices hype with CS101:

“I’m willing to wager a monolith will outperform every microservice architecture. Just do the math on the network latency between each service and the amount of serialization and deserialization of each request.”

He subsequently deleted this tweet, but the network math still grades microservices.

When pioneers like these, including those who actually solved distributed systems at scale, start waving red flags, it’s worth taking note. 

My question here is:

If GitHub’s CTO thinks 90% of companies don’t need microservices, are you sure yours is part of the 10%?

The Hidden Costs of Microservices

Microservices demand such caution because of these hidden costs that teams often underestimate.

Operational Costs

A monolith is simple: in-process function calls. 

Microservices replace that with networks. Every request now travels across machines, through load balancers, service meshes, and authentication layers, creating more failure points and infrastructure needs. You suddenly need service discovery (how services find each other), distributed tracing (tracking requests across services), centralized logging (aggregating logs from multiple services), and monitoring systems that understand service topology.

Each of these is necessary, but together they’re complex and expensive. Duplicated data requires extra storage. Constant service-to-service calls rack up network egress fees. Cloud costs scale faster than the apps they host. Prime Video’s workflow spent more on orchestrating S3 data transfers between services than on actual processing. 

Developer Productivity Drain

In microservices, the hard part isn’t writing code; it’s navigating distributed system interactions.

In “The macro problem with microservices“, Stack Overflow identifies a critical productivity drain: distributed state forces developers to write defensive code that constantly checks for partial failures. 

In a monolith, a developer can follow a code path end-to-end within one repo. In microservices, one feature might span four or five repos with different dependencies and deploy cycles. Adding a single field triggers weeks of coordination: you need to update one service, then wait for consumers to adopt, version your APIs, manage rollouts, and so on. Different teams will also typically maintain different microservices using different tech stacks, so there’s a risk that they unintentionally break something as well. Breaking changes that a compiler would catch in a monolith now surface as runtime errors in production.

Testing and Deployment Complexity

Monolith integration and end-to-end tests are faster because they run locally, in memory. Distributed systems don’t allow that luxury: real confidence requires integration and end-to-end tests across numerous service boundaries. So these tests are slower, more brittle, and require staging environments that resemble production, all of which effectively double infrastructure costs and slow feedback loops.

Many teams discover this only after their test suite becomes a bottleneck. Deployment orchestration adds another layer. Rolling updates across interdependent services require careful sequencing to avoid breaking contracts. Version incompatibility disturbs frequently: Service A works with Service B v2.1 but breaks with v2.2.

Failed deployments leave systems partially updated and difficult to recover.

Data Management and Consistency

The most underestimated complexity of microservices lies in data consistency across service boundaries.

Monoliths benefit from ACID transactions: operations complete entirely or fail entirely. Microservices split that across services, forcing you to build distributed saga (multi-step workflows with rollback logic), live with eventual consistency (data only becomes correct after a delay), or write compensation logic (extra code to undo partial failures). What was once a single database transaction now spans network hops, retries, and partial failures. Debugging inconsistent orders or payments gets much harder when state is duplicated across services. 

As research confirms, data duplication, correctness challenges, and transactional complexity are the top pain points in microservice systems.

The Compounding Effect

These complexities multiply. Operational overhead makes debugging harder, which slows testing, which makes deployments riskier, which creates more incidents. Microservices don’t just shift complexity from code to operations; they tax every part of your engineering process. 

Unless your scale demands it, that tax often outweighs the benefits. 

Think about it:

If every network hop adds complexity and cost, does your use case really justify the price?

Beyond Microservices: Smarter Architectural Alternatives

Before defaulting to microservices, it’s worth considering how simpler, well-structured architectures can deliver comparable scalability without the distributed complexity tax. Two noteworthy alternatives are modular monoliths and service-oriented architectures.

Modular Monoliths: Structure without Distribution

Unlike traditional monoliths that become tangled messes, modular monoliths enforce strict internal boundaries through clear module APIs and disciplined separation. Each module exposes well-defined interfaces, enabling teams to work independently while deploying a single, coherent system.

As Kent Beck explains in “Monolith -> Services: Theory & Practice”, modular monoliths manage coupling through organizational discipline rather than distributed networks. The key difference: modules still communicate via explicit contracts like microservices, but they use fast, reliable function calls instead of HTTP requests that are vulnerable to network latency and partial failures.

Why does it work?

Simpler operations: microservices-level organization with monolithic simplicity

Stronger consistency: full ACID transactions

Easier debugging: one traceable system, no hunting for bugs in the ELK haystack

Better performance: function calls beat network hops

Here’s some real-world proof: Shopify’s 2.8 million-line codebase handles 30TB per minute with separate teams owning distinct modules, yet everything deploys together. Facebook runs similarly. (And principal architect Keith Adams jokes that if you want to be talked out of microservices, he’s your guy.)

With recent developments in frameworks like Spring Modulith, Django, Laravel, and Rails (as seen at scale with Shopify), modular monoliths are poised to gain wider traction in the years ahead.

Service-Oriented Architecture: The Middle Ground

Service-oriented architecture (SOA) sits between monoliths and microservices, favoring larger, domain-driven services instead of dozens or hundreds of tiny ones. These services often communicate via an enterprise service bus (ESB), which reduces orchestration overhead while preserving separation of concerns.

Instead of splitting authentication, user preferences, and notifications into separate microservices, SOA might combine them into a single “User Service”, simplifying coordination while preserving autonomy and targeted scaling. SOA provides enterprise-grade modularity without ultra-fine-grained distribution overhead.

Here’s why it works:

Right-sized boundaries: fewer, domain-aligned services instead of sprawl

Targeted scalability: scale services tied to real business domains

Pragmatic complexity: avoids ultra-fine-grained overhead while retaining modular reasoning

SOA has also been proven to work at scale. Norwegian Air Shuttle, Europe’s 9th-largest airline, used SOA to boost agility across complex flight operations. Credit Suisse’s SOA rollout powered millions of service calls per day back in the early 2000s.

Choosing Wisely: Fit over Hype

The problem you’re solving should justify your architecture.

I often use this analogy in consulting: You don’t need a sword to cut a lemon—a knife suffices. And as timeless wisdom reminds us, simplicity is the ultimate sophistication. 

In all likelihood, you’re not Google (you don’t need Google-level fault tolerance), or Amazon (you don’t need massive write availability), or LinkedIn (you don’t handle billions of events a day). Most applications don’t operate at that scale, demanding fundamentally different solutions than ultra-distributed architectures.

For most systems, well-structured modular monoliths (for most common applications, including startups) or SOA (enterprises) deliver comparable scalability and resilience as microservices, without the distributed complexity tax. Alternatively, you may also consider well-sized services (macroservices, or what Gartner proposed as miniservices) instead of tons of microservices.

It’s worth asking:

If simpler architectures can deliver comparable scalability, why are you choosing the complexity of microservices?

Docker: Built for Any Architecture

Docker isn’t just for microservices—it works great across all kinds of architectures like monoliths, SOA, APIs, and event-driven systems. The real benefit is that Docker gives you consistent performance, easier deployment, and flexibility to scale up your apps no matter what architectural approach you’re using.

Docker packages applications cleanly, keeps environments consistent from laptop to production, simplifies dependency management, and isolates applications from the host system. A Dockerized monolith offers all these benefits, minus the orchestration overhead of microservices. 

Microsoft’s guidance on containerizing monoliths clarifies that scaling containers is “far faster and easier than deploying additional VMs”, whether you run one service or fifty. Twilio Segment observed that containerized monoliths can “horizontally scale your environment easily by spinning up more containers and shutting them down when demand subsides.” For many applications, scaling the whole app is exactly what’s needed.

As for DevOps, a monolith in Docker is lighter to operate than a full-blown microservices setup. Logging aggregation becomes simpler when you’re collecting from identical containers rather than disparate services with different formats. Monitoring and debugging remain centralized, and troubleshooting avoids tracing requests across service boundaries.

So, it’s definitely worth considering:

Even without the complexity of microservices, Docker gives you the same advantages — clean deployments, easy scaling, and consistent environments. So why not keep it?

Wrapping Up

A few years ago, my then-8-year-old wanted a bicycle. He’d mostly ride around our apartment complex, maybe venture into the nearby lane. He didn’t need 21 gears, but those shiny shifters had him smitten—imagine riding faster by changing those gears! He absolutely wanted that mechanically complex beauty. (It’s hard to argue with a starry-eyed kid… or a founder :P).

Once he started riding the new bike, the gears slipped, the chain jammed, and the bicycle spent more time broken than on the road. Eventually, we had to dump it. 

I wasn’t able to convince him back then that a simpler bicycle could’ve served him better, but maybe this article will convince a few grown-ups making architectural decisions.

We techies love indulging in complex systems. (Check: were you already thinking, What’s complex about bicycles with gears??) But the more moving parts you add, the more often they break. Complexity often creates more problems than it solves.

The point I’m making isn’t to dump microservices entirely—it’s to pick an architecture that fits your actual needs, not what the cloud giant is pushing (while quietly rolling back their own commit). Most likely, modular monoliths or well-designed SOA will serve your needs better and make your team more productive.

So here’s the million-dollar question: 

Will you design for cloud-native hype or for your own business requirements?

Do you really need microservices?

Quelle: https://blog.docker.com/feed/

Security that strengthens the ecosystem: Docker’s upstream approach to CVE-2025-12735

On November 24, 2025, Docker Hardened Images resolved CVE-2025-12735 in the Kibana project, which is the visualization and user interface layer for Elasticsearch. This CVE is a critical remote code execution vulnerability that scored 9.8 on the CVSS scale. While images from other hardened image vendors were still carrying the vulnerability, Docker’s security team and tooling not only patched the CVE  for Docker Hardened Images users, but also submitted the fix to the upstream LangChain.js project. Once that pull request merges, every application that depends on LangChain.js will benefit from a more secure foundation across the entire ecosystem.

We always default to upstream patching when possible because it protects everyone who depends on these libraries – not just Docker users. Upstream patches require effort. You have to submit a PR and get it approved by the project. That can mean back and forth with maintainers. Security teams are under intense time pressures. But when we fix expr-eval for LangChain.js, we’re protecting not just Kibana users but every application that depends on that library. That’s over one million weekly downloads that become more secure.

Another Nested Dependency, Another Ticking Time Bomb

CVE-2025-12735 originated in expr-eval, a JavaScript expression parser and evaluator library. The vulnerability allowed attackers to inject crafted variables into evaluate(), enabling untrusted code paths to execute logic the application never intended. Three layers deep into the dependency chain, there was a critical RCE vulnerability in unmaintained code. In practice, this gave attackers a pathway to execute malicious behavior within affected applications. The library hadn’t been updated in years. LangChain.js depends on expr-eval, which means any application or service built with LangChain.js inherits the vulnerability. This includes AI assistants, workflow tools, and LLM-powered applications widely deployed across the industry. Kibana was affected by the same dependency chain. 

This matters because LangChain.js has become a foundational component in modern application development. The library provides a framework for building applications powered by large language models, and it has been downloaded millions of times from npm. As of November 18, 2025, the npm package langchain (which includes LangChain.js) receives approximately 1,018,076 weekly downloads. Organizations use LangChain.js to build chatbots, document analysis systems, customer service platforms, and AI-powered search tools. When a vulnerability exists in LangChain.js or its dependencies, it potentially affects thousands of production applications across the technology industry.

This is exactly the attack surface that sophisticated adversaries target. The 2024 XZ Utils backdoor attempt demonstrated how attackers focus on dependencies precisely because they affect so many downstream projects. Old vulnerabilities remain a persistent challenge because organizations focus on direct dependencies while nested dependencies slip through the cracks.

Why We Must Fix at the Source, Fast

Many security and hardened image vendors scan for CVEs, flag them, and patch their own images. The vulnerability remains in the upstream project. The next build cycle reintroduces it. The problem persists for every other user of that dependency chain. This approach treats symptoms instead of causes. You patch your copy of Kibana. The next developer who builds from upstream gets the vulnerable version. Other container image providers may still ship the vulnerable dependency until their next update cycle. When the next CVE gets assigned to expr-eval, the cycle repeats.

Docker takes a different approach. When the Docker Security team identified CVE-2025-12735 in Kibana, we traced it back through the dependency chain to expr-eval. Rather than applying a surface-level patch, we replaced the unmaintained library with math-expression-evaluator, an actively maintained alternative that did not have the vulnerability. Then we contributed that fix upstream to LangChain.js: Pull Request #9391.

This approach delivers three outcomes:

Docker Hardened Images users got immediate protection. The updated Kibana image shipped without the vulnerable dependency. There was no waiting for upstream maintainers and no emergency patching required.

The entire LangChain.js ecosystem will benefit. Once the PR merged, every project using LangChain.js inherits the fix automatically. Web applications, data processing pipelines, AI tools, and analytics platforms all get safer because the fix lives where it belongs.

Future builds are secure by default. Docker doesn’t have to maintain downstream patches or worry about the vulnerability reappearing in the next release cycle. The fix lives in the upstream project where it belongs.

Docker Hardened Images responded faster than other  vendors. We identified the root cause, selected a maintained replacement, verified it worked correctly, and contributed the fix back to the upstream project. This is possible because Docker’s security architecture is designed for a high-speed workflow without sacrificing thoroughness or attention to detail. (We are also, as a team, strongly committed to contributing back to open source!) Continuous dependency analysis through Docker Scout identifies issues the moment they’re disclosed. Deep supply chain visibility shows not just what packages are in an image but the entire dependency chain. Direct upstream engagement means we can contribute fixes rather than wait for maintainers to respond to bug reports.

What This Means for Your Organization

If you’re running Kibana in production, CVE-2025-12735 posed a critical risk. Organizations using Docker Hardened Images received immediate protection with secure, minimal, production-ready container images built from source and backed by a fast SLA that ensures rapid remediation.. The updated image shipped with expr-eval replaced by a maintained alternative. No emergency patching was required and there was no downtime. Organizations using other container distributions may still be exposed. Check your Kibana images for the vulnerable expr-eval dependency. If you’re running upstream Kibana, monitor for the LangChain.js update that incorporates Docker’s fix.

But the implications extend beyond this single CVE. The nested dependency problem affects every modern application. Your development teams probably don’t know what libraries are three or four levels deep in your dependency trees. Your security scanners might not catch them. Your vendors might not fix them upstream.

Helping Open Source Projects Helps Us All

The container ecosystem depends on thousands of open source projects. Most are maintained by small teams, often volunteers, who juggle security alongside feature development, bug fixes, and user support. When vulnerabilities emerge, maintainers may lack resources for immediate response.

Commercial vendors who benefit from open source have a responsibility to contribute back. When Docker Security fixes vulnerabilities upstream, open source maintainers get security support at no cost. The entire community benefits from hardened dependencies. Docker builds trust with the projects that power modern infrastructure. Future vulnerabilities become easier to address as relationships deepen. Together, we are more secure.

Docker is not the only company to push patches upstream, but it is a core part of our DNA. We don’t just protect our own customers but strengthen the entire ecosystem. Fixes go upstream so everyone benefits. The focus is on eliminating vulnerabilities at their source rather than playing endless rounds of patch-and-scan.

Modern supply chain attacks move faster than traditional security response times. Docker Hardened Images and Docker Scout are designed to match that speed while strengthening the entire ecosystem through upstream contributions. When vulnerabilities emerge, our customers get immediate protection. When our fixes go upstream, everyone gets safer.

Learn more about how Docker Hardened Images deliver security that protects your organization and strengthens the ecosystem.
Quelle: https://blog.docker.com/feed/

Securing the software supply chain shouldn’t be hard. According to theCUBE Research, Docker makes it simple

In today’s software-driven economy, securing software supply chains is no longer optional, it’s mission-critical. Yet enterprises often struggle to balance developer speed and security. According to theCUBE Research, 95% of organizations say Docker improved their ability to identify and remediate vulnerabilities, while 79% rate it highly effective at maintaining compliance with security standards. Docker embeds security directly into the developer workflow so that protection happens by default, not as an afterthought.

At the foundation are Docker Hardened Images, which are ultra-minimal, continuously patched containers that cut the attack surface by up to 95% and achieve near-zero CVEs. These images, combined with Docker Scout’s real-time vulnerability analysis, allow teams to prevent, detect, and resolve issues early, keeping innovation and security in sync. The result: 92% of enterprises report fewer application vulnerabilities, and 60% see reductions of 25% or more.

Docker also secures agentic AI development through the MCP Catalog, Toolkit, and Gateway. These tools provide a trusted, containerized way to run Model Context Protocol (MCP) servers that power AI agents, ensuring communication happens in a secure, auditable, and isolated environment. According to theCUBE Research, 87% of organizations reduced AI setup time by over 25%, and 95% improved AI testing and validation, demonstrating that Docker makes AI development both faster and safer.

With built-in Zero Trust principles, role-based access controls, and compliance support for SOC 2, ISO 27001, and FedRAMP, Docker simplifies adherence to enterprise-grade standards without slowing developers down. The payoff is clear: 69% of enterprises report ROI above 101%, driven in part by fewer security incidents, faster delivery, and improved productivity. In short, Docker’s modern approach to DevSecOps enables enterprises to build, ship, and scale software that’s not only fast, but fundamentally secure.

Docker’s impact on software supply chain security

Docker has evolved into a complete development platform that helps enterprises build, secure, and deploy modern and agentic AI applications with trusted DevSecOps and containerization practices. From Docker Hardened Images, which are secure, minimal, and production-ready container images with near-zero CVEs, to Docker Scout’s real-time vulnerability insights and the MCP Toolkit for trusted AI agents, teams gain a unified foundation for software supply chain security.

Every part of the Docker ecosystem is designed to blend in with existing developer workflows while making security affordable, transparent, and universal. Whether you want to explore the breadth of the Docker Hardened Images catalog, analyze your own image data with Docker Scout, or test secure AI integration through the MCP Gateway, it is easy to see how Docker embeds security by default, not as an afterthought.

Review additional resources

Read more in our latest blog about ROI of working with Docker

theCUBE Research Report and eBook – economic validation of Docker

Explore Docker Hardened Images and start a 30-day free trial 

View Hardened Images and Helm Charts on Docker Hub

Explore Docker Scout

Quelle: https://blog.docker.com/feed/

A New Approach for Coding Agent Safety

Coding agents like Claude Code, Gemini CLI, Codex, Kiro, and OpenCode are changing how developers work. But as these agents become more autonomous with capabilities like deleting repos, modifying files, and accessing secrets, developers face a real problem: how do you give agents enough access to be useful without adding unnecessary risk to your local environment?

A More Effective Way to Run Local Coding Agents Safely.

We’re working on an approach that lets you run coding agents in purpose-built, isolated local environments. Local sandboxes from Docker that wrap agents in containers that mirror your local workspace and enforce strict boundaries across all the coding agents you use. The idea is to give agents the access they need while maintaining isolation from your local system.

Today’s experimental release runs agents as containers inside Docker Desktop’s VM, but we will be switching to running them inside of dedicated microVMs for more defense in depth and to improve the experience of agents executing Docker containers securely. 

What’s Available Now (Experimental Preview).

This is an experimental preview. Commands may change and you shouldn’t rely on this for production workflows yet.

Here’s what you get today:

Container-based isolation: Agents can run code, install packages, and modify files within a bind mounted workspace directory.

Filesystem isolation: Process containment, resource limits, and filesystem scoping, protecting your local system.

Broad agent support: Native support for Claude Code and Gemini CLI, with more coding agents support coming soon.

Why We Are Taking this Approach.

We don’t think the operating system-level approaches have the right long-term shape:

They sandbox only the agent process itself, not the full environment the agent needs. This means the agent constantly needs to access the host system for basic tasks (installing packages, running code, managing dependencies), leading to constant permission prompts that interrupt workflows.

They aren’t consistent across platforms.

Container-based isolation is designed for exactly the kind of dynamic, iterative workflows that coding agents need. You get flexibility without brittleness.

Although this structure is meant to be general-purpose, we’re starting for specific, pre-configured coding agents. Rather than trying to be a solution for all kinds of agents out of the box, this approach lets us solve real developer problems and deliver a great experience. We’ll support other use cases in the future, but for now, coding agents are where we can make the biggest impact.

Here’s How You Can Try It.

Today’s experimental preview works natively with Claude Code and Gemini CLI. We’re building for other agents developers use.

With Docker Desktop 4.50 and later installed, run: docker sandbox run <agent>

This creates a new isolated environment with your current working directory bind mounted.

What’s Next.

Better support and UX for running multiple agents in parallel

Granular network access controls

Granular token and secret management for multi-agent workflows

Centralized policy management and auditability

MicroVM-based isolation architecture

Support for additional coding agents

Try It and Share Your Feedback.

We’re building this alongside developers. As you experiment with Docker Sandboxes, we want to hear about your use cases and what matters most to your workflow.

Send your feedback to: coding-sandboxes-feedback@docker.com

We believe sandboxing should be how every coding agent runs, everywhere. This is an early step, and we need your input to get there. We’re building toward a future where there’s no compromise: where you can let your agents run free while protecting everything that matters. 
Quelle: https://blog.docker.com/feed/

Security that moves fast: Docker’s response to Shai Hulud 2.0

On November 21, 2025, security researchers detected the beginning of what would become one of the most aggressive npm supply chain attacks to date. The Shai Hulud 2.0 campaign compromised over 25,000 GitHub repositories within 72 hours, targeting packages from major organizations including Zapier, ENS Domains, PostHog, and Postman. The malware’s self-propagating design created a compounding threat that moved at container speed, not human speed.

This variant executed during npm’s preinstall phase, harvesting developer credentials, GitHub tokens, and cloud provider secrets before packages even finished installing. Stolen credentials appeared in public GitHub repositories labeled “Sha1-Hulud: The Second Coming,” creating a secondary attack vector as threat actors recycled tokens to publish additional malicious packages. Researchers tracked approximately 1,000 new compromised repositories appearing every 30 minutes at the attack’s peak.

For teams using npm packages in their containerized applications, this attack represented exposure not just to credential theft initially but also to systematic supply chain compromise that could persist across rebuild cycles and burrow deep into supply chains.

Docker’s real-time response architecture

According to Google Mandiant’s 2023 vulnerability analysis, the average time-to-exploit for vulnerabilities has collapsed from 63 days in 2018-19 to just five days. With Shai Hulud-type attacks on the rise, the likely compression of the vulnerability window will move from days to hours. 

Within hours of security researchers publishing indicators of compromise, Docker Security created DSA-2025-1124, a Docker Security Advisory that encoded detection rules for the Shai Hulud 2.0 malware signatures. This advisory immediately entered Docker Scout’s continuous monitoring pipeline, where it followed the same automated workflow that handles CVE ingestion.

Here’s how the protection deployed:

Automatic threat intelligence ingestion: Docker Scout continuously ingests security intelligence from multiple published sources. Scout’s ingestion pipeline identified the malicious package indicators and malware signatures from these sources and propagated them within seconds.

Instant supply chain analysis: Docker Scout cross-referenced the threat intelligence against SBOMs from all Docker Hardened Images and customer images under Scout protection. This analysis identified which images, if any, contained dependencies from the compromised package ecosystem, enabling immediate risk assessment across the entire Docker registry. 

Automated detection distribution: The DSA containing Shai Hulud 2.0 detection rules propagated through Scout’s monitoring infrastructure automatically. Every Docker Scout-protected environment gained the ability to flag malicious packages based on the latest threat intelligence, without requiring manual policy updates or signature downloads.

Continuous verification: As Docker Security performed immediate scans of all Docker GitHub Enterprise repositories (which returned no findings), the same SBOM-based verification confirmed that Docker Hardened Images contained no compromised packages. 

From threat disclosure to deployed protection, the response cycle completed in hours. Organizations using Docker Scout received alerts identifying any exposure to the compromised packages while the attack was still unfolding, allowing them to mount a timely response and protect their infrastructure.

Why Docker’s approach creates verifiable protection

Docker’s response to Shai Hulud 2.0 demonstrates why security architecture must assume attacks will move faster than human response times.

Real-time protection: Traditional vulnerability management treats each threat as a discrete event requiring investigation, triage, and manual remediation. Docker Scout’s architecture treats threat intelligence as streaming data, continuously updating detection capabilities the moment new indicators become available. 

Unified telemetry eliminates blind spots: The integration between Scout’s monitoring, DHI’s build pipeline, and Docker’s supply chain tracking provides complete visibility into what’s running and where it came from. When the Shai Hulud malware attempted to compromise the npm ecosystem, Docker’s architecture could immediately answer: “Do we have exposure?” 

Cryptographic verification enables trust under fire: Every Docker Hardened Image ships with complete SBOMs, cryptographic signatures, and verifiable build provenance. During an active supply chain attack, this transparency becomes operational capability. Security teams can prove to auditors, incident responders, and leadership exactly what’s running in production, which versions are deployed, and whether any compromised packages made it through the supply chain. 

Speed that matches attack velocity: Self-propagating malware spreads through automated exploitation. This means you have to move fast. Docker’s remediation pipeline doesn’t wait for security teams to file tickets or schedule maintenance windows. When threats emerge, the pipeline automatically initiates detection updates, verifies image integrity, and flags exposure based on factual SBOM data. 

The five pillars prove themselves under pressure

Docker’s security architecture rests on five pillars that proved themselves under pressure: minimal attack surface, complete SBOMs, verifiable provenance, exploitability context, and cryptographic verification. During Shai Hulud 2.0, these worked together as implemented controls that functioned automatically, enabling teams to verify exposure immediately through SBOMs, prove integrity through cryptographic signatures, and focus response on actually weaponized packages. Even if your organization does not use Docker Hardened Images, by using Docker Scout you get the same detection speed via Scout-generated SBOMs, which are optimized for transparency and speed. 

Supply chain security at container speed

We believe that increasingly, modern supply chain attacks targeting the package infrastructure will be designed to outrun traditional security response times. The only viable response is security architecture and response mechanism that can match this speed.

If your security team is still chasing alerts from last month’s supply chain attack, or if you’re uncertain whether your container images contain compromised dependencies, Docker offers a different approach. 

Learn more about how Docker Scout and Hardened Images deliver continuous, verifiable protection, or contact our team to discuss how real-time security architecture applies to your specific environment.
Quelle: https://blog.docker.com/feed/

The Rising Importance of Governance at SwampUP Berlin 2025

On November 12-14, the Docker team was out in numbers at JFrog SwampUP Berlin 2025. We joined technical sessions, put on a fireside chat, and had conversations with attendees there. We’d like to thank the folks at JFrog for having us there and putting on such a great show!

Here’s our takeaways from the event about software supply chain security trends:

Software supply chain attacks reach unprecedented scale leveraging open source packages

An analysis of recent software supply chain attacks by JFrog’s CTO Asaf Karas shed light on how malicious actors leverage AI and software supply chains on their exploits. Recent attacks combine existing techniques, like phishing, in combination with AI prompts that recursively write and execute code in order to compromise hundreds of thousands of systems running popular open source packages. A few examples include Shai Hulud, Red Donkey, and the recent NPM package phishing attack. So far, despite these attacks’ scale, damages have been limited due to the still rudimentary nature of these exploits. Expect more software supply chain attacks as well as more sophistication in the coming year.

New Roles of Governance as a Security Layer

The best way to avoid software supply chain attacks is to not have malicious code entering software supply chains in the first place. That’s where governance comes into play. Taking control of gate points during the software development lifecycle, for example during dependency scanning, build pipelines, and deployments is not enough. It is necessary to block malicious or risky code before it enters the software supply chain. Not only that, but also tools need increased interoperability to detect all potential attack vectors.

Addressing MCP Challenges in AI Development

MCP’s ability to leverage both deterministic and non-deterministic outcomes by connecting an LLM client to many different servers seems to be the main reasons companies are betting on the technology to build applications that deliver value to customers. Moreover, because each server can run independently from one another, it becomes possible to add governance layers on MCP servers, reducing risks of hallucination or unexpected results. Overall, we agree with JFrog’s assessment and look forward to opportunities where Docker and JFrog MCP technologies can work together for a safer and smoother enterprise AI developer experience.

Building on Strong Open Source Foundations Is Core in the AI Era

The fireside chat between Gal Marder, JFrog’s Chief Strategy Officer, and Michael Donovan, Docker’s VP of Product, explored how organizations can protect themselves from risks in unverified open source dependencies. They emphasized the importance of starting with strong foundations: using hardened images, maintaining them throughout their lifecycle, including those that have reached end of life, and ensuring visibility and governance across every stage. Strong third-party integrations are essential to manage this complexity effectively and extend security and trust from development to delivery.

Conclusion: Build strong foundations, keep it consistent, stay ahead

Software development is changing fast as AI becomes part of everyone’s workflow, developers and attackers alike. The best way to stay ahead is to build protection early by starting with strong foundations and keep it consistent across every stage with governance, visibility, and strong partnerships. Only then can teams innovate with confidence and speed as the landscape evolves. Exciting times!

Learn more

Subscribe to the Docker Navigator Newsletter

Explore the MCP Catalog: Discover containerized, security-hardened MCP servers

Explore the DHI Catalog: Discover secure, minimal, production-ready container images

Docker Partner Programs: Discover trusted partners, tools, and integrations

New to Docker? Create an account

Have questions? The Docker community is here to help

Quelle: https://blog.docker.com/feed/

Beyond the Hype: How to Use AI to Actually Increase Your Productivity as a Dev

When I started incorporating AI tools into my workflow, I was first frustrated. I didn’t get the 5x or 10x gains others raved about on social. In fact, it slowed me down.

But I persisted. Partly because I see it as my professional duty as a software engineer to be as productive as possible, partly because I’d volunteered to be a guinea pig in my organization.

After wrestling with it for some time, I finally got my breakthrough discovery—the way to use AI tools well involves the same disciplines we’ve applied in software development for decades:

Break work down into reasonable chunks

Understand the problem before trying to solve it    

Identify what worked well and what didn’t

Tweak variables for the next iteration

In this article, I share the patterns of AI use that have led me to higher productivity. 

These aren’t definitive best practices. AI tools and capabilities are changing too quickly, and codebases differ too much. And then we’re not even taking the probabilistic nature of AI into account.

But I do know that incorporating these patterns into your workflow can help you become one of the developers who benefit from AI instead of being frustrated or left behind.

A Cycle for Effective AI Coding

Too many people treat AI like a magic wand that will write their code _and_ do their thinking. It won’t. These tools are just that: tools. Like every developer tool before them, their impact depends on how well you use them.

To get the most from AI tools, you need to constantly tweak and refine your approach. 

The exact process you follow will also differ depending on the capabilities of the tools you use. 

For this article, I’ll assume you’re using an agentic AI tool like Claude Code, or something similar: a well-rounded coding agent with levers you can tweak and a dedicated planning mode, something that more tools are adopting. I’ve found this type of tool to be the most impactful.

With such a tool, an effective AI coding cycle should look something like this:

The cycle consists of four phases:

Prompting: Giving instructions to the AI

Planning: Working with the AI to construct a change plan

Producing: Guiding the AI as it makes changes to the code

Refining: Using learnings from this iteration to update your approach for the next cycle

You might think this is overly complicated. Surely you could simply go between prompting and producing repeatedly? Yes, you could do that, and it might work well enough for small changes. 

But you’ll soon find that it doesn’t help you write sustainable code quickly. 

Without each step in this loop, you risk that the AI tool will lose its place or context, and the quality of its output will plummet. One of the major limitations of these tools is that they will not stop and warn you when this happens; they’ll just keep on trying their best. As the operator of the tool and ultimately the owner of the code, it’s your responsibility to set the AI up for success. 

Let’s look at what this workflow looks like in practice.

1. Prompting

AI tools are not truly autonomous: the quality of the output reflects the input you provide. That’s why prompting is arguably the most important phase in the loop: how well you do it will determine the quality of output you get, and by extension, how productive your use of AI will be.

This phase has two main considerations: context management and prompt crafting.

Context Management

A common characteristic of current-gen AI tools is that the quality of their output tends to decrease as the amount of context they hold increases. This happens for several reasons:

Poisoning: errors or hallucinations linger in context

Distractions: the model reuses mediocre context instead of searching for better info    

Confusion: irrelevant details lower output quality

Clashes: outdated or conflicting info leads to errors

As long as AI tools have this limitation, you get better results by strictly managing the context.

In practice, this means rather than having one long-running conversation with your agent, you should “wipe” its context in between tasks. Start from a fresh slate each time, and re-prompt it with the information it needs for the next task so that you don’t implicitly rely on accumulated context. With Claude Code, you do this with the /clear slash command. 

If you don’t clear context, tools like Claude will “auto-compact” it, a lossy process that can carry forward errors and reduce quality over time.

If you need any knowledge to persist between sessions, you can have the AI dump it into a markdown file. You can then either reference these markdown files in your tool’s agent file (CLAUDE.md for Claude Code) or mention the relevant files when working on specific tasks and have the agent load them in.

Structure varies, but it might look something like this…

.

├── CLAUDE.MD

└── docs

└── agents

     └── backend

         ├── api.md

         ├── architecture.md

         └── testing.md

“`

Prompt Crafting

After ensuring you’re working with a clean context window, the next most important thing is the input you provide. Here are the different approaches you can take depending on the task you are dealing with.

Decomposition

Generally, you want to break work down into discrete, actionable chunks. Avoid ambiguous high-level instructions like “implement an authentication system”, as this has too much variability. Instead, think about how you would actually do the work if you were going to do it manually, and try to guide the AI along the same path.

Here’s an example from a document management system task I gave Claude. You can view the whole interaction summary in this GitHub repo.

Prompt: “Look at DocumentProcessor and tell me which document types reference customers, projects, or contracts.”

Output: AI identified all references

Prompt: “Update the mapping functions at {location} to use those relationships and create tests.”

Output: Implemented mappings + tests

Prompt: “Update documentIncludes to ensure each type has the right relations. Check backend transformers to see what exists.”

Output: Filled in missing relationships

Notice how the task is tackled in steps. A single mega-prompt would have likely failed at some point due to multiple touchpoints and compounding complexity. Instead, small prompts with iterative context led to a high success rate. 

Once the task is done, wipe the context again before moving on to avoid confusing the AI.

Chaining

Sometimes you do need a more detailed prompt, such as when tasking the AI with a larger investigation task. In this case, you can improve your chances of success greatly by chaining prompts together. 

The most common way of doing this is by providing your initial prompt to a separate LLM, such as ChatGPT or Claude chat, and asking it to draft a prompt for you for a specific purpose. Once you’re satisfied with the parameters of the detailed prompt, feed it into your coding agent. 

Here’s an example:

Prompt (ChatGPT): “Draft me a prompt for a coding agent to investigate frontend testing patterns in this codebase, and produce comprehensive documentation that I can provide to an AI to write new tests that follow codebase conventions.”

This prompt produces a fairly detailed second-stage prompt that you can review, refine, and feed to your agent:

You can see the full output here. 

This approach obviously works best when you ensure the output aligns with the reality of your code. For example, this prompt talks about `jest.config.js`, but if you don’t use jest, you should change this to whatever you do use. 

Reuse

Sometimes, you’ll find a pattern that works really well for your codebase or way of working. Often, this will happen after Step 4: Refining, but it can happen at any time. 

When you find something that works well, you should set it aside for reuse. Claude Code has a few ways you can do this, with the most idiomatic one being custom slash commands. The idea here is that if you have a really solid prompt, you can encode it as a custom command for reuse.

For example, one great time saver I found was using an agent to examine a Laravel API and produce a Postman collection. This was something I used to do manually when creating new modules, which can be quite time-consuming.

Using the chaining approach, I produced a prompt that would:

Generate a new Postman collection for a given backend module

Use the Controller/API test suite to inform the request body values

Use the Controller and route definitions to determine the available endpoints

Running the prompt through an agent consistently produced a working Postman collection almost instantly. You can see the prompt here. 

When you find a valuable pattern or prompt like this, you should consider sharing it with your team, too. Increasing productivity across your team is where the real compounding benefits can happen.

2. Planning

Tools like Claude Code have a planning mode that allows you to run prompts to build context without making any changes. While you don’t always need this functionality, it’s invaluable if you’re dealing with a change with any appreciable amount of complexity.

Typically, the tool will perform an investigation to find all the information it needs to determine what it would do if it weren’t in planning mode. It will then present you with a summary of the intended changes. The key inflection point here is that it allows you to review what the AI is planning to do.

In the screenshot below, I used planning mode to ask Claude what’s needed to add “Login with Google” to an existing app that already supports “Login with Discord”:

I could see everything the AI planned to change to decide whether it makes sense for my use case.

Important: read the plan carefully! Make sure you understand what the AI is saying, and make sure it makes sense. If you don’t understand or if it seems inaccurate, ask it to clarify or investigate more. 

You should not move on from the planning phase until the plan looks exactly like what you would expect.

If the AI proposes rewriting a huge amount of code, treat it as a red flag. Most development should be evolutionary and iterative. If you break work into small chunks, the AI should propose and make small changes, which in turn will be easier to review. If the plan includes far more changes than you expected, review your input to see if the AI is missing important context.

Once you’ve iterated on the plan, you can give the AI the go-ahead to execute the plan.

3. Producing

During the third phase, the AI will begin to make changes to your codebase. Although the AI will produce most of the output here, you’re not off the hook. You still own any code it produces at your behest, for better or worse. It’s therefore better to see the producing phase as a collaboration between you and the AI: the AI produces code and you’re guiding it in real-time.

To get the most out of your AI tool and spend the least amount of time doing rework, you need to guide it. Remember, your goal is maximum productivity—real productivity, not just lines of code. That requires that you need to actively engage with the tool and work with it as it builds things, rather than just leaving it to its own devices.

If you take sufficient care with creating your prompt and doing planning, there shouldn’t be too many surprises during the actual coding phase. However, AI can still make mistakes, and it will certainly overlook things, especially in larger systems. (This is one of the major reasons why fully “vibe coded” projects break down quickly as they increase in scope. Even when the entire system has been built by AI, it will not remember or know everything that exists in the codebase.)

A day must still pass where I’ve not caught AI making a mistake. They might be small mistakes, like using string literals in place of pre-existing constants, or inconsistent naming conventions. These things might not even stop the code from working. 

However, if you let these changes through unchecked, it will be the start of a slippery slope that is hard to recover from. Be diligent, and treat any AI-generated code as you would code from another team member. Better still, understand that this code has your name attached to it, and don’t accept anything that you aren’t willing to “own” in perpetuity.

So if you notice a mistake has been made, point it out and suggest how it can be fixed. If the tool deviates from the plan or forgets something, try to catch it early and course-correct. Because your prompts are now small and focused, the features the AI builds should also be smaller. This makes reviewing them easier.

4. Refining

Luckily, rather than constantly fighting the machine and going back and forth on minor issues, the final phase of the loop—refining—offers a more sustainable way to calibrate your AI tool over time.

You might not make a change to your setup after every loop, but every loop will yield insight into what is working well and what needs to change. 

The most common way to tweak the behavior of AI tools is to use their specific steering documents. For instance, Claude has CLAUDE.md, and Cursor has Rules. 

These steering documents are typically a markdown file that gets loaded into the agent’s context automatically. In it, you can define project-specific rules, style guides, architectures, and more. If you find, for example, that the AI constantly struggles with how to set up mocks in your tests, you can add a section to your doc that explains what it needs to know, with examples it can use for reference, or links to known-good files in the codebase it can look at. 

This file shouldn’t get too big, as it does take up space in the LLM’s context. Treat it like an index, where you include information that is always needed directly in the file, and link out to more specialized information that AI can pull in when needed. 

Here’s an excerpt from one of my CLAUDE.md files that work well:

“`md

## Frontend

### Development Guidelines

For detailed frontend development patterns, architecture, and conventions, see:
**[Frontend Module Specification](./docs/agents/frontend/frontend-architecture.md)**

This specification covers:

– Complete module structure and file organization
– Component patterns and best practices
– Type system conventions
– Testing approaches
– Validation patterns
– State management
– Performance considerations

“`

The AI understands the hierarchy of markdown files, so it will see that there’s a section about frontend development guidelines, and it will see a link to a module specification. The tool will then decide internally whether it needs this information. For instance, if it’s working on a backend feature, it will skip it, but if it’s working on a frontend module, it will pull in this extra file. 

This feature allows you to conditionally expand and refine the agent’s behavior, tweaking it each time it has trouble in a specific area, until it can work in your codebase effectively more often than not.

Exceptions to the Cycle

There are some cases where it makes sense to deviate from this flow.

For quick fixes or trivial changes, you might only need Prompting → Producing. For anything beyond that, skipping planning and refinement usually backfires, so I don’t recommend it.

Refinement will likely need to be done quite often when first starting or when moving to a new codebase. As your prompts, workflows, and setup mature, the need to refine drops. Once things are dialed in, you likely won’t need to tweak much at all.

Finally, while AI can be a real accelerator for feature work and bug fixes, there are situations where it will slow you down. This varies by team and codebase, but as a rule of thumb: if you’re deep in performance tuning, refactoring critical logic, or working in a highly regulated domain, AI is more likely to be a hindrance than a help.

Other Considerations

Beyond optimizing your workflow with AI tools, a few other factors strongly affect output quality and are worth keeping in mind.

Well-Known Libraries and Frameworks

One thing you’ll notice quickly is that AI tools perform much better with well-known libraries. These are usually well-documented and likely included in the model’s training data. In contrast, newer libraries, poorly documented ones, or internal company libraries tend to cause problems. Internal libraries are often the hardest, since many have little to no documentation. This makes them difficult not only for AI tools but also for human developers. It’s one of the biggest reasons AI productivity can lag on existing codebases.

In these situations, your refinement phase often means creating guiding documentation for the AI so it can work with your libraries effectively. Consider investing time up front to have the AI generate comprehensive tests and documentation for them. Without it, the AI will have to reanalyze the library from scratch every time it works on your code. By producing documentation and tests once, you pay that cost up front and make future use much smoother.

Project Discoverability

The way your project is organized has a huge impact on how effectively AI can work with it. A clean, consistent directory structure makes it easier for both humans and AI to navigate, understand, and extend your code. Conversely, a messy or inconsistent structure increases confusion and lowers the quality of output you get.

For instance, a clean, consistent structure might look like this:

“`
.
├── src
│ ├── components
│ ├── services
│ └── utils
├── tests
│ ├── unit
│ └── integration
└── README.md

“`

Compare that with this confusing structure:

“`
.
├── components
│ └── Button.js
├── src
│ └── utils
├── shared
│ └── Modal.jsx
├── pages
│ ├── HomePage.js
│ └── components
│ └── Card.jsx
├── old
│ └── helpers
│ └── api.js
└── misc
└── Toast.jsx
“`

In the clear structure, everything lives in predictable places. In the confusing one, components are scattered across multiple folders (`components`, `pages/components`, `shared`, `misc`), utilities are duplicated, and old code lingers in `old/`. 

An AI, like any developer, will struggle to build a clear mental model of the project, which increases the chance of duplication and errors. 

If your codebase has a confusing structure and restructuring it is not an option, map out common patterns—even if there are multiple patterns for similar things—and add these to your steering document to reduce the amount of discovery and exploration the AI tool needs to do.

Wrapping Up

Adding AI tools to your workflow won’t make you a 10x developer overnight. You might even find that they slow you down a bit initially, as all new tools do. But if you invest the time to learn them and adapt your workflow, the payoff can come surprisingly quickly.

The AI tooling space is evolving fast, and the tools you use today will likely feel primitive a year from now. However, the habits you build and the workflow you develop—the way you prompt, plan, act, and refine—will carry forward in one form or another. Get those fundamentals right, and you’ll not only keep up with the curve, you’ll stay ahead of it.

Quelle: https://blog.docker.com/feed/

How Docker Hardened Images Patches Vulnerabilities in 24 hours

On November 19, 2025, the Golang project published two Common Vulnerabilities and Exposures (CVEs) affecting the widely-used golang.org/x/crypto/ssh package. While neither vulnerability received a critical CVSS score, both presented real risks to applications using SSH functionality in Go-based containers.

CVE-2025-58181 affects SSH servers parsing GSSAPI authentication requests. The vulnerability allows attackers to trigger unbounded memory consumption by exploiting the server’s failure to validate the number of mechanisms specified in authentication requests. CVE-2025-47914 impacts SSH Agent servers that fail to validate message sizes when processing identity requests, potentially causing system panics when malformed messages arrive. (These two vulnerabilities came just days after CVE-2025-47913, a high-severity vulnerability affecting the same Golang component that Docker also quickly patched)

For teams running Go applications with SSH functionality in their containers, leaving these vulnerabilities unpatched creates exposure to denial-of-service attacks and potential system instability.

How Docker achieves lightning fast vulnerability response

When these CVEs hit the Golang project’s security feed, Docker Hardened Images customers had patched versions available in less than 24 hours. This rapid response stems from Docker Scout’s continuous monitoring architecture and DHI’s automated remediation pipeline.

Here’s how it works:

Continuous CVE ingestion: Unlike vulnerability scanning that runs on batch schedules, Docker Scout continuously ingests CVE information from upstream sources including GitHub security advisories, the National Vulnerability Database, and project-specific feeds. The moment CVE data becomes available, Scout begins analysis.

Instant impact assessment: Within seconds of CVE ingestion, Scout identifies which Docker Hardened Images are affected based in Scout’s comprehensive SBOM database. This immediate notification allows the remediation process to start without delay.

Automated patching workflow: Depending on the vulnerability and package, Docker either patches automatically or triggers a manual review process for complex changes. For these Golang SSH vulnerabilities, the team initiated builds immediately after upstream patches became available.

Cascading builds: Once the patched Golang package builds successfully, the system automatically triggers rebuilds of all dependent packages and images. Every Docker Hardened Image containing the affected golang.org/x/crypto/ssh package gets rebuilt with the security fix.

The entire process, from CVE disclosure to patched images available to customers, was completed in under 24 hours. Customers using Docker Scout received immediate notifications about the vulnerabilities and the availability of patched versions.

Why Docker’s Security Response Is Different

One of Docker’s key differentiators is its continuous, real-time monitoring, rather than periodic batch scanning. Traditional vulnerability management relies on daily or weekly scans, leaving containers exposed to known vulnerabilities for hours or even days.

With Docker Scout’s real-time CVE ingestion, detection starts the moment a vulnerability is published, enabling remediation within seconds and minimizing exposure.

This foundation powers Docker Hardened Images (DHI), where packages and dependencies are continuously tracked and automatically updated when issues arise. For example, when vulnerabilities were found in the golang.org/x/crypto library, all affected images were rebuilt and released within a day. Customers simply pull the latest tags to stay secure, no manual patching, emergency maintenance, or impact triage required.

But continuous monitoring is just the foundation. What truly sets Docker apart is how that real-time intelligence flows into an automated, transparent, and trusted remediation pipeline, built on over a decade of experience securing and maintaining the Docker Official Images program.These are the same images trusted and used by millions of developers and organizations worldwide, forming the foundation of countless production environments. That long-standing operational experience in continuously maintaining, rebuilding, and distributing secure images at global scale gives Docker a proven track record in delivering reliability, consistency, and trust few others can match.

Beyond automation, Docker’s AI guardrails add yet another layer of protection. Purpose-built for the Hardened Images pipeline, these AI systems continuously analyze upstream code changes, flag risky patterns, and prevent flawed dependencies from entering the supply chain. Unlike standard coding assistants, Docker’s AI guardrails are informed by manual, project-specific reviews, blending human expertise with adaptive intelligence. When the system detects a high-confidence issue such as an inverted error check, ignored failure, or resource mismanagement, it halts the release until a Docker engineer verifies and applies the fix. This human-in-the-loop model ensures vulnerabilities are caught long before they can reach customers, turning AI into a force multiplier for safety, not a replacement for human judgment.

Another critical differentiator is complete transparency. Consider what happens when a security scanner still flags a vulnerability even after you’ve pulled a patched image. With DHI, every image includes a comprehensive and accurate Software Bill of Materials (SBOM) that provides definitive visibility into what’s actually inside your container. When a scanner reports a supposedly remediated image as vulnerable, teams can verify the exact package versions and patch status directly from the SBOM instead of relying on scanner heuristics.

This transparency also extends to how Docker Scout handles CVE data. Docker relies entirely on independent, third-party sources for vulnerability decisions and prioritization, including the National Vulnerability Database (NVD), GitHub Security Advisories, and upstream project maintainers. This approach is essential because traditional scanners often depend on pattern matching and heuristics that can produce false positives. They may miss vendor-specific patches, overlook backported fixes, or flag vulnerabilities that have already been remediated due to database lag. In some cases, even vendor-recommended scanners fail to detect unpatched vulnerabilities, creating a false sense of security.

Without an accurate SBOM and objective CVE data, teams waste valuable time chasing phantom vulnerabilities or debating false positives with compliance auditors. Docker’s approach eliminates that uncertainty. Because the SBOM is generated directly from the build process, not inferred after the fact, it provides definitive evidence of what’s inside each image and why certain CVEs do or don’t apply. This transforms vulnerability management from guesswork and debate into objective, verifiable security assurance, backed by transparent, third-party data.

CVEs don’t have to disrupt your week

Managing vulnerabilities consumes significant engineering time. When critical CVEs drop, teams rush to assess impact, test patches, and coordinate deployments. Docker Hardened Images eliminate this overhead by continuously updating base images with complete transparency into their contents with rapid turnarounds to reduce your exposure window.

If you’re tired of vulnerability whack-a-mole disrupting your team’s roadmap, Docker Hardened Images offers a better path forward. Learn more about how Docker Scout and Hardened Images can reduce your vulnerability management burden, or contact our team to discuss your specific security requirements.

Quelle: https://blog.docker.com/feed/