AWS Batch now provides Job Queue and Share Utilization Visibility

AWS Batch now provides Queue and Share Utilization Visibility, giving you insights into how your workloads are distributed across compute resources. This feature introduces queue utilization data in job queue snapshots, revealing compute capacity used by your first-in-first-out (FIFO) and fair share job queues, along with capacity consumption by individual fair share allocations. Additionally, the ListServiceJobs API now includes a scheduledAt timestamp for AWS Batch service jobs, allowing you to track when jobs are scheduled for execution. Queue and Share Utilization Visibility helps you understand which fair-share allocations consume the most capacity and pinpoint the specific jobs driving resource consumption. You can monitor overall queue utilization and drill down into active shares to optimize resource distribution, or filter jobs by share identifier to analyze consumption patterns and scheduling behavior across your workloads. You can access this feature using the GetJobQueueSnapshot, ListJobs, and ListServiceJobs APIs, or through the AWS Batch Management Console by navigating to your job queue details page and selecting the new Share Utilization tab. This feature is available today in all AWS Regions where AWS Batch is available. To learn more, visit the Job Queue Snapshot, List Jobs, and List Service Jobs pages of the AWS Batch API Reference Guide.
Quelle: aws.amazon.com

Enhanced storage resiliency with Azure NetApp Files Elastic zone-redundant service 

In today’s globally connected environment, even minutes of downtime can disrupt supply chains, hinder customer experiences, and affect regulatory commitments costing organizations thousands of dollars per minute and risking lasting reputational harm. Data resiliency is no longer optional—it is the foundation that keeps mission‑critical applications running, teams productive, and compliance intact. Organizations must ensure continuous data availability and zero data loss to meet stringent regulatory and audit standards.

Azure NetApp Files (ANF) is an Azure first party, enterprise-grade file storage offering in the cloud built to deliver unmatched performance, security with instant provisioning, and enterprise data management capabilities like snapshots, replication, backups, and flexible, cost-efficient service levels. It’s the trusted platform for critical workloads and the engine behind numerous migrations that drive Azure enterprise cloud journeys.

As businesses scale, the impact of service disruptions grows, and storage platforms must evolve to help maintain operational continuity. To support these needs, we are introducing Azure NetApp Files Elastic zone‑redundant storage (ANF Elastic ZRS)—a new service that provides enhanced resiliency across availability zones (AZs) and advanced data management capabilities. Built on Azure infrastructure, ANF Elastic ZRS can be deployed rapidly, accelerating availability across a broad set of regions. As a customer, you benefit from a service that can quickly adapt, whether that means expanding to a new Azure region you require, or adopting new technological advancements faster.

Synchronous replication and service managed failover

Built on Microsoft Azure’s zone-redundant storage (ZRS) architecture, ANF Elastic ZRS synchronously replicates data across multiple AZs within a region.

The graphic below shows the high-level architecture of ANF Elastic ZRS.

The following summarizes the operational methodology used: 

Synchronous replication: ANF Elastic zone-redundant storage (ZRS) volume replicates data within your NetApp Elastic Account to three or more AZs within the primary region of choice.

Service managed failover: If one AZ goes down (think single datacenter failure, power outage), ANF Elastic ZRS automatically routes traffic to the failover zone that was chosen during initial setup without customer intervention. The mount target and service endpoint remain the same, so your applications barely notice.

Zero data loss and high availability for mission-critical applications: ANF Elastic ZRS ensures uninterrupted operations for mission-critical applications and regulated systems by combining synchronous replication with service-managed failover. This architecture guarantees zero data loss and seamless continuity even during zone-level failures, meeting enterprise resiliency standards.

Capabilities for enterprise mission-critical applications

NFS, SMB protocol support, multi-AZ file service: ANF Elastic ZRS supports NFS and SMB independently with the added benefit of zonal redundancy.

Rich enterprise data management capabilities: Instant writeable space-efficient snapshots, clones, tiering, and backup integration all powered by the NetApp ONTAP® Unified data management platform.

Metadata performance: Beyond consistent throughput, ANF Elastic ZRS provides efficient handling of metadata operations such as rapid file creation and fast enumeration of numerous small files, improving responsiveness of metadata-heavy workloads. Its shared QoS architecture dynamically allocates IOPS across volumes to maintain low-latency, metadata-intensive operations consistently.

Cost optimized: ANF Elastic ZRS delivers multi-AZ high availability in a single volume at a lower cost than cross-zone replication with three separate ANF volumes, making it an efficient and valuable investment. You can create volumes as small as 1 GiB, giving you the flexibility to optimize storage for workloads of any size.

Apart from the above capabilities that are already supported, we will be adding the below capabilities in the future:

Simultaneous multi-protocol, multi-AZ file service: The service will support simultaneous NFS, SMB, and Object REST API access to the same dataset. This flexibility is crucial for environments where Windows and Linux workloads share data, such as analytics platforms and enterprise file shares.

Custom region pair for cross-region replication: ANF Elastic ZRS will be offering the flexibility to choose custom region pairs for cross-region replication, meeting strict compliance and disaster recovery requirements for large enterprise customers. This capability ensures business continuity while aligning with unique geographic or regulatory needs.

Migration Assistant: This capability will simplify data movement from on-premises or other ONTAP® systems to Elastic ZRS service level for seamless cloud adoption.

Targeted use cases

Use caseHow ANF Elastic ZRS helps
General file shares

Corporate user data including home directories or departmental file shares, remain accessible even during zone outages, ensuring seamless employee productivity and business continuity.

Financial services and trading platforms

Delivers zero data loss and nonstop trading even in a zone outage, so financial apps stay compliant, and customers stay confident.

Kubernetes/containerized applications

Keeps Kubernetes and container workloads running even during zone outages by instantly synchronizing data across zones and enabling rapid, automated failover. Your stateful apps stay online with zero data loss and minimal downtime.

Applications

Ensure nonstop access to in-house/line of business apps, even if an AZ fails, as data stays online with zero outages or lost transactions.

Innovation in action

A global healthcare enterprise running mission-critical application, on-premises is looking to modernize its infrastructure by moving mission-critical workloads to the cloud. The organization’s top priorities are eliminating downtime during maintenance and ensuring seamless failover to maintain application availability.

By adopting ANF Elastic ZRS, the organization gained a fully managed, zone-resilient storage solution that synchronously replicates data across three Azure AZs. This ensures zero data loss and near-instant failover, even during zone outages or platform maintenance. The mission-critical application remains highly available without requiring operational intervention, dramatically improving uptime, and simplifying infrastructure management.

ANF Elastic ZRS delivers the same enterprise-grade features as existing Azure NetApp Files service levels—such as NFS/SMB support, snapshots, encryption, and backup—all while removing the complexity of managing high availability (HA) clusters or VM-level failover. For the healthcare provider, this translates into higher SLA compliance, reduced operational overhead, and a more resilient foundation for its mission-critical application landscape.

Take the next step toward resilient storage

NetApp on premises customers: this is the inflection point you’ve been waiting for to move to Azure. ANF Elastic ZRS‑ brings the ONTAP®-powered data management you trust to a fully managed, multi-AZ file service, so you can modernize mission‑AZ file service, so you can modernize mission‑critical workloads with enterprise resilience, meet availability commitments, and reduce operational overhead—without re‑architecting your applications.

ANF Elastic ZRS is available now in these Azure regions and we will be rapidly expanding region availability.

Learn more:

Understand Azure NetApp Files Elastic zone-redundant storage

Create an SMB volume for Azure NetApp Files

Create an NFS volume for Azure NetApp Files

Comparison of Elastic vs other ANF service levels

Explore ANF Elastic ZRS in Azure today
Join the ranks of enterprises who demand the best in performance, availability, and simplicity.

Start experimenting

The post Enhanced storage resiliency with Azure NetApp Files Elastic zone-redundant service  appeared first on Microsoft Azure Blog.
Quelle: Azure

Claude Opus 4.6: Anthropic’s powerful model for coding, agents, and enterprise workflows is now available in Microsoft Foundry

At Microsoft we believe that intelligence and trust are the core requirements of agentic AI at scale. Built on Azure, Microsoft Foundry brings these capabilities together on a secure, scalable cloud foundation for enterprise AI. Today, with the launch of Claude Opus 4.6 in Microsoft Foundry, we bring even more capability to agents that increasingly learn from and act on business systems.

Explore Claude in Microsoft Foundry today

Claude Opus 4.6 brings Anthropic’s most advanced reasoning capabilities into Microsoft Foundry, our interoperable platform where intelligence and trust come together to enable autonomous work. In Foundry, Opus 4.6 can activate knowledge from everywhere: leveraging Foundry IQ to access data from M365 Work IQ, Fabric IQ, and the web. The model is best applied to complex tasks across coding, knowledge work, and agent-driven workflows, supporting deeper reasoning while offering superior instruction following for reliability.

Developers can now delegate their most complex work to AI systems for full-lifecycle development from, requirements gathering to implementation and maintenance. Business users can generate documents, perform research, and draft copy with professional polish and domain awareness.

At Adobe, we’re continuously evaluating new AI capabilities that can help us deliver more powerful, responsible, and intuitive experiences for our customers. We’ve been testing Claude models in Microsoft Foundry and are excited about the direction of Anthropic’s model roadmap. Foundry gives us a flexible, enterprise-ready environment to explore frontier models while maintaining the trust, governance, and scale that are critical for Adobe.
—Michael Marth, VP Engineering for Experience Manager and LLM Optimizer, Adobe

Introducing Opus 4.6: Frontier intelligence, built for real work

Claude Opus 4.6 is the latest version of Anthropic’s most intelligent model, and is considered the best Opus model for coding, enterprise agents, and professional work. With a 1M token context window (beta) and 128K max output, Opus 4.6 is ideal for:

Production code

Sophisticated agents

Office tasks

Financial analysis

Cybersecurity

Computer use

By combining Anthropic’s most advanced model with Foundry’s end-to-end tooling, customers can move from experimentation to production faster, without stitching together infrastructure.

1. Autonomous coding at a new level

Opus 4.6 handles large codebases well and is particularly effective at long-running tasks like refactoring, bug detection, and complex implementations.

This allows senior engineers to delegate work that previously took days, covering everything from requirements gathering to implementation and maintenance, while staying focused on reviews and decision-making.

With Foundry’s managed infrastructure and operational controls, teams can compress development timelines from days into hours, while maintaining the rigor required for real-world production systems.

Macroscope has relied on Anthropic’s frontier models to push what’s possible in AI code review, helping us catch the bugs that matter most to customers before they ever reach production. We’re incredibly excited to deepen this partnership by scaling on Azure infrastructure and delivering our product to teams around the world.
—Kayvon Beykpour, CEO & Co-Founder, Macroscope

2. Better knowledge work across the enterprise

Opus 4.6 delivers a step-change in how enterprises approach knowledge work, which Anthropic defines across three pillars: search, analyze, and create. In Microsoft Foundry, these capabilities can be applied directly to enterprise data, workflows, and agent-driven applications.

Opus 4.6 understands the conventions and norms of professional domains, producing documents, spreadsheets, and presentations that look and read like expert-created work. Combined with Foundry’s governance and access controls, this makes Opus 4.6 a great fit for finance, legal, and other precision-critical industries where quality, context, and compliance matter.

At Dentons, we are scaling generative AI across drafting, review, and research workflows across our global teams. Claude in Microsoft Foundry brings the frontier reasoning strength we need for legal work, backed by the governance and operational controls required in an enterprise environment. Better model reasoning reduces rework and improves consistency, so our lawyers can focus on higher value judgment.
—Matej Jambrich, CTO, Dentons Europe

High context financial analysis

Opus 4.6 excels at connecting insights across regulatory filings, market reports, and internal enterprise data, surfacing conclusions that would traditionally take analysts days to compile.

Its advanced reasoning capabilities allow it to:

Navigate nuanced financial and regulatory contexts

Generate compliance-sensitive outputs

Maintain consistency and traceability across complex analytical workflows

When deployed through Microsoft Foundry, these workflows benefit from Azure’s security, compliance, and auditability, helping organizations apply frontier AI to high-stakes analysis with confidence.

3. Advancing agentic and computer use capabilities

According to Anthropic, Opus 4.6 delivers major gains in computer use, with strong performance on industry benchmarks for visual understanding and multi-step navigation. Opus 4.6 sets a new standard for computer use. Claude can now operate computers more accurately, handle more complex tasks, and work across multiple applications seamlessly.

It can interact with software, navigate interfaces, complete forms, and move data across applications. When deployed in Microsoft Foundry, these actions can run as secure, governed agents, enabling automation of workflows that span legacy systems, document processing, and operational tools.

Opus 4.6 can manage complex, multi-tool workflows with less oversight—an essential requirement for teams operating AI systems at scale.

Anthropic on Azure provides Momentic with the reliability guarantees needed to process millions of tokens per hour on state-of-the-art models like Opus 4.5. Azure’s platform works seamlessly with Anthropic’s SDK, even supporting beta features such as reasoning effort out of the box.
—Jeff Ann, CTO & Co-Founder, Momentic AI

4. Agents, security, and high-stakes reasoning

Opus 4.6 is also best suited for agentic workflows, reliably orchestrating complex tasks across dozens of tools. It can proactively spin up sub-agents, parallelize work, and drive tasks forward with minimal oversight.

For security workflows, Opus 4.6 delivers deep reasoning, enabling teams to identify subtle patterns and complex attack vectors with high accuracy.

Anthropic is a trusted partner for governments and companies alike. Their speed, accuracy, and toolkit are already helping Everstar make fast, safe nuclear energy deployments a reality. I’m excited to see these capabilities integrated natively on Azure for secure deployments for our government and nuclear customers.
—Kevin Kong, Founder & CEO, Everstar

New API capabilities co-launching with Opus 4.6

Alongside Opus 4.6, Anthropic is introducing new API capabilities, available through Microsoft Foundry—that give developers greater control, scalability, and efficiency:

Adaptive thinking: Allows Claude to dynamically decide when and how much reasoning is required, optimizing performance and speed on simpler tasks, while allowing Claude to think harder on complex tasks.

Context Compaction (beta): Supports long-running conversations and agentic workflows by summarizing older context as token limits are reached.

1M Context Window (beta): Support for up to 1M tokens, with premium pricing applied beyond 200K tokens.

Max effort control: A new max effort level joins high, medium, and low, offering finer control over token allocation across thinking, tools, and output.

128K Output Tokens: Enables richer, more comprehensive outputs in a single response.

Claude Opus 4.6 in Microsoft Foundry

As AI systems move from assistance to autonomy, success depends on more than frontier intelligence, it requires intelligence that can be trusted to operate in real-world environments. Claude Opus 4.6 brings advanced reasoning and long-horizon execution, and Microsoft Foundry provides the system context where that intelligence can be applied responsibly, at scale.

Together, Claude in Foundry enables organizations to move beyond isolated experiments and toward intelligent, agent-driven systems that deliver real business impact—grounded in trust, governance, and operational rigor.

Opus 4.6 is also available through Microsoft Copilot Studio, enabling organizations to quickly build, orchestrate, and deploy advanced AI agents without custom code.

Explore Claude Opus 4.6 in Microsoft Foundry
Explore how Claude Opus 4.6 can power trusted, autonomous work in Microsoft Foundry.

Discover Opus 4.6

The post Claude Opus 4.6: Anthropic’s powerful model for coding, agents, and enterprise workflows is now available in Microsoft Foundry appeared first on Microsoft Azure Blog.
Quelle: Azure

Five Reasons to attend SQLCon

The SQL community is gathering in Atlanta this March for the first‑ever SQLCon, co‑located with FabCon, the Microsoft Fabric Community Conference, March 16-20. One registration unlocks both events, giving you access to deep SQL expertise and the latest in Fabric, Power BI, data engineering, real‑time intelligence, and AI. Whether you’re a DBA, developer, data engineer, architect, or a leader building data‑driven team, this is your chance to learn, connect, and shape what’s next.

One pass, two conferences—double the valueRegister once, benefit twice. With SQLCon and FabCon under the same roof, you can mix a deep SQL session in the morning with a Fabric or AI talk in the afternoon, then drop into the shared expo and community lounge. It’s a seamless, high‑impact week that lets specialists go deep while cross‑functional teams build a common language across data, analytics, and AI.

Dive Deep with Interactive Sessions and Hands-On WorkshopsThere are 50 SQL sessions at SQLCon. Fifty! The program is designed for momentum. Across the week, you’ll find practical content on SQL Server, Azure SQL, SQL database in Fabric, performance tuning, security and governance, migration and modernization, and building AI‑powered experiences with SQL. Monday and Tuesday are hands‑on workshop days—bring your laptop and leave with repeatable scripts, patterns, and demos you can apply immediately. Wednesday through Friday, you’ll stack conference sessions to round out your plan for the year.

Experience Atlanta: The Perfect Setting for SQLConSQLCon + FabCon take place at the Georgia World Congress Center, in the heart of a walkable downtown that’s tailor‑made for a great conference week. You’ll be just a short walk from Centennial Olympic Park, near State Farm Arena—home to major keynote events—and amid lively dining and music options. The attendee party is at the Georgia Aquarium, an unforgettable after‑hours experience with spectacular exhibits and a perfect setting for relaxed conversations with peers and product teams. Want a quick vibe check on the city and the conference energy? Watch the short video of Guy in a Cube and me:

Announcements on roadmap, engineering insights, and live updatesIf you want to understand where SQL Server, Azure SQL, and SQL database in Fabric are heading, this is the place. Expect direct updates from engineering (we’re sending over 30 members from the SQL product team); first‑look announcements; and live demos of upcoming capabilities across SQL tooling and drivers, SSMS/VS Code extensions, Copilot integrations, and Fabric SQL experiences. You’ll leave with clarity on what’s coming, how it impacts your environment, and where to invest next.

The SQL Community: Revitalized and EngagedSQLCon goes beyond a conference—it’s a gathering where the lounge hosts meetups and active conversations. Ask‑the‑Experts sessions connect you with engineers, MVPs, and product teams. Shared keynotes bring everyone together, and the city makes it easy to extend conversations into the evening. Bring your toughest questions, real-world challenges, and bold goals—you’ll leave with practical solutions, valuable connections, and new inspiration.

Bonus: make the budget workDepending on timing, look for early‑bird pricing, team discounts, or buy‑one‑get‑one offers on the registration page. These deals move fast, so check what’s live when you register. You can always use SQLCMTY200 for $200 off!

Wrap‑up: build the next chapter of your data strategy at SQLConSQLCon + FabCon is the highest‑leverage week of the year to sharpen your technical skills, understand SQL’s next chapter, accelerate modernization and performance, and build meaningful connections across the global community. If SQL plays any role in your data estate, this is the one event you shouldn’t miss.

See you in Atlanta!
The post Five Reasons to attend SQLCon appeared first on Microsoft Azure Blog.
Quelle: Azure

Can high-temperature superconductors transform the power infrastructure of datacenters?

As the demand for AI and data-intensive computing is on the rise, the need for efficient and reliable power delivery is critical. Enter high-temperature superconductors (HTS), a game-changing technology that can improve energy efficiency by reducing transmission losses. Microsoft is investigating HTS technology to understand how our datacenters can meet the growing demand for power and how to improve our operational sustainability. Superconductors offer a ‘lossless’ advantage, making power transmission more efficient.

See here how Microsoft datacenters support cloud around the globe

Superconductors let electricity flow with no resistance. This means we can move power more efficiently and increase capacity more quickly. Microsoft is exploring how this technology could make electrical grids stronger and reduce the impact datacenters have on nearby communities. Because superconductors take up less space to move large amounts of power, they could help us build cleaner, more compact systems.

Using this technology could change how power moves through the cloud and support AI and other demanding workloads. To make this possible, we need to rethink traditional power designs and how datacenters move electricity today. By working with superconducting technology partners and system integrators, we aim to turn this advanced science into real solutions that help our customers and communities.

Judy Priest, corporate vice president and chief technical officer of Cloud Operations and Innovation at Microsoft, and Erhan Karaca, Chief Technology Officer at VEIR, during factory test of 3MW superconducting cable.

How superconductors boost datacenter performance and efficiency

Reduction of datacenter impact through HTS capabilities.

Copper and aluminum are good conductors and are used today in most cloud infrastructure wiring and power lines. But HTS cables can do even better because they carry electricity with zero resistance. They are also smaller and lighter, and they don’t produce heat or introduce voltage drops as electricity travels through them. At the center of the technology are scalable high-availability cooling systems, maintaining HTS cables at cryogenic temperatures required to support the operational excellence of Microsoft’s datacenters. In copper, electrical current encounters resistance at every step, losing efficiency, generating heat, and limiting how much current we can move. Superconducting materials behave differently: once cooled, they create a pathway for current to move with zero resistance, eliminating losses, heat buildup, and removing limitations on how far the power can travel.

Why does this matter for datacenters specifically?

HTS is not new and has been researched for decades across energy, transportation, and advanced science. Only recently the economics and manufacturing aspects of this technology made it viable at Microsoft’s cloud scale. Datacenters can benefit from HTS because they concentrate massive electrical loads in compact footprints. Traditional conductors force operators to choose between expanding substations, adding more feeders, reducing deployment densities or curtailing growth. Superconductors break this tradeoff: they increase electrical density without expanding the physical footprint, allowing modern facilities to support AI-era power requirements within the same or even smaller physical constraints.

Inside the datacenters, more power delivered directly to the racks supports high-density, high-performance workloads with improved efficiency. HTS cables are lighter than copper and can carry current over longer distances, enabling further optimization of power distribution across racks and pods and reducing potential bottlenecks. We shared our vision for these novel architectures at OCP 2025 Summit.

In practice, HTS has already demonstrated the potential to reduce the size of the power cables by an order of magnitude when delivering power directly to a server rack—opening new possibilities for how power is distributed within a datacenter.

Ruslan Nagimov, principal infrastructure engineer for Cloud Operations and Innovation at Microsoft, near world’s first HTS-powered rack-prototype (superconducting line seen above the rack).

Increasing capacity with next‑gen power infrastructure

HTS technology also supports Microsoft’s long-term cloud plans. As our AI systems grow, power is still the biggest limit we face. By updating out power systems with superconductors, we can build electrical infrastructure that grows more easily with the rising demand for cloud services. This could even allow us to design new kinds of datacenter facilities in the future.

We need modern power systems that allow electrical capacity to scale dynamically without requiring entirely new power infrastructure. Next-gen superconducting transmission lines deliver an order of magnitude higher capacity than conventional lines at the same voltage level. In turn, they can accelerate the expansion and interconnection of datacenter sites, speeding up compute deployment to meet the growing global demand for cloud services. Superconductors represent a foundational shift for datacenters and the electrical grid, but unlocking their full potential will require reexamining traditional power system assumptions and rethinking today’s approaches to power transmission and datacenter design.

Superconductors are a category defining technology poised to transform how power is moved across the electricity value chain, stretching from generation to datacenter chips. At VEIR, we build complete power delivery solutions that take advantage of these remarkable materials, enabling customers to overcome critical bottlenecks in energy infrastructure, unlock new datacenter capacity faster, and achieve higher power and compute density.
Tim Heidel, CEO at VEIR (a Microsoft Climate Innovation Fund portfolio company)

Reduced impact on the grid and local communities

HTS systems reduce energy loss and require significantly less physical space for power delivery. From a grid perspective, they minimize voltage drop along transmission lines and can be used to introduce fault-current limiting capabilities, with the potential to enhance overall grid stability for high-demand facilities such as datacenters, but also for nearby homes, schools, hospitals and businesses.

Superconducting cables require smaller trenches and reduce the need for intrusive overhead power lines [Source: AMSC, LIPA Superconductor Project].

More importantly, this technology reduces the physical and social footprint of the power infrastructure, reducing the impact on local communities. Furthermore, expanding the electricity supply typically requires a complex effort that includes increasing electrical generation capacity and improving transmission and substation systems. Unlike traditional power lines, which rely on wider corridors and heavier, more visible infrastructure (tall overhead lines and expansive substations), HTS supports smaller, quieter, and far less intrusive systems. HTS transmission lines can transfer the same amount of power as conventional systems at lower voltage, reducing the setbacks and required right-of-ways. This translates to a better use of space, which reduces construction impact, shortens build timelines, and lowers pressure on surrounding communities.

Superconductors enabled ComEd to interconnect electrical grid substations in Chicago without disrupting local businesses or communities. Our proprietary solution uniquely increases grid resilience.
Daniel McGahn, CEO at American Superconductor Corporation (AMSC)

We are striving to accelerate indoor and outdoor applications of advanced power technologies like superconductors for faster and effective deployments of real-world datacenter infrastructure systems. Alongside breakthroughs in networking and cooling technologies like hollow-core fiber and microfluidics, high-temperature superconductors complete a strategic triad of power, network, and thermal innovation in our datacenters. You may never see the power lines, but HTS technology could be working behind the scenes to keep power, capacity, and AI infrastructure efficient, resilient, and future-ready, so our customers focus on what matters most: building and running their cloud infrastructure workloads.

Explore the future of datacenters

HTS is just one of the new technologies shaping the future of datacenters. As the cloud continues to grow, many other innovations—from advanced cooling systems to cleaner power solutions—are helping us build faster, smarter, and more sustainable facilities. Learn more about some of the other projects driving the next generation of datacenter design.

Learn more about Microfluidics cooling: Cooling at the micro level for Microsoft’s datacenters.

Learn more about how Microsoft Azure scales Hollow Core Fiber (HCF) production through outsourced manufacturing.

Learn more about building community-first AI infrastructure.

Get started with Azure today with a free account.

The post Can high-temperature superconductors transform the power infrastructure of datacenters? appeared first on Microsoft Azure Blog.
Quelle: Azure

Agentic cloud operations: A new way to run the cloud

Cloud operations have reached an inflection point. For more than a decade, the industry has focused on scale—more infrastructure, more data, more services, more dashboards to build and manage both infrastructure and applications. While today’s cloud delivers extraordinary flexibility, the rapid growth of modern applications and AI workloads has introduced levels of scale and complexity that traditional operations were not designed for.

See how you can run agents with Azure Copilot

As modern applications and AI workloads expand in scale, speed, and interconnectedness, operational demands are evolving just as quickly. Organizations are now looking for an operating model that builds on their existing practices—one that brings intelligence into the flow of work and translates the constant stream of signals into coordinated action across the cloud lifecycle.

A new operating model for a dynamic cloud

Macro trends are pointing towards major shifts in operations. In the era of AI, workloads can move from experimentation to full production in weeks, making constant change the new norm. Infrastructure and applications are continuously updated, scaled, and reconfigured. Telemetry now streams from every layer—health, configuration, cost, performance, and security—while programmable infrastructure enables action at machine speed. At the same time, AI agents are emerging as practical operational partners—able to correlate signals, understand context, and take action within defined guardrails. Together, these shifts are driving the need for a new operating model—one where operations are dynamic, context-aware, and continuously optimized rather than reactive and manual.

Introducing agentic cloud operations

Agentic cloud operations brings this model to life by enabling teams to harness AI-powered agents that infuse contextual intelligence into everyday workflow. These agents help accelerate development, migration, and optimization by connecting operational signals directly to coordinated action across the lifecycle. They bring people, tools, and data together, so insights don’t stay passive—they become execution. The result is faster performance, reduced risk, and cloud operations that improve over time instead of falling behind as complexity grows.

Azure Copilot: The agentic interface

Azure Copilot brings agentic cloud operations to life as the agentic interface for Azure. Rather than adding yet another dashboard, it delivers a unified, immersive experience grounded in a customer’s real environment—subscriptions, resources, policies, and operational history. Teams can work through natural language, chat, console, or CLI, invoking agents directly within their workflows. A centralized management environment brings observability, configuration, resiliency, optimization, and security together—enabling operators to move seamlessly from insight to action in one place.

Full-lifecycle agents, working in context

At Ignite, we unveiled the agentic capabilities of Azure Copilot. These capabilities span key operational domains—migration, deployment, optimization, observability, resiliency, and troubleshooting—each designed to bring contextual intelligence into the flow of work. Azure Copilot correlates signals, understands operational context, and takes governed action where it matters. Rather than functioning as discrete bots, they operate as a coordinated, context-aware system that continuously strengthens cloud operations.

Plan and prepare

Azure Copilot and agents helps teams start with clarity and confidence. Copilot migration agent can assist with discovering existing environments, mapping application and infrastructure dependencies, and identifying modernization paths before workloads move. Deployment agent then guides well-architected design and generate infrastructure as code artifacts that set strong operational patterns from the outset. In parallel, resiliency agent identifies gaps across availability, recovery, backup, and continuity—so reliability is designed in, not pathed later.

Deploy and launch

When teams are ready to go live, Copilot deployment agent support governed, repeatable deployment workflows that validate both infrastructure and application rollout. Observability agent establishes baseline health from the moment production traffic hits, while troubleshooting agent accelerates early-life issue resolution by diagnosing root causes, recommending fixes, and initiating support actions if needed. Throughout this phase, resiliency agent verifies that recovery and failover configurations hold up under real world conditions.

Operate, optimize, and evolve

In ongoing operations, Azure Copilot’s agentic capabilities deliver compounding value. Observability agent provides continuous, full stack visibility and diagnosis across applications and infrastructure. Optimization agent identify and execute improvements across cost, performance, and sustainability—often comparing financial and carbon impact in real time. Resiliency agent moves from validation to proactive posture management, continuously strengthening protection against emerging risks such as ransomware. Troubleshooting agent helps make the shift from reactive firefighting to rapid, context aware incident resolution. Last but not least, migration agent reenters the lifecycle to identify new opportunities to refactor or evolve workloads—not as a onetime event, but as continuous modernization.

In ongoing operations, Azure Copilot’s agentic capabilities deliver compounding value. Observability agent provides continuous, full stack visibility and diagnosis across applications and infrastructure. Optimization agent identifies and executes improvements across cost, performance, and sustainability—often comparing financial and carbon impact in real time. Resiliency agent moves from validation to proactive posture management, continuously strengthening protection against emerging risks such as ransomware. Troubleshooting agent helps make the shift from reactive firefighting to rapid, context aware incident resolution. Last but not least, migration agent reenters the lifecycle to identify new opportunities to refactor or evolve workloads—not as a onetime event, but as continuous modernization.

A connected system, not disparate tools

These capabilities don’t operate as isolated bots. They work within connected, context-aware workflows—correlating real time signals, understanding operational context, and taking governed action where it matters most. This allows teams to anticipate issues earlier, resolve them faster, and continuously improve their cloud posture across development, migration, and operations. The outcome isn’t fewer tools—it’s better flow, where people, data, and automation operate as a unified system.

Governance and human oversight by design

Agentic cloud operations are built for mission-critical systems, where governance and control are nonnegotiable. Azure Copilot embeds governance at every layer, allowing enterprises to define boundaries, apply policies consistently, and maintain clear oversight. Features such as Bring Your Own Storage (BYOS) for conversation history give customers even greater control—keeping operational data within their own Azure environment to ensure sovereignty, compliance, and visibility on their terms. All of this is grounded in Microsoft’s Responsible AI principles, ensuring autonomy and safety advance together. Every agent-initiated action honors existing policy, security, and RBAC controls. Actions are always reviewable, traceable, and auditable, ensuring human oversight remains central to automated workflows—not removed from them.

Operating with confidence as the cloud evolves

As cloud environments grow more dynamic and complex, operational models must evolve to match them. With Azure Copilot and agentic cloud operations, Microsoft is enabling organizations to operate mission-critical environments with greater speed, clarity, and control—providing the confidence to move forward as the cloud continues to change.

Explore more resources to deepen your understanding of agentic cloud operations

Access white paper on Intelligent Operations: How Agentic AI Is Aiming to Reshape IT.

Find resources, use cases, and get started with Azure Copilot.

From cloud to edge, see how Azure Copilot can help
Gain new insights, discover more benefits of the cloud, and orchestrate data across both the cloud and the edge.

Start here

The post Agentic cloud operations: A new way to run the cloud appeared first on Microsoft Azure Blog.
Quelle: Azure

How to solve the context size issues with context packing with Docker Model Runner and Agentic Compose

If you’ve worked with local language models, you’ve probably run into the context window limit, especially when using smaller models on less powerful machines. While it’s an unavoidable constraint, techniques like context packing make it surprisingly manageable.

Hello, I’m Philippe, and I am a Principal Solutions Architect helping customers with their usage of Docker.  In my previous blog post, I wrote about how to make a very small model useful by using RAG. I had limited the message history to 2 to keep the context length short.

But in some cases, you’ll need to keep more messages in your history. For example, a long conversation to generate code:

– generate an http server server in golang
– add a human structure and a list of humans
– add a handler to add a human to the list
– add a handler to list all humans
– add a handler to get a human by id
– etc…

Let’s imagine we have a conversation for which we want to keep 10 messages in the history. Moreover, we’re using a very verbose model (which a lot of tokens), so we’ll quickly encounter this type of error:

error: {
code: 400,
message: 'request (8860 tokens) exceeds the available context size (8192 tokens), try increasing it',
type: 'exceed_context_size_error',
n_prompt_tokens: 8860,
n_ctx: 8192
},
code: 400,
param: undefined,
type: 'exceed_context_size_error'
}

What happened?

Understanding context windows and their limits in local LLMs

Our LLM has a context window, which has a limited size. This means that if the conversation becomes too long… It will bug out.

This window is the total number of tokens the model can process at once, like a short-term working memory.  Read this IBM article for a deep dive on context window

In our example in the code snippet above, this size was set to 8192 tokens for LLM engines that power local LLM, like Docker Model Runner, Ollama, Llamacpp, …

This window includes everything: system prompt, user message, history, injected documents, and the generated response. Refer to this Redis post for more info. 

Example: if the model has 32k context, the sum (input + history + generated output) must remain ≤ 32k tokens. Learn more here.  

It’s possible to change the default context size (up or down) in the compose.yml file:

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m
# Increased context size for better handling of larger inputs
context_size: 16384

You can also do this with Docker with the following command: docker model configure –context-size 8192 ai/qwen2.5-coder `

And so we solve the problem, but only part of the problem. Indeed, it’s not guaranteed that your model supports a larger context size (like 16384), and even if it does, it can very quickly degrade the model’s performance.

Thus, with hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m, when the number of tokens in the context approaches 16384 tokens, generation can become (much) slower (at least on my machine). Again, this will depend on the model’s capacity (read its documentation). And remember, the smaller the model, the harder it will be to handle a large context and stay focused.

Tips: always provide an option (a /clear command for example) in your application to empty the message list, or to reduce it. Automatic or manual. Keep the initial system instructions though.

So we’re at an impasse. How can we go further with our small models?

Well, there is still a solution, which is called context packing.

Using context packing to fit more information into limited context windows

We can’t indefinitely increase the context size. To still manage to fit more information in the context, we can use a technique called “context packing”, which consists of having the model itself summarize previous messages (or entrust the task to another model), and replace the history with this summary and thus free up space in the context.So we decide that from a certain token limit, we’ll have the history of previous messages summarized, and replace this history with the generated summary.I’ve therefore modified my example to add a context packing step. For the exercise, I decided to use another model to do the summarization.

Modification of the compose.yml file

I added a new model in the compose.yml file: ai/qwen2.5:1.5B-F16

models:
chat-model:
model: hf.co/qwen/qwen2.5-coder-3b-instruct-gguf:q4_k_m

embedding-model:
model: ai/embeddinggemma:latest

context-packing-model:
model: ai/qwen2.5:1.5B-F16

Then:

I added the model in the models section of the service that runs our program.

I increased the number of messages in the history to 10 (instead of 2 previously).

I set a token limit at 5120 before triggering context compression.

And finally, I defined instructions for the “context packing” model, asking it to summarize previous messages.

excerpt from the service:

golang-expert-v3:
build:
context: .
dockerfile: Dockerfile
environment:

HISTORY_MESSAGES: 10
TOKEN_LIMIT: 5120
# …

configs:
– source: system.instructions.md
target: /app/system.instructions.md
– source: context-packing.instructions.md
target: /app/context-packing.instructions.md

models:
chat-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CHAT

context-packing-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_CONTEXT_PACKING

embedding-model:
endpoint_var: MODEL_RUNNER_BASE_URL
model_var: MODEL_RUNNER_LLM_EMBEDDING

You’ll find the complete version of the file here: compose.yml

System instructions for the context packing model

Still in the compose.yml file, I added a new system instruction for the “context packing” model, in a context-packing.instructions.md file:

context-packing.instructions.md:
content: |
You are a context packing assistant.
Your task is to condense and summarize provided content to fit within token limits while preserving essential information.
Always:
– Retain key facts, figures, and concepts
– Remove redundant or less important details
– Ensure clarity and coherence in the condensed output
– Aim to reduce the token count significantly without losing critical information

The goal is to help fit more relevant information into a limited context window for downstream processing.

All that’s left is to implement the context packing logic in the assistant’s code.

 Applying context packing to the assistant’s code

First, I define the connection with the context packing model in the Setup part of my assistant:

const contextPackingModel = new ChatOpenAI({
model: process.env.MODEL_RUNNER_LLM_CONTEXT_PACKING || `ai/qwen2.5:1.5B-F16`,
apiKey: "",
configuration: {
baseURL: process.env.MODEL_RUNNER_BASE_URL || "http://localhost:12434/engines/llama.cpp/v1/",
},
temperature: 0.0,
top_p: 0.9,
presencePenalty: 2.2,
});

I also retrieve the system instructions I defined for this model, as well as the token limit:

let contextPackingInstructions = fs.readFileSync('/app/context-packing.instructions.md', 'utf8');

let tokenLimit = parseInt(process.env.TOKEN_LIMIT) || 7168

Once in the conversation loop, I’ll estimate the number of tokens consumed by previous messages, and if this number exceeds the defined limit, I’ll call the context packing model to summarize the history of previous messages and replace this history with the generated summary (the assistant-type message: [“assistant”, summary]). Then I continue generating the response using the main model.

excerpt from the conversation loop:

let estimatedTokenCount = messages.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);
console.log(` Estimated token count for messages: ${estimatedTokenCount} tokens`);

if (estimatedTokenCount >= tokenLimit) {
console.log(` Warning: Estimated token count (${estimatedTokenCount}) exceeds the model's context limit (${tokenLimit}). Compressing conversation history…`);

// Calculate original history size
const originalHistorySize = history.reduce((acc, [role, content]) => acc + Math.ceil(content.length / 4), 0);

// Prepare messages for context packing
const contextPackingMessages = [
["system", contextPackingInstructions],
…history,
["user", "Please summarize the above conversation history to reduce its size while retaining important information."]
];

// Generate summary using context packing model
console.log(" Generating summary with context packing model…");
let summary = '';
const summaryStream = await contextPackingModel.stream(contextPackingMessages);
for await (const chunk of summaryStream) {
summary += chunk.content;
process.stdout.write('x1b[32m' + chunk.content + 'x1b[0m');
}
console.log();

// Calculate compressed size
const compressedSize = Math.ceil(summary.length / 4);
const reductionPercentage = ((originalHistorySize – compressedSize) / originalHistorySize * 100).toFixed(2);

console.log(` History compressed: ${originalHistorySize} tokens → ${compressedSize} tokens (${reductionPercentage}% reduction)`);

// Replace all history with the summary
conversationMemory.set("default-session-id", [["assistant", summary]]);

estimatedTokenCount = compressedSize

// Rebuild messages with compressed history
messages = [
["assistant", summary],
["system", systemInstructions],
["system", knowledgeBase],
["user", userMessage]
];
}

You’ll find the complete version of the code here: index.js

All that’s left is to test our assistant and have it hold a long conversation, to see context packing in action.

docker compose up –build -d
docker compose exec golang-expert-v3 node index.js

And after a while in the conversation, you should see the warning message about the token limit, followed by the summary generated by the context packing model, and finally, the reduction in the number of tokens in the history:

Estimated token count for messages: 5984 tokens
Warning: Estimated token count (5984) exceeds the model's context limit (5120). Compressing conversation history…
Generating summary with context packing model…
Sure, here's a summary of the conversation:

1. The user asked for an example in Go of creating an HTTP server.
2. The assistant provided a simple example in Go that creates an HTTP server and handles GET requests to display "Hello, World!".
3. The user requested an equivalent example in Java.
4. The assistant presented a Java implementation that uses the `java.net.http` package to create an HTTP server and handle incoming requests.

The conversation focused on providing examples of creating HTTP servers in both Go and Java, with the goal of reducing the token count while retaining essential information.
History compressed: 4886 tokens → 153 tokens (96.87% reduction)

This way, we ensure that our assistant can handle a long conversation while maintaining good generation performance.

Summary

The context window is an unavoidable constraint when working with local language models, particularly with small models and on machines with limited resources. However, by using techniques like context packing, you can easily work around this limitation. Using Docker Model Runner and Agentic Compose, you can implement this pattern to support long, verbose conversations without overwhelming your model.

All the source code is available on Codeberg: context-packing. Give it a try! 
Quelle: https://blog.docker.com/feed/