Introducing Azure Accelerate for Databases: Modernize your data for AI with experts and investments

In this article

Database modernization: Why now?Why Azure Accelerate for Databases? What you can do with Azure Accelerate for Databases Unlock savings and investments Get started with Azure Accelerate for Databases

Database modernization: Why now?

We consistently hear common realities from leaders: data infrastructure is a critical accelerator for AI adoption, and many organizations haven’t been able to fully realize the value of their data. 60% of AI projects unsupported by AI-ready data will be abandoned.1 Modernization is a key enabler of AI readiness, with 75% of organizations that migrated to Azure reporting significantly reduced barriers to AI and machine learning.2

This highlights a clear opportunity. Organizations that modernize with fully managed, AI-optimized databases can unlock faster performance, real-time insights, and the ability to build intelligent applications and agents at scale.

Today, I am excited to introduce Azure Accelerate for Databases—an offering designed to help organizations modernize their databases and build AI‑ready capabilities on Azure, faster and with greater confidence. Save up to 35% (vs. pay-as-you-go) with the savings plan for databases, receive delivery funding and Azure credits, and benefit from zero-cost delivery support. Azure Accelerate for Databases brings together expert guidance, investments, savings, and skilling into a single offering, helping teams move from legacy constraints to systems ready to support real-time, intelligent applications. 

Explore Azure Accelerate for Databases

Why Azure Accelerate for Databases?

Azure Accelerate for Databases is built for organizations modernizing at scale while preparing both their platforms and teams for what comes next. Modernization initiatives are often complex, requiring time, investment, and coordination across teams, while legacy environments can leave data fragmented and difficult to operationalize for AI.

Azure Accelerate for Databases is designed to simplify this journey. It brings together Microsoft Cloud Accelerate Factory delivery support, Azure specialized partner expertise, flexible savings and investments, AI-enhanced tooling and assessments, and role-based skilling into a cohesive experience.

The goal is straightforward: to help organizations move faster, reduce friction, and turn database modernization into a durable, AI-enabling strategy.

What you can do with Azure Accelerate for Databases

With Azure Accelerate for Databases, customers can:

Access trusted experts

Modernization outcomes depend on execution as much as strategy. With the right expertise in place, organizations can reduce risk and move forward with greater confidence.

Engage with Microsoft’s Cloud Accelerate Factory for zero-cost delivery support.3

Tap into Azure’s specialized partner ecosystem for deep technical and industry expertise.

Use assessments and AI-enhanced tooling to guide modernization and new development.

Unlock savings and investments

This removes financial barriers so customers can modernize faster, with more predictable economics and more flexibility to keep momentum as needs evolve.

Access savings up to 35% (vs. pay-as-you-go) with Savings Plan for Databases.

Advance your project with delivery funding.

Lower initial costs with Azure credits.4

How the Savings Plan for Databases works

The savings plan for databases5 offers a flexible, spend-based pricing model that adapts to evolving database needs. Customers commit to a fixed hourly spend, and savings are automatically applied to the most valuable usage each hour on select services. This helps reduce the complexity of managing multiple reservations and supports scaling without managing individual SKUs, regions, or configurations. When usage exceeds the commitment, pay-as-you-go pricing applies—helping costs remain predictable as usage grows.

Empower skilled teams

Modernization succeeds when teams can operate and innovate confidently. This helps organizations build durable capability—not just complete a project.

Build capable and confident teams with free, on-demand, self-paced skilling content.

Grow skills with on‑demand, expert‑led training.

Cultivate technical specialization with 50% discounts on certification exams.6

One example is Thomson Reuters, which modernized its tax preparation platform by migrating more than 18,000 databases, totaling over 500 terabytes of data, to Azure SQL Managed Instance. The goal was not only to address performance and scalability challenges during peak tax season, but to establish a more resilient and reliable data foundation for the future.

Running on Azure has helped improve application performance and scalability for 7,000 tax firms and 70,000 users. With a modern, fully-managed platform in place, Thomson Reuters is now better positioned to scale services and support continued innovation. The migration was accelerated through Microsoft’s Cloud Accelerate Factory, the zero-cost delivery benefit of Azure Accelerate, which provided hands-on engineering support, automation, and structured execution to help reduce risk and streamline the transition at scale.

Azure Accelerate for Databases is designed to support this kind of modernization progress, so they can build a stronger data foundation for AI.

Get started with Azure Accelerate for Databases

Modernizing your database estate is a critical step in preparing for AI. Azure Accelerate for Databases is designed to make that step more achievable by bringing together the resources, expertise, and investments needed to move forward with confidence.

To learn more, visit the Azure Accelerate for Databases page and explore savings, as well as access expert-led resources.

Join us at the Migrate & Modernize Summit (April 23 and on demand) to learn more about modernizing your database estate.

For more details, connect with your Microsoft account team.

Get expert-led resources for modernizing your database estate

1Lack of AI-Ready Data Puts AI Projects at Risk

2The Total Economic Impact™ Of Migrating To Microsoft Azure For AI-Readiness. Commissioned study.

3Zero‑cost delivery support for eligible customers through Microsoft‑funded programs. Availability and eligibility criteria apply.

4Eligible customers may receive delivery funding (for partner-led services) and Azure credits through approved Azure Accelerate programs. Funding is subject to application, project scope, and regional availability.

5Customers may see savings estimated to be between 0% and 35%. The 35% savings estimate is based on one Azure SQL Database serverless running for 12 months at a pay-as-you-go rate versus a reduced rate for a 1-year savings plan. Based on Azure pricing as of March 2026. Prices are subject to change. Actual savings may vary based on location, database service, and/or usage.

6Skilling benefits are subject to eligibility, approval, and availability.
The post Introducing Azure Accelerate for Databases: Modernize your data for AI with experts and investments appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft Discovery: Advancing agentic R&D at scale

Transforming R&D with agentic AI: Introducing Microsoft Discovery

Read the blog

Over the past year, we’ve made significant progress with Microsoft Discovery by working closely with research and development (R&D) organizations. Today, we’re sharing how those efforts are translating into real momentum for customers and partners, while also expanding preview access to Microsoft Discovery. This next phase reflects what we’ve learned as we continue to broaden access to enterprise-grade, agentic AI capabilities for R&D. The Microsoft Discovery platform continues to evolve with new capabilities, expanded partner interoperability, and a growing set of results with real-world scientific outcomes and engineering transformation. We believe what comes next can meaningfully change how R&D teams operate and empower them to achieve more.

Learn how to get started with Microsoft Discovery

The era of agentic AI for research and development 

Agentic AI opens a new chapter for R&D where autonomous agent teams, guided by human expertise, perform the core research and engineering tasks in a redefined agentic loop. Specialized agents can reason on top of vast amounts of organizational and public-domain knowledge, create hypotheses on an expanded search space, test and validate those hypotheses at scale, analyze the results, and feed conclusions into iterative loops. Empowering science and engineering experts with agentic AI has the potential to reshape the future of science and engineering, enabling organizations to lead boldly in the new Frontier R&D era.

This fundamental shift requires a deep transformation that encompasses both technological and organizational challenges. Scientific discovery has always been defined by ambition and the relentless pursuit of what comes next—a more sustainable material, a cleaner source of energy, a more effective treatment. But for many R&D teams the hardest work can begin after an idea shows promise. Turning concepts into outcomes requires repeated development cycles that involve reformulating candidates as new datasets emerge, re-engineering existing materials to meet evolving regulatory and performance requirements, or adjusting designs when performance, yield, or manufacturability fall short. As R&D grows more complex, tooling must evolve to help close the distance between what researchers and engineers want to pursue and what they can practically deliver.

Earlier generations of AI offered incremental relief through faster search and better retrieval, but lacked the deeper reasoning that genuinely complex, multi-disciplinary science demands. Tradeoffs across cost, performance, yield, compliance, and timelines must be revisited repeatedly as development progresses. But the convergence of large-scale reasoning models, agentic AI architectures, and high-performance cloud infrastructure has created a genuine opportunity to rethink how R&D work gets done—not only to improve existing processes at the margins, but to help teams iterate faster and move from hypothesis to candidate development to outcome with greater confidence.

Figure 1

When Microsoft Discovery was introduced in private preview last year, it was an early expression of that possibility: an agentic AI platform purpose-built for R&D, bringing together the reasoning depth and collaborative intelligence that complex, real-world R&D requires. The response from engineers and researchers across life sciences, chemistry and materials science, physics, semiconductors, and other fields made clear that the need was real and the approach was right.

The Microsoft Discovery platform 

Microsoft Discovery is an extensible platform that brings together agentic orchestration, advanced reasoning, a graph-based knowledge foundation, and high-performance computing. It helps drive the three principles outlined in Figure 1 for effective agentic discovery—enabling agent empowerment, discovery loop automation, and quality at scale. Because it is built on Microsoft Azure’s enterprise cloud infrastructure, Microsoft Discovery is designed to operate within the security, compliance, transparency, and governance frameworks used to manage sensitive real-world R&D environments.

Figure 2

Agents are equipped with a broad range of digital, physical, and analytical tools used across R&D. This includes in silico experimentation environments such as high-performance compute (HPC) clusters, specialized large quantitative models (LQMs) and agents, and potential future integration with quantum capabilities as they become applicable to commercial R&D. It also allows interoperability with physical labs, facilitating the lab procedure generation and even direct operation with robotics, lab instrumentation, and Internet of Things (IoT)-enabled devices that agents can operate under human oversight.

At the heart of Microsoft Discovery is the Discovery Engine that mimics the scientific method where specialized agents reason over large amounts of knowledge, generate hypotheses, and validate them in a complex tree across a vast search space. The Discovery Engine connects proprietary research data with external scientific literature—not solely to retrieve isolated facts but to reason across conflicting theories, experimental results, and domain-specific assumptions in a way that reflects how science actually works. This contextual depth is what separates Microsoft Discovery from general-purpose AI tools and enables the platform to function as a genuine thinking partner across the full arc of a research program.

Built-in governance controls help ensure that agent driven research remains aligned with strategic priorities, security and compliance standards, and safety requirements. These systems provide centralized management, audit trails, and checkpoints that help maintain reliability as agentic throughput grows. The platform is extensible by design which enables integration with existing business tools and assets, partner solutions, and open-source models. Integration with Microsoft 365, Microsoft Foundry, and Microsoft Fabric enables organizations to interoperate across business agents, enterprise data, and institutional knowledge.

Real-world impact of Microsoft Discovery 

Previously we shared how a team of Microsoft researchers leveraged advanced AI models and HPC tools from Microsoft Discovery to identify a novel, non-PFAS, immersion datacenter coolant prototype in about 200 hours. We’re excited to share a few examples of how customers have been using the platform during preview.

Syensqo

A global leader in advanced materials and specialty chemicals, Syensqo is advancing a bold, multi-year transformation of its technology landscape to accelerate data-driven science, advanced simulation, and AI-enabled discovery. Building on early success with Microsoft Discovery, Syensqo is now scaling these capabilities enterprise-wide to unlock greater scientific and business impact. This next phase focuses on modernizing R&D knowledge foundations, expanding access to scalable, cost-efficient, cloud-based compute, and establishing a unified operating model that brings together data, high-performance computing, and emerging agentic AI to power the future of innovation.

As Microsoft Discovery workflows gained momentum, Syensqo expanded its ambition to scale these capabilities across both R&D and commercial organizations, unlocking new opportunities for end-to-end innovation. This evolution is enabling teams to unify scientific and business datasets, scale simulation environments in line with increasingly complex development needs, and integrate engineering workflows within a connected digital ecosystem. Together, these advancements are establishing a strong, future-ready foundation to accelerate innovation-led growth—from early-stage discovery through engineering and large-scale formulation. 

To realize this vision, Syensqo is advancing its science and commercial data and simulation platforms on Azure. By centralizing critical datasets within a governed, enterprise-grade data backbone and extending Microsoft Discovery workflows onto highly scalable cloud compute, the company is establishing a modern, standardized operating model for innovation. This shift enables more seamless collaboration, supports advanced analytics and simulation at scale, and lays the groundwork for next-generation, AI-powered workflows across priority research and innovation (R&I) domains.

We are entering a new phase of our partnership with Microsoft, focused on scaling AI agents across research, sales and marketing to drive near-term growth. By connecting customer demand to scientific development and back to market execution, agentic AI is enabling faster cycles, sharper prioritization, and tangible impact on revenue growth and business performance.” 

—Mike Radossich, Chief Executive Officer (CEO), Syensqo

GigaTIME  

Modern oncology increasingly depends on understanding tumors not only by appearance, but by the biological signals that shape cell behavior, immune response, and treatment outcomes. GigaTIME addresses this need by using AI to infer spatially resolved tumor microenvironment signals from routine hematoxylin and eosin (H&E) pathology slides. This approach makes insights such as immune infiltration, checkpoint context, and tumor proliferation more accessible at scale without the cost and throughput constraints of experimental assays. GigaTIME and its outputs within Microsoft Discovery are intended for research use only. They are not a medical device and are not intended for clinical diagnosis, treatment, prevention, or patient-management decisions. 

The impact of GigaTIME increases when its outputs are embedded into real research workflows. Within Microsoft Discovery, virtual multiplex immunofluorescence (mIF) predictions move beyond standalone visualizations and become inputs to ongoing scientific reasoning. Spatial phenotypes can be generated consistently across cohorts, localized to single cell context, and connected to supporting evidence such as literature, biomarkers, and downstream endpoints. This allows researchers to interpret results systematically, question assumptions, and refine biological hypotheses over time.

Microsoft Discovery supports this work in a way that is reproducible, scalable, and governed end to end. GigaTIME can be used alongside additional models, data sources, and tools within a shared environment that supports iteration, comparison, and validation. Rather than accelerating a single analytical step, Discovery supports a full discovery loop—where spatial biology informs hypotheses, hypotheses guide validation, and results feed the next cycle of learning with clarity and confidence.

Learn more about the GigaTIME and Microsoft Discovery integration to see how virtual mIF outputs are applied within Microsoft Discovery for oncology R&D.

PhysicsX

PhysicsX, a leader in physics AI for industrial engineering and manufacturing, is partnering with Microsoft to bring agentic engineering into production through Microsoft Discovery. At the core of this collaboration is the PhysicsX platform—combining Large Physics Models and AI-native workflows to deliver near-real-time simulation by inference across the full engineering lifecycle.

Integrated into Discovery’s agentic environment, the PhysicsX platform enables engineers to move beyond sequential, solver-driven workflows and explore significantly larger design spaces, evaluating thousands of manufacturable candidates in days, without compromising physical fidelity.

The collaboration is already delivering impact at Microsoft Surface. Faced with tightly coupled constraints across thermal performance, acoustics, and form factor, the Surface engineering team used the PhysicsX platform through Discovery to reimagine their cooling fan design process. What previously required weeks of simulation and manual setup is now compressed into days. Discovery agents orchestrate the generation, evaluation, and optimization of thousands of geometries, surfacing high-performing, production-ready designs for validation.

The result is a step change in engineering productivity: faster iteration, broader design-space coverage, and more confident decision-making. The approach is now being extended across additional components in the Surface portfolio.

Engineering is still constrained by workflows built for the pre-AI era. This partnership changes that. PhysicsX’s frontier physics AI models, combined with Microsoft Discovery’s agentic orchestration and Azure infrastructure, give engineers the ability to explore design spaces that were previously out of reach—at the speed and scale that modern industrial development demands.

—Jacomo Corbo, CEO, PhysicsX

Synopsys

Synopsys is a leader in electronic design automation (EDA), computer aided engineering (CAE) tools, and intellectual property (IP), and plays a central role in the design and development of the most complex chips and systems for the leading semiconductor and systems companies of the world.  

Synopsys and Microsoft have been partnering since 2019, helping pioneer software-as-a-service (SaaS) models on Microsoft Azure. Synopsys also launched the first Silicon Copilot in collaboration with Microsoft and is continuing that journey by leveraging Microsoft Discovery to roll out solutions for chip design.

The semiconductor industry is facing an unprecedented set of challenges—demand for high performance chips is growing exponentially, complexity of sustainable, power-efficient chip design, and a critical shortage of skilled engineering. Agentic systems can help mitigate these challenges while accelerating design cycles.

Synopsys agentic AI stack with multi-agent workflows built on AgentEngineer™ technology, supported by Microsoft Discovery, have defined a new paradigm for the industry.

Chip design sits at the intersection of extreme complexity and outsized impact—exactly where AI can make the biggest difference. By bringing together Synopsys’ AI‑driven design leadership with Microsoft Discovery, we are enabling agentic AI to redefine semiconductor engineering workflows, unlock step‑function productivity gains, and accelerate the next era of technology innovation.

—Ravi Subramanian, Chief Product Management Officer, Product Management & Markets Group, Synopsys

A growing ecosystem

Microsoft Discovery works with an expanding ecosystem of partners offering integrated tools and specialized expertise.

Expanding what is possible for R&D 

Expanding the preview marks an important step in making agentic AI available to a broader set of R&D organizations. Microsoft Discovery reflects our belief that the next generation of scientific progress can come from systems that combine human expertise with AI that can reason, plan, and act at scale. 

We look forward to partnering with organizations that want to rethink how discovery happens and to help shape the future of enterprise R&D. 

For organizations looking to get started with Microsoft Discovery be sure to review the technical documentation to understand requirements, onboarding prerequisites, and infrastructure considerations.

Get started with Microsoft Discovery
Learn how Microsoft Discovery enables agent‑driven discovery across complex, governed R&D environments.

Explore more

Microsoft Discovery is offered in preview. Features, availability, integrations, and performance characteristics described in this post may change prior to, or without, general availability and are not commitments. Statements about future capabilities (including any potential quantum integration) are forward-looking and subject to change. Customer and internal outcomes described reflect specific workflows and data; individual results will vary. 
The post Microsoft Discovery: Advancing agentic R&D at scale appeared first on Microsoft Azure Blog.
Quelle: Azure

Optimize object storage costs automatically with smart tier—now generally available

We are excited to announce the general availability (GA) of smart tier for Azure Blob and Data Lake Storage. Smart tier is a fully managed, automated tiering capability for Azure Blob Storage and Data Lake Storage that helps optimize storage costs without ongoing operational effort. By continuously optimizing data placement, smart tier ensures your storage costs are aligned with actual usage.

Get in-depth details about smart tier

As data estates expand and access patterns evolve, managing lifecycle rules at scale becomes complex. Customers need automated, continuous tiering to keep costs aligned with usage.

Smart tier continuously evaluates your data access patterns and automatically moves objects across the hot, cool, and cold tiers to keep your costs aligned with usage without manual configuration.

Since launching the public preview of smart tier at Ignite in November 2025, customers and partners have adopted it across a range of data estates and over 50% of smart-tier–managed capacity has automatically shifted to cooler tiers based on actual access patterns:

We see a significant and measurable benefit from adopting smart tier in Azure Storage for our Azure Data Explorer (ADX) clusters. By intelligently placing data in the most cost‑effective tier based on actual usage patterns, smart tier allows us to optimize storage spend without sacrificing performance. Hot data remains instantly accessible for query workloads, while cooler, less frequently accessed data is automatically shifted to lower‑cost tiers. Smart tier effectively removed the guesswork from storage optimization, enabling us to focus on delivering insights rather than managing data placement.
Brad Watts, Principal PM for Azure Data Explorer

The Azure Blob and Data Lake Storage partner ecosystem is also integrating smart tier into their solutions:

Smart Tier represents a major step forward in simplifying how enterprises optimize storage in the cloud. The ability to automate tiering while maintaining resilience and predictable economics is highly complementary to Qumulo’s data services on Azure. Together with Microsoft, we’re enabling customers to modernize file workloads on Azure while reducing operational complexity and improving long‑term cost efficiency.
Brandon Whitelaw, SVP and Head of Product at Qumulo

Smart tier is generally available today in nearly all zonal public cloud regions, supporting both Azure Blob and Data Lake Storage.

How smart tier makes tiering decisions

Smart tier continuously evaluates the last access time of each individual object on the storage account where smart tier is enabled.

Frequently accessed data stays in the hot tier to support performance and transaction efficiency; inactive data transitions to the cool tier after 30 days and to the cold tier after an additional 60 days. When data is accessed again, it is immediately promoted back to hot and the tiering cycle restarts. This means your datasets remain in the most cost-effective tier automatically, removing the need to predict access patterns.

Read and write operations against an object, i.e. Get Blob or Put Blob operations are restarting the tiering cycle. Metadata operations, i.e. Get Blob Properties, are not impacting transitions. These static tiering rules are part of the underlying service and ensure automatic optimizations without the need for manual maintenance.

Setting up smart tier

Enabling smart tier is straightforward and designed to minimize change management while delivering immediate cost-optimization benefits:

During storage account creation, just select smart tier as the default access tier through the storage account configuration for any storage account with zonal redundancy. This is supported both via API and the Azure portal.

Enable existing accounts with zonal redundancies by switching the blob access tier from default to smart through the same tooling.

Let Azure optimize automatically: Objects inheriting the default tier are continuously managed without manual interventions needed.

Please note: Smart tier doesn’t support legacy account types such as Standard general-purpose v1 (GPv1) and is not applicable on page or append blobs.

For objects managed by smart tier, you pay standard hot, cool, and cold capacity rates, without additional charges for tier transitions, early deletion, or data retrieval. Moving existing objects into smart tier does not incur tier-change fees; a monitoring fee covers the orchestration.

Over time, automated down-tiering of inactive data combined with smart tier’s simplified billing can translate into meaningful savings at scale.

Best practices for maximizing smart tier value

After enabling smart tier on the account level, you can explicitly pin objects that you don’t want to be managed by smart tier to other tiers. No monitoring fee will apply to those objects.

Don’t exclude small objects. Objects less than 128 KiB stay in hot, don’t tier down, and don’t incur the monitoring fee. If an object later grows to equal to or greater than 128 KiB, smart tier policies apply automatically.

Common pitfall: Avoid trying to influence tiering behavior using lifecycle rules or other tier optimization mechanisms for smart tier–managed objects.

Based on patterns observed across multiple large smart tier preview deployments, customers commonly see the following outcomes after enabling smart tier:

Smart tier adoption for a large analytics workload

During public preview, a large data analytics customer enabled smart tier across hundreds of tebibytes of telemetry and log data with mixed and evolving access patterns.

Before enabling smart tier, the team relied on custom lifecycle rules that required frequent retuning as access patterns evolved and often led to unexpected cost spikes after re-access.

After enabling smart tier:

More than half of this customer’s managed data footprint automatically transitioned to cooler tiers based on actual usage patterns.

The team eliminated lifecycle policy management entirely, freeing engineering time.

Storage costs became more predictable and resilient to re-access spikes, since rehydration occurred automatically without retrieval or early deletion charges.

While savings vary by workload, this pattern reflects how smart tier helps align object storage costs with real usage.

Who should use smart tier?

Smart tier is well suited for organizations that:

Manage large or fast-growing object data estates.

Have mixed, evolving, or unpredictable access patterns.

Want to optimize costs without maintaining lifecycle rules.

Need data to remain online and immediately accessible, even when infrequently used.

Want safeguards against billing spikes caused by unplanned rehydration of cooler-tier datasets.

This includes analytics pipelines, data lakes, logs, telemetry, and application data where usage naturally changes over time.

Why enable smart tier now?

Reduce operational overhead: No lifecycle rules to design, test, or maintain.

Align costs with real usage: Data continuously moves to the most appropriate tier based on access patterns.

Preserve performance: Frequently accessed data remains hot; re‑access is automatic.

Simplify billing: No tier transition, early deletion, or retrieval charges within smart tier; a monthly monitoring fee occurs for each object in scope.

Scale with confidence: Built for large, evolving data estates.

What’s next for smart tier?

Smart tier is designed as a foundational capability that will continue to evolve. Upcoming improvements focus on:

Broader regional availability, including additional public cloud regions as GA rollout progresses.

Client tooling support: Watch out for upcoming releases of our Storage SDKs and tooling supporting this new capability.

Get started with smart tier

Enable smart tier during storage account creation or update an existing zonal storage account by setting smart tier as the default access tier. Once enabled, Azure continuously optimizes data placement—no ongoing configuration required.

Optimize data placement with smart tier

The post Optimize object storage costs automatically with smart tier—now generally available appeared first on Microsoft Azure Blog.
Quelle: Azure

Cloud Cost Optimization: Principles that still matter

In this article

What is cloud cost optimization and why does it still matter?How AI workloads change traditional cost optimizationCloud cost optimization best practices for AI and modern workloadsCloud cost management versus cost optimizationMeasuring value alongside cloud cost optimizationNext steps for cloud cost optimization on Azure

This blog post is the second in a multi-part series called Cloud Cost Optimization. Throughout this series, we’ll share practical strategies, best practices, and actionable guidance to help you plan, design, and manage AI investments for sustainable value and efficiency.

Cloud cost optimization continues to be a top priority for organizations of every size. As cloud environments grow and workloads scale, leaders are under constant pressure to control spend, reduce waste, and ensure that resources are being used efficiently. What was once a secondary operational concern has become a strategic capability tied directly to business performance, resilience, and long‑term growth.

At the same time, the rapid growth of AI workloads is adding a new layer of complexity to managing cloud costs. AI‑powered workloads and evolving usage patterns are transforming how organizations approach cloud optimization and investment planning. However, these changes do not replace the need for strong cost optimization practices. Instead, they make cloud cost optimization and AI cost management more critical than ever.

Maximize the return on your AI investment with Azure

This article provides a practical, evergreen overview of cloud cost optimization, how AI changes the cost landscape, and the principles organizations can apply to optimize cloud and AI workloads over time.

What is cloud cost optimization and why does it still matter?

Cloud cost optimization refers to the ongoing practice of analyzing cloud usage and making informed decisions to reduce unnecessary spend while maintaining performance, reliability, and scalability. It is not about cutting costs indiscriminately, but about ensuring that cloud resources are aligned to real workload demand and business value.

Unlike traditional IT environments, cloud platforms operate on consumption‑based pricing models. This means costs are directly tied to how resources are used, not just what is deployed. As a result, cost optimization is not a one‑time exercise. It requires continuous attention as environments evolve, workloads change, and new services are introduced.

Organizations that invest in cloud cost optimization benefit from:

Improved visibility into where cloud spend is going.

Reduced waste from underutilized or idle resources.

Better alignment between cloud usage and business needs.

Greater confidence when scaling workloads.

As cloud environments grow more complex (spanning multiple services, regions, and architectures), the importance of structured cloud cost management and optimization only increases. For organizations operating in the cloud, this makes cost optimization a foundational capability rather than an operational afterthought.

How AI workloads change traditional cost optimization

AI workloads introduce new cost dynamics that can challenge traditional cloud cost optimization approaches. While many principles still apply, the pace and variability of AI usage amplify the need for strong cost governance.

AI consumption patterns are often less predictable. Training models, running inference, and experimenting with different architectures can cause rapid fluctuations in compute and storage usage. Costs may spike during experimentation phases and stabilize later in production or shift again as models evolve.

AI development typically involves a higher degree of iteration. Teams may test multiple models, datasets, or configurations before settling on a production approach. Without strong visibility and controls, these experiments can quietly drive significant cloud costs and complicate efforts to optimize cloud costs effectively.

AI workloads often rely on specialized infrastructure and services that increase cost sensitivity. As a result, maintaining visibility and control requires intentional AI cost optimization and disciplined cloud cost management practices.

This makes cloud cost optimization even more critical in AI‑powered environments, not optional.

Cloud cost optimization best practices for AI and modern workloads

While technologies change, many cloud cost optimization best practices remain consistent across traditional and AI workloads. The key is applying them continuously and adapting them to modern usage patterns.

Visibility and usage awareness

Effective cost optimization starts with understanding how resources are being consumed. Organizations need clear insight into usage patterns across environments, workloads, and services to identify inefficiencies and optimization opportunities. Visibility is the foundation of both cloud cost management and AI cost management.

Governance guardrails

Guardrails help prevent unnecessary spend before it occurs. These can include usage boundaries, policy‑driven controls, and standardized approaches that encourage efficient resource consumption without slowing innovation. Strong governance supports sustainable cost optimization as environments scale.

Rightsizing and lifecycle thinking

Workloads change over time. Resources that were appropriate during development may be inefficient in production, or vice versa. Rightsizing and lifecycle awareness help ensure resources match actual needs at every stage, which is essential to optimizing cloud costs over the long term.

Continuous review and iteration

Cloud cost optimization is not static. Regular review cycles allow teams to adapt to changing usage patterns, new workloads, and evolving priorities, especially as AI solutions move from experimentation to scale.

These cloud cost optimization best practices apply whether organizations are optimizing traditional applications, data platforms, or AI workloads running at scale.

Cloud cost management versus cost optimization

Cloud cost management and cost optimization are closely related, but not the same.

Cloud cost management focuses on tracking, reporting, and understanding cloud spend. It answers questions like:

Where is money being spent?

How is usage trending over time?

Which workloads or services are driving costs?

Cloud cost optimization, on the other hand, is about action and decision‑making. It builds on cost management insights to determine:

Where inefficiencies exist.

What changes can reduce waste.

How to improve efficiency without compromising outcomes.

Organizations need both. Cloud cost management provides visibility, while cost optimization turns that visibility into informed decisions that improve efficiency, scalability, and resiliency (especially in AI‑heavy environments).

Measuring value alongside cloud cost optimization

Reducing cloud costs alone is rarely the goal. The real objective is ensuring that cloud and AI investments deliver sustainable value over time.

Effective cost optimization balances efficiency with outcomes. This means considering how resources contribute to workload performance, reliability, and long‑term viability (not just minimizing spend). For AI workloads, this balance is particularly important, as experimentation and innovation are essential but must still be managed responsibly.

By measuring efficiency and aligning cloud cost optimization and AI cost optimization efforts with workload value, organizations can avoid short‑term savings that undermine long‑term success. This value‑driven approach to managing cloud costs ensures optimization supports growth rather than constraining it.

Explore how Azure can help maximize your AI return on investment

Next steps for cloud cost optimization on Azure

Azure provides a broad set of resources designed to help organizations manage and optimize cloud and AI costs over time.

To explore guidance, best practices, and curated resources that support cost optimization across cloud and AI workloads, visit the solutions pages:

Maximize ROI from AI.

FinOps on Azure.

For deeper perspectives on related topics, you may also find these resources helpful:

Defining roles and responsibilities for cloud cost optimization.

Optimize your Azure costs to help meet your financial objectives.

Cost optimization is a continuous journey, one that becomes even more important as AI adoption accelerates. By applying durable principles and maintaining ongoing visibility and control, organizations can scale cloud and AI investments responsibly while maximizing long‑term value.

To go deeper, explore the Cloud Cost Optimization series for best practices and guidance on optimizing cloud and AI investments for long-term business impact.

Did you miss these posts in the Cloud Cost Optimization series?

Cloud Cost Optimization: How to maximize ROI from AI, manage costs, and unlock real business value

The post Cloud Cost Optimization: Principles that still matter appeared first on Microsoft Azure Blog.
Quelle: Azure

How Drasi used GitHub Copilot to find documentation bugs

For early-stage open-source projects, the “Getting started” guide is often the first real interaction a developer has with the project. If a command fails, an output doesn’t match, or a step is unclear, most users won’t file a bug report—they will just move on.

Drasi, a CNCF sandbox project that detects changes in your data and triggers immediate reactions, is supported by our small team of four engineers in Microsoft Azure’s Office of the Chief Technology Officer, and we move fast. We have comprehensive tutorials, but we are shipping code faster than we can manually test them.

Detect and react to your first database change using DrasiThe team didn’t realize how big this gap was until late 2025, when GitHub updated its Dev Container infrastructure, bumping the minimum Docker version. The update broke the Docker daemon connection—and every single tutorial stopped working. Because we relied on manual testing, we didn’t immediately know the extent of the damage. Any developer trying Drasi during that window would have hit a wall.

This incident forced a realization: with advanced AI coding assistants, documentation testing can be converted to a monitoring problem.

The problem: Why does documentation break?Documentation usually breaks for two reasons:

The curse of knowledgeExperienced developers write documentation with implicit context. When we write “wait for the query to bootstrap,” we know to run drasi list query and watch for the Running status, or even better—run the drasi wait command. A new user has no such context. Neither does an AI agent. They read the instructions literally and don’t know what to do. They get stuck on the “how,” while we only document the “what.”

Silent driftDocumentation doesn’t fail loudly like code does. When you rename a configuration file in your codebase, the build fails immediately. But when your documentation still references the old filename, nothing happens. The drift accumulates silently until a user reports confusion.

This is compounded for tutorials like ours, which spin up sandbox environments with Docker, k3d, and sample databases. When any upstream dependency changes—a deprecated flag, a bumped version, or a new default—our tutorials can break silently.

The solution: Agents as synthetic usersTo solve this, we treated tutorial testing as a simulation problem. We built an AI agent that acts as a “synthetic new user.”

This agent has three critical characteristics:

It is naïve: It has no prior knowledge of Drasi—it knows only what is explicitly written in the tutorial.It is literal: It executes every command exactly as written. If a step is missing, it fails.It is unforgiving: It verifies every expected output. If the doc says, “You should see ‘Success’”, and the command line interface (CLI) just returns silently—the agent flags it and fails fast.The stack: GitHub Copilot CLI and Dev ContainersWe built a solution using GitHub Actions, Dev Containers, Playwright, and the GitHub Copilot CLI.

Our tutorials require heavy infrastructure:

A full Kubernetes cluster (k3d)Docker-in-DockerReal databases (such as PostgreSQL and MySQL)We needed an environment that exactly matches what our human users experience. If users run in a specific Dev Container on GitHub Codespaces, our test must run in that same Dev Container.

The architectureInside the container, we invoke the Copilot CLI with a specialized system prompt (view the full prompt here):

A screen shot of a computer terminal:

bash

copilot -p “$(cat prompt.md)” –allow-all-tools –allow-all-paths –deny-tool ‘fetch’ –deny-tool ‘websearch’ –deny-tool ‘githubRepo’ –deny-tool ‘shell(curl *)’

# … additional deny-tool flags

–allow-url localhost –allow-url 127.0.0.1This prompt using the prompt mode (-p) of the CLI agent gives us an agent that can execute terminal commands, write files, and run browser scripts—just like a human developer sitting at their terminal. For the agent to simulate a real user, it needs these capabilities.

To enable the agents to open webpages and interact with them as any human following the tutorial steps would, we also install Playwright on the Dev Container. The agent also takes screenshots which it then compares against those provided in the documentation.

Security modelOur security model is built around one principle: the container is the boundary.

Rather than trying to restrict individual commands (a losing game when the agent needs to run arbitrary node scripts for Playwright), we treat the entire Dev Container as an isolated sandbox and control what crosses its boundaries: no outbound network access beyond localhost, a Personal Access Token (PAT) with only “Copilot Requests” permission, ephemeral containers destroyed after each run, and a maintainer-approval gate for triggering workflows.

Dealing with non-determinismOne of the biggest challenges with AI-based testing is non-determinism. Large language models (LLMs) are probabilistic—sometimes the agent retries a command; other times it gives up.

We handled this with a three-stage retry with model escalation (start with Gemini-Pro, on failure try with Claude Opus), semantic comparison for screenshots instead of pixel-matching, and verification of core-data fields rather than volatile values.

We also have a list of tight constraints in our prompts that prevent the agent from going on a debugging journey, directives to control the structure of the final report, and also skip directives that tell the agent to bypass optional tutorial sections like setting up external services.

Artifacts for debuggingWhen a run fails, we need to know why. Since the agent is running in a transient container, we can’t just Secure Shell (SSH) in and look around.

So, our agent preserves evidence of every run—screenshots of web UIs, terminal output of critical commands, and a final markdown report detailing its reasoning like shown here:

Drasi Getting Started Tutorial Evaluation

Environment

Timestamp: 2026-02-20T13:32:07.998Z

Directory: /workspaces/learning/tutorial/getting-started

Step 1: Setup Drasi Environment

Skipped as per instructions (already in DevContainer).

Verified environment setup by checking resources folder existence.

Step 2: Create PostgreSQL Source

Command: drasi apply -f ./resources/hello-world-source.yaml

………………………………………… more steps ……….………………………………

Scenario 1: hello-world-from

Initial check: “Brian Kernighan” present. (Screenshot: 09_hello-world-from.png)

Action: Insert ‘Allen’, ‘Hello World’.

Verification: “Allen” appeared in UI. (Screenshot: 10_hello-world-from-updated.png)

Result: PASSED

……………………………………………………….. more validation by playwright taking screenshots …..……………………………………………………

Conclusion

The tutorial instructions were clear and the commands executed successfully. The expected behavior matches the actual behavior observed via the Debug Reaction UI.

STATUS: SUCCESS

These artifacts are uploaded to the GitHub Action run summary, allowing us to “time travel” back to the exact moment of failure and see what the agent saw.

Screenshot of Agents report output in a folder with other files.Parsing the agent’s reportWith LLMs, getting a definitive “Pass/Fail” signal that a machine can understand can be challenging. An agent might write a long, nuanced conclusion like:

To make this actionable in a CI/CD pipeline, we had to do some prompt engineering. We explicitly instructed the agent:

In our GitHub Action, we then simply grep for this specific string to set the exit code of the workflow.

Simple techniques like this bridge the gap between AI’s fuzzy, probabilistic outputs and CI’s binary pass/fail expectations.

AutomationWe now have an automated version of the workflow which runs weekly. This version evaluates all our tutorials every week in parallel—each tutorial gets its own sandbox container and a fresh perspective from the agent acting as a synthetic user. If any of the tutorial evaluation fails—the workflow is configured to file an issue on our GitHub repo.

This workflow can optionally also be run on pull-requests, but to prevent attacks we have added a maintainer-approval requirement and a pull_request_target trigger—which means that even on pull-requests by external contributors, the workflow that executes will be the one in our main branch.

Running the Copilot CLI requires a PAT token which is stored in the environment secrets for our repo. To make sure this does not leak, each run requires maintainer approval—except the automated weekly run which only runs on the main branch of our repo.

What we found: Bugs that matterSince implementing this system, we have run over 200 “synthetic user” sessions. The agent identified 18 distinct issues including some serious environment issues and other documentation issues like these. Fixing them improved the docs for everyone, not just the bot.

Implicit dependencies: In one tutorial, we instructed users to create a tunnel to a service. The agent ran the command, and then—following the next instruction—killed the process to run the next command.The fix: We realized we hadn’t told the user to keep that terminal open. We added a warning: “This command blocks. Open a new terminal for subsequent steps.”Missing verification steps: We wrote: “Verify the query is running.” The agent got stuck: “How, exactly?”The fix: We replaced the vague instruction with an explicit command: drasi wait -f query.yaml.Format drift: Our CLI output had evolved. New columns were added; older fields were deprecated. The documentation screenshots still showed the 2024 version of the interface. A human tester might gloss over this (“it looks mostly right”). The agent flagged every mismatch, forcing us to keep our examples up to date.AI as a force multiplierWe often hear about AI replacing humans, but in this case, the AI is providing us with a workforce we never had.

To replicate what our system does—running six tutorials across fresh environments every week—we would need a dedicated QA resource or a significant budget for manual testing. For a four-person team, that is impossible. By deploying these Synthetic Users, we have effectively hired a tireless QA engineer who works nights, weekends, and holidays.

Our tutorials are now validated weekly by synthetic users—try the Getting Started guide yourself and see the results firsthand. And if you’re facing the same documentation drift in your own project, consider GitHub Copilot CLI not just as a coding assistant, but as an agent—give it a prompt, a container, and a goal—and let it do the work a human doesn’t have time for.
The post How Drasi used GitHub Copilot to find documentation bugs appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft named a Leader in The Forrester Wave™ for Sovereign Cloud Platforms

Digital sovereignty is no longer a niche requirement. For organizations operating across borders, regulated industries, and complex supply chains, sovereignty is now table stakes for cloud strategy.

That’s why we’re pleased that Microsoft has been named a Leader in The Forrester Wave TM: Sovereign Cloud Platforms, Q2 2026 – an evaluation that assessed the most significant sovereign cloud providers based on current offerings, strategy, and customer feedback.

We believe this recognition reflects Microsoft’s long-term commitment to helping organizations adopt cloud and AI without compromising on control, compliance, operational independence, or innovation.

Read the full report

Why this recognition matters

Forrester’s research highlights a key reality of sovereign clouds: there is no single deployment model that fits every sovereignty requirement. Instead, organizations combine public cloud, private cloud, and disconnected environments to achieve the level of sovereignty they need – balancing risk, regulations, functionality, and cost.

In this context, leadership isn’t about offering a single “sovereign cloud.” The goal is not isolation, but it’s about providing consistent sovereign controls across multiple environments to maintain access to modern cloud capabilities.

Forrester places Microsoft in the Leaders category based on its scores in the current offering and strategy categories. The report also notes Microsoft’s vision to offer sovereign controls across cloud, AI and productivity services. Specifically, Microsoft’s ability to extend sovereignty across AI, productivity, security, and cloud platform.

A platform approach to sovereignty

The Forrester report notes that Microsoft’s sovereign capabilities are available consistently for both private and public cloud. In practice, digital sovereignty is achieved through a combination of technical controls, operational practices, and contractual commitments applied consistently across deployment models.

Microsoft Sovereign Cloud brings together:

Public cloud with data residency and access controls, including region-specific residency controls such as EU Data Boundary.

Private cloud with hybrid deployments, enabled through Azure Local and consistent policy and management via Azure Arc.

Partner-operated national clouds, with Bleu and Delos Cloud, where infrastructure is independently owned and operated to meet national requirements.

This approach allows organizations to grow their sovereign IT posture over time, adapting to evolving regulatory, operational, or geopolitical conditions without having to abandon the Microsoft cloud ecosystem.

Read the Microsoft Sovereign Cloud in Europe white paper

Consistency across sovereign environments

One of the differentiators cited in the evaluation is Microsoft’s ability to make key capabilities available across sovereign public and sovereign private cloud. Forrester specifically calls out Microsoft’s container and Kubernetes capabilities, including the use of Azure Arc and Azure Local to run Kubernetes clusters in connected or disconnected environments, supported by infrastructure-as-code and GitOps tooling.

This consistency matters because sovereign cloud isn’t just about where data resides, but about whether organizations can:

Operate and secure workloads the same way across environments.

Maintain development and operation standards.

Avoid fragmenting teams, tools, and processes.

By extending common management, governance, and deployment models across environments, Microsoft Sovereign Cloud helps reduce complexity while giving organizations control.

Read the full report

Looking ahead

Sovereign cloud platforms are evolving quickly, especially as customers look to apply AI, analytics, and modern application services across different environments. Forrester notes that customers don’t “buy” sovereignty as a standalone product, they architect for it over time.

Microsoft’s recognition as a Leader in this evaluation underscores our commitment to keep investing in sovereign cloud innovation such as:

Advanced AI development and runtime capabilities.

Increasing consistency and parity across deployment models.

Supporting customers as sovereignty requirements continue to mature and evolve.

We’re grateful to our customers and partners who continue to shape our approach and we remain focused on helping organizations adopt cloud and AI with confidence, flexibility, and transparency wherever their workloads need to run.

Forrester does not endorse any company, product, brand, or service included in its research publications and does not advise any person to select the products or services of any company or brand based on the ratings included in such publications. Information is based on the best available resources. Opinions reflect judgment at the time and are subject to change. For more information, read about Forrester’s objectivity here.
The post Microsoft named a Leader in The Forrester Wave™ for Sovereign Cloud Platforms appeared first on Microsoft Azure Blog.
Quelle: Azure

Cloud Cost Optimization: How to maximize ROI from AI, manage costs, and unlock real business value

In this article

Why ROI from AI is now a top business priorityAI cost management: Strategic considerationsUsage patterns are variableAI workloads tend to rely on specialized infrastructureAI initiatives frequently span teams and stagesAI cost optimization vs. cloud cost optimization: Why they're differentConnecting AI cost optimization to AI business valueManaging ROI across the AI lifecyclePlanning for long‑term AI successDesigning AI solutions for efficiencyManaging and optimizing AI investmentsHow Microsoft supports sustainable AI adoptionTurning AI adoption into measurable ROIA centralized resource for maximizing ROI from AI

This blog post is the first in a multi-part series called Cloud Cost Optimization. Throughout this series, we’ll share practical strategies, best practices, and actionable guidance to help you plan, design, and manage AI investments for sustainable value and efficiency.

As AI adoption accelerates across industries, organizations are asking a more nuanced question than ever before: How do we maximize return on investment (ROI) from AI while keeping costs under control?

Start maximizing ROI from AI with Azure

AI promises transformative business value, from productivity gains to new digital experiences, but it also introduces new cost dynamics. As organizations scale, they are embracing a more dynamic financial landscape shaped by compute-intensive workloads and evolving pricing models.

This new reality has elevated AI cost management and optimization to a board-level priority. As a result, leaders are focusing not only on deploying AI, but also on ensuring investments are sustainable, measurable, and aligned with long-term business outcomes.This article explores how organizations can think holistically about ROI from AI, manage AI costs effectively, and turn AI adoption into lasting business value.

Why ROI from AI is now a top business priority

AI has moved beyond isolated experiments. Today, organizations are embedding AI into core business processes, modern applications, and customer‑facing experiences. As AI becomes more pervasive, its financial impact and strategic value are becoming increasingly clear.

AI costs are often consumption based. Model usage, inference frequency, training cycles, and infrastructure choices all influence spend. This makes AI pricing dynamic and ROI more difficult to assess without deliberate governance.

As a result, business and technical leaders are asking critical questions:

Which AI use cases will deliver the greatest business value?

How do we balance performance, scalability, and cost as AI solutions grow?

How do we continuously optimize AI investments to increase ROI?

Answering these questions requires a shift from short‑term experimentation to long‑term AI cost optimization and value management.

AI cost management: Strategic considerations

Effective AI cost management starts with understanding what actually drives AI costs. While the specifics vary by workload, several common factors influence AI spend across environments:

Usage patterns are variable

Development and experimentation often involve bursts of activity, while production workloads may scale dynamically based on demand. Without visibility, these fluctuations can lead to unexpected cost increases.

AI workloads tend to rely on specialized infrastructure

Compute‑intensive resources, data pipelines, and supporting services all contribute to the overall cost profile. As models evolve, these requirements often change.

AI initiatives frequently span teams and stages

It’s critical to maintain oversight from research to deployment. AI cost management must be ongoing and adaptive, rather than reactive.

AI cost optimization vs. cloud cost optimization: Why they’re different

While many cloud cost optimization principles still apply, AI introduces unique considerations that require a more intentional approach:

Traditional optimization sometimes focuses on static workloads and predictable demand. AI workloads, by contrast, are iterative and exploratory by nature. Teams may test multiple models, adjust parameters, or retrain systems regularly. Each iteration has cost implications.

AI success is not defined by cost reduction alone. Over‑optimizing too early can limit experimentation and slow innovation. The goal of AI cost optimization is not simply to spend less, but to spend more efficiently in pursuit of measurable business outcomes.

This is why AI cost optimization must be closely tied to value creation, not isolated cost controls.

Connecting AI cost optimization to AI business value

To truly maximize ROI from AI, organizations must connect cost decisions to business value. AI investments should be evaluated based on their contribution to outcomes such as productivity, customer satisfaction, operational efficiency, and revenue growth.

This means shifting the conversation from “How much does AI cost?” to “What value does this AI workload deliver relative to its cost?”

By continuously measuring efficiency and impact, organizations can identify which AI initiatives justify further investment, and which require refinement or reevaluation. This approach helps ensure AI adoption remains aligned with strategic priorities rather than becoming an unchecked expense.

Managing ROI across the AI lifecycle

One of the most important principles to measure ROI from AI is recognizing that value is realized over time. ROI is not a single calculation performed before or after deployment, it evolves across the AI lifecycle.

Planning for long‑term AI success

At the planning stage, organizations should focus on identifying AI use cases with clear, high‑confidence value. Understanding expected outcomes, usage patterns, and cost drivers early helps set realistic expectations for ROI.

Designing AI solutions for efficiency

Architectural decisions play a significant role in long‑term AI costs. Model selection, deployment approaches, and scalability considerations all influence how efficiently AI resources are consumed. Designing with cost awareness from the start reduces the need for corrective optimization later.

Managing and optimizing AI investments

Once AI solutions are in production, ongoing AI cost management becomes critical. Monitoring usage, evaluating performance, and adjusting resources over time help prevent waste while supporting growth. This continuous approach is essential for sustaining ROI from AI.

How Microsoft supports sustainable AI adoption

As organizations scale AI adoption, they need platforms that support both innovation and responsible cost management. Microsoft provides a broad ecosystem designed to help organizations build, deploy, and manage AI solutions efficiently.

By combining scalable infrastructure, governance capabilities, and optimization resources, Microsoft supports organizations as they navigate the financial and operational realities of AI adoption. The goal is not just to deploy AI, but to do so in a way that maximizes long‑term business value.

Turning AI adoption into measurable ROI

AI adoption is no longer about proving technical feasibility. It is about delivering sustained business impact while managing complexity and cost. Organizations that succeed are those that treat AI cost management and optimization as strategic disciplines, not afterthoughts.

By aligning AI cost optimization with business value, embracing lifecycle‑based ROI thinking, and maintaining continuous visibility into AI spend, organizations can transform AI from an experimental technology into a reliable driver of growth.

A centralized resource for maximizing ROI from AI

To support organizations on this journey, Azure provides a hub that centralizes guidance, research, and resources focused on helping organizations maximize ROI from AI.

The Maximize ROI from AI page brings together insights on AI cost management, optimization best practices, and value measurement to help organizations plan, design, and manage AI investments more effectively.

Explore resources for maximizing ROI from AI

As AI continues to reshape industries, the organizations that win will be those that combine innovation with discipline, turning AI adoption into sustainable, measurable business value.

For deeper perspectives, read more:

Bridging the AI divide: How Frontier Firms are transforming business

What If You Could Cut AI Costs by 60% Without Losing Quality?

Unlock Cost Savings with Azure AI Foundry Provisioned Throughput reservations

Unlock Savings with Copilot Credit Pre-Purchase Plan

Explore the Cloud Cost Optimization series for best practices and guidance on optimizing cloud and AI investments for long-term business impact.
The post Cloud Cost Optimization: How to maximize ROI from AI, manage costs, and unlock real business value appeared first on Microsoft Azure Blog.
Quelle: Azure

Navigating digital sovereignty at the frontier of transformation

Digital sovereignty is no longer a theoretical debate or a narrow compliance exercise. For leaders across governments, regulated industries, and critical infrastructure sectors, it has become a practical leadership discipline grounded in risk management, continuity planning, and long-term accountability.

Over the past several years, we have seen customer concerns evolve materially. Early conversations focused primarily on privacy and lawful data handling. Today, those concerns have expanded. Leaders are now asking how they maintain operational continuity during disruption, how they adopt AI responsibly without losing control, and how they protect national, organizational, and customer interests in an increasingly volatile global environment.

These questions are not abstract. They surface in boardrooms, procurement decisions, architecture reviews, and crisis simulations. They reflect a broader shift in how trust is evaluated in digital systems. Today in Brussels we brought together attendees from around the world—policy makers, IT leaders, and enterprises—to approach these questions from the multiplicity of perspectives to move the conversation from headlines to action.

Learn about our vision for Microsoft Sovereign Cloud in EuropeFrom privacy to resilience and beyondPrivacy remains foundational. But it is no longer the sole lens through which sovereignty is assessed.

Customers are increasingly concerned about business continuity in the face of cyber incidents, geopolitical tension, supply chain disruption, and network instability. They want to understand how critical workloads operate if connectivity is constrained, if dependencies fail, or if policy conditions change with little warning.

At the same time, innovation pressures have intensified. AI is becoming central to public service delivery, national competitiveness, and economic growth. Organizations cannot afford to pause progress while sovereignty questions are debated in isolation. They need approaches that allow them to move forward responsibly, balancing opportunity with control.

What we hear consistently is this: sovereignty concerns will continue to evolve. Any approach that treats them as static is already behind.

For four decades, Microsoft has operated under some of the world’s most demanding data protection, competition, and digital governance frameworks. Working closely with European institutions, regulators, and customers has shaped how we think about sovereignty—not as a regional exception, but as a discipline that must function at scale, under scrutiny, and over time. That experience matters because many of the sovereignty questions now emerging globally were first tested in Europe, long before they became mainstream elsewhere.

A consultative approach to risk managementThis is why we believe digital sovereignty must be approached as consultative risk management, not a checkbox or a predefined deployment model.

Every organization faces a unique mix of regulatory obligations, cyber risk, operational exposure, and innovation goals. Even within a single institution, sovereignty requirements differ by workload. Some demand strict isolation and local control. Others require global scale, advanced security capabilities, and rapid innovation.

Text reads “Five key digital sovereignty scenarios with Microsoft Sovereign Cloud.”Our role is to help customers navigate these tradeoffs deliberately. That means working with them to assess risk, align architecture to policy realities, and design environments that reflect both today’s constraints and tomorrow’s unknowns.

This work sits at the intersection of cybersecurity, compliance, resilience, and frontier transformation. It requires ongoing engagement, transparency, and the willingness to adapt as conditions change.

Digital sovereignty posture in practiceA digital sovereignty posture that is flexible recognizes that no single approach can address every requirement. Instead, it focuses on giving organizations options, visibility, and control across a continuum of environments.

Customers operating in public cloud environments expect clear data residency options, strong encryption and access controls, and visible operational discipline. Just as important, they look for transparency into how cloud systems are governed and how exceptional situations are managed, particularly as regulatory scrutiny increases.

Those expectations do not disappear when workloads move closer to the edge. In fact, they intensify. For workloads that require greater isolation, local processing, or operation in constrained environments, hybrid and disconnected solutions become essential. In February, Microsoft announced the expansion of disconnected operations, enabling customers to run critical workloads in air-gapped environments while retaining consistent governance and operational control. This capability extends cloud-based practices into disconnected settings, supporting operational continuity without abandoning security and innovation.

That commitment shows up in concrete safeguards that customers can independently evaluate and apply. The EU Data Boundary is one example, supporting data storage and processing within the EU and European Free Trade Association (EFTA) regions for cloud services, alongside longstanding investments in encryption, access controls, auditability, and operational transparency. These measures provide practical mechanisms for aligning cloud operations with regulatory and risk requirements, rather than relying on abstract assurances.

At the same time, we are expanding options across hybrid and private cloud environments to support continuity, resilience, and local control where required. These investments reflect a simple reality: customer needs are not converging toward one model. They are diversifying.

Underpinning all of this are Microsoft’s digital commitments, which frame how we approach privacy, security, transparency, and responsible AI. These commitments are not marketing statements. They guide how systems are built, operated, and governed, and they provide a foundation for long-term accountability.

Practical guidance for leaders navigating sovereigntyAs digital sovereignty becomes embedded in policy and procurement decisions, leaders benefit from a practical lens. Based on what we hear from customers and stakeholders, there are a few consistent themes shaping successful approaches:

Sovereignty requirements will continue to expand beyond privacy to include continuity, resilience, and AI governance.Risk management is now inseparable from digital transformation strategy.Flexibility and optionality matter more than rigid architectures.Transparency and accountability are as important as technical capability.Sovereignty posture must consider protections against cyberthreats.Addressing these realities requires partners who understand the full scope of the challenge and are willing to engage over the long term. It requires platforms and collaboration designed with sovereignty in mind from the start.

So what does this mean for you?Digital sovereignty is not a destination. It is an ongoing discipline shaped by changing technology, regulation, and global conditions.

At Microsoft, we approach this work with humility and responsibility. We recognize that customer concerns will continue to evolve, and that our own platforms and practices must evolve with them. We remain committed to expanding our sovereign cloud continuum, strengthening our cloud capabilities, and delivering solutions that balance innovation with control.

Most importantly, we remain focused on delivery. Because in moments of uncertainty, what matters most is not what technology promises, but what it allows organizations to do with confidence.

Where does digital sovereignty go from here?The future of digital sovereignty will be defined by implementation, not rhetoric. Success will depend on collaboration between governments, industry, and civil society, as well as a shared commitment to transparency and continuous improvement.

As we look ahead, our focus remains on helping organizations turn sovereignty principles into durable, scalable outcomes. That means continuing to invest in capabilities that support trust, engaging constructively with policymakers, and listening closely to the evolving needs of our customers.

Digital trust is built over time, through consistent action and openness, and that trust is one of the most important foundations we can help create.

Read the Microsoft Sovereign Cloud in Europe white paper
The post Navigating digital sovereignty at the frontier of transformation appeared first on Microsoft Azure Blog.
Quelle: Azure

Building sovereign AI at the edge: Microsoft and Armada collaborate to deliver Azure Local on Galleon modular datacenters

As governments and regulated industries continue their digital transformation, one requirement consistently rises to the top: the ability to run mission critical workloads where data originates, while maintaining sovereignty, resilience, and control.

Build sovereign edge solutions with Azure Local

Today, I am pleased to announce a collaboration between Microsoft and Armada to deliver a practical path to sovereign AI at the edge. Together, we are bringing Microsoft Sovereign Private Cloud capabilities to Armada’s Galleon modular datacenters (MDC), enabling customers to run secure, compliant workloads designed to operate in intermittently connected, contested, and even fully disconnected environments. This customer-controlled cloud environment delivers Azure’s operating model, security, and AI-ready capabilities where traditional cloud approaches are not feasible.

Meeting sovereign requirements anywhere

Defense, public safety, energy, and critical infrastructure operators increasingly need cloud capabilities in locations where using public clouds is not feasible. They require workloads to run in environments that are disconnected, mobile, or operationally constrained. In these settings, cloud capabilities must move closer to the point of need. These scenarios often demand:

Disconnected or limited connectivity.

Portable or rapidly deployable infrastructure.

Strict data residency and regulatory controls.

Support for modern AI and analytics workloads at the edge.

Through this collaboration, Microsoft and Armada are delivering a validated sovereign reference architecture that shows how Sovereign Private Cloud operates on and interoperates with the Armada Edge Platform, enabling customers to deploy Azure services closer to where data is created, while retaining full control over their data, operations, and governance.

See how Microsoft supports digital sovereignty

Azure Local in Armada’s Galleon modular datacenters

At the core of this collaboration is Azure Local, Microsoft’s on-premises cloud platform that can be used in disconnected and sovereign scenarios, combined with Armada’s Galleon MDC and Armada Edge Platform (AEP).

Together, the solution supports:

Azure Local control plane and managed clusters, including multi-rack scalability.

Flexible storage architectures, including hyperconverged and SAN-backed deployments.

Resilient multi network connectivity, spanning satellite, LTE/5G, RF, and SD-WAN.

Security, compliance, and hardening aligned to sovereign, government, and regulated workloads.

The result is an edge platform that can be deployed in remote, mobile, or constrained environments while still benefiting from Azure’s consistent cloud operating model.

Enabling sovereign AI and mission critical workloads

Beyond infrastructure, this collaboration is focused on delivering sovereign AI capabilities at the edge.

As part of Microsoft Sovereign Private Cloud, Foundry Local and Azure Local enable customers to deploy, govern, and operate AI entirely within their own trusted boundary, supporting national sovereignty, classified workloads, and highly regulated data pipelines. With Foundry Local, customers can run AI inference and analytics locally, even when disconnected from the public cloud.

This approach helps customers:

Process sensitive data locally to meet sovereignty requirements.

Reduce latency for real-time decision-making.

Operate AI workloads in austere or bandwidth-constrained environments.

By combining Foundry Local and Azure Local’s cloud consistent platform with Armada’s deployable infrastructure, customers gain a practical path to operational AI, where it matters most.

Explore how Azure Local powers sovereign edge deployments

A shared vision for sovereign edge infrastructure

Customers operating in the world’s most demanding environments don’t have the luxury of choosing between sovereignty, resilience, and modern cloud capabilities, they need all three. By partnering with Microsoft, we’re combining Armada’s deployable, mission ready infrastructure with Azure Local’s consistent cloud platform to help governments and regulated industries run secure, AI enabled workloads anywhere they operate, even when connectivity is limited or unavailable. Together, Microsoft and Armada are delivering a practical path to sovereign AI at the edge, one that respects local control, supports disconnected operations, and scales from today’s mission critical needs to tomorrow’s intelligent systems.
—Dan Wright, Co-Founder and CEO of Armada

Looking ahead

Achieving digital sovereignty is no longer just about where data lives, but where intelligence runs, who controls it, and how resilient it remains under real-world conditions.

With this collaboration, Microsoft and Armada are extending Azure to the edge in a way that respects sovereignty, enables AI, and meets customers where they operate, whether that’s in remote locations, mobile deployments, or highly regulated environments.

Learn more about Azure Local and sovereign edge solutions.

Discover Armada MDC and edge platform.

The post Building sovereign AI at the edge: Microsoft and Armada collaborate to deliver Azure Local on Galleon modular datacenters appeared first on Microsoft Azure Blog.
Quelle: Azure

Azure IaaS: Keep critical applications running with built-in resiliency at scale

This blog post is the second part of a blog series called Azure IaaS which will share best practices and guidance to help you build a trusted infrastructure platform—from performance, resiliency, and security to scalability and cost efficiency.

Disruption should not be treated as an edge case. It is a reality organizations must be prepared to navigate. That preparation starts with resiliency as a core design principle, not an afterthought. Businesses depend on a broad set of applications to run daily operations, from essential internal systems to mission-critical workloads. And across that landscape, hardware issues, maintenance events, zonal disruptions, and even regional incidents can all affect availability.

The goal of a resilient infrastructure is not to assume disruptions will never happen. It is to ensure services remain available, impacts stay contained, and recovery happens quickly when events occur. In that sense, resiliency is what helps organizations maintain continuity, protect customer trust, and operate with confidence even when conditions change.

Azure IaaS is purpose-built to offer a resilient operating environment, delivering enterprise grade-resiliency. But outcomes ultimately depend on how product features across compute, storage, and networking are brought together within customer environments to help maintain availability through disruptions. Resiliency is a shared responsibility: Azure IaaS helps organizations start from a resilient platform foundation with built-in capabilities for availability, continuity, and recovery, while customers design and configure workloads to meet their specific business and operational requirements.

Designing for resiliency is not a one-time decision, and it is rarely simple. As architectures grow more distributed and workload requirements become more demanding, the Azure IaaS Resource Center provides a centralized destination for tutorials, best practices, and guidance organizations need to build and operate resilient infrastructure with greater confidence.

Explore the Azure IaaS Resource Center

Resiliency built into the foundation of mission-critical applications

When an application is truly mission critical, downtime is not just inconvenient; it can disrupt customer transactions, delay operations, interrupt employee productivity, and create real financial and reputational impact. That is why resilient design starts with one important shift in mindset: not asking whether disruption will happen but designing for how the application will behave when it does.

Azure IaaS helps customers do that with built-in capabilities that support isolation, redundancy, failover, and recovery across the infrastructure stack. The value of those capabilities is not just technical. It is operational. They help organizations reduce the blast radius of disruption, improve continuity, and recover with greater predictability when critical services are under pressure.

Keep applications available with resilient compute design

Compute resiliency starts with placement and isolation. For example, if all the virtual machines supporting an application sit too close together from an infrastructure perspective, a localized event can affect more of the workload than expected.

For applications that need both scale and availability, Virtual Machine Scale Sets help automate deployment and management while distributing instances across availability zones and fault domains. This is especially valuable for front-end tiers, application tiers, and other distributed services where maintaining enough healthy instances is key to staying online.

For broader protection, availability zones provide datacenter-level isolation within a region. Each zone has independent power, cooling, and networking, which allows organizations to architect applications across zones so that if one zone is affected, healthy instances in another zone can continue serving the workload.

Together, these capabilities help organizations reduce single points of failure and design compute architectures that are better prepared to absorb localized infrastructure events, planned maintenance, and zonal disruptions.

Build continuity and recovery on a resilient storage foundation

When disruption occurs, organizations need confidence that application data is still durable, accessible, and recoverable. Azure provides multiple storage redundancy models to support those needs. Locally redundant storage (LRS) keeps multiple copies of data within a single datacenter. Zone-redundant storage (ZRS) replicates data synchronously across availability zones within a region, helping protect against zonal failures. For broader cross-geographical resiliency scenarios, geo-redundant storage (GRS) and read-access geo-redundant storage (RA-GRS) extend protection to a secondary region.

For managed disks and virtual machine-based workloads, recovery is also shaped by capabilities such as snapshots, Azure Backup, and Azure Site Recovery. These are not just backup features in the abstract. They are mechanisms that help define how much data an organization could lose and how quickly an application can be restored after an incident.

That is why storage decisions should not be treated as only a performance or capacity conversation. For stateful applications especially, storage is central to recovery point objectives, recovery time objectives, and the broader question of how the business resumes operation after disruption.

Keep network traffic moving when conditions change

A workload is not truly available if users and dependent services cannot reach it. Even when compute and storage remain healthy, traffic disruption can still turn a manageable infrastructure event into a customer-facing outage.

That is where networking plays a distinct resiliency role. Azure networking services help maintain reachability by distributing traffic across healthy resources and redirecting around issues when conditions change. Azure Load Balancer helps spread traffic across available instances. Application Gateway adds intelligent Layer 7 routing for web applications. Traffic Manager uses DNS-based routing across endpoints, while Azure Front Door helps direct and failover internet traffic at a global level.

For customers, the value here is practical. Good networking design means that when one instance, zone, or endpoint becomes unavailable, traffic can move to a healthy path instead of stopping altogether. That can be the difference between a brief, invisible reroute and an outage your users immediately feel.

In mission-critical environments, resilient networking is what connects healthy infrastructure to real-world continuity.

Tailor resiliency to what each workload demands

Not all workloads require the same resiliency approach, and recognizing those differences is central to effective architecture and design. A stateless application tier may benefit most from autoscaling, zone distribution, and rapid instance replacement. A stateful workload may require stronger replication, backup, and failover planning because continuity depends just as much on the integrity of the data as the availability of the compute layer.

Mission-critical workloads often demand more from every layer of the stack. They may need tighter recovery targets, broader failure isolation, and more rigorously tested recovery paths than lower-priority internal systems. That does not mean every workload requires the highest possible level of redundancy. It means resiliency architecture should be guided by business impact.

Azure IaaS gives customers flexibility. The same platform can support different patterns depending on workload criticality, operational needs, and acceptable tradeoffs around cost, complexity, and recovery speed.

Make every migration a chance to build greater resiliency

Whether organizations are migrating existing applications or deploying new ones on Azure, the transition point is one of the best opportunities to build resiliency in from the start. It is the moment to reexamine architecture choices, eliminate inherited single points of failure, and design for stronger continuity across compute, storage, and networking.

Too often, a move to the cloud simply recreates existing infrastructure patterns and carries forward the same risks. But migration or new deployment can be much more valuable than that. For example, Carne Group recently shared how its move to Azure helped turn migration into a broader resiliency strategy, combining Azure Site Recovery with Terraform-based landing zones to streamline cutover while strengthening recovery readiness and operational resilience.

With IaC in place, we could easily build a duplicate site in another region. Even in the event of a worst-case scenario, we could be back up and running more or less in the same day.
Stéphane Bebrone, Global Technology Lead at Carne Group

Learn more about how Carne Group uses Azure

This is also where infrastructure as code and deployment automation play an important role. Using repeatable deployment templates and CI/CD workflows helps teams standardize resilient architectures, reduce configuration drift, and recover environments more consistently when changes or disruptions occur.

Azure Site Recovery is a foundational Azure capability for regional resilience, enabling workloads to be replicated and restarted in another Azure region on demand. Customers retain control over where and when workloads move, aligning recovery behavior with capacity, compliance, and regional availability needs.

Services such as Azure Migrate, Azure Storage Mover, and Azure Data Box support different migration scenarios. GitHub and pipeline-based deployment practices then help operationalize resiliency over time.

In that sense, this is bigger than migration alone. Whether a workload is being moved, modernized, or built new on Azure, resiliency should be part of the deployment strategy from the beginning, not added later.

Maintain resiliency after deployment as workloads evolve

Resiliency must also be maintained over time. As workloads grow and change, configuration drift, new dependencies, and evolving recovery expectations can weaken the architecture originally put in place. The most resilient organizations periodically validate readiness through testing, drills, fault simulations, and observability practices that help teams identify issues early, understand root cause, and make informed corrections. Resiliency in Azure was released in preview at Ignite to help organizations assess, improve, and validate application resiliency, with a public preview planned for Microsoft Build 2026.

Sign up for Resiliency in Azure

Azure IaaS provides foundational capabilities across compute, storage, and networking, but resilient outcomes result from how those capabilities are combined and operationalized. By designing with disruption in mind, organizations can create architectures that stay available more consistently, protect critical data more effectively, and recover more predictably when incidents occur.

To go deeper, explore the Azure IaaS Resource Center for tutorials, best practices, and guidance across compute, storage, and networking to help you design and operate resilient infrastructure with greater confidence.

Did you miss these posts in the Azure IaaS series?

Explore new resources for building a stronger, more efficient infrastructure

Create a resilient infrastructure with Azure
Visit the Azure IaaS Resource Center to start building a stronger, more efficient infrastructure today.

Get started with Azure

The post Azure IaaS: Keep critical applications running with built-in resiliency at scale appeared first on Microsoft Azure Blog.
Quelle: Azure