Agent Factory: From prototype to production—developer tools and rapid agent development

This blog post is the fourth out of a six-part blog series called Agent Factory which will share best practices, design patterns, and tools to help guide you through adopting and building agentic AI.

Developer experiences as the key to scale

AI agents are moving quickly from experimentation to real production systems. Across industries, we see developers testing prototypes in their Integrated Development Environment (IDE) one week and deploying production agents to serve thousands of users the next. The key differentiator is no longer whether you can build an agent—it’s how fast and seamlessly you can go from idea to enterprise-ready deployment.

Deploy AI agents quickly with Azure AI Foundry

Industry trends reinforce this shift:

In-repo AI development: Models, prompts, and evaluations are now first-class citizens in GitHub repos—giving developers a unified space to build, test, and iterate on AI features. 

More capable coding agents: GitHub Copilot’s new coding agent can open pull requests after completing tasks like writing tests or fixing bugs, acting as an asynchronous teammate.

Open frameworks maturing: Communities around LangGraph, LlamaIndex, CrewAI, AutoGen, and Semantic Kernel are rapidly expanding, with “agent templates” on GitHub repos becoming common.

Open protocols emerging: Standards like the Model Context Protocol (MCP) and Agent-to-Agent (A2A) are creating interoperability across platforms.

Developers increasingly expect to stay in their existing workflow—GitHub, VS Code, and familiar frameworks—while tapping into enterprise-grade runtimes and integrations. The platforms that win will be those that meet developers where they are—with openness, speed, and trust.

What a modern agent platform should deliver

From our work with customers and the open-source community, we’ve seen a clear picture emerge of what developers really need. A modern agent platform must go beyond offering models or orchestration—it has to empower teams across the entire lifecycle:

Local-first prototyping: Developers want to stay in their flow. That means designing, tracing, and evaluating AI agents directly in their IDE with the same ease as writing and debugging code. If building an agent requires jumping into a separate UI or unfamiliar environment, iteration slows and adoption drops.

Frictionless transition to production: A common frustration we hear is that an agent that runs fine locally becomes brittle or requires heavy rewrites in production. The right platform provides a single, consistent API surface from experimentation to deployment, so what works in development works in production—with scale, security, and governance layered in automatically.

Open by design: No two organizations use the exact same stack. Developers may start with LangGraph for orchestration, LlamaIndex for data retrieval, or CrewAI for coordination. Others prefer Microsoft’s first-party frameworks like Semantic Kernel or AutoGen. A modern platform must support this diversity without forcing lock-in, while still offering enterprise-grade pathways for those who want them.

Interop by design: Agents are rarely self-contained. They must talk to tools, databases, and even other agents across different ecosystems. Proprietary protocols create silos and fragmentation. Open standards like the Model Context Protocol (MCP) and Agent-to-Agent (A2A) unlock collaboration across platforms, enabling a marketplace of interoperable tools and reusable agent skills.

One-stop integration fabric: An agent’s real value comes when it can take meaningful action: updating a record in Dynamics 365, triggering a workflow in ServiceNow, querying a SQL database, or posting to Teams. Developers shouldn’t have to rebuild connectors for every integration. A robust agent platform provides a broad library of prebuilt connectors and simple ways to plug into enterprise systems.

Built-in guardrails: Enterprises cannot afford agents that are opaque, unreliable, or non-compliant. Observability, evaluations, and governance must be woven into the development loop—not added as an afterthought. The ability to trace agent reasoning, run continuous evaluations, and enforce identity, security, and compliance policies is as critical as the models themselves.

How Azure AI Foundry delivers this experience 

Azure AI Foundry is designed to meet developers where they are, while giving enterprises the trust, security, and scale they need. It connects the dots across IDEs, frameworks, protocols, and business channels—making the path from prototype to production seamless.

Build where developers live: VS Code, GitHub, and Foundry

Developers expect to design, debug, and iterate AI agents in their daily tools—not switch into unfamiliar environments. Foundry integrates deeply with both VS Code and GitHub to support this flow.

VS Code extension for Foundry: Developers can create, run, and debug agents locally with direct connection to Foundry resources. The extension scaffolds projects, provides integrated tracing and evaluation, and enables one-click deployment to Foundry Agent Service—all inside the IDE they already use.

Model Inference API: With a single, unified inference endpoint, developers can evaluate performance across models and swap them without rewriting code. This flexibility accelerates experimentation while future-proofing applications against a fast-moving model ecosystem.

GitHub Copilot and the coding agent: Copilot has grown beyond autocomplete into an autonomous coding agent that can take on issues, spin up a secure runner, and generate a pull request, signaling how agentic AI development is becoming a normal part of the developer loop. When used alongside Azure AI Foundry, developers can accelerate agent development by having Copilot generate agent code while pulling in the models, agent runtime, and observability tools from Foundry needed to build, deploy, and monitor production-ready agents.

Use your frameworks

Agents are not one-size-fits-all, and developers often start with the frameworks they know best. Foundry embraces this diversity:

First-party frameworks: Foundry supports both Semantic Kernel and AutoGen, with a convergence into a modern unified framework coming soon. This future-ready framework is designed for modularity, enterprise-grade reliability, and seamless deployment to Foundry Agent Service.

Third-party frameworks: Foundry Agent Service integrates directly with CrewAI, LangGraph, and LlamaIndex, enabling developers to orchestrate multi-turn, multi-agent conversations across platforms. This ensures you can work with your preferred OSS ecosystem while still benefiting from Foundry’s enterprise runtime.

Interoperability with open protocols

Agents don’t live in isolation—they need to interoperate with tools, systems, and even other agents. Foundry supports open protocols by default:

MCP: Foundry Agent Service allows agents to call any MCP-compatible tools directly, giving developers a simple way to connect external systems and reuse tools across platforms.

A2A: Semantic Kernel supports A2A, implementing the protocol to enable agents to collaborate across different runtimes and ecosystems. With A2A, multi-agent workflows can span vendors and frameworks, unlocking scenarios like specialist agents coordinating to solve complex problems.

Ship where the business runs

Building an agent is just the first step—impact comes when users can access it where they work. Foundry makes it easy to publish agents to both Microsoft and custom channels:

Microsoft 365 and Copilot: Using the Microsoft 365 Agents SDK, developers can publish Foundry agents directly to Teams, Microsoft 365 Copilot, BizChat, and other productivity surfaces.

Custom apps and APIs: Agents can be exposed as REST APIs, embedded into web apps, or integrated into workflows using Logic Apps and Azure Functions—with thousands of prebuilt connectors to SaaS and enterprise systems.

Observe and harden

Reliability and safety can’t be bolted on later—they must be integrated into the development loop. As we explored in the previous blog, observability is essential for delivering AI that is not only effective, but also trustworthy. Foundry builds these capabilities directly into the developer workflow:

Tracing and evaluation tools to debug, compare, and validate agent behavior before and after deployment.

CI/CD integration with GitHub Actions and Azure DevOps, enabling continuous evaluation and governance checks on every commit.

Enterprise guardrails—from networking and identity to compliance and governance—so that prototypes can scale confidently into production.

Why this matters now

Developer experience is the new productivity moat. Enterprises need to enable their teams to build and deploy AI agents quickly, confidently, and at scale. Azure AI Foundry delivers an open, modular, and enterprise-ready path—meeting developers in GitHub and VS Code, supporting both open-source and first-party frameworks, and ensuring agents can be deployed where users and data already live.

With Foundry, the path from prototype to production is smoother, faster, and more secure—helping organizations innovate at the speed of AI.

What’s next

In Part 5 of the Agent Factory series, we’ll explore how agents connect and collaborate at scale. We’ll demystify the integration landscape—from agent-to-agent collaboration with A2A, to tool interoperability with MCP, to the role of open standards in ensuring agents can work across apps, frameworks, and ecosystems. Expect practical guidance and reference patterns for building truly connected agent systems.

Did you miss these posts in the series?

Agent Factory: The new era of agentic AI—common use cases and design patterns.

Agent Factory: Building your first AI agent with the tools to deliver real-world outcomes.

Agent Factory: Top 5 agent observability best practices for reliable AI.

Azure AI Foundry
Build AI agents that automate tasks, enhance user experiences, and deliver results.

Learn more

The post Agent Factory: From prototype to production—developer tools and rapid agent development appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft Cost Management updates—July & August 2025

Whether you’re a new student, a thriving startup, or the largest enterprise, you have financial constraints. You need to know what you’re spending, where your money is being spent, and how to plan for the future. Nobody wants a surprise bill—and that’s where Microsoft Cost Management comes in. We’re always looking for ways to learn more about your challenges and help you do more with less.

See the newest features on Microsoft Cost Management

Here are a few of the latest improvements and updates (July 2025):

Service Principal support for Partner Admin Reader role (EA indirect partners)

Azure Pricing Calculator: Tip of the Month

New ways to save money with Microsoft Cloud

New videos and learning opportunities

Documentation updates

Let’s dig into the details.

Service Principal support for Partner Admin Reader role (EA indirect partners)

We’re excited to announce that Azure now supports assigning the Partner Admin Reader role to Service Principals. This enhancement empowers Enterprise Agreement indirect partners (CSPs who manage customer Azure costs) to programmatically access cost data across their customers’ enrollments under their Partner Customer Number (PCN) via Azure Active Directory applications—without relying on interactive user accounts.

Why this matters:

Managing cloud costs across multiple customers is complex and might be error-prone when relying on manual exports or shared credentials. Partners need secure, scalable, and automated access to cost data to integrate insights into their tools and optimize spend in real time.

With these enhancements, partners can now:

Automate cost data retrieval securely using Azure Active Directory service principals (no shared user credentials).

Integrate Cost Management data into partner billing tools, dashboards, or workflows using APIs.

Maintain strong governance and control access to billing scopes with Azure Role-Based Access Control.

Enable near real-time monitoring, invoice reconciliation, and proactive cost optimization across multiple customers.

To get started, learn more about how to assign Enterprise Agreement roles to service principals.

Azure Pricing Calculator: Tip of the Month 

When working with estimates in the Azure Pricing Calculator—that include multiple services—scrolling through all the details can become overwhelming. To simplify your view, click the collapse button on your estimate. This instantly minimizes the detailed configuration for all services in your estimate, leaving just the summary line visible.

Why this helps:

Reduces unnecessary scrolling when managing large estimates.

Makes it easier to focus on the services you want to review or adjust.

Keeps your workspace clean and organized, especially when sharing estimates with others.

Try collapsing services the next time you build a complex estimate. It’s a small trick that makes a big difference in navigating your pricing scenarios!

New ways to save money with Microsoft Cloud

Here are new and updated offers you might be interested in for cost savings and optimization from July and August 2025:

Generally available: Azure Firewall ingestion-time transformation for cost-efficient logging. Now you can filter or transform Azure Firewall logs before they’re ingested into Log Analytics, reducing the amount of data stored and lowering your logging costs without losing critical security insights. 

Public preview: Azure Storage Mover–free Amazon Web Services S3-to-Azure Blob migration. Now you can move data from Amazon Web Services S3 to Azure Blob Storage securely and at no additional cost using Azure Storage Mover. This fully managed service simplifies multi-cloud or full migration scenarios without third-party tools, reducing complexity and expenses.

New videos and learning opportunities

We added several new videos for your viewing and learning. Whether you are new to Cost Management or require a refresher, these videos will prove to be highly beneficial:

Managing Access to Cost Management Data

How to use the Azure Copilot to understand your costs

Configuring Cost Allocation Rules and Tags

Documentation updates

The Cost Management and Billing documentation continues to evolve. Here are some our new and updated documents from July and August:

Pay your Microsoft Customer Agreement or Microsoft Online Subscription Program bill: Updated on July 2 to add partial payment options and India-specific payment methods.

Manage Azure Reservations: Updated on July 8 to clarify reservation scope changes, splitting reservations, and limitations on billing subscription changes.

Charge back Azure saving plan costs: Published on July 9 to explain chargeback/showback for savings plans using amortized cost and API queries.

Calculate Enterprise Agreement (EA) savings plan cost savings: Published on July 9 to guide EA customers in calculating savings plan benefits using amortized usage data.

Manage Azure costs with automation: Updated on July 10 to add best practices for Cost Details API, automation workflows, and handling large datasets.

Understand and work with Cost Management scopes: Updated on July 25 to clarify Role-Based Access Control vs. billing scopes and role requirements for cost visibility.

Manage a Microsoft Azure Consumption Commitment resource: Published on August 14 to describe MACC resources, movement between subscriptions, and deletion rules.

Set up your billing account for a Microsoft Customer Agreement: Updated on August 14 to detail EA-to-MCA transition steps, prerequisites, and common migration issues.

Manage a Microsoft Azure credit resource under a subscription: Published on August 19 to introduce Azure Credit resources for MCA accounts and explain moving or deleting them.

Permissions to view and manage Azure reservations: Updated on August 21 to expand guidance on Role-Based Access Control roles, billing roles, and delegation for reservation access.

Want to keep an eye on all documentation updates? Check out the change history of the Cost Management and Billing documentation in the Azure Docs repository on GitHub. If you see something missing, select Edit at the top of the document and submit a quick pull request. You can also submit a GitHub issue. We welcome and appreciate all contributions!

What’s next for Cost Management

These are just a few of the updates from the last two months. Don’t forget to check out previous Microsoft Cost Management updates for more tips and features. We’re always listening and making continuous improvements based on your feedback—please keep it coming!

Follow the team, share your ideas, and get involved:

Submit ideas and vote on requests in the Cost Management feedback forum.

Watch and subscribe to the Microsoft Cost Management YouTube channel.

Stay tuned for more in next month’s update.

Microsoft Cost Management
Manage your cloud costs with confidence.

Get started >

The post Microsoft Cost Management updates—July & August 2025 appeared first on Microsoft Azure Blog.
Quelle: Azure

Azure mandatory multifactor authentication: Phase 2 starting in October 2025

As cyberattacks become increasingly frequent, sophisticated, and damaging, safeguarding your digital assets has never been more critical, and at Microsoft, your security is our top priority. Microsoft research shows that multifactor authentication (MFA) can block more than 99.2% of account compromise attacks, making it one of the most effective security measures available.

As announced in August 2024, Azure started to implement mandatory MFA for Azure Public Cloud sign-ins. By enforcing MFA for Azure sign-ins, we aim to provide you with the best protection against cyber threats as part of Microsoft’s commitment to enhance security for all customers, taking one step closer to a more secure future.

As previously announced, Azure MFA enforcement was rolled out gradually in phases to provide customers with enough time to plan and execute their implementations:

Phase 1: MFA enforcement on Azure Portal, Microsoft Entra admin center, and Intune admin center sign-ins.

Phase 2: Gradual enforcement for MFA requirement for users performing Azure resource management operations through any client (including but not limited to: Azure Command-Line Interface (CLI), Azure PowerShell, Azure Mobile App, REST APIs, Azure Software Development Kit (SDK) client libraries, and Infrastructure as Code (IaC) tools).

We are proud to announce that multifactor enforcement for Azure Portal sign-ins was rolled out for 100% of Azure tenants in March 2025. Now, Azure is announcing the start of Phase 2 MFA enforcement at the Azure Resource Manager layer, starting October 1, 2025. Phase 2 enforcement will be gradually applied across Azure tenants through Azure Policy, following Microsoft safe deployment practices.

Starting this week, Microsoft sent notices to all Microsoft Entra Global Administrators by email and through Azure Service Health notifications to notify the start date of enforcement and how to prepare for upcoming MFA enforcement.

Prepare for mandatory MFA enforcement

Customer impact

Users will be required to authenticate with MFA before performing resource management operations. Workload identities, such as managed identities and service principals, aren’t impacted by either phase of this MFA enforcement.

Learn more about the scope of enforcement.

How to prepare

1. Enable MFA for your users

To ensure your users can perform resource management actions, enable MFA for your users by October 1, 2025. To identify which users in your environment are set up for mandatory MFA, follow these steps. 

2. Understand potential impact

To understand potential impact ahead of Phase 2 enforcement, assign built-in Azure Policy definitions to block resource management operations if the user has not authenticated with MFA.

Customers can gradually apply this enforcement across different resource hierarchy scopes, resource types, or regions.

3. Update your Azure CLI and PowerShell clients

For the best compatibility experience, users in your tenant should use Azure CLI version 2.76 and Azure PowerShell version 14.3 or later.

Next steps for multifactor authentication for Azure sign-in

To ensure your users can perform resource management actions, enable MFA for your users by October 1, 2025. 

To understand the potential impact, apply a built-in Azure Policy definition in audit or enforcement mode.

For the best compatibility experience, users in your tenant should use Azure CLI version 2.76 and Azure PowerShell version 14.3 or later.

If you can’t enable MFA for your tenant by October 1, 2025, the Global Administrator for your tenant can postpone the enforcement date through Azure Portal.

Keep an eye out for further communications through the previously communicated notification channels.

MFA for Azure sign-in
Prepare for Phase 2 of multifactor authentication enforcement.

Learn more

The post Azure mandatory multifactor authentication: Phase 2 starting in October 2025 appeared first on Microsoft Azure Blog.
Quelle: Azure

Agent Factory: Top 5 agent observability best practices for reliable AI

This blog post is the third out of a six-part blog series called Agent Factory which will share best practices, design patterns, and tools to help guide you through adopting and building agentic AI.

Seeing is knowing—the power of agent observability

As agentic AI becomes more central to enterprise workflows, ensuring reliability, safety, and performance is critical. That’s where agent observability comes in. Agent observability empowers teams to:

Detect and resolve issues early in development.

Verify that agents uphold standards of quality, safety, and compliance.

Optimize performance and user experience in production.

Maintain trust and accountability in AI systems.

With the rise of complex, multi-agent and multi-modal systems, observability is essential for delivering AI that is not only effective, but also transparent, safe, and aligned with organizational values. Observability empowers teams to build with confidence and scale responsibly by providing visibility into how agents behave, make decisions, and respond to real-world scenarios across their lifecycle.

Learn more about building agentic AI in Azure AI Foundry

What is agent observability?

Agent observability is the practice of achieving deep, actionable visibility into the internal workings, decisions, and outcomes of AI agents throughout their lifecycle—from development and testing to deployment and ongoing operation. Key aspects of agent observability include:

Continuous monitoring: Tracking agent actions, decisions, and interactions in real time to surface anomalies, unexpected behaviors, or performance drift.

Tracing: Capturing detailed execution flows, including how agents reason through tasks, select tools, and collaborate with other agents or services. This helps answer not just “what happened,” but “why and how did it happen?”

Logging: Records agent decisions, tool calls, and internal state changes to support debugging and behavior analysis in agentic AI workflows.

Evaluation: Systematically assessing agent outputs for quality, safety, compliance, and alignment with user intent—using both automated and human-in-the-loop methods.

Governance: Enforcing policies and standards to ensure agents operate ethically, safely, and in accordance with organizational and regulatory requirements.

Traditional observability vs agent observability

Traditional observability relies on three foundational pillars: metrics, logs, and traces. These provide visibility into system performance, help diagnose failures, and support root-cause analysis. They are well-suited for conventional software systems where the focus is on infrastructure health, latency, and throughput.

However, AI agents are non-deterministic and introduce new dimensions—autonomy, reasoning, and dynamic decision making—that require a more advanced observability framework. Agent observability builds on traditional methods and adds two critical components: evaluations and governance. Evaluations help teams assess how well agents resolve user intent, adhere to tasks, and use tools effectively. Agent governance can ensure agents operate safely, ethically, and in compliance with organizational standards.

This expanded approach enables deeper visibility into agent behavior—not just what agents do, but why and how they do it. It supports continuous monitoring across the agent lifecycle, from development to production, and is essential for building trustworthy, high-performing AI systems at scale.

Azure AI Foundry Observability provides end-to-end agent observability

Azure AI Foundry Observability is a unified solution for evaluating, monitoring, tracing, and governing the quality, performance, and safety of your AI systems end to end in Azure AI Foundry—all built into your AI development loop. From model selection to real-time debugging, Foundry Observability capabilities empower teams to ship production-grade AI with confidence and speed. It’s observability, reimagined for the enterprise AI era.

With built-in capabilities like the Agents Playground evaluations, Azure AI Red Teaming Agent, and Azure Monitor integration, Foundry Observability brings evaluation and safety into every step of the agent lifecycle. Teams can trace each agent flow with full execution context, simulate adversarial scenarios, and monitor live traffic with customizable dashboards. Seamless CI/CD integration enables continuous evaluation on every commit and governance support with Microsoft Purview, Credo AI, and Saidot integration helps enable alignment with regulatory frameworks like the EU AI Act—making it easier to build responsible, production-grade AI at scale.

Five best practices for agent observability

1. Pick the right model using benchmark driven leaderboards

Every agent needs a model and choosing the right model is foundational for agent success. While planning your AI agent, you need to decide which model would be the best for your use case in terms of safety, quality, and cost.

You can pick the best model by either evaluating the model on your own data or use Azure AI Foundry’s model leaderboards to compare foundation models out-of-the-box by quality, cost, and performance—backed by industry benchmarks. With Foundry model leaderboards, you can find model leaders in various selection criteria and scenarios, visualize trade-offs among the criteria (e.g., quality vs cost or safety), and dive into detailed metrics to make confident, data-driven decisions.

Azure AI Foundry’s model leaderboards gave us the confidence to scale client solutions from experimentation to deployment. Comparing models side by side helped customers select the best fit—balancing performance, safety, and cost with confidence.
—Mark Luquire, EY Global Microsoft Alliance Co-Innovation Leader, Managing Director, Ernst & Young, LLP*

2. Evaluate agents continuously in development and production

Agents are powerful productivity assistants. They can plan, make decisions, and execute actions. Agents typically first reason through user intents in conversations, select the correct tools to call and satisfy the user requests, and complete various tasks according to their instructions. Before deploying agents, it’s critical to evaluate their behavior and performance.

Azure AI Foundry makes agent evaluation easier with several agent evaluators supported out-of-the-box, including Intent Resolution (how accurately the agent identifies and addresses user intentions), Task Adherence (how well the agent follows through on identified tasks), Tool Call Accuracy (how effectively the agent selects and uses tools), and Response Completeness (whether the agent’s response includes all necessary information). Beyond agent evaluators, Azure AI Foundry also provides a comprehensive suite of evaluators for broader assessments of AI quality, risk, and safety. These include quality dimensions such as relevance, coherence, and fluency, along with comprehensive risk and safety checks that assess for code vulnerabilities, violence, self-harm, sexual content, hate, unfairness, indirect attacks, and the use of protected materials. The Azure AI Foundry Agents Playground brings these evaluation and tracing tools together in one place, letting you test, debug, and improve agentic AI efficiently.

The robust evaluation tools in Azure AI Foundry help our developers continuously assess the performance and accuracy of our AI models, including meeting standards for coherence, fluency, and groundedness.
—Amarender Singh, Director, AI, Hughes Network Systems

3. Integrate evaluations into your CI/CD pipelines

Automated evaluations should be part of your CI/CD pipeline so every code change is tested for quality and safety before release. This approach helps teams catch regressions early and can help ensure agents remain reliable as they evolve.

Azure AI Foundry integrates with your CI/CD workflows using GitHub Actions and Azure DevOps extensions, enabling you to auto-evaluate agents on every commit, compare versions using built-in quality, performance, and safety metrics, and leverage confidence intervals and significance tests to support decisions—helping to ensure that each iteration of your agent is production ready.

We’ve integrated Azure AI Foundry evaluations directly into our GitHub Actions workflow, so every code change to our AI agents is automatically tested before deployment. This setup helps us quickly catch regressions and maintain high quality as we iterate on our models and features.
—Justin Layne Hofer, Senior Software Engineer, Veeam

4. Scan for vulnerabilities with AI red teaming before production

Security and safety are non-negotiable. Before deployment, proactively test agents for security and safety risks by simulating adversarial attacks. Red teaming helps uncover vulnerabilities that could be exploited in real-world scenarios, strengthening agent robustness.

Azure AI Foundry’s AI Red Teaming Agent automates adversarial testing, measuring risk and generating readiness reports. It enables teams to simulate attacks and validate both individual agent responses and complex workflows for production readiness.

Accenture is already testing the Microsoft AI Red Teaming Agent, which simulates adversarial prompts and detects model and application risk posture proactively. This tool will help validate not only individual agent responses, but also full multi-agent workflows in which cascading logic might produce unintended behavior from a single adversarial user. Red teaming lets us simulate worst-case scenarios before they ever hit production. That changes the game.
—Nayanjyoti Paul, Associate Director and Chief Azure Architect for Gen AI, Accenture

5. Monitor agents in production with tracing, evaluations, and alerts

Continuous monitoring after deployment is essential to catch issues, performance drift, or regressions in real time. Using evaluations, tracing, and alerts helps maintain agent reliability and compliance throughout its lifecycle.

Azure AI Foundry observability enables continuous agentic AI monitoring through a unified dashboard powered by Azure Monitor Application Insights and Azure Workbooks. This dashboard provides real-time visibility into performance, quality, safety, and resource usage, allowing you to run continuous evaluations on live traffic, set alerts to detect drift or regressions, and trace every evaluation result for full-stack observability. With seamless navigation to Azure Monitor, you can customize dashboards, set up advanced diagnostics, and respond swiftly to incidents—helping to ensure you stay ahead of issues with precision and speed.

Security is paramount for our large enterprise customers, and our collaboration with Microsoft allays any concerns. With Azure AI Foundry, we have the desired observability and control over our infrastructure and can deliver a highly secure environment to our customers.
—Ahmad Fattahi, Sr. Director, Data Science, Spotfire

Get started with Azure AI Foundry for end-to-end agent observability

To summarize, traditional observability includes metrics, logs, and traces. Agent Observability needs metrics, traces, logs, evaluations, and governance for full visibility. Azure AI Foundry Observability is a unified solution for agent governance, evaluation, tracing, and monitoring—all built into your AI development lifecycle. With tools like the Agents Playground, smooth CI/CD, and governance integrations, Azure AI Foundry Observability empowers teams to ensure their AI agents are reliable, safe, and production ready. Learn more about Azure AI Foundry Observability and get full visibility into your agents today!

What’s next

In part four of the Agent Factory series, we’ll focus on how you can go from prototype to production faster with developer tools and rapid agent development.

Did you miss these posts in the series?

Agent Factory: The new era of agentic AI—common use cases and design patterns.

Agent Factory: Building your first AI agent with the tools to deliver real-world outcomes.

Azure AI Foundry
Build adaptable AI agents that automate tasks and enhance user experiences.

Learn more

*The views reflected in this publication are the views of the speaker and do not necessarily reflect the views of the global EY organization or its member firms.
The post Agent Factory: Top 5 agent observability best practices for reliable AI appeared first on Microsoft Azure Blog.
Quelle: Azure

Transforming scientific discovery with Microsoft Azure and NVIDIA

From startups to industry-leading enterprises, there is an insatiable desire for organizations to experiment with the idea of how the world could work differently. They explore bold, new ways for society to operate and meet market demand by focusing their efforts on developing innovative products and services. With 78% of companies using AI in at least one business function,1 we’re now seeing AI adopted by organizations of every size and sector around the world. And startups that use AI have been reported to obtain funding 2.5 times more quickly than those without it.2

Organizations use AI as catalysts for change—taking risks, challenging the status quo, scaling niche ideas, and reshaping entire industries. In season 1 of the Catalyst series, the possibilities unfold through the lens of three startups using Microsoft Azure infrastructure and NVIDIA’s domain-specific models and accelerated computing technologies to advance science, health, and technology. From digital twins to drug discovery, meet the companies driving social and scientific progress through new methods that amplify human ingenuity and skill. 

Watch The Catalyst Series

Pangaea Data helps doctors close care gaps with AI 

Diagnosing and treating health conditions, whether rare or prevalent, can be difficult for doctors and patients because patient records can span hundreds of pages long and clinicians have limited time to thoroughly review them, resulting in information remaining buried within them and patients being missed. Helping clinicians identify optimal clinical pathways faster leads to better care and improved patient outcomes. Pangaea Data set out to close care gaps by applying AI configured on clinical guidelines to emulate the process clinicians use to review electronic health records. This helps to find untreated and under-treated patients across hard-to-diagnose conditions at the point of care, without disrupting clinicians’ existing workflows.

For example, the UK’s National Health Service (NHS) applied Pangaea’s platform to identify six times more untreated and under-treated patients with the hard-to-diagnose condition, cancer cachexia, which can help halve the cost of treatment. Similarly, a US health system deployed Pangaea’s platform to recognize $9 million in additional revenue each year by closing care gaps for only one condition. 

Pangaea’s platform runs on Microsoft Azure‘s globally compliant and secure cloud infrastructure and is integrated with NVIDIA’s comprehensive AI stack. By combining high-performance GPU-accelerated computing with optimized software frameworks and tools, Pangaea’s platform delivers timely and dependable insights trusted by clinicians and health care providers. This helps ensure equitable patient care and helps improves outcomes in a financially sustainable and scalable manner. 

You’ve got to make sure that information is delivered to the physician in a compliant manner. And this is where Azure does a phenomenal job. The fact that they have already done the groundwork to be compliant across all different countries and territories. It immediately instills that confidence that you are a known entity, a trusted entity, beyond just the technology.
–Vibhor Gupta, PhD, Founder and CEO of Pangaea Data

Basecamp Research connects biotechnology with biodiversity 

Basecamp Research sees nature as the key to scientific discovery. They have built one of the world’s largest biological databases to support life science researchers and companies working on drug discovery, product development, and biological research. This database of more than 10 billion novel proteins includes valuable tools such as gene editing proteins that have been collected from diverse natural environments around the world. The dataset is then used to train AI models, like Evolutionary Scale Modeling (ESM), that can help identify new targets, design novel proteins, and predict metabolic pathways, supporting faster development of innovative medicines and biological products.

AI has enabled Basecamp Research to set new benchmarks in control, novelty, and efficiency. Increasing their database’s annual growth rate tenfold required scalable computing resources, advanced models, and extensive data. By using Microsoft Azure infrastructure to operate their models and employing NVIDIA’s BioNeMo framework for training, they significantly increased operational speed and achieved scalability.

We’ve been leveraging Azure for this quite a lot, leveraging Azure Kubernetes Services (AKS) and the container registry to orchestrate this and to basically massively scale these annotation pipelines. NVIDIA comes in at the next stage when it comes to utilizing these datasets for model training.
–Glen Gowers, Co-Founder of Basecamp Research

Global Objects digitizes physical artifacts across industries

Microsoft sees AI digital twins as a foundational technology with many applications that can help bridge the physical and digital worlds. Digital twins provide virtual replicas of physical assets and systems, whether it’s a piece of artwork in a gallery, a prop in a film, or a simulation that can be used to expedite the production process in manufacturing. Global Objects shared how they’re using advanced AI and 3D scanning technologies to create photorealistic digital twins of props and objects across different locations. The company aimed to address the challenge of producing high-quality photorealistic content and fill that gap by training AI to map physical environments in the highest fidelity dataset with precision.

By blending artistic skills with advanced technology, they were able to make an authentic digital version of novel objects in the cloud. To accomplish this, they gathered a large subset of data about random objects to help generate new content that accurately replicates the exact specifications of each object digitally. Digitization of the objects can be used for not just preserving art or making films, but also creating new products and services across gaming, robotics, and healthcare.

The most important thing around all of this is processing. The files, the size, the speed by which we’re operating all require the best systems, servers, operators, GPUs that are in the market today. The relationship with Microsoft and NVIDIA is so powerful not only for us learning internally on our own systems and operations but engaging with our clients and ensuring we always have uptime, ensuring that we can always deliver with the most speed, efficiency, and fidelity as possible.
–Jesse Redniss, Chief Strategy Officer of Global Objects

Become a catalyst for change with Microsoft Azure and NVIDIA

Microsoft and NVIDIA offer industry-leading development tools, frameworks, and pre-trained models that are designed to support developers, researchers, and business leaders alike. By combining Azure’s trusted cloud foundation with NVIDIA’s cutting-edge compute capabilities, companies can deploy AI faster, iterate smarter, and stay ahead of the competition. The examples you’ve seen today are catalysts for a more inclusive and intelligent future, driving progress across diverse fields and industries. Watch The Catalyst Series.

Learn more about Azure AI infrastructure

1 McKinsey, The state of AI: How organizations are rewiring to capture value, March 12, 2025.

2 Crunchbase, The state of startups in mid-2025 in 8 charts: Global funding and M&A surge as AI fervor continues, July 22, 2025.

The post Transforming scientific discovery with Microsoft Azure and NVIDIA appeared first on Microsoft Azure Blog.
Quelle: Azure

Protecting Azure Infrastructure from silicon to systems

At Microsoft, secure design begins at the foundation of our computing stack—the silicon level—and extends through every layer of the cloud. Since launching the Secure Future Initiative (SFI)—a company-wide commitment to security, we continue to prioritize delivering products and solutions that are secure by design, secure by default, and secure in operation.

SFI sits at the core of Azure’s approach to designing hardware systems, reshaping the architecture, deployment, and operations of our physical cloud and AI infrastructure. Azure is redefining workload security from the datacenter to the edge. Our comprehensive approach includes Azure Boost, which isolates control and data planes for virtual machines; the Azure Integrated HSM, a custom security chip providing FIPS 140-3 Level 3 key protection within our servers; and confidential computing, which protects data in-use through hardware-based trusted execution environments (TEEs). 

Microsoft has also made ongoing efforts to advance systematic security audits and contribute to the open-source community through Caliptra—a silicon root-of-trust designed to anchor security in hardware. Combining these initiatives with technologies such as confidential computing and code transparency services, our aim is to ensure that every component in the supply chain can be securely verified. 

Figure 1: Building blocks of Azure’s hardware security Architecture .

Purpose built silicon for defense-in-depth 

Azure Boost serves as the system’s security controller, managing secure access to Azure. Only systems with a verified and approved security configuration are granted access. 

To enhance security and performance, Azure Boost offloads control plane services from the host CPU to a dedicated Azure Boost system controller. This setup creates a hardware separation  between Azure’s control plane—running on the Azure Boost controller—and customer workloads—running on the CPU—ensuring strong isolation and protection.

Azure Integrated HSM is a server local Hardware Security Module (HSM) for high-assurance workloads. It is designed to meet the stringent requirements of the Federal Information Processing Standards (FIPS) 140-3 Level 3 security requirements, requiring strong isolation, tamper-resistant hardware, identity-based authentication, and automatic zeroization. Azure Integrated HSM protects keys in-use, by ensuring keys always remain within the bounds of the HSM. 

Unlike centralized remote HSM services, Azure Integrated HSM eliminates network roundtrips for key operations and avoids the need to release keys into the workload environment. Instead of relying on remote access, the Azure Integrated HSM is securely bound to the local workload and provides oracle-style key usage to authorized services within the local environment. 

Learn more about Azure Integrated HSM

Azure Datacenter Secure Control Module (DC-SCM)—DC-SCM is a security and server control module that contains Hydra—a security-focused Board Management Controller (BMC) design, with an integrated root-of-trust and hardware-based security protection on all management interfaces. This root of trust restricts unauthorized access to BMC firmware and ensures the firmware is authenticated and cryptographically measured. 

Confidential Computing offers a spectrum of guarantees

Confidential computing makes use of hardware-based Trusted Execution Environments (TEEs) to protect workloads—such as virtual machines—from other system software, including the hypervisor. 

Microsoft, a founding member of the Confidential Computing Consortium, works closely with CPU and GPU manufacturers to design and integrate confidential computing technologies directly into their hardware. Earlier this year at the Confidential Computing Summit we defined a spectrum of guarantees users can enable with confidential computing when they port their applications, including:  

On by default, is the lift and shift existing applications with minimal change.   

Build in confidential computing by designing services and applications that make deeper use of hardware-based protections.

Leverage transparent confidential computing to gain deeper insights into how confidential services interact and operate securely.

Figure 2. Confidential Computing—Spectrum of Guarantees.

Azure has the most comprehensive portfolio of confidential computing solutions, including confidential virtual machines, containers, generative AI, and services like Azure Confidential Ledger, Azure Attestation, and Managed HSM—each designed to protect code and data throughout its lifecycle using hardware-backed security. 

Hardware Security Transparency

Caliptra is a hardware root of trust that plays a critical role in securing devices. It anchors the chain of trust directly in silicon, establishing foundational security properties that support the integrity of higher-level features. This foundation provides workloads the ability to verify the code and configuration of the underlying platform, enabling workloads establish trust in the hardware platform. 

Caliptra is a fully open-source silicon root of trust developed through a collaboration between Microsoft, AMD, Google, and NVIDIA. In April, we released Caliptra 2.0, which included Adams Bridge—an open-source accelerator designed for post-quantum resilient cryptography. This integration made Caliptra the first open-source root of trust to feature hardened post-quantum cryptography, driving broader adoption across the hardware ecosystem. 

Systematic Security Reviews play a critical role in protecting hardware infrastructure, which relies on both low-level hardware features and the firmware that runs on top of them. Ensuring firmware security involves rigorous code scanning, continuous security reviews, and hardware-based attestation. 

Historically, Microsoft conducted these reviews internally. To enhance transparency and extend security assurance beyond Microsoft, we partnered with Google and the Open Compute Project in 2023 to establish OCP SAFE—a framework for systematic security reviews.

Under OCP SAFE, approved Security Review Providers (SRPs) perform independent evaluations and issue verifiable endorsements of a manufacturer’s security compliance. These endorsements serve as trusted evidence, supporting secure device attestations and reinforcing confidence in the hardware supply chains. 

Enhancing security with Code Transparency Services

Code Transparency Services (CTS) is an immutable ledger technology built to meet the standards of Supply Chain Integrity, Transparency, and Trust (SCITT). It operates exclusively within confidential computing environments, enhancing trust in Azure’s hardware and firmware by ensuring that every component is verifiably secure .

CTS addresses key challenges in firmware provenance, integrity, and auditability across both first-party and third-party supply chains. When paired with a silicon root of trust like Caliptra, and supported by audits such as OCP-SAFE, CTS ensures that hardware and firmware are authorized, non-repudiable, and immutably auditable. 

Today, CTS is a central component in Azure’s confidential cloud services. In addition to Azure’s own usage, CTS will be available as a managed service—allowing Azure customers to create and operate their own transparency service instances. 

Staying secure with Microsoft

Microsoft’s Secure Future Initiative (SFI) provides a guiding framework for building secure and trustworthy cloud infrastructure. By embedding security into every layer—from silicon to systems to services—Azure takes a defense-in-depth approach to cloud security. Through innovations like Azure Boost, Azure Integrated HSM, and confidential computing, and through collaborative efforts such as Caliptra, OCP SAFE, and SCITT, Microsoft is not only securing today’s workloads but also laying the foundation for a more secure and transparent future. 

Azure Boost
Explore next-generation Azure infrastructure.

Learn more >

The post Protecting Azure Infrastructure from silicon to systems appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft is a Leader in the 2025 Gartner® Magic Quadrant™ for Cloud-Native Application Platforms 

We’re proud to announce that Microsoft has been named a Leader in the 2025 Gartner® Magic Quadrant™ for Cloud-Native Application Platforms for a second year in a row, and the furthest to the right in Completeness of Vision. We believe this recognition reflects our continued product innovation, seamless developer experience, and AI leadership. Enabling customers to innovate faster with cloud-native apps and AI agents.

As AI reshapes the application landscape, Microsoft continues to lead with a comprehensive application platform that spans web apps, APIs, event-driven applications, serverless functions, containers, and AI agents backed by global scale and deep enterprise expertise. We’re committed to helping developers and customers innovate with AI while delivering scalable, cost-efficient operations for organizations of all sizes.

Read more about Magic Quadrant for Cloud-Native Application Platforms

A developer-first experience, built for productivity 

We continue to invest deeply in improving the developer experience across our application platform—offering choice, simplicity, and integration at every layer. Whether customers are building with containers, functions, APIs, or web frameworks, Azure provides a seamless and productive environment to accelerate from idea to impact. Azure offers: 

Azure App Service: An enterprise-grade platform-as-a-service (PaaS) for web apps with support for language runtimes across .NET, Java, Node.js, Python, PHP, Windows, and Linux containers, custom domain management, and deep integration with GitHub and DevOps pipelines.

Azure Container Apps: A serverless container service ideal for microservices and event-driven workloads, now enhanced with scaling improvements, serverless GPU support, and Azure AI Foundry integration. 

Azure Functions: A powerful serverless compute platform for event-driven architectures. We recently added enhanced performance with the Flex Consumption plan, .NET 8 remote MCP server support, and improved support for long-running durable workflows. 

Azure Static Web Apps: A managed service for building full stack web applications with frameworks like React, Vue, Angular, and Blazor with fully managed global distribution of static content. 

Agentic DevOps: All of these are services are integrated with GitHub Copilot and Visual Studio. GitHub Copilot has transformed the way we code, which is why 20 million developers rely on it to build faster. 50 million developers are actively using Visual Studio and Visual Studio Code each month. With Agentic DevOps using GitHub Copilot and Azure SRE Agent, developers can seamlessly go from code to cloud scale faster, accelerating every stage of their application lifecycle. 

Powering the next wave of AI-native apps 

AI is fundamentally changing how software is built and experienced. From personalized digital experiences to autonomous agents, applications increasingly rely on embedded intelligence, real-time inference, and orchestration of complex workflows.

Microsoft is leading the way in AI innovation with Azure AI Foundry—unifying agents, models, and tools with built-in enterprise-readiness capabilities such as tracing, monitoring, and evaluations. Azure AI Foundry offers a rich catalog of AI models such as OpenAI’s GPT-5, GPT-4o, Meta’s Llama, Microsoft’s Phi-4, and more. AI Foundry integrates with our application platform services like Azure Container Apps and Azure Functions to enable developers to build agentic applications using composable, event-driven, and containerized building blocks. 

In the past year, we introduced key innovations across our application platform to make Azure the home for AI-native applications: 

Azure Container Apps Serverless GPUs let customers run AI model inference on demand without managing infrastructure. Serverless GPUs help teams scale generative AI and workloads with ease and cost efficiency. 

Dynamic sessions in Azure Container Apps offer secure, on-demand, sandboxed compute environments, ideal for large language model (LLM) generated code, AI agents, and ephemeral workloads. 

Azure Container Apps integration with Azure AI Foundry lets customers deploy models from the Foundry model catalog directly to Azure Container Apps, simplifying the model deployment experience. 

Sidecars in Azure App Service simplify AI-native app deployment by integrating small language model (SLM) model hosting, routing, and scaling in existing web apps. 

By combining apps, data, and AI in a single platform, Azure enables organizations to build and scale the next generation of intelligent, adaptive applications. 

Built to scale securely and cost effectively 

Applications need to do more than run, they must scale, perform reliably, and remain cost-effective. Azure is designed to meet the demands of enterprises and startups alike with built-in capabilities that reduce operational overhead and improve efficiency. Recent enhancements include: 

Azure App Service Premium v4 plan (public preview) brings a next-generation, fully managed PaaS experience on both Windows and Linux with superior performance, scalability, and cost efficiency powered by modern Azure hardware. It’s built to reduce total cost of ownership, with early testing showing up to 24% savings for Windows web apps compared to the previous Premium v3 tier. 

Azure Functions Flex Consumption offers concurrency‑based scaling, instantaneous scale‑from‑zero up to 1,000 instances with no cold starts via the “Always Ready” feature, and seamless virtual network integration. 

App Service plans with 2+ instances automatically support Availability Zones without extra set up, while still maintaining a 99.99% SLA. Support for regions with two zones, enhanced zone visibility, and mutable zone redundancy settings further simplify deployment and reduce costs. 

Customers are innovating with Azure’s application platform 

We’re honored by this recognition and thankful for the trust our customers and partners place in us. Their innovation and feedback continue to drive our roadmap and our mission. Here are just a few examples of that innovation in practice. 

Coca‑Cola leveraged Azure AI Foundry and Azure application services to build a custom, real‑time conversational “Santa” AI model that interacted with over one million people across 43 markets in 26 languages—launching the immersive “Create Real Magic” holiday campaign in just 60 days. The initiative showcased how the company’s multi‑phase migration to an AI‑ready Azure platform enabled rapid innovation through scalable services like Azure AI Speech, Functions, and Container Apps. 

Medigold Health migrated its applications to Azure App Service and adopted Azure OpenAI Service, along with Azure Cosmos DB and Azure SQL Database, to automate clinicians’ report generation and significantly cut down administrative effort. This transformation led to a 58% increase in clinician retention and greatly improved job satisfaction and workflow efficiency. 

The NFL enhanced its Combine App with a real‑time conversational AI assistant, powered by Azure OpenAI Service, Azure Container Apps, and Azure Cosmos DB, enabling scouts and coaches to ask natural‑language questions and receive fast, accurate player insights while drills are still underway. This innovation eliminated hours of manual data crunching, turning it into instantaneous “grab‑and‑go” insights, transforming talent evaluation during the Combine event. 

Audi AG used Azure AI Foundry, Azure App Service, and Azure Cosmos DB to rapidly deploy secure, scalable AI assistants that addressed urgent HR needs. The project delivers fast, modular access to information, earning leadership and employee confidence. Audi is moving from experimentation to operational AI, ready for deeper system integration and next-gen capabilities like multi-agent orchestration. 

As AI transforms the application landscape, Microsoft is committed to helping customers build what’s next with speed, intelligence, and resilience. 

Learn more about Azure

Explore Azure App Service, Azure Container Apps, and Azure Functions. 

Get started building with AI App Templates. Edit and deploy to Azure using Visual Studio Code or GitHub Codespaces. 

Get expert guidance from skilled Microsoft partners through the new Azure Accelerate program. 

Gartner, Magic Quadrant for Cloud-Native Application Platforms, By Tigran Egiazarov, Mukul Saha, Prasanna Lakshmi Narasimha, 4 August 2025. 

*Gartner is a registered trademark and service mark and Magic Quadrant is a registered trademark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and are used herein with permission. All rights reserved. 

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request here. 

Gartner does not endorse any vendor, product, or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose. 
The post Microsoft is a Leader in the 2025 Gartner® Magic Quadrant™ for Cloud-Native Application Platforms  appeared first on Microsoft Azure Blog.
Quelle: Azure

Microsoft’s open source journey: From 20,000 lines of Linux code to AI at global scale

Microsoft’s engagement with the open source community has transformed the company from a one-time skeptic to now being one of the world’s leading open source contributors. In fact, over the past three years, Microsoft Azure has been the largest public cloud contributor (and the second largest overall contributor) to the Cloud Native Computing Foundation (CNCF). So, how did we get here? Let’s look at some milestones in our journey and explore how open-source technologies are at the heart of the platforms powering many of Microsoft’s biggest products, like Microsoft 365, and massive-scale AI workloads, including OpenAI’s ChatGPT. Along the way, we have also introduced and contributed to several open-source projects inspired by our own experiences, contributing back to the community and accelerating innovation across the ecosystem.  

Innovate faster with open source on Azure

Embracing open source: Key milestones in Microsoft’s journey

2009—A new leaf: 20,000 lines to Linux. In 2009, Microsoft contributed more than 20,000 lines of code to the Linux kernel, initially Hyper‑V drivers, under General Public License, version 2 (GPLv2). It wasn’t our first open source contribution, but it was a visible moment that signaled a change in how we build and collaborate. In 2011, Microsoft was in the top 5 companies contributing to Linux. Today, 66% of customer cores in Azure run Linux.

2015—Visual Studio code: An open source hit. In 2015, Microsoft released Visual Studio Code (VS Code), a lightweight, open-source, cross-platform code editor. Today, Visual Studio and VS Code together have more than 50 million monthly active developers, with VS Code itself widely regarded as the most popular development environment. We believe AI experiences can thrive by leveraging the open-source community, just as VS Code has successfully done over the past decade. With AI becoming an integral part of the modern coding experience, we’ve released the GitHub Copilot Chat extension as open source on GitHub.

2018—GitHub and the “all-in” commitment. In 2018, Microsoft acquired GitHub, the world’s largest developer community platform, which was already home to 28 million developers and 85 million code repositories. This acquisition underscored Microsoft’s transformation. As CEO Satya Nadella said in the announcement, “Microsoft is all-in on open source… When it comes to our commitment to open source, judge us by the actions we have taken in the recent past, our actions today, and in the future.” In the 2024 Octoverse, GitHub reported 518 million public or open-source projects, over 1 billion contributions in 2024, about 70,000 new public or open-source generative AI projects, and about a 59% year-over-year surge in contributions to generative AI projects. 

Open source at enterprise scale: Powering the world’s most demanding workloads 

Open-source technologies, like Kubenetes and PostgreSQL, have become foundational pillars of modern cloud-native infrastructure—Kubernetes is the second largest open-source project after Linux and now powers millions of containerized workloads globally, while PostgreSQL is one of the most widely adopted relational databases. Azure Kubernetes Service (AKS) and Azure’s managed Postgres take the best of these open-source innovations and elevate them into robust, enterprise-ready managed services. By abstracting away the operational complexity of provisioning, scaling, and securing these platforms, AKS and managed PostgreSQL lets organizations focus on building and innovating. The combination of open source flexibility with cloud-scale reliability allows services like Microsoft 365 and OpenAI’s ChatGPT to operate at massive scale while staying highly performant.

COSMIC: Microsoft’s geo-scale, managed container platform powers Microsoft 365’s transition to containers on AKS. It runs millions of cores and is one of the largest AKS deployments in the world. COSMIC bakes in security, compliance, and resilience while embedding architectural and operational best practices into our internal services. The result: drastically reduced engineering effort, faster time-to-market, improved cost management, even while scaling to millions of monthly users around the world. COSMIC uses Azure and open-source technologies to operate at planet-wide scale: Kubernetes event-driven autoscaling (KEDA) for autoscaling, Prometheus, and Grafana for real-time telemetry and dashboards to name a few.

OpenAI’s ChatGPT: ChatGPT is built on Azure using AKS for container orchestration, Azure Blob Storage for user and AI-generated content, and Azure Cosmos DB for globally distributed data. The scale is staggering: ChatGPT has grown to almost 700 million weekly active users, making it the fastest-growing consumer app in history.1 And yet, OpenAI operates this service with a surprisingly small engineering team. As Microsoft’s Cloud and AI Group Executive Vice President Scott Guthrie highlighted at Microsoft Build in May, ChatGPT “needs to scale … across more than 10 million compute cores around the world,” …with approximately 12 engineers to manage all that infrastructure. How? By relying on managed platforms like AKS that combine enterprise capabilities with the best of open source innovation to do the heavy lifting of provisioning, scaling, and healing Kubernetes clusters across the globe. 

Consider what happens when you chat with ChatGPT: Your prompt and conversation state are stored in an open-source database (Azure Database for PostgreSQL) so the AI can remember context. The model runs in containers across thousands of AKS nodes. Azure Cosmos DB then replicates data in milliseconds to the datacenter closest to the user, ensuring low latency. All of this is powered by open-source technologies under the hood and delivered as cloud services on Azure. The result: ChatGPT can handle “unprecedented” load—over one billion queries per day, without a hitch, and without needing a giant operations team. 

Deploy containers on Azure Kubernetes Service

What Azure teams are building in the open

At Microsoft, our commitment to building in the open runs deep, driven by engineers across Azure who actively shape the future of open-source infrastructure. Our teams don’t just use open-source technologies, they help build and evolve them.  

Our open-source philosophy is straightforward: we contribute upstream first and then integrate those innovations into our downstream products. To support this, we play a pivotal role in upstream open-source projects, collaborating across the industry with partners, customers, and even competitors. Examples of projects we have built or contributed to include:  

Dapr (Distributed Application Runtime): A CNCF-graduated project launched by Microsoft in 2019, Dapr simplifies cloud-agnostic app development with modular building blocks for service invocation, state, messaging, and secrets.

Radius: A CNCF Sandbox project that lets developers define application services and dependencies, while operators map them to resources across Azure, AWS, or private clouds—treating the app, not the cluster, as the unit of intent.

Copacetic: A CNCF Sandbox tool that patches container images without full rebuilds, speeding up security fixes—originally built to secure Microsoft’s cloud images.

Dalec: A declarative tool for building secure OS packages and containers, generating software bill of materials (SBOMs) and provenance attestations to produce minimal, reproducible base images.

SBOM Tool: A command line interface (CLI) for generating SPDX-compliant SBOMs from source or builds—open-sourced by Microsoft to boost transparency and compliance.

Drasi: A CNCF Sandbox project released in 2024, Drasi reacts to real-time data changes using a Cypher-like query language for change-driven workflows. 

Semantic Kernel and AutoGen: Open-source frameworks for building collaborative AI apps—Semantic Kernel orchestrates large language models (LLMs) and memory, while AutoGen enables multi-agent workflows.

Phi-4 Mini: A compact 3.8 billion-parameter AI model released in 2025, optimized for reasoning and mathematics on edge devices; available on Hugging Face.

Kubernetes AI Toolchain Operator (KAITO): A CNCF Sandbox Kubernetes operator that automates AI workload deployment—supporting LLMs, fine-tuning, and retrieval-augmented generation (RAG) across cloud and edge with AKS integration. 

KubeFleet: A CNCF Sandbox project for managing applications across multiple Kubernetes clusters. It offers smart scheduling, progressive deployments, and cloud-agnostic orchestration. 

This is just a small sampling of some of the open-source projects that Microsoft is involved in—each one sharing, in code, the lessons we’ve learned from running systems at a global scale and inviting the community to build alongside us.  

Open Source + Azure = Empowering the next generation of innovation

Microsoft’s journey with open source has come a long way from that 20,000-line Linux patch in 2009. Today, open-source technologies are at the heart of many Azure solutions. And conversely, Microsoft’s contributions are helping drive many open-source projects forward—whether it’s commits to Kubernetes; new tools like KAITO, Dapr, and Radius; or research advancements like Semantic Kernel and Phi-4. Our engineers understand that the success of end user solutions like Microsoft 365 and ChatGPT rely on scalable, resilient platforms like AKS—which in turn are built on and sustained by strong, vibrant open source communities. 

Join us at Open Source Summit Europe 2025

As we continue to contribute to the open source community, we’re excited to be part of Open Source Summit Europe 2025, taking place August 25–27. You’ll find us at booth D3 with live demos, in-booth sessions covering a wide range of topics, and plenty of opportunities to connect with our Open Source team. Be sure to catch our conference sessions as well, where Microsoft experts will share insights, updates, and stories from our work across the open source ecosystem. 

1 TechRepublic, ChatGPT’s On Track For 700M Weekly Users Milestone: OpenAI Goes Mainstream, August 5, 2025.

The post Microsoft’s open source journey: From 20,000 lines of Linux code to AI at global scale appeared first on Microsoft Azure Blog.
Quelle: Azure

Agent Factory: Building your first AI agent with the tools to deliver real-world outcomes

This blog post is the second out of a six-part blog series called Agent Factory which will share best practices, design patterns, and tools to help guide you through adopting and building agentic AI.

Tools are defining the next wave of agentic AI

In the previous blog, we explored five common design patterns of agentic AI—from tool use and reflection to planning, multi-agent collaboration, and adaptive reasoning. These patterns show how agents can be structured to achieve reliable, scalable automation in real-world environments.

Across the industry, we’re seeing a clear shift. Early experiments focused on single-model prompts and static workflows. Now, the conversation is about extensibility—how to give agents a broad, evolving set of capabilities without locking into one vendor or rewriting integrations for each new need. Platforms are competing on how quickly developers can:

Integrate with hundreds of APIs, services, data sources, and workflows.

Reuse those integrations across different teams and runtime environments.

Maintain enterprise-grade control over who can call what, when, and with what data.

The lesson from the past year of agentic AI evolution is simple: agents are only as capable as the tools you give them—and only as trustworthy as the governance behind those tools.

Develop agentic AI in Azure AI Foundry

Extensibility through open standards

In the early stages of agent development, integrating tools was often a bespoke, platform-specific effort. Each framework had its own conventions for defining tools, passing data, and handling authentication. This created several consistent blockers:

Duplication of effort—the same internal API had to be wrapped differently for each runtime.

Brittle integrations—small changes to schemas or endpoints could break multiple agents at once.

Limited reusability—tools built for one team or environment were hard to share across projects or clouds.

Fragmented governance—different runtimes enforced different security and policy models.

As organizations began deploying agents across hybrid and multi-cloud environments, these inefficiencies became major obstacles. Teams needed a way to standardize how tools are described, discovered, and invoked, regardless of the hosting environment.

That’s where open protocols entered the conversation. Just as HTTP transformed the web by creating a common language for clients and servers, open protocols for agents aim to make tools portable, interoperable, and easier to govern.

One of the most promising examples is the Model Context Protocol (MCP)—a standard for defining tool capabilities and I/O schemas so any MCP-compliant agent can dynamically discover and invoke them. With MCP:

Tools are self-describing, making discovery and integration faster.

Agents can find and use tools at runtime without manual wiring.

Tools can be hosted anywhere—on-premises, in a partner cloud, or in another business unit—without losing governance.

Azure AI Foundry supports MCP, enabling you to bring existing MCP servers directly into your agents. This gives you the benefits of open interoperability plus enterprise-grade security, observability, and management. Learn more about MCP at MCP Dev Days.

The enterprise toolchain in Azure AI Foundry

Once you have a standard for portability through open protocols like MCP, the next question becomes: what kinds of tools should your agents have, and how do you organize them so they can deliver value quickly while staying adaptable?

In Azure AI Foundry, we think of this as building an enterprise toolchain—a layered set of capabilities that balance speed (getting something valuable running today), differentiation (capturing what makes your business unique), and reach (connecting across all the systems where work actually happens).

1. Built-in tools for rapid value: Azure AI Foundry includes ready-to-use tools for common enterprise needs: searching across SharePoint and data lake, executing Python for data analysis, performing multi-step web research with Bing, and triggering browser automation tasks. These aren’t just conveniences—they let teams stand up functional, high-value agents in days instead of weeks, without the friction of early integration work.

2. Custom tools for your competitive edge: Every organization has proprietary systems and processes that can’t be replicated by off-the-shelf tools. Azure AI Foundry makes it straightforward to wrap these as agentic AI tools—whether they’re APIs from your ERP, a manufacturing quality control system, or a partner’s service. By invoking them through OpenAPI or MCP, these tools become portable and discoverable across teams, projects, and even clouds, while still benefiting from Foundry’s identity, policy, and observability layers.

3. Connectors for maximum reach: Through Azure Logic Apps, Foundry can connect agents to over 1,400 SaaS and on-premises systems—CRM, ERP, ITSM, data warehouses, and more. This dramatically reduces integration lift, allowing you to plug into existing enterprise processes without building every connector from scratch.

One example of this toolchain in action comes from NTT DATA, which built agents in Azure AI Foundry that integrate Microsoft Fabric Data Agent alongside other enterprise tools. These agents allow employees across HR, operations, and other functions to interact naturally with data—revealing real-time insights and enabling actions—reducing time-to-market by 50% and giving non‑technical users intuitive, self-service access to enterprise intelligence.

Enterprise-grade management for tools

Extensibility must be paired with governance to move from prototype to enterprise-ready automation. Azure AI Foundry addresses this with a secure-by-default approach to tool management:

Authentication and identity in built-in connectors: Enterprise-grade connectors—like SharePoint and Microsoft Fabric—already use on-behalf-of (OBO) authentication. When an agent invokes these tools, Foundry ensures that the call respects the end user’s permissions via managed Entra IDs, preserving existing authorization rules. With Microsoft Entra Agent ID, every agentic project created in Azure AI Foundry automatically appears in an agent-specific application view within the Microsoft Entra admin center. This provides security teams with a unified directory view of all agents and agent applications they need to manage across Microsoft. This integration marks the first step toward standardizing governance for AI agents company wide. While Entra ID is native, Azure AI Foundry also supports integrations with external identity systems. Through federation, customers who use providers such as Okta or Google Identity can still authenticate agents and users to call tools securely.

Custom tools with OpenAPI and MCP: OpenAPI-specified tools enable seamless connectivity using managed identities, API keys, or unauthenticated access. These tools can be registered directly in Foundry, and align with standard API design best practices. Foundry is also expanding MCP security to include stored credentials, project-level managed identities, and third-party OAuth flows, along with secure private networking—advancing toward a fully enterprise-grade, end-to-end MCP integration model.

API governance with Azure API Management (APIM): APIM provides a powerful control plane for managing tool calls: it enables centralized publishing, policy enforcement (authentication, rate limits, payload validation), and monitoring. Additionally, you can deploy self-hosted gateways within VNets or on-prem environments to enforce enterprise policies close to backend systems. Complementing this, Azure API Center acts as a centralized, design-time API inventory and discovery hub—allowing teams to register, catalog, and manage private MCP servers alongside other APIs. These capabilities provide the same governance you expect for your APIs—extended to agentic AI tools without additional engineering.

Observability and auditability: Every tool invocation in Foundry—whether internal or external—is traced with step-level logging. This includes identity, tool name, inputs, outputs, and outcomes, enabling continuous reliability monitoring and simplified auditing.

Five best practices for secure, scalable tool integration

Enterprise-grade management ensures tools are secure and observable—but success also depends on how you design and operate them from day one. Drawing on Azure AI Foundry guidance and customer experience, a few principles stand out:

Start with the contract. Treat every tool like an API product. Define clear inputs, outputs, and error behaviors, and keep schemas consistent across teams. Avoid overloading a single tool with multiple unrelated actions; smaller, single-purpose tools are easier to test, monitor, and reuse.

Choose the right packaging. For proprietary APIs, decide early whether OpenAPI or MCP best fits your needs. OpenAPI tools are straightforward for well-documented REST APIs, while MCP tools excel when portability and cross-environment reuse are priorities.

Centralize governance. Publish custom tools behind Azure API Management or a self-hosted gateway so authentication, throttling, and payload inspection are enforced consistently. This keeps policy logic out of tool code and makes changes easier to roll out.

Bind every action to identity. Always know which user or agent is invoking the tool. For built-in connectors, leverage identity passthrough or OBO. For custom tools, use Entra ID or the appropriate API key/credential model, and apply least-privilege access.

Instrument early. Add tracing, logging, and evaluation hooks before moving to production. Early observability lets you track performance trends, detect regressions, and tune tools without downtime.

Following these practices ensures that the tools you integrate today remain secure, portable, and maintainable as your agent ecosystem grows.

What’s next

In part three of the Agent Factory series, we’ll focus on observability for AI agents—how to trace every step, evaluate tool performance, and monitor agent behavior in real time. We’ll cover the built-in capabilities in Azure AI Foundry, integration patterns with Azure Monitor, and best practices for turning telemetry into continuous improvement.

Did you miss the first post in the series? Check it out: The new era of agentic AI—common use cases and design patterns.

Azure AI Foundry
Build AI agents that automate tasks, enhance user experiences, and deliver results.

Learn more

The post Agent Factory: Building your first AI agent with the tools to deliver real-world outcomes appeared first on Microsoft Azure Blog.
Quelle: Azure

Building the Frontier Firm with Microsoft Azure: The business case for cloud and AI modernization

A new kind of enterprise is emerging—and it’s rewriting the rules of leadership.

Microsoft’s 2025 Work Trend Index calls them Frontier Firms: organizations that are not just adopting AI, but are powered by intelligence on tap, human-agent teams, and a new role for everyone; agent boss while rearchitecting their operations around it. These firms are moving beyond experimentation and into scaled transformation—powered by human-agent teams—by having agents as members of their team, reinventing processes, providing employees with new skills to scale their impact, and using a modern cloud foundation.

Read Microsoft’s 2025 Work Trend Index

What’s most inspiring is that they’re not waiting for perfect conditions. They’re building now for a competitive advantage.

81% of business leaders expect AI agents to be deeply integrated into workflows within the next 12–18 months.

82% say 2025 is a pivotal year to rethink how their organization operates.

And yet, only a fraction have modernized the infrastructure needed to support this shift.

This is the leadership gap. AI is the defining force of this era—but it can’t thrive on legacy systems. The ability to scale AI responsibly, securely, and enterprise-wide depends on a modern cloud foundation. And that’s where the leaders are separating from the laggards.

The future belongs to the Frontier Firm. And the foundation of every Frontier Firm is a modern cloud. The companies that will lead in AI tomorrow are the ones modernizing their digital core today. Cloud + AI isn’t a tech trend. It’s the new business model.

The gap between digital leaders and digital laggards is no longer theoretical—it’s measurable. While some organizations are reimagining their operating models with AI at the core, others are still debating pilot programs. This divergence is creating a new kind of competitive pressure—one that’s not just about technology, but about time.

In fact, according to the recent Microsoft 2025 Work Trend Index, Frontier Firms are already reshaping their industries. They are faster to market, more adaptive to change, and more efficient in how they scale intelligence across the enterprise. Meanwhile, digital laggards are stuck in a holding pattern, attempting to run AI on infrastructure that wasn’t built for it, managing data in silos that AI can’t learn from, and spending more time maintaining systems than innovating with them.

Technical debt is now one of the most significant barriers to digital transformation. This technical debt includes legacy systems that consume a disproportionate share of IT budgets, limits agility, and introduces security and compliance risks. Shifting from project-based funding to continuous modernization models can address this growing burden.

This is where cloud modernization becomes a company-level strategic—not just technical—imperative.

Legacy systems weren’t built for AI. Modernizing with Azure is how leaders unlock its full value—securely, responsibly, and at scale.

Modernization is a strategic imperative for competitive advantage, requiring rearchitecting business processes, people, and technology to unlock the full power of cloud-native services, advanced analytics, and enterprise-grade AI. Migrating and modernizing to Azure delivers more than technical efficiencies—it enables agility, innovation, and resilience at scale. A research report by IDC highlights numerous business outcomes from customers who have migrated and modernized on Azure. Let’s dive into some of the top strategic outcomes coming out of this report:

Agility: Faster execution of business change

IDC reports a 78% improvement in the speed of executing business changes for organizations modernizing with Azure. In a world where agility is a competitive advantage, this speed enables companies to respond to market shifts, regulatory changes, and customer demands with unprecedented responsiveness. Business leaders can pivot strategies, launch new offerings, or adapt operations quickly, turning agility into a core differentiator.

Innovation: More time focused on strategic initiatives

IDC found that organizations reported 69% time gain for IT teams to focus on innovation. By reducing time spent on maintenance and triaging, teams can invest in building new capabilities, experimenting with AI, and driving digital transformation. This shift allows technology leaders to move from “keeping the lights on” to driving strategic initiatives—such as AI adoption, automation, and digital transformation—that fuel long-term growth.

Speed to market: Accelerated product and service delivery

Modernized organizations saw a 43% faster time to market for new products and services. In fast-moving industries, being first—or fast—can mean the difference between market leadership or irrelevance. This outcome empowers leaders to seize new opportunities, capture market share, and drive revenue growth through innovation.

Resilience: Reduced downtime and risk

IDC reports a 90% reduction in unplanned downtime. That’s a business continuity imperative. In sectors like healthcare, manufacturing, and financial services, downtime can mean lost revenue, reputational damage, or worse. By modernizing on Azure, organizations ensure their critical systems are available and resilient, supporting business continuity and customer trust.

ROI and cost efficiency: Tangible business impact

IDC found a 344% three-year ROI and a 14-month payback period. Organizations also reported $902,700 in annual benefits per migrated and modernized application. These gains come from reduced infrastructure costs, improved performance, and better resource utilization. For business leaders, this means that investments in Azure modernization are not just cost centers, they are profit drivers. Rapid payback frees up capital for further innovation, while high ROI demonstrates that modernization is a growth strategy, not just an IT upgrade.

Modernization is a growth engine. The data proves it, and the leaders are already moving.

IDC found that organizations modernizing with Azure saw a 47% improvement in IT agility. That’s not just faster systems, its faster strategy, faster pivots, and faster innovation cycles. The business case is clear: the sooner you modernize, the sooner you unlock the full value of AI.

The future belongs to those who can scale AI responsibly. And that future starts with a modern cloud foundation.

Modernizing with Azure helps you and your team navigate the AI transformation with confidence—building a resilient, secure, and intelligent digital core that empowers your teams, delights your customers, and positions your organization as a leader in the era of AI.

To learn more about how your organization can modernize with Azure, check out the latest App Modernization Guidance and get in touch with your Microsoft expert.

Get the App Modernization Guidance

The post Building the Frontier Firm with Microsoft Azure: The business case for cloud and AI modernization appeared first on Microsoft Azure Blog.
Quelle: Azure