Azure Files Entra-Only identities: Advancing cloud-native identity and security

We are excited to announce the general availability (GA) of Entra-Only identities for Azure Files SMB. With native Microsoft Entra ID authentication, organizations can now grant secure, identity-based access to SMB file shares using cloud-only identities.

Enable Microsoft Entra Kerberos authentication

This means no Active Directory, hybrid sync, or managed domain controllers required, significantly simplifying architecture while reducing ongoing management and maintenance costs. Entra-Only identities elevate Azure Files with a highly integrated, modern identity experience—delivering a leading, best-in-class standard for secure, seamless and comprehensive cloud native access.

As customers look to migrate to Azure Files, reliance on on-premises Active Directory authentication has been seen as a key blocker to a complete cloud-native experience. Entra-Only identities support for Azure Files SMB removes that blocker, enabling organizations to authenticate users and devices directly through Microsoft Entra ID, helping modernize storage, compute and identity, while aligning with Zero-Trust principles.

Entra-Only identities enable seamless virtual desktop infrastructure (VDI) profile management on Azure Files while meeting modern security standards. In Azure Virtual Desktop (AVD), built-in B2B support extends this further, allowing external partners to use their existing identities with FSLogix profiles, without creating duplicate accounts.

For general-purpose scenarios, this unlocks migration of on-prem Windows-based workloads to a fully cloud-native platform, retaining native SMB compatibility while delivering a highly integrated identity, security, and management experience. Users can securely access files from anywhere without domain setup, VPNs, or complex networking requirements. Together, these capabilities help organizations reduce operational complexity while strengthening their security posture.

Why choose Entra-Only identities with Azure Files

Modern, cloud-native identity with simplified operations. Access to Azure Files is secured using native Entra ID authentication with client-side Intune integration, eliminating overhead of identity lifecycle maintenance and compliance, VPNs, and hybrid sync—simplifying deployment, reducing maintenance overhead, and streamlining management.

Co-existence with hybrid identities setup. Organizations with a mix of hybrid and cloud-native identities can use this feature concurrently while in the journey to retire active directory.

Secure access from anywhere. Users can access file shares via Entra-joined clients, enabling seamless remote work without duplicating identities.

Extended support to MacOS clients (limited preview). Secure file share access is extended to modern MacOS clients, Entra-joined via Platform SSO, enabling creative and cross-platform workloads to integrate with Azure Files using Entra-based identity.

What’s new with Entra-Only identities

Portal-based NTFS permissions management: Granular file and directory ACLs for Entra-Only (and hybrid) users and groups can be configured directly from the Azure portal, eliminating the need for domain-joined clients or legacy tools. This is now available for all users across all regions.

Expanded RBAC support for secure authorization: Adding share-level RBAC for specific users and groups is now available for Entra only users and groups in limited regions. For regional availability, please check here.

How Entra-Only identities work with Azure Files

This feature modernizes SMB authentication by making Microsoft Entra ID the primary Kerberos Key Distribution Center (KDC). Clients authenticate directly with Microsoft Entra ID to obtain Kerberos tickets for cloud identities, eliminating the need for Active Directory or Entra Connect sync. While the SMB protocol remains unchanged for compatibility, ticket issuance and identity validation are completely handled by Entra.

How it works:

When accessing the file share, the client requests a Kerberos ticket from Entra ID for Azure Files.

This ticket, containing cloud-based security identifiers (SIDs), is presented during the SMB session setup.

Azure Files validates the ticket and establishes the session—enabling secure, identity-based access. Authorization continues to use NTFS ACLs, now extended to Entra-Only users and groups. Permissions can be managed directly in the Azure portal, removing reliance on domain-joined clients or legacy tools.

Together, this preserves Kerberos security and scale while shifting identity control entirely to Entra, enabling a clean transition to cloud-native file access.

Hero workloads modernized with Entra-Only identities

Re-imagining VDI deployments with Azure Files and Entra-Only identities

Entra-Only identities simplify and modernize VDI deployments with Azure Files by enabling a fully cloud-native identity, compute and storage stack for user profile management. In Azure Virtual Desktop (AVD), FSLogix profile containers can be stored on Azure Files Premium and accessed using Microsoft Entra-based users via Kerberos, preserving secure, seamless SMB access.

Why this matters:

It removes dependencies on hybrid identity infrastructure.

It simplifies deployments.

It reduces operational overhead, especially for distributed or remote workforces.

With Entra ID as the authentication authority, users can sign in to their virtual desktops and access profiles using cloud-native identities, enabling end-to-end single sign-on without line-of-sight to on-premises systems.

By adopting Entra-Only identity access with Azure Files, WTW has been able to deliver insurance applications to customers on AVD using their existing Entra identities. FSLogix profile containers stored on Azure File Shares ensure users receive a consistent profile experience across any AVD host they connect to. This solution removes the dependency on legacy domain controllers and file share infrastructure, replacing it with a fully Entra-joined environment backed by AVD hosts and Azure File Shares—resulting in a more secure, streamlined, and less complex architecture.
—Gordon Griffin, Technical Director, Willis Tower Watson

B2B identities support further extends VDI scenarios by allowing external users to access desktops, loading their profiles securely using existing identities. Together, this enables organizations to deliver a consistent, scalable, and secure VDI experience while accelerating their transition to a fully cloud-native architecture.

Entra-Only identities with Azure Files mark a major step forward in simplifying and securing modern desktop and application environments. By enabling Kerberos-based Entra user access, we can deliver a truly cloud-native experience for our customers, with identity, compute and storage all in Azure, while maintaining seamless SMB compatibility. This significantly reduces deployment complexity and allows organizations to adopt secure, scalable VDI and file access solutions faster than ever before.
—Chuck Mikuzis, Product Manager, Nerdio

Watch: Azure Files Native Entra ID support with AVD FSLogix

Simplifying file sharing for the modern workforce

Entra-Only identities streamline general-purpose file sharing and information worker (IW) collaboration. Access to shared folders is governed directly through Entra ID, enabling consistent, identity-driven access across distributed teams without requiring domain-joined devices or network connectivity to on-premises infrastructure.

This simplifies onboarding and day-to-day operations—new users can be granted access through Entra groups, and permissions are enforced consistently across locations. Combined with NTFS ACL portal support, organizations can maintain familiar file-level security while modernizing their access model.

The result:

Faster onboarding.

Reduced helpdesk overhead.

Seamless collaboration across geographies.

Seamless cloud native access for remote and distributed energy workforces

Entra-Only identities enable oil and gas organizations to securely access critical datasets from remote and field locations without relying on complex multi-domain/multi-forest Active directory configuration or hybrid infrastructure. Engineers and geoscientists working across offshore rigs, exploration sites, and global offices can authenticate directly with Entra ID and access Azure Files, eliminating VPN dependencies and improving reliability in low-connectivity environments.

This approach simplifies deployment and operations while maintaining enterprise-grade security and compliance. Combined with support for thin clients and remote access, teams can collaborate in real-time on large datasets without managing distributed infrastructure.

Continued investments in Azure Files identity

Secure Entra-native application access with Managed Identities (GA)

Managed Identities support brings Entra-native application access to Azure Files, removing the need for shared keys or secrets. Applications, virtual machines, or Azure services use Managed Identities with Entra-issued OAuth tokens establishing secure SMB sessions, reducing credential sprawl and simplifying access. This helps simplify DevOps workflows and enables scalable integration across Azure Kubernetes Service (AKS) and enterprise applications.

Bringing secure, cloud-native access to MacOS workloads (limited preview)

Secure Azure Files support over MacOS clients allows creative design teams and educational institutions to work seamlessly across operating system (OS) platforms with un-interrupted access. Designers, media professionals, and higher education professionals can authenticate directly with Entra ID and access SMB file shares, aligning Mac workflows with the same enterprise-grade identity used organization-wide.

Sign up for limited preview

What’s next with Azure Files Entra-Only Identities

Native NTFS ACL editing experience

We are continuing to enhance the permissions management experience by bringing native support for editing NTFS ACLs directly through familiar client workflows. This closes a key gap between cloud and traditional file server environments, enabling administrators and users to manage fine-grained file and directory permissions using the same tools and experiences they rely on today.

Adding support in sovereign cloud environments

We are working to expand Entra-Only identities for Azure Files to sovereign cloud regions, enabling organizations in highly regulated environments to adopt cloud-native identity for SMB workloads. This unlocks the same benefits of SMB Kerberos-based authentication, and centralized identity management, while meeting compliance and enterprise grade regulatory requirements.

Get started with Entra-Only identities and other Azure Files investments

Entra-Only identities for Azure Files SMB is generally available today, supported across HDD and SSD shares and all billing models, at no additional cost. Explore our documentation for step-by-step guidance. Make your workload ready for the future!

For questions on enabling on MacOS platforms, please register here. For other questions, reach out to azurefiles@microsoft.com.

Explore the documentation
Enable Microsoft Entra Kerberos authentication for hybrid and cloud-only identities (preview) on Azure Files.

Read now

The post Azure Files Entra-Only identities: Advancing cloud-native identity and security appeared first on Microsoft Azure Blog.
Quelle: Azure

Azure IaaS: Deploy high-performance workloads with a system-level approach

In this article

Rethinking performance in the cloudAccelerating AI workloads with system-level performanceScaling cloud-native applications without sacrificing performanceSustaining performance for business-critical systemsPerformance as a coordinated systemPractical guidance: Optimizing for your workloadBuild on a foundation designed for performance

This blog post is the third part of a blog series called Azure IaaS which will share best practices and guidance to help you build a trusted infrastructure platform—from performance, resiliency, and security to scalability and cost efficiency.

Performance has become one of the most defining factors in how applications succeed or fail in the cloud. Whether you’re training AI models, scaling a Kubernetes platform, or running a business-critical database, performance is no longer a single decision about CPU, storage, or networking. It’s the outcome of how all three work together and requires a system-level approach.

Learn how Azure IaaS delivers system-level performance

Many organizations still approach performance by provisioning more resources—larger virtual machines (VMs), faster disks, or higher network bandwidth. But modern workloads don’t behave predictably enough for that strategy to hold. Bottlenecks shift dynamically. A database may be constrained by storage latency at one moment and network bandwidth shortly after that. An AI pipeline may stall not because of compute limitations, but because data cannot move fast enough between nodes.

This is why performance in the cloud has evolved from a resource-level concern to a system-level challenge. And it’s why Azure approaches performance differently, engineering it into the platform so customers can achieve consistent, scalable outcomes without manually tuning every layer.

Rethinking performance in the cloud

Performance today is not just about peak speed. It’s about consistency, scalability, and responsiveness under real-world conditions.

For customers, that means evaluating performance across multiple dimensions:

Latency—including tail latency (P99/P99.9), which directly impacts user experience.

Throughput—or how much work can be completed over time.

Scalability—the ability to maintain performance as demand increases.

Consistency—ensuring performance doesn’t degrade unpredictably under load.

Equally important is time-to-performance, how quickly infrastructure can be provisioned, scaled, or recovered. In many cases, how fast you can respond to change matters just as much as how fast your system runs.

Azure IaaS brings these dimensions together, aligning compute, storage, and networking capabilities to the needs of specific workloads. The result is performance that is delivered as a coordinated system, not assembled from isolated components.

Accelerating AI workloads with system-level performance

AI workloads are among the most demanding environments for performance. Training and inference pipelines require massive parallel compute, high-throughput data access, and low-latency communication between distributed components.

In these scenarios, performance is only as strong as the weakest layer. Azure addresses this by optimizing the full data path.

Compute efficiency through platform acceleration

Azure Boost helps improve VM performance by offloading storage and networking processing from the host CPU to dedicated hardware and software components. This helps reduce hypervisor overhead and helps free up compute cycles for model training and inference, improving both throughput and latency consistency.

High-throughput storage for sustained data access

AI workloads depend on continuous access to large datasets. Azure storage options are designed to help deliver sustained IO performance and help ensure that compute resources are not idle while waiting on data. Services like Azure Blob Storage and ADLS help deliver the high-throughput, low-latency, and massively scalable data foundation AI workloads need—enabling fast ingestion and retrieval of large datasets for training and inference. Their optimized parallel data access and seamless integration with AI tools help maximize compute utilization and help eliminate pipeline bottlenecks.

Low-latency, high-bandwidth networking

Distributed training requires rapid communication between nodes. Azure’s networking services, such as Azure ExpressRoute, help enable fast data movement across clusters, reducing synchronization delays and improving overall training efficiency. This can help prevent compute resources from sitting idle.

Together, these capabilities help ensure that performance improvements in compute are not constrained by storage or networking bottlenecks. This helps organizations to process more data and train models faster without unnecessary infrastructure overhead.

Scaling cloud-native applications without sacrificing performance

Cloud-native applications introduce a different kind of performance challenge. Instead of fixed workloads, they must handle unpredictable demand, scaling up and down dynamically while maintaining responsiveness.

Azure Kubernetes Service (AKS) helps provide the foundation for this elasticity, enabling workloads to scale horizontally across nodes. But compute scaling alone is not enough; stateful services must scale with the same level of performance.

This is where Azure’s integrated approach becomes critical.

Dynamic, high-performance storage for Kubernetes

Azure Container Storage enables AKS workloads to consume local NVMe disks through Kubernetes-native provisioning. This helps remove the need for manual disk configuration while delivering sub-millisecond latency and high IOPS for stateful services.

Production-ready data platforms on Kubernetes

With tools like CloudNativePG, organizations can run PostgreSQL and other databases directly on AKS with built-in high availability, failover, and backup capabilities without sacrificing performance. Adding flexible data access across both file and object storage further enhances this foundation, enabling applications to use the most appropriate storage interface for their needs while simplifying data movement and recovery across environments.

Low-latency service communication

Microservices architectures depend on frequent communication between components. Using eBPF host routing in Cilium, Advanced Container Networking Services improves datapath efficiency by reducing latency and increasing throughput, enabling high-performance communication across large-scale microservices environments. Azure’s networking helps ensure interactions remain fast and consistent, helping to prevent inter-service latency from becoming a bottleneck.

The result is a platform where both stateless and stateful workloads can scale dynamically while maintaining performance. And because resources can be provisioned and scaled on demand, organizations benefit from improved cost efficiency, paying only for what they use while maintaining application responsiveness.

Sustaining performance for business-critical systems

For business-critical workloads (enterprise databases, SAP environments, and transactional systems), performance is not just about speed. It’s about predictability and reliability.

These systems must deliver consistent performance under sustained load, often with strict latency and availability requirements. Variability, even at the margins, can have significant business impact.

Azure addresses this through precise control and platform-level optimization.

Consistent compute performance

Azure helps deliver consistent compute performance through purpose-built VM architectures, intelligent placement, and platform-level orchestration. Virtual Machine Scale Sets (VMSS) automatically distribute and scale workloads across fault and update domains, helping maintain predictable performance under changing demand. Azure further enhances consistency with Azure Boost, which offloads virtualization and I/O processing to dedicated hardware, reducing contention and improving efficiency.

Tunable storage performance

Azure Ultra Disk and Premium SSD v2 allow customers to independently configure capacity, IOPS, and throughput. This decoupling helps enable precise alignment of storage performance to workload requirements, avoiding both underperformance and unnecessary cost.

In addition to tunable block storage options like Ultra Disk and Premium SSD v2, Azure also offers highly durable object and file storage services—such as Azure Blob Storage and Azure Files—that provide geo-redundancy and long-term data protection for unstructured data and shared workloads, complementing performance tuning with enterprise-grade durability and scale.

Reliable, low-latency networking

Consistent communication between application tiers is essential for transactional systems. Azure’s networking infrastructure helps ensure that latency remains low and predictable across environments through features such as Accelerated Networking, which reduces network latency by bypassing the virtual switch path, and proximity placement groups, which keep latency-sensitive workloads physically close together within the datacenter. Combined with Azure Boost, which offloads networking processing to dedicated hardware, and support for high-bandwidth, multi-NIC configurations on optimized VM series, these capabilities help enable fast, deterministic data movement and help maintain consistent application performance at scale.

Faster recovery

Performance also includes how quickly systems can recover. Instant Access Snapshots help enable disks to be restored immediately—without waiting for data hydration—reducing downtime and accelerating recovery from failures.

This is complemented by Azure Backup fast restore capabilities, which further shorten restore times, while zone-redundant storage (ZRS) maintains data across availability zones to reduce the impact of localized disruptions.

For broader incidents, Azure Site Recovery orchestrates failover across regions to rapidly bring workloads back online. Together, these capabilities are enhanced by Azure Disk incremental snapshots, which capture only changed data to reduce recovery point objectives (RPO) with minimal overhead, enabling faster, more efficient recovery across scenarios.

This combination helps ensure that performance is maintained not only during normal operations, but also during peak demand and recovery scenarios—where it matters most.

Performance as a coordinated system

Across AI, cloud-native, and business-critical workloads, a clear pattern emerges: performance is not achieved by optimizing a single component in isolation.

Instead, it depends on how compute, storage, and networking are tailored in tandem for the workload at hand.

This alignment helps reduce bottlenecks and helps ensure that improvements in one area are reinforced by capabilities in others. It also simplifies operations, allowing teams to focus on workload design and business outcomes rather than infrastructure tuning.

Practical guidance: Optimizing for your workload

While Azure provides a strong foundation, achieving optimal performance still requires aligning infrastructure choices to workload needs:

For AI workloads, prioritize balanced throughput across compute, storage, and networking to avoid idle resources and maximize efficiency.

For cloud-native applications, design for horizontal scaling and leverage Kubernetes-native storage to maintain performance for stateful services.

For business-critical systems, focus on consistency and predictability, using tunable storage and optimized compute to meet strict performance requirements.

Across all scenarios, evaluate performance holistically and leverage platform capabilities to reduce overhead and simplify optimization.

Build on a foundation designed for performance

Performance directly impacts every aspect of your application—from user experience to operational efficiency and the ability to scale innovation.

By integrating compute, storage, and networking into a cohesive platform, Azure enables organizations to deliver high performance across their most demanding workloads—without the complexity of managing each layer independently.

Explore how Azure delivers performance across AI, cloud-native, and business-critical workloads.

To go deeper, explore the Azure IaaS Resource Center for tutorials, best practices, and guidance across compute, storage, and networking to help you design and operate resilient infrastructure with greater confidence.

Create a resilient infrastructure with Azure
Visit the Azure IaaS Resource Center to start building a stronger, more efficient infrastructure today.

Get started with Azure

Did you miss these posts in the Azure IaaS series?

Explore new resources for building a stronger, more efficient infrastructure

Keep critical applications running with built-in resiliency at scale

Defense in depth built on secure-by-design principles

The post Azure IaaS: Deploy high-performance workloads with a system-level approach appeared first on Microsoft Azure Blog.
Quelle: Azure

Azure NetApp Files for EDA workloads: From revolution to breakthrough at scale

Last year, we outlined how Azure NetApp Files helped reshape silicon design by delivering the low-latency, high-throughput storage required for Electronic Design Automation (EDA) workloads at cloud scale. Since then, we have continued to extend performance and scalability. Today, we are advancing that progress with another significant step forward.

Modern semiconductor design is defined by scale. Thousands of concurrent EDA jobs spanning simulation, synthesis, and verification run continuously against shared datasets, where even small variations in storage latency can ripple across entire design cycles. For many teams, this has historically limited how far EDA workflows could scale in the cloud.

That constraint is now changing.

Azure NetApp Files (ANF) is redefining what is possible for EDA in the cloud by delivering predictable, high-performance shared storage at massive concurrency. With new independent benchmark results and growing adoption by leading semiconductor companies, Azure is establishing itself as a viable—and in many cases superior—platform for modern EDA environments.

See how Azure NetApp Files delivers scalable, high-performance storage for EDA

Why EDA storage has been difficult to scale in the cloud

EDA workloads combine three characteristics that have traditionally challenged cloud storage architectures:

Extremely high concurrency, with thousands of jobs accessing shared file systems simultaneously.

Strict latency sensitivity, where even minor delays reduce compute efficiency and extend runtimes.

Intensive shared data access patterns, creating contention under load.

While cloud compute scales easily, shared storage has often introduced variability that limits overall system efficiency. As concurrency increases, storage becomes the bottleneck, impacting regression cycles, increasing tool license costs, and slowing time to tape-out.

For EDA teams evaluating cloud transformation, the central question has remained consistent: can storage scale with compute while maintaining predictable performance?

A modern approach: Azure NetApp Files for EDA at scale

Azure NetApp Files is designed specifically to address this challenge. Its architecture aligns directly to the requirements of highly parallel, shared workloads like EDA.

At its core, ANF enables independent scaling of compute and storage, so EDA clusters can grow without storage becoming the constraint, and additional compute nodes do not introduce hotspots or contention at the storage layer. It natively supports concurrent metadata operations at scale, handling the millions of small file interactions typical of EDA workflows without degradation. And its service-level performance model ensures that throughput and IOPS scale predictably with capacity, eliminating the need for complex tuning.

More recently, innovations such as large volumes and large volumes breakthrough mode have expanded the concurrency envelope even further. These capabilities allow thousands of parallel jobs to share a single storage environment while maintaining consistent latency under sustained load.

This delivers what cloud-based EDA systems have long struggled to provide: consistent, repeatable performance, not only at low utilization, but also under full production load.

Independent validation: SPECstorage® Solution 2020 benchmark results

To validate these capabilities in a real-world context, Azure NetApp Files was measured using the industry-standard SPECstorage® Solution 2020 EDA_BLENDED benchmark. This benchmark simulates realistic EDA workflows by combining metadata-intensive frontend operations with throughput-heavy backend processing, all under strict latency requirements.

The Azure NetApp Files large volume breakthrough mode scale configuration reached 17,280 SPECstorage® Solution 2020 EDA_BLENDED JOBS with an overall response time of 0.60 milliseconds (ms).

These results demonstrate several important characteristics:

The ability to sustain very high levels of concurrent EDA workloads.

Consistently low response times under load.

Linear scaling behavior as concurrency increases.

No requirement for overprovisioning.

Historically, top benchmark outcomes in this category have been associated with tightly integrated on-premises systems. This validation underscores a broader shift in the industry: when architected correctly, cloud-based EDA infrastructure can not only match on-premises approaches, but in some scenarios surpass them in both scale and operational efficiency.

Proven in production: EDA workloads already running on ANF

This performance is not limited to benchmarks. Organizations such as AMD and ASML are already using Azure NetApp Files to run EDA and high-performance design workloads in production environments.

These companies operate at the leading edge of semiconductor innovation, where infrastructure must support both extreme scale and precise predictability. Their adoption of ANF reflects a broader industry trend: moving EDA workloads to the cloud is no longer experimental, it is becoming a strategic advantage.

These customers, along with others, consistently report the same operational benefits:

The ability to increase regression concurrency without performance degradation.

Improved utilization of compute resources and reduced EDA tool license fees.

Greater predictability in design cycles, enabling more confident scheduling of key milestones.

In this context, storage is no longer the limiting factor—it becomes an enabler of scale.

How Azure helps EDA teams scale with confidence

Organizations have flexibility in how they deploy EDA environments with Azure NetApp Files, depending on workload characteristics and operational priorities.

Some teams choose a centralized model built around a single large volume to maximize throughput and tightly control latency. Others adopt a multi-volume approach to distribute workloads and scale concurrency across different job types. Many enterprises extend existing on-premises environments into Azure, using cloud capacity to absorb peak demand without permanent infrastructure expansion.

Across all of these patterns, one principle remains consistent: storage performance must scale predictably alongside compute. Azure NetApp Files provides that foundation.

Azure NetApp Files delivers the consistent, high‑throughput NFS performance that modern EDA workloads demand, shrinking runtimes, accelerating tape‑out schedules, and giving chip designers the confidence that storage will never be the bottleneck.
Srikanth Gubbala, Head of Global HPC Infrastructure, Applied Materials

Bringing it all together

The evolution of cloud storage for EDA marks an important inflection point for the semiconductor industry. What was once considered a tradeoff—scale versus predictability—is no longer a constraint.

With Azure NetApp Files, organizations can confidently run highly concurrent EDA workloads in the cloud, supported by architecture designed for their specific demands and validated by independent benchmarking.

For teams exploring how to modernize their EDA infrastructure, the path forward is increasingly clear. Cloud-based storage can now meet the requirements of even the most demanding design environments, while offering the flexibility to scale as workloads continue to grow.

For a deeper technical exploration of the benchmark configuration and design considerations, see the companion Azure Tech Community technical blog: “From scale to breakthrough: Azure NetApp Files sets a new cloud benchmark for EDA.”

For further information, explore the Azure NetApp Files documentation or email askanf@microsoft.com.

Scale high-performance EDA workloads with Azure NetApp Files
Discover how Azure NetApp Files delivers predictable, high-performance storage for EDA workloads, enabling massive concurrency, low latency, and consistent scaling in production.

Get started with ANF

The post Azure NetApp Files for EDA workloads: From revolution to breakthrough at scale appeared first on Microsoft Azure Blog.
Quelle: Azure

Powering multi-cluster workloads with seamless cross‑cluster networking for Azure Kubernetes Fleet Manager

In this article

The challenge of multi-cluster networkingOur vision: Multi-cluster management with seamless networkingStrategic resilience with cross-cluster networkingGetting started with cross-cluster networkingDocumentation and resources

As organizations modernize their application portfolios, we are witnessing a fundamental shift in how cloud-native infrastructure is architected. No longer is the question “How do we scale a cluster?” but rather “How do we scale across clusters, regions, and clouds while maintaining operational simplicity?”

Today, we are thrilled to announce the public preview of cross-cluster networking for Azure Kubernetes Fleet Manager. This capability represents the next evolution in multi-cluster management by introducing transparent E-W multi-cluster networking powered by Advanced Container Networking Services.

Learn more about cross-cluster networking for Azure Kubernetes Fleet Manager

The challenge of multi-cluster networking

Whether driven by regulatory requirements, regional disaster recovery, or the need to isolate blast domains, organizations of every size often run multiple Azure Kubernetes Service (AKS) clusters. However, managing these clusters has historically introduced a “networking tax.” Traditional approaches rely on complex VPNs, gateways, and manual service discovery, adding latency and operational complexity.

Even when operating just a few clusters and especially when operating large scale fleets of clusters, teams need consistent, reliable cross‑cluster connectivity to support scenarios like failover, shared services architectures, and seamlessly shifting workloads across regions for capacity or latency. At the same time, platform teams want to abstract infrastructure details from developers, enabling seamless cluster-level changes without disrupting applications.

Our vision: Multi-cluster management with seamless networking

In response to similar challenges, we built Azure Kubernetes Fleet Manager. Fleet Manager is designed to simplify multi-cluster Kubernetes for everyone. While Fleet Manager has already simplified workload propagation (deploying to many clusters) and update orchestration (safe, staged upgrades), the network remained a challenge.

With the introduction of Cilium-based cross-cluster networking in Azure Kubernetes Fleet Manager, we are delivering a managed, high-performance network that can span your entire fleet.

This capability extends the Kubernetes networking model across clusters, enabling services and workloads to communicate across cluster boundaries as if they were local, while preserving cluster-level isolation and governance.

Built on an open-source foundation, this capability uses Cilium for dataplane and Kubefleet for fleet-level orchestration, both active Cloud Native Computing Foundation (CNCF) projects. This ensures transparency, portability, and alignment with the broader Kubernetes ecosystem, while benefiting from continuous innovation from the open-source community.

The following diagram shows how clusters in a fleet are connected through a unified, managed network, enabling seamless communication, service discovery, and policy enforcement.

Key capabilities include:

Seamless east-west connectivity: Using eBPF-based routing with power of Azure CNI powered by Cilium and Advanced Container Networking Services, pods can communicate across clusters with native performance, no proxies or gateways required.

Global service discovery: With a simple annotation (service.cilium.io/global=true), a standard Kubernetes Service becomes “global.” Cross-cluster networking automatically discovers endpoints across joined member clusters, providing transparent load balancing and failover.

Multi‑cluster observability: Gain a unified view of network health across clusters with aggregated metrics, logs, and flow visibility. Advanced Container Networking Services integrates Cilium telemetry to provide consistent insights, faster troubleshooting, and end‑to‑end visibility across the fleet.

Unified security and governance: Security policies are no longer confined by cluster boundaries. Through Advanced Container Networking Services, you can now enforce enterprise-grade network policies and gain deep observability across your entire global footprint, ensuring identity-based security follows your workloads wherever they run.

Zero-touch management: Fleet Manager handles the complex lifecycle, managing certificates, and network configurations, so you don’t have to.

These capabilities are using eBPF to enable efficient routing, policy enforcement, and observability for high-performance networking

Strategic resilience with cross-cluster networking

In a digital-first economy, resilience is a competitive advantage. Cross-cluster networking enables customers to build architectures that are inherently resilient to single-cluster or single-region failures.

Whether you are running “Shared Services” clusters to support hundreds of tenants or building “Global Services” that route traffic to the healthiest available endpoints, cross-cluster networking for Azure Kubernetes Fleet Manager ensures your infrastructure is as agile as your business needs.

We are committed to providing the most robust, secure, and performant platform for multi-cluster environments. Cross-cluster networking is a big step towards a future where the physical boundaries of a cluster no longer limit the innovation within it.

Getting started with cross-cluster networking

Cross-cluster networking for Azure Kubernetes Fleet Manager is designed to minimize operational complexity:

Prerequisites for your clusters:

Azure CNI powered by Cilium as the networking dataplane.

Advanced Container Networking Services enabled.

Set up cross-cluster networking:

Join clusters to a Fleet.

Associate the members with a cross-cluster network profile.

Deploy services with global annotations to enable cross-cluster communication.

Once configured, Fleet Manager automatically deploys and manages the required components, enabling direct pod-to-pod communication across clusters without additional gateways or overlays.

This managed approach removes the burden of setting up and maintaining Cilium multi-cluster components manually, allowing teams to focus on application delivery instead of infrastructure management.

See it in action: Watch the Cross-Cluster Networking for Azure Kubernetes Fleet Manager Video Guide to learn more and see a quick demo.

Documentation and resources

Cross-cluster networking for Azure Kubernetes Fleet Manager

How to enable cross-cluster networking for Azure Kubernetes Fleet Manager

Azure CNI powered by Cilium

Advanced Container Networking Services

If you have feedback or would like to learn more, reach out to your Microsoft account team or share feedback through the Azure Kubernetes Service community channels, we would love to hear from you!

Get started with cross-cluster networking

The post Powering multi-cluster workloads with seamless cross‑cluster networking for Azure Kubernetes Fleet Manager appeared first on Microsoft Azure Blog.
Quelle: Azure