H&R Block’s blockbuster data strategy

If you think your company isn’t making the most of its data, you’re probably right.

Just ask H&R Block. In 2019, before most people began sheltering in place due to COVID, the company rolled out a new data platform that would allow it to digitally serve the 13 million people who usually get their taxes done via tax prep experts at one of the company’s 10,000-plus retail offices.

"The upgrades we made came just in time because otherwise, we would have faced a much bigger problem," says Aditya Thadani, the company's vice president in charge of digital transformation and enterprise architecture.

H&R Block’s new data strategy, based on Microsoft’s Intelligent Data Platform, has been an essential ingredient in possibly the most sweeping strategic expansion in the company’s 67-year history. Rather than just focusing on tax-preparation services—a large percentage of the company’s revenues are booked in the two weeks before Tax Day—the company has become a player in the much larger market for small business services and has introduced a mobile banking app for consumers.

As a result, H&R Block is now a Wall Street darling. As of January 2023, the company’s share price had nearly tripled since the new data strategy was put in place.

While the timing may have been fortunate, the story nonetheless shows how a well-executed data modernization strategy can have a massive impact on everything from the speed of innovation to customer experience to bottom-line results.

From obstacle to enabler

The seeds of this resurgence were planted in 2017, when new CEO Jeff Jones began thinking about ways to leverage H&R Block’s reputation and retail presence to establish a broader, deeper relationship with clients throughout the year. Why stop at tax help, when small business owners who were short on time and sometimes financial expertise also needed help with payroll, invoicing, and strategy? Why not empower its 8 million "under-banked" clients—typically less-affluent people who don’t have, or don’t take full advantage of, a bank account—to participate more fully in the digital economy?

"We wanted to serve the client the way they want, where they want, but could we really?"

But there was a problem. The company’s previous approach to data was not up to the job. In fact, the walk-in business, which serves 13 million clients, and the online software business, which serves an additional 7 million, felt like two different companies. While the company has offered an online service since 2004, if a "DIY" client who did their taxes online walked into a retail location for some extra help, the tax pros there would have no record that they existed. To get help, they would have had to begin a new tax return from scratch, starting with their name and address. Similarly, walk-in clients couldn’t easily update or continue working on their tax return online once they got home.

"We tried to serve the client the way they want, where they want, but could we really?" says Thadani. "We had two operating units running on two different technology stacks that didn't talk to each other. The information was completely siloed, introducing friction as clients switched from one channel to another."

In early 2018, Jones approved investments to make IT a catalyst, rather than an obstacle, to fully realizing this vision.

A new data platform strategy

Step one was obvious, but formidable: to transform the company's underlying architecture from on-premises technology running in the company's Kansas City data center to the cloud. Over an eight-month period, a team led by Sameer Agarwal, IT director of data platforms, migrated a million lines of code running on legacy AS400 and Netezza appliances to Microsoft SQL Server, and consolidated five racks of data appliances into a single rack of servers to handle applications that still needed to run on-premises. Today, 75 percent of the company's workload is managed on Microsoft Azure, and Agarwal expects that to rise to 90 percent by mid-2023.

"I've never seen a company like ours move this fast," says Agarwal, who joined H&R Block in 2007 after years as a consultant with Tata Consultancy Systems.

Because of its intrinsic scalability, the cloud was an obvious cost-saving choice for a company that does so much of its business during just a few weeks of the year. But management's focus was less on short-term cost savings and more on unleashing innovation that could provide year-round value to clients. To that end, Thadani's team insisted on creating a unified architecture that brought all of that siloed data and business intelligence into a single platform.

"Management’s focus was less on short-term cost savings and more on unleashing innovation that could provide year-round value to clients."

The team made fast progress, and by 2019 the benefits were becoming evident internally. With an architecture built around cloud-native technology, business unit leaders found they could use data from across multiple products to quickly convert Jones's high-level vision into crisp business insights and execution. In the past, it could take several months just to get IT to incorporate a new type of data into reports—say, to study the impact a new digital service was having on the walk-in business. Now they can deliver it in days, says Thadani.

Building "the Block experience"

The strongest evidence of the changes began showing up in the products themselves. In 2018, the company rolled out a new price preview feature that let filers know up front how much their return would cost, rather than have the price change based on the number of tax forms required. Later that year, it introduced a number of hybrid options. One service lets clients have a tax pro do their return, without their having to even go to the office; another one lets DIY filers have a tax expert check their work and then sign and file the return on their behalf; a third lets clients leave their documents with a preparer at an H&R Block office and go home.

"In the past, it could take several months just to get IT to incorporate a new type of data into reports. Now we can deliver it in days."

Since then, the company's product developers have continued to let clients use whatever mix of virtual and human help serves their needs—a vision that was tailor-made to serve clients after the pandemic hit. For example, many clients that had grown used to filing their taxes online suddenly had questions about how to account for COVID stimulus checks. Thanks to the new architecture, the company was able to quickly create an online assistance add-on service that lets clients ask such questions for an extra $20. And in January 2022, it introduced a mobile banking app just six months after development began. Thadani thinks it would have taken considerably longer if the old architecture were still in place.

The benefits go far beyond speed. In the old approach, Thadani's team lacked the in-house data science skills or tools to make use of machine learning and other emerging technologies. Now, by taking advantage of the capabilities in Azure, the company has streamlined the data-gathering process so that the company's more than 90,000 tax pros spend less time on routine work like determining which of the tax documents will be required to complete a return, and more time advising clients.

Originally, this was viewed as a productivity-enhancing process improvement. But it is also making the company's clients more satisfied. The use of online capabilities by traditional walk-in clients tripled in the 2022 tax year. In early October 2022, the company announced a business formation service to help entrepreneurs choose the right corporate structure, whether on their own or with the help of an H&R adviser.

The great unlearning

Thadani believes an increased appreciation for the power of data has set off a cultural shift inside the IT organization that will continue to pay dividends in the coming years. The move to the cloud and away from high-maintenance legacy systems means engineers spend less time worrying about technical details and more on how to use technology to solve business problems. "It makes their jobs more fun and allows them an opportunity to have a bigger impact on our business," he says.

Looking back, Thadani says the biggest challenge wasn’t getting people to learn new tricks but to unlearn old ones.

"When you have a company with a long history and people with long tenures, all that experience sometimes gets in the way," he says. "We have people who are very good at their jobs and are used to doing things the same way for years. Now we're offering them a new way to do it. That's not always easy."

While it may not be easy, it certainly is becoming more compelling, as the company's tax pros come to understand how much more expansive their business relationships with clients can be.

"We are having deeper, richer conversations with our clients about their overall financial health, as opposed to simply saying, 'Here are your tax returns,'" says Thadani. "And that’s only possible because we now have a full view of every client."

Learn more about how H&R Block modernized its data platform with Azure

If you’d like to hear more on this Azure customer story, watch the H&R Block Learn from the Leaders: Optimize your Data Estate webinar
Quelle: Azure

Microsoft joins the FinOps Foundation

In today’s economic times, the criticality of cost efficiency is at an all-time high. Organizations need high-quality guidance backed by products and services that help you achieve and maintain that efficiency. This is a large part of what we do today within the Cost Management team and the larger Commerce organization here at Microsoft. In that vein, we are excited to announce that Microsoft has joined the FinOps Foundation as a premier member and has joined the Governing Board defining the strategy and vision of the organization. Together, we can deliver unparalleled guidance and innovative solutions that empower organizations to increase efficiency and accelerate growth.

"I’m very enthusiastic about our partnership with the FinOps Foundation and our membership as part of the FinOps community. Optimizing cloud workloads is more important than ever for companies of all sizes in all industries. For Microsoft this collaboration with the FinOps Foundation and our industry partners will empower Microsoft Cloud customers and partners to leverage the cost management best practices and industry-standard operating procedures cultivated by the FinOps community." —Vivek Dalvi, Corporate Vice President, Commerce Platform and Experiences

What is the FinOps Foundation?

The FinOps Foundation is a non-profit organization hosted at the Linux Foundation dedicated to advancing people who practice the discipline of cloud financial management via best practices, education, and standards. The FinOps Foundation community is made up of practitioners around the world, including many of our valued Microsoft Cloud customers and partners. The FinOps Foundation hosts working groups and special interest groups covering topics like cost and usage data standardization, containers and Kubernetes, and sustainability based on real-world stories and expertise from the community.

“Microsoft is a bellwether technology leader who is aligned to our vision of accelerating the growth of FinOps practitioners with its presence, leadership, and innovation. We welcome Microsoft as a Premier Member as its membership will be a huge asset to the larger FinOps community and development and maturation of best practices across industries and the world.” —JR Storment, Executive Director of the FinOps Foundation.

Microsoft and the FinOps Foundation

My colleague, Jimin Li, joined the Foundation Governing Board in January and we’ve already begun participating in working groups and special interest groups, but that’s just the beginning. As we look toward our future as part of the FinOps Foundation, we’re exploring five primary focus areas over the coming months:

Defining specifications and evolving best practices

We are excited to join the FinOps Foundation and our industry partners in defining, evangelizing, and implementing best practices and specifications like the FinOps Open Cost and Usage Specification (FOCUS). We’re already actively contributing to this program and looking forward to sharing our joint developments broadly.

Aligning our collective guidance

We offer a wealth of guidance from architecture documentation, like Microsoft Cloud Adoption Framework and Azure Well-Architected Framework, to our products, like Microsoft Cost Management and Azure Advisor. While we tell the same underlying story as the FinOps Foundation, we believe the closer our guidance is aligned to the FinOps Framework, the easier it will be for individuals and organizations to understand and implement. As a first step, I’ve contributed to the second edition of the O’Reilly Cloud FinOps book.

Improving our products and services

Similar to how we plan to align our guidance, we see opportunities to align our products and services to the FinOps Framework while learning more about customer needs from the vibrant multicloud community of practitioners in the FinOps Foundation forums. We view this as an end-to-end experience for people and organizations to adopt and for many, that can sometimes start and end with the product itself. We aim to be good citizens in the community by contributing and listening.

Advancing training and certification programs

The FinOps Foundation offers several great training and certification programs, such as the FinOps Certified Practitioner and FinOps Certified Professional, geared towards helping people advance within their careers and growing the community at large. We look forward to working with the FinOps Foundation to improve material specifically focused on the Microsoft Cloud and to certify relevant Microsoft teams in FinOps.

 

Fun fact: Microsoft has the largest number of FinOps Certified Professionals at any organization in the world!

 

Engaging with the community

I’ve mentioned the FinOps Foundation community several times now, but I’m not sure I’ve really done it justice. With over 8,700 members and growing rapidly, the community behind the FinOps Foundation is truly the driving force of the success of the organization. We are extremely enthusiastic about this opportunity to collaborate with and learn from this passionate community as we engage in various programs and initiatives, like the upcoming FinOps X conference where Microsoft is a platinum sponsor. The more we learn, the better we can support you and help you achieve more.

What’s next?

We’re looking forward to the many exciting opportunities ahead of us as we partner with FinOps Foundation, seeking to make cost management and optimization—or "FinOps"—easier to adopt and implement within the Microsoft Cloud. We only scratched the surface here, so stay tuned by following Cost Management updates over the coming months.

To learn more about FinOps Foundation and to participate in the community, please join us at finops.org or in person at FinOps X in June.
Quelle: Azure

Microsoft Azure Security expands variant hunting capacity at a cloud tempo

In the first blog in this series, we discussed our extensive investments in securing Microsoft Azure, including more than 8500 security experts focused on securing our products and services, our industry-leading bug bounty program, our 20-year commitment to the Security Development Lifecycle (SDL), and our sponsorship of key Open-Source Software security initiatives. We also introduced some of the updates we are making in response to the changing threat landscape including improvements to our response processes, investments in Secure Multitenancy, and the expansion of our variant hunting efforts to include a global, dedicated team focused on Azure. In this blog, we’ll focus on variant hunting as part of our larger overall security program.

Variant hunting is an inductive learning technique, going from the specific to the general. Using newly discovered vulnerabilities as a jumping-off point, skilled security researchers look for additional and similar vulnerabilities, generalize the learnings into patterns, and then partner with engineering, governance, and policy teams to develop holistic and sustainable defenses. Variant hunting also looks at positive patterns, trying to learn from success as well as failure, but through the lens of real vulnerabilities and attacks, asking the question, “why did this attack fail here, when it succeeded there?”

In addition to detailed technical lessons, variant hunting also seeks to understand the frequency at which certain bugs occur, the contributing causes that permitted them to escape SDL controls, the architectural and design paradigms that mitigate or exacerbate them, and even the organizational dynamics and incentives that promote or inhibit them. It is popular to do root cause analysis, looking for the single thing that led to the vulnerability, but variant hunting seeks to find all of the contributing causes.

While rigorous compliance programs like the Microsoft SDL define an overarching scope and repeatable processes, variant hunting provides the agility to respond to changes in the environment more quickly. In the short term, variant hunting augments the SDL program by delivering proactive and reactive changes faster for cloud services, while in the long term, it provides a critical feedback loop necessary for continuous improvement. 

Leveraging lessons to identify anti-patterns and enhance security

Starting with lessons from internal security findings, red team operations, penetration tests, incidents, and external MSRC reports, the variant hunting team tries to extract the anti-patterns that can lead to vulnerabilities. In order to be actionable, anti-patterns must be scoped at a level of abstraction more specific than, for example, “validate your input” but less specific than “there’s a bug on line 57.” 

Having distilled an appropriate level of abstraction, variant hunting researchers look for instances of the anti-pattern and perform a deeper assessment of the service, called a “vertical” variant hunt. In parallel, the researcher investigates the anti-pattern’s prevalence across other products and services, conducting a “horizontal” variant hunt using a combination of static analysis tools, dynamic analysis tools, and skilled review.

Insights derived from vertical and horizontal variant hunting inform architecture and product updates needed to eliminate the anti-pattern broadly. Results include improvements to processes and procedures, changes to security tooling, architectural changes, and, ultimately, improvements to SDL standards where the lessons rapidly become part of the routine engineering system.

For example, one of the static analysis tools used in Azure is CodeQL. When a newly identified vulnerability does not have a corresponding query in CodeQL the variant hunting team works with other stakeholders to create one. New “specimens”—that is, custom-built code samples that purposely exhibit the vulnerability—are produced and incorporated into a durable test corpus to ensure learnings are preserved even when the immediate investigation has ended. These improvements provide a stronger security safety net, helping to identify security risks earlier in the process and reducing the re-introduction of known anti-patterns into our products and services.

Azure Security's layered approach to protecting against server-side threats

Earlier in this series, we highlighted security improvements in Azure Automation, Azure Data Factory, and Azure Open Management Infrastructure that arose from our variant hunting efforts. We would call those efforts “vertical” variant hunting.

Our work on Server-Side Request Forgery (SSRF) is an example of “horizontal” variant hunting. The impact and prevalence of SSRF bugs have been increasing across the industry for some time. In 2021 OWASP added SSRF to its top 10 list based on feedback from the Top 10 community survey—it was the top requested item to include. Around the same time, we launched a number of initiatives, including:

Externally, Azure Security recognized the importance of identifying and hardening against SSRF vulnerabilities and ran the Azure SSRF Research Challenge in the fall of 2021.
Internally, we ran a multi-team, multi-division effort to better address SSRF vulnerabilities using a layered approach.
Findings from the Azure SSRF Research challenges were incorporated to create new detections using CodeQL rules to identify more SSRF bugs.
Internal research drove investment in new libraries for parsing URLs to prevent SSRF bugs and new dynamic analysis tools to help validate suspected SSRF vulnerabilities.
New training has been created to enhance prevention of SSRF vulnerabilities from the start.
Targeted investments by product engineering and security research contributed to the creation of new Azure SDK libraries for Azure Key Vault that will help prevent SSRF vulnerabilities in applications that accept user-provided URIs for a customer-owned Azure Key Vault or Azure Managed HSM.

This investment in new technology to reduce the prevalence of SSRF vulnerabilities helps ensure the security of Azure applications for our customers. By identifying and addressing these vulnerabilities, we are able to provide a more secure platform for our customers on which to build and run their applications.

In summary, Azure has been a leader in the development and implementation of variant hunting as a method for identifying and addressing potential security threats. We have hired and deployed a global team focused exclusively on variant hunting, working closely with the rest of the security experts at Microsoft. This work has resulted in more than 800 distinct security improvements to Azure services since July 2022. We encourage security organizations all over the world to adopt or expand variant hunting as part of your continuous learning efforts to further improve security.

Learn more about Azure security and variant hunting

Read the first blog in this series to learn about Azure’s security approach, which focuses on defense in depth, with layers of protection throughout all phases of design, development, and deployment of our platforms and technologies.
Learn more about the out-of-the-box security capabilities embedded in our cloud platforms.
Register today for Microsoft Secure on March 28 to view our session covering built-in security across the Microsoft Cloud.

Quelle: Azure

Microsoft and NVIDIA experts talk AI infrastructure

This post has been co-authored by Sheila Mueller, Senior GBB HPC+AI Specialist, Microsoft; Gabrielle Davelaar, Senior GBB AI Specialist, Microsoft; Gabriel Sallah, Senior HPC Specialist, Microsoft; Annamalai Chockalingam, Product Marketing Manager, NVIDIA; J Kent Altena, Principal GBB HPC+AI Specialist, Microsoft; Dr. Lukasz Miroslaw, Senior HPC Specialist, Microsoft; Uttara Kumar, Senior Product Marketing Manager, NVIDIA; Sooyoung Moon, Senior HPC + AI Specialist, Microsoft.

As AI emerges as a crucial tool in so many sectors, it’s clear that the need for optimized AI infrastructure is growing. Going beyond just GPU-based clusters, cloud infrastructure that provides low-latency, high-bandwidth interconnects, and high-performance storage can help organizations handle AI workloads more efficiently and produce faster results.

HPCwire recently sat down with Microsoft Azure and NVIDIA’s AI and cloud infrastructure specialists and asked a series of questions to uncover AI infrastructure insights, trends, and advice based on their engagements with customers worldwide.

How are your most interesting AI use cases dependent on infrastructure?

Sheila Mueller, Senior GBB HPC+AI Specialist, Healthcare & Life Sciences, Microsoft: Some of the most interesting AI use cases are in-patient health care, both clinical and research. Research in science, engineering, and health is creating significant improvements in patient care, enabled by high-performance computing and AI insights. Common use cases include molecular modeling, therapeutics, genomics, and health treatments. Predictive Analytics and AI coupled with cloud infrastructure purpose-built for AI are the backbone for improvements and simulations in these use cases and can lead to a faster prognosis and the ability to research cures. See how Elekta brings hope to more patients around the world with the promise of AI-powered radiation therapy.

Gabrielle Davelaar, Senior GBB AI Specialist, Microsoft: Many manufacturing companies need to train inference models at scale while being compliant with strict local and European-level regulations. AI is placed on the edge with high-performance compute. Full traceability with strict security rules on privacy and security is critical. This can be a tricky process as every step must be recorded for reproduction, from simple things like dataset versions to more complex things such as knowing which environment was used with what machine learning (ML) libraries with its specific versions. Machine learning operations (MLOps) for data and model auditability now make this possible. See how BMW uses machine learning-supported robots to provide flexibility in quality control for automotive manufacturing.

Gabriel Sallah, Senior HPC Specialist, Automotive Lead, Microsoft: We’ve worked with car makers to develop advanced driver assistance systems (ADAS) and advanced driving systems (ADS) platforms in the cloud using integrated services to build a highly scalable deep learning pipeline for creating AI/ML models. HPC techniques were applied to schedule, scale, and provision compute resources while ensuring effective monitoring, cost management, and data traceability. The result: faster simulation/training times due to the close integration of data inputs, compute simulation/training runs, and data outputs than their existing solutions.

Annamalai Chockalingam, Product Marketing Manager, Large Language Models & Deep Learning Products, NVIDIA: Progress in AI has led to the explosion of generative AI, particularly with advancements to large language models (LLMs) and diffusion-based transformer architectures. These models now recognize, summarize, translate, predict, and generate languages, images, videos, code, and even protein sequences, with little to no training or supervision, based on massive datasets. Early use cases include improved customer experiences through dynamic virtual assistants, AI-assisted content generation for blogs, advertising, marketing, and AI-assisted code generation. Infrastructure purpose-built for AI that can handle computer power and scalability demands is key.

What AI challenges are customers facing, and how does the right infrastructure help?

John Lee, Azure AI Platforms & Infrastructure Principal Lead, Microsoft: When companies try to scale AI training models beyond a single node to tens and hundreds of nodes, they quickly realize that AI infrastructure matters. Not all accelerators are alike. Optimized scale-up node-level architecture matters. How the host CPUs connect to groups of accelerators matter. When scaling beyond a single node, the scale-out architecture of your cluster matters. Selecting a cloud partner that provides AI-optimized infrastructure can be the difference between an AI project’s success or failure. Read the blog: AI and the need for purpose-built cloud infrastructure.

Annamalai Chockalingam: AI models are becoming increasingly powerful due to a proliferation of data, continued advancements in GPU compute infrastructure, and improvements in techniques across both training and inference of AI workloads. Yet, combining the trifecta of data, compute infrastructure, and algorithms at scale remains challenging. Developers and AI researchers require systems and frameworks that can scale, orchestrate, crunch mountains of data, and manage MLOps to optimally create deep learning models. End-to-end tools for production-grade systems incorporating fault tolerance for building and deploying large-scale models for specific workflows are scarce.

Kent Altena, Principal GBB HPC+AI Specialist, Financial Services, Microsoft: Trying to decide the best architectures between the open flexibility of a true HPC environment to the robust MLOps pipeline and capabilities of machine learning. Traditional HPC approaches, whether scheduled by a legacy scheduler like HPC Pack or SLURM or a cloud-native scheduler like Azure Batch, are great for when they need to scale to hundreds of GPUs, but in many cases, AI environments need the DevOps approach to AI model management and control of which models are authorized or conversely need overall workflow management.

Dr. Lukasz Miroslaw, Senior HPC Specialist, Microsoft: AI infrastructure is not only the GPU-based clusters but also low-latency, high-bandwidth interconnect between the nodes and high-performant storage. The storage requirement is often the limiting factor for large-scale distributed training as the amount of data used for the training in autonomous driving projects can grow to petabytes. The challenge is to design an AI platform that meets strict requirements in terms of storage throughput, capacity, support for multiple protocols, and scalability.

What are the most frequently asked questions about AI infrastructure?

John Lee: “Which platform should I use for my AI project/workload?” There is no single magic product or platform that is right for every AI project. Customers usually have a good understanding of what answers they are looking for but aren’t sure what AI products or platforms will get them that answer the fastest, most economical, and scalable way. A cloud partner with a wide portfolio of AI products, solutions, and expertise can help find the right solution for specific AI needs.

Uttara Kumar, Senior Product Marketing Manager, NVIDIA: “How do I select the right GPU for our AI workloads?” Customers want the flexibility to provision the right-sized GPU acceleration for different workloads to optimize cloud costs (fractional GPU, single GPU, multiple GPUs all the way up to multiple GPUs across multi-node clusters). Many also ask, “How do you make the most of the GPU instance/virtual machines and leverage it within applications/solutions?” Performance-optimized software is key to doing that.

Sheila Mueller: “How do I leverage the cloud for AI and HPC while ensuring data security and governance.” Customers want to automate the deployment of these solutions, often across multiple research labs with specific simulations. Customers want a secure, scalable platform that provides control over data access to provide insight. Cost management is also a focus in these discussions.

Kent Altena: “How best should we implement this GPU to run our GPUs?” We know what we need to run and have built the models, but we also need to understand the final mile. The answer is not always a straightforward one-size-fits-all answer. It requires understanding their models, what they are attempting to solve, and what their inputs and outputs/workflow looks like.

What have you learned from customers about their AI infrastructure needs?

John Lee: The majority of customers want to leverage the power of AI but are struggling to put an actionable plan in place to do so. They worry about what their competition is doing and whether they are falling behind but, at the same time, are not sure what first steps to take on their journey to integrate AI into their business.

Annamalai Chockalingam: Customers are looking for AI solutions to improve operational efficiency and deliver innovative solutions to their end customers. Easy-to-use, performant, platform-agnostic, and cost-effective solutions across the compute stack are incredibly desirable to customers.

Gabriel Sallah: All customers are looking to reduce the cost of training an ML model. Thanks to the flexibility of the cloud resources, customers can select the right GPU, storage I/O, and memory configuration for the given training model.

Gabrielle Davelaar: Costs are critical. With the current economic uncertainty, companies need to do more with less and want their AI training to be more efficient and effective. Something a lot of people are still not realizing is that training and inferencing costs can be optimized through the software layer.

What advice would you give to businesses looking to deploy AI or speed innovation?

Uttara Kumar: Invest in a platform that is performant, versatile, scalable, and can support the end-to-end workflow—start to finish—from importing and preparing data sets for training, to deploying a trained network as an AI-powered service using inference.

John Lee: Not every AI solution is the same. AI-optimized infrastructure matters, so be sure to understand the breadth of products and solutions available in the marketplace. And just as importantly, make sure you engage with a partner that has the expertise to help navigate the complex menu of possible solutions that best match what you need.

Sooyoung Moon, Senior HPC + AI Specialist, Microsoft: No amount of investment can guarantee success without thorough early-stage planning. Reliable and scalable infrastructure for continuous growth is critical.

Kent Altena: Understand your workflow first. What do you want to solve? Is it primarily a calculation-driven solution, or is it built upon a data graph-driven workload? Having that in mind will go a long way to determining the best or optimal approach to start down.

Gabriel Sallah: What are the dependencies across various teams responsible for creating and using the platform? Create an enterprise-wide architecture with common toolsets and services to avoid duplication of data, compute monitoring, and management.

Sheila Mueller: Involve stakeholders from IT and Lines of Business to ensure all parties agree to the business benefits, technical benefits, and assumptions made as part of the business case.

Learn more about Azure and NVIDIA

Visit our HPCwire Solution Channel.
Learn more about Microsoft Azure purpose-built infrastructure for AI.

Quelle: Azure

Automate your attack response with Azure DDoS Protection solution for Microsoft Sentinel

DDoS attacks are most known for their ability to take down applications and websites by overwhelming servers and infrastructure with large amounts of traffic. However, there are additional objectives for cybercriminals to use DDoS attacks to exfiltrate data, extort, act politically, or ideologically. One of the most devastating features of DDoS attacks is their unique ability to disrupt and create chaos in targeted organizations or systems. This plays well for bad actors that leverage DDoS as smokescreen for more sophisticated attacks, such as data theft. This demonstrates the increasingly sophisticated tactics cybercriminals use to intertwine multiple attack vectors to achieve their goals.

Azure offers several network security products that help organizations protect their applications: Azure DDoS Protection, Azure Firewall, and Azure Web Application Firewall (WAF). Customers deploy and configure each of these services separately to enhance the security posture of their protected environment and application in Azure. Each product has a unique set of capabilities to address specific attack vectors, but the most benefit speaks to the power of relationship—when combined these three products provide more comprehensive protection. Indeed, to combat modern attack campaigns one should use a suite of products and correlate security signals from one to another, to be able to detect and block multi-vector attacks.

We are announcing a new Azure DDoS Protection Solution for Microsoft Sentinel. It allows customers to identify bad actors from Azure’s DDoS security signals and block possible new attack vectors in other security products, such as Azure Firewall.

Using Microsoft Sentinel as the glue for attack remediation

Each of Azure’s network security services is fully integrated with Microsoft Sentinel, a cloud-native security information and event management (SIEM) solution. However, the real power of Sentinel is in collecting security signals from these separate security services and analyzing them to create a centralized view of the attack landscape. Sentinel correlates events and creates incidents when anomalies are detected. It then automates the response to mitigate sophisticated attacks.

In our example case, when cybercriminals use DDoS attacks as smokescreen to data theft, Sentinel detects the DDoS attack, and uses the information it gathers on attack sources to prevent the next phases of the adversary lifecycle. By using remediation capabilities in Azure Firewall and other network security services in the future, the attacking DDoS sources are blocked. This cross-product detection and remediation magnifies the security posture of the organization, where Sentinel is the orchestrator.

Automated detection and remediation of sophisticated attacks

Our new Azure DDoS Protection Solution for Sentinel provides a single consumable solution package that allows customers to achieve this level of automated detection and remediation. The solution includes the following components:

Azure DDoS Protection data connector and workbook.
Alert rules that help retrieve the source DDoS attackers. These are new rules we created specifically for this solution. These rules may be utilized by customers to achieve other objectives for their security strategy.
A Remediation IP Playbook that automatically creates remediation in Azure Firewall to block the source DDoS attackers. Although we document and demonstrate how to use Azure Firewall for remediation, any 3rd party firewall that has a Sentinel Playbook can be used for remediation. This provides the flexibility for customers to use this new DDoS solution with any firewall.

The solution is initially released for Azure Firewall (or any third-party firewall), and we plan to enhance it to support Azure WAF soon.

Let’s see a couple of use cases for this cross-product attack remediation.

Use case #1: remediation with Azure Firewall

Let’s consider an organization that use Azure DDoS Protection and Azure Firewall, and consider the attack scenario in the following figure:

An adversary controls a compromised bot. They starts with a DDoS smokescreen attack, targeting the resources in the virtual network for that organization. They then plan to access the network resources by scanning and phishing attempts until they’re able to gain access to sensitive data.

Azure DDoS Protection detects the smokescreen attack and mitigates this volumetric network flood. In parallel it starts sending log signals to Sentinel. Next, Sentinel retrieves the attacking IP addresses from the logs, and deploys remediation rules in Azure Firewall. These rules will prevent any non-DDoS attack from reaching the resources in the virtual network, even after the DDoS attacks ends, and DDoS mitigation ceases.

Use case #2: remediation with Azure WAF (coming soon)

Now, let’s consider another organization who runs a web application in Azure. It uses Azure DDoS Protection and Azure WAF to protect its web application. The adversary objective in this case is to attack the web application and exfiltrate sensitive data by starting with a DDoS smokescreen attack, and then launch web attacks on the application.

 

When Azure DDoS Protection service detects the volumetric smokescreen attack, it starts mitigating it, and signals logs to Sentinel. Sentinel retrieves the attack sources and applies remediation in Azure WAF to block future web attacks on the application.

Get started with Azure DDoS protection today

As attackers employ advanced multi-vector attack techniques during the adversary lifecycle, it’s important to harness security services as much as possible to automatically orchestrate attack detection and mitigation.

For this reason, we created the new Azure DDoS Protection solution for Microsoft Sentinel that helps organizations to protect their resources and applications better against these advanced attacks. We will continue to enhance this solution and add more security services and use cases.

Follow our step-by-step configuration guidance on how to deploy the new solution.
Quelle: Azure

Tune in today: Learn Live experts are ready to accelerate your skilling

There is a treasure trove of information available at Microsoft Learn, including self-paced lessons, assessments, and certifications waiting to be explored—for every skill level. Whether you’re kicking off your IT career or a seasoned pro looking to master new skills to stay competitive, anyone can get blocked on their path to educational growth. That’s where our interactive Learn Live video series comes to the rescue—Instructors with deep knowledge of Microsoft technology walk participants through a wide array of Microsoft Learn modules in real time and answer your questions live. It’s like having your own virtual tutor for anything Azure.

Many learners benefit from instantaneous communication and feedback. Whether you tune in live or watch the videos on demand, our 60-to-90-minute Learn Live episodes take the time to guide learners through lessons and provide guidance on Microsoft Learn modules. Have a nagging uncertainty about evaluating classification models? Don’t know where to begin implementing your Cosmos DB SQL API account? Glean insights from a live Q&A hosted by professionals—sometimes from the very engineers who built the solutions you’re studying.  From Azure engineers to program managers, cloud advocates, and technical trainers, our team of experts is available for your questions during the episode as well as after through social media.

Your learning path is waiting

If your goal is to get certified in your role, Learn Live sessions are your chance to practice technical skills in an interactive environment with Azure experts and other developers from around the world. Every episode is closed-captioned and offered in several languages. If you are restricted from watching live by time or location, know that after livestreaming each episode can be watched on-demand, and live airings are staggered to accommodate learners worldwide.

Register for one of our ongoing or upcoming series, or dive into our deep roster of previous episodes.

Current Learn Live series

Azure Core IaaS Study Hall

22 episodes; began Dec. 1, 2022, runs every week through May 18, 2023

Discover how to build infrastructure solutions with Azure infrastructure as a service (IaaS) services and products. Start maximizing the value of your IT investments by learning about highly secure, available, and scalable cloud services. Modernize your IT with enterprise-grade cloud infrastructure and migrate your apps with confidence.

Automate your Azure deployments by using Bicep and GitHub Actions

8 episodes; began Nov. 30, 2022, runs every week through Feb. 8, 2023

Gain all the benefits of infrastructure as code by using an automated workflow to deploy your Bicep templates and integrate other deployment activities with your workflows. You'll build workflows using GitHub Actions.

FastTrack for Azure, Live, and On-Demand Series

Beginning February 2023, runs every week through June 2023

Join expert Azure engineers in our regular virtual sessions, designed for Azure users to come together and discuss a specific Azure technology or theme in an interactive setting in a multi-customer, informal environment.

 

On-Demand Learn Live series (see complete list here)

FastTrack for Azure, Season 1

13 episodes, ran Sept. 13-Dec. 15, 2022

Accelerate your Azure solution implementation with hands-on exercises and demos. This on-demand series will cover a variety of Azure solution areas as directly requested by customers.

Build mobile and desktop apps with .NET MAUI

7 episodes, ran Sept. 7-Nov. 16, 2022

Learn how to use .NET MAUI to build apps that run on mobile devices and on the desktop using C# and Visual Studio. You'll learn the fundamentals of building an app with .NET MAUI and more advanced topics such as local data storage and invoking REST-based web services.

Create machine learning models with R and tidymodels

4 episodes, ran Sept. 2-Sept. 23, 2022

Explore and analyze data by using R, and get an introduction to regression models, classification models, and clustering models by using tidymodels and R.

Azure Hybrid Cloud Study Hall

14 episodes

Learn how to configure, deploy, and manage your hybrid cloud resources using services and hybrid cloud technologies, and walk through Microsoft Learn modules focused on Azure Arc and Azure Stack HCI. You will learn how to manage your on-premises, edge, and multicloud resources and deploy Azure services anywhere with Azure Arc and Azure Stack.

Use Bicep to deploy your Azure infrastructure as code

15 episodes

Discover how to deploy Azure resources by using Bicep, a language and set of tools to help you to deploy your infrastructure as code. Bicep makes your deployments more consistent and repeatable.

Run VMware workloads on Azure VMware Solution

3 episodes (launched at the VMware Solution Event 2022)

Work out how to easily extend your VMware workloads, skills, and tools to the cloud with Azure VMware Solution—the cloud service that lets you run VMware infrastructure natively on Azure.

Hybrid Infrastructure Study Hall

7 episodes

Hone your skills in configuring advanced Windows Server services using on-premises, hybrid, and cloud technologies, and walk through Microsoft Learn modules related to the new Windows Server Hybrid Administrator Associate certification.

Start your learning journey into Azure AI with a Helping Hand (powered by Women in AI)

3 episodes

 

No matter your previous AI knowledge, this series will take you through what is available in Azure AI, Computer Vision, and Conversational AI. Through a partnership with Microsoft Certified, you will be well on your way to taking the Azure AI Fundamentals certification exam.

Create microservices with .NET and ASP.NET

8 episodes

 

Create independently deployable, highly scalable, and resilient services using the free and open-source .NET platform. In addition, learn how to develop microservices with .NET and ASP.NET

Azure Cosmos DB certification study hall

 

24 episodes

 

Learn how to design, implement, and monitor cloud-native applications that store and manage data. Work on getting certified with the Azure Cosmos DB Developer Specialty certification.

Deploy your apps with Java on Azure using familiar tools and frameworks

3 episodes

 

Discover how you can build, migrate, and scale Java applications on Azure using the tools and frameworks you know and love, from Spring to Kubernetes to Java EE.

 

Additional video paths for growth

Learn Live is only part of the Microsoft Learn TV ecosystem—in other words, just one of the hit shows on the Learn TV network. Even with the extensive lineup of series provided by Learn Live, there is even more video content to discover on Learn TV.

Keep up to date on Azure tips, demos, and technical skill-building resources with episodes of Inside Azure for IT and stay on the forefront of key insights, tools, and best practices for optimizing all of your infrastructure for performance, cost efficiency, security, and reliability on Azure.

On the Azure Enablement Show, experts share technical advice, tips, and best practices to accelerate your cloud journey, build well-architected cloud apps, and optimize your solutions in Azure.

Finally, the SAP on Microsoft Azure video tutorial series provides technical guidance and enablement for customers and partners. Improve your cloud infrastructure skills with advanced guidance from Azure experts in this on-demand video series.
Quelle: Azure

Secure your application traffic with Application Gateway mTLS

I am happy to share that Azure Application Gateway now supports mutual transport layer security (mTLS) and online certificate status protocol (OCSP). This was one of the key questions from our customers as they were looking for more secure communication options for the cloud workloads. Here, I cover what mTLS is, how it works, when to consider it, and how to verify it in Application Gateway.

What is mTLS?

Mutual transport layer security (TLS) is a communication process where both parties verify and authenticate each other’s digital certificates prior to setting up an encrypted TLS connection. mTLS is an extension of the standard TLS protocol, and it provides an additional layer of security over TLS. With traditional TLS, the server is authenticated, but the client is not. This means that anyone can connect to the server and initiate a secure connection, even if the client or user is not authorized to do so. By using mTLS you can make sure that both the client and the server must authenticate each other prior to establishing the secure connection, this will make sure there is no unauthorized access possible on either side. mTLS works on the framework of zero trust—never trust, always verify. This framework ensures that no connection should be trusted automatically.

How does mTLS work?

mTLS works by using a combination of secure digital certificates and private keys to authenticate both the client and the server. The client and the server each have their own digital certificate and private key, which are used to establish trust and a secure connection. The client verifies the server's certificate, and the server verifies the client's certificate—this ensures that both parties are who they claim to be.

How are TLS and mTLS different?

TLS and mTLS protocols are used to encrypt network communication betweenclient and server. In TLS protocol only the client verifies the validity of the server prior to establishing the encrypted communication. The server does not validate the client during the TLS handshake. mTLS, on other hand, is a variation of TLS that adds an additional layer of security by requiring mutual authentication between client and server. This means that both the client and server must present a valid certificate before the encrypted connection can be established. This makes mTLS more secure than TLS as it adds an added layer of security by validating authenticity of client and server.

TLS call flow:

mTLS call flow:

 

When to consider mTLS

mTLS is useful where organizations follow a zero-trust approach. This way a server must ensure of the validity of the specific client or device that wants to use server information. For example, an organization may have a web application that employees or clients can use to access very sensitive information, such as financial data, medical records, or personal information. By using mTLS, the organization can ensure that only authorized employees, clients, or devices are able to access the web application and the sensitive information it contains.
Internet of Things (IoT) devices talk to each other with mTLS. Each IoT device presents its own certificate to each other to get authenticated.
Most new applications are working on microservices-based architecture. Microservices communicate with each other via application programming interfaces (APIs), by using mTLS you can make sure that API communication is secure. Also, by using mTLS you can make sure malicious APIs are not communicating with your APIs
To prevent various attacks, such as brute force or credential stuffing. If an attacker can get a leaked password or a BOT tries to force its way in with random passwords, it will be of no use—without a valid TLS certificate the attacker will not be able to pass the TLS handshake.

At high level now you understand what is mTLS and how it offers more secure communication by following zero trust security model. If you are new to Application Gateway and have never setup TLS in Application Gateway, follow the link to create APPGW and Backend Servers. This tutorial uses self-signed certificates for demonstration purposes. For a production environment, use publicly trusted CA-signed certificates. Once end-to-end TLS is set up, you can follow this link for setting up mTLS. To test this setup the prerequisite is to have OpenSSL and curl tool installed on your machine. You should have access to the client certificate and client private key.

Let’s dive into how to test mTLS Application Gateway. In the command below, the client's private key is used to create a signature for the Certificate Verify message. The private key does not leave the client device during the mTLS handshake.

Verify your mTLS setup by using curl/openssl

curl -vk https://<yourdomain.com> –key client.key –cert client.crt
<Yourdomain.com> -> Your domain address
client.key -> Client’s private key
client.crt -> Client certificate

In the above output, we are verifying if mTLS is correctly set up. If it is set up correctly, during the TSL handshake server will request the client certificate. Next, in the handshake, you need to verify if the client has presented a client certificate along with the Certificate Verify message. Since the client certificate was valid, the handshake was successful, and the application has responded with an HTTP "200" response.

If the client certificate is not signed by the root CA file that was uploaded as per the link in step 8, the handshake will fail. Below is the response we will get if the client certificate is not valid.

Alternatively, you can verify the mTLS connectivity with an OpenSSL command.

openssl s_client -connect <IPaddress> :443 -key client.key -cert client.crt

Once the SSL connection is established type as written below:

GET / HTTP/1.1

Host: <IP of host>

You should get the Response code—200. This validates that mutual authentication is successful.

Conclusion

I hope you have learned now what mTLS is, what problem it solves, how to set it up in Application Gateway and how to validate the setup.  It is one of the several great features of Application gateway that provides our customer with an extra layer of security for the various use cases that we have discussed above. One thing to note is that currently Application Gateway supports mTLS in frontend only (between client and Application gateway). If your backend server is expecting a client certificate during SSL negotiation between Application gateway and backend server, that request will fail. If you want to learn how to send certificates to backend application via http header please wait for our next blog of  mTLS series. In that blog I will go over how to use Rewrite feature to send the client certificate as http header. Also we will discuss how we can do OCSP validation of client certificate.
 

Learn more and get started with Azure Application Gateway

What is Azure Application Gateway | Microsoft Learn

Overview of mutual authentication on Azure Application Gateway | Microsoft Learn

Frequently asked questions about Azure Application Gateway | Microsoft Learn
Quelle: Azure

Roundup of AI breakthroughs by Microsoft and NVIDIA

In terms of AI developments, 2022 proved to be a banner year. AI production and usage increased and both deep learning and machine learning models steadily conquered increasingly complex problems. But AI evolution goes beyond impressive production and adoption levels to include breakthroughs in new computing methods, applications, and mega-scale supercomputers. Two of the leaders in this space are Microsoft Azure and NVIDIA and here are some of their most noteworthy achievements this year.

Massive cloud AI supercomputer for mega-sized AI models

The two computing giants are collaborating to provide one of the most powerful AI supercomputing platforms in the world. Powered by Microsoft Azure’s advanced supercomputing infrastructure combined with NVIDIA GPUs, networking, and the full stack of AI software, this cloud supercomputer will help enterprises train, deploy, and scale AI, including large, state-of-the-art models.

Azure is the first public cloud to incorporate NVIDIA’s advanced AI stack, adding tens of thousands of NVIDIA A100 and H100 GPUs, NVIDIA Quantum-2 400Gb/s InfiniBand networking, and the NVIDIA AI enterprise software suite to its platform.

This effort isn’t limited to spinning up better outputs in lean times—there’s also the need to advance knowledge and discoveries to benefit all of mankind’s endeavors and needs. All told, it’s an impressive AI system that performs at scales rarely imagined to be possible.

A quantum leap for AI-fueled scientific discoveries

AI can do far more than traditional data mining and advanced analytics. AI is dramatically changing scientific methods, handling complexities that a human mind cannot comprehend.

At Supercomputing 2022 (SC22), NVIDIA announced broad adoption of its next-generation H100 Tensor Core GPUs and Quantum-2 InfiniBand, including new offerings on Microsoft Azure cloud and over 50 new partner systems for accelerating scientific discovery.

H100 and Quantum-2 are part of NVIDIA’s high-performance computing (HPC) platform—a full technology stack with CPUs, GPUs, DPUs, systems, networking, and a broad range of AI and HPC software—that provides researchers the ability to efficiently accelerate their work on powerful systems, on-premises or in the cloud.

Microsoft Azure will be the first to offer NVIDIA Quantum-2 for HPC Workloads, providing a world-class supercomputing cloud infrastructure that allows researchers and scientists using Azure to achieve their life’s work.

Innovation is the lifeblood of all industries and this Microsoft Azure and NVIDIA breakthrough puts it at the fingertips of all users.

AI brings medical imagery diagnostics into sharper focus

A powerful collaboration between Microsoft Azure, NVIDIA, and the Nuance Precision Imaging Network puts AI-based medical image diagnostic tools directly into the hands of radiologists and other clinicians. This enables the capture of economies at scale, meaning patient care improves while costs drop.

A powerful collaboration between Microsoft Azure, NVIDIA, and the Nuance Precision Imaging Network puts AI-based medical image diagnostic tools directly into the hands of radiologists and other clinicians. This enables the capture of economies at scale, meaning patient care improves while costs drop. 

Powered by Microsoft Azure, the Nuance Precision Imaging Network provides access to an entire ecosystem of AI-powered tools and insights within clinical workflows to more than 12,000 healthcare facilities and the 80 percent of U.S. radiologists who use Nuance's PowerScribe radiology reporting and PowerShare image sharing solutions1. 

Mass General Brigham will be among the first to accelerate end-to-end AI model development and deployment in clinical workflows on the Nuance Precision Imaging Network.

Wrapping up the roundup

Azure collaborates with Hazy Research and NVIDIA to achieve unmatched MLPerf results. Azure-Hazy Research is the only submitter that reached below the 2-minute mark with BERT on 16 virtual machines.
Azure scales 530B Parameter GPT-3 Model with NVIDIA NeMo Megatron. Combining NVIDIA NeMo Megatron with Azure AI infrastructure offers a powerful platform that anyone can spin up in minutes without having to incur the costs and burden of managing their own on-premises infrastructure.
Azure and NVIDIA scored Top500 and Green500 ranking for AI purpose-built infrastructure
HPCwire bestowed Microsoft Azure and NVIDIA top awards in Best Use of HPC in Financial Services Editors’ Choice, Best AI Product or Technology—Readers’ Choice, and Best Use of High-Performance Data Analytics & Artificial Intelligence.
At Supercomputing 2022, Nidhi Chappell, General Manager, Microsoft Azure HPC + AI, and Ian Buck, VP/General Manager Hyperscale and HPC Computing, NVIDIA discussed game changers in AI and cloud infrastructure. And NVIDIA founder and CEO Jensen Huang shared the latest news, innovations, and technologies in NVIDIA’s SC22 Special Address.

Stay tuned for our 2023 advancements

Keep up to date with Microsoft and NVIDIA. Visit our HPCwire Solution Channel.

 

 

Sources: NVIDIA Teams With Microsoft to Build Massive Cloud AI Computer, NVIDIA news, November 16, 2022.
 
Quelle: Azure

Azure Red Hat OpenShift for Microsoft Azure Government—now generally available

Today we’re pleased to announce the general availability of Azure Red Hat OpenShift on Microsoft Azure Government. With this release, we are combining world-class Azure infrastructure with a leading enterprise Kubernetes platform as a jointly operated and supported service for Azure Government customers to run their containerized workloads in production.

Azure Red Hat OpenShift (ARO) on Azure Government enables compliance with strict government regulations and certifications, such as FedRAMP and CJIS, which makes it a secure and compliant option for running containerized workloads at production scale. Agencies can take advantage of the stringent security and compliance features of Azure Government and leverage the flexibility and scalability of OpenShift.

Azure Red Hat OpenShift for Azure Government includes key IT security and regulatory certifications, including:

FedRAMP High Authorization.
International Traffic in Arms Regulations (ITAR).
Defense Federal Acquisition Regulation Supplement (DFARS).
Internal Revenue Service (IRS) 1075 forms.
Criminal Justice Information Services (CJIS).

As a managed service, Azure Red Hat OpenShift also offers several benefits for agencies looking to innovate, including:

Scalability: OpenShift provides automatic scaling, self-healing, and rolling updates, which help to ensure that applications can handle increased loads and recover from failures quickly.
Flexibility: OpenShift allows developers to use their preferred languages, frameworks, and tools to build and deploy containerized applications, making it easy to work with existing applications and technologies.
Enterprise-grade management: OpenShift provides a centralized management console, role-based access control, and built-in monitoring and logging, making it easy for IT teams to manage and troubleshoot containerized applications.
Interoperability: Azure Red Hat OpenShift runs on top of Azure and integrates with other Azure services, such as Azure Database for PostgreSQL, Azure Cosmos DB, and Azure Virtual Network, making it easy to build and deploy applications that leverage the full range of Azure services.
Support: Azure Red Hat OpenShift is a jointly managed product, which means that it is supported by both Microsoft and Red Hat, providing customers with access to the expertise and resources of both companies.

Launched in 2019, Azure Red Hat OpenShift was the first codeveloped, jointly operated Red Hat OpenShift service on the public cloud, offering a powerful on-ramp to the hybrid cloud by extending the same enterprise-grade Kubernetes used in private datacenters to the scale of Microsoft Azure.

Get started with Azure Red Hat OpenShift

Kickstart your Azure Red Hat OpenShift journey in Azure Government.
Check out our documentation and the tutorial on how to create an Azure Red Hat OpenShift 4 cluster.
Stay in touch with us on our GitHub and follow our roadmap.
Connect with us on Q&A, we would love to hear from you.

Quelle: Azure

Scale Azure Firewall SNAT ports with NAT Gateway for large workloads

This post was co-authored by Suren Jamiyanaa, Product Manager II, Azure Networking.

As large organizations across all industries expand their cloud business and operations, one core criteria for their cloud infrastructure is to make connections over the internet at scale. However, a common outbound connectivity issue encountered when handling large-scale outbound traffic is source network address translation (SNAT) port exhaustion. Each time a new connection to the same destination endpoint is made over the internet, a new SNAT port is used. SNAT port exhaustion occurs when all available SNAT ports run out. Environments that often require making many connections to the same destination, such as accessing a database hosted in a service provider’s data center, are susceptible to SNAT port exhaustion. When it comes to connecting outbound to the internet, customers need to not only consider potential risks such as SNAT port exhaustion but also how to provide security for their outbound traffic.

Azure Firewall is an intelligent security service that protects cloud infrastructures against new and emerging attacks by filtering network traffic. All outbound internet traffic using Azure Firewall is inspected, secured, and undergoes SNAT to conceal the original client IP address. To bolster outbound connectivity, Azure Firewall can be scaled out by associating multiple public IPs to Azure Firewall. Some large-scale environments may require manually associating up to hundreds of public IPs to Firewall in order to meet the demand of large-scale workloads, which can be a challenge to manage long-term. Partner destinations also commonly have a limit on the number of IPs that can be whitelisted at their destination sites, which can create challenges when Firewall outbound connectivity needs to be scaled out with many public IPs. Without scaling this outbound connectivity, customers are more susceptible to outbound connectivity failures due to SNAT port exhaustion.

This is where network address translation (NAT) gateway comes in. NAT gateway can be easily deployed to an Azure Firewall subnet to automatically scale connections and filter traffic through the firewall before connecting to the internet. NAT gateway not only provides a larger SNAT port inventory with fewer public IPs but NAT gateway’s unique method of SNAT port allocation is specifically designed to handle dynamic and large-scale workloads. NAT gateway’s dynamic allocation and randomized selection of SNAT ports significantly reduce the risk of SNAT port exhaustion while also keeping overhead management of public IPs at a minimum.

In this blog, we’ll explore the benefits of using NAT Gateway with Azure Firewall as well as how to integrate both into your architecture to ensure you have the best setup for meeting your security and scalability needs for outbound connectivity.

Benefits of using NAT Gateway with Azure Firewall

One of the greatest benefits of integrating NAT gateway into your Firewall architecture is the scalability that it provides for outbound connectivity. SNAT ports are a key component to making new connections over the internet and distinguishing different connections from one another coming from the same source endpoint. NAT gateway provides 64,512 SNAT ports per public IP and can scale out to use 16 public IP addresses. This means, when fully scaled out with 16 public IP addresses, NAT gateway provides over 1 million SNAT ports. Azure Firewall, on the other hand, supports 2,496 SNAT ports per public IP per virtual machine instance within a virtual machine scale set (minimum of 2 instances). This means that to achieve the same volume of SNAT port inventory as NAT gateway when fully scaled out, Firewall may require up to 200 public IPs. Not only does NAT gateway offer more SNAT ports with fewer public IPs, but these SNAT ports are allocated on demand to any virtual machine in a subnet. On-demand SNAT port allocation is key to how NAT gateway significantly reduces the risk of common outbound connectivity issues like SNAT port exhaustion.

NAT gateway also provides 50 Gbps of data throughput for outbound traffic that can be used in line with a standard SKU Azure Firewall, which provides 30 Gbps of data throughput. Premium SKU Azure Firewall provides 100 Gbps of data throughput.

With NAT gateway you also ensure that your outbound traffic is entirely secure since no inbound traffic can get through NAT gateway. All inbound traffic is subject to security rules enabled on the Azure Firewall before it can reach any private resources within your cloud infrastructure.

To learn more about the other benefits that NAT gateway offers in Azure Firewall architectures, see NAT gateway integration with Azure Firewall.

How to get the most out of using NAT Gateway with Azure Firewall

Let’s take a look at how to set up NAT gateway with Azure Firewall and how connectivity to and from the internet works upon integrating both into your cloud architecture.

Production-ready outbound connectivity with NAT Gateway and Azure Firewall

For production workloads, Azure recommends separating Azure Firewall and production workloads into a hub and spoke topology. Introducing NAT gateway into this setup is simple and can be done in just a couple short steps. First, deploy Azure Firewall to an Azure Firewall Subnet within the hub virtual network (VNet). Attach NAT gateway to the Azure Firewall Subnet and add up to 16 public IP addresses and you’re done. Once configured, NAT gateway becomes the default route for all outbound traffic from the Azure Firewall Subnet. This means that internet-directed traffic (traffic with the prefix 0.0.0.0/0) routed from the spoke Vnets to the Hub Vnet’s Azure Firewall Subnet will automatically use the NAT gateway to connect outbound. Because NAT gateway is fully managed by Azure, NAT gateway allocates SNAT ports and scales to meet your outbound connectivity needs automatically. No additional configurations are required.

 

Figure: Separate the Azure Firewall from the production workloads in a hub and spoke topology and attach NAT gateway to the Azure Firewall Subnet in the hub virtual network. Once configured, all outbound traffic from your spoke virtual networks is directed through NAT gateway and all return traffic is directed back to the Azure Firewall Public IP to maintain flow symmetry. 

How to set up NAT Gateway with Azure Firewall

To ensure that you have set up your workloads to route to the Azure Firewall Subnet and use NAT gateway for connecting outbound, follow these steps:

Deploy your Firewall to an Azure Firewall Subnet within its own virtual network. This will be the Hub Vnet.
Add NAT gateway to the Azure Firewall Subnet and attach at least one public IP address.
Deploy your workloads to subnets in separate virtual networks. These virtual networks will be the spokes. Create as many spoke Vnets for your workload as needed.
Set up Vnet peering between the hub and spoke Vnets.
Insert a route to the spoke subnets to route 0.0.0.0/0 internet traffic to the Azure Firewall.
Add a network rule to the Firewall policy to allow traffic from the spoke Vnets to the internet.

Refer to this tutorial for step-by step guidance on how to deploy NAT gateway and Azure Firewall in a hub and spoke topology.

Once NAT gateway is deployed to the Azure Firewall Subnet, all outbound traffic is directed through the NAT gateway. Normally, NAT gateway also receives any return traffic. However, in the presence of Azure Firewall, NAT gateway is used for outbound traffic only. All inbound and return traffic is directed through the Azure Firewall in order to ensure traffic flow symmetry.

FAQ

Can NAT gateway be used in a secure hub virtual network architecture with Azure Firewall?

No, NAT gateway is not supported in a secure hub (vWAN) architecture. A hub virtual network architecture as described above must be used instead.

How does NAT gateway work with a zone-redundant Azure Firewall?

NAT gateway is a zonal resource that can provide outbound connectivity from a single zone for a virtual network regardless of whether it used with a zonal or zone-redundant Azure Firewall. To learn more about how to optimize your availability zone deployments with NAT gateway, refer to our last blog.

Benefits of NAT Gateway with Azure Firewall

When it comes to providing outbound connectivity to the internet from cloud architectures using Azure Firewall, look no further than NAT gateway. The benefits of using NAT gateway with Azure Firewall include:

Simple configuration. Attach NAT gateway to the Azure Firewall Subnet in a matter of minutes and start connecting outbound right away. No additional configurations required.
Fully managed by Azure. NAT gateway is fully managed by Azure and automatically scales to meet the demand of your workload.
Requires fewer static public IPs. NAT gateway can be associated with up to 16 static public IP addresses which allows for easy whitelisting at destination endpoints and simpler management of downstream IP filtering rules.
Provides a greater volume of SNAT ports for connecting outbound. NAT gateway can scale to over 1 million SNAT ports when configured to 16 public IP addresses.
Dynamic SNAT port allocation ensures that the full inventory of SNAT ports is available to every virtual machine in your workload. This in turn helps to significantly reduce the risk of SNAT port exhaustion that is common with other SNAT methods.
Secure outbound connectivity. Ensures that no inbound traffic from the internet can reach private resources within your Azure network. All inbound and response traffic is subject to security rules on the Azure Firewall.
Higher data throughput. A standard SKU NAT gateway provides 50 Gbps of data throughput. A standard SKU Azure Firewall provides 30 Gbps of data throughput.

Learn more

For more information on NAT Gateway, Azure Firewall, and how to integrate both into your architectural setup, see:

What is Azure Virtual Network NAT?
Azure Firewall documentation.
Scale SNAT ports with Azure Virtual Network NAT.
Integrate NAT gateway with Azure Firewall in a hub and spoke network.

Quelle: Azure