Achieve higher performance and cost savings on Azure with virtual machine bursting

Selecting the right combination of virtual machines (VMs) and disks is extremely important as the wrong mix can impact your application’s performance. One way to choose which VMs and disks to use is based on your disk performance pattern, but it’s not always easy. For example, a common scenario is unexpected or cyclical disk traffic where the peak disk performance is temporary and significantly higher than the baseline performance pattern. We frequently get asked by our customers, "should I provision my VM for baseline or peak performance?" Over-provisioning can lead to higher costs, while under-provisioning can result in poor application performance and customer dissatisfaction. Azure Disk Storage now makes it easier for you to decide, and we’re pleased to share VM bursting support on your Azure virtual machines.

Get short-term, higher performance with no additional steps or costs

VM bursting, which is enabled by default, offers you the ability to achieve higher throughput for a short duration on your virtual machine instance with no additional steps or cost. Currently available on all Lsv2-series VMs in all supported regions, VM bursting is great for a wide range of scenarios like handling unforeseen spiky disk traffic smoothly, or processing batched jobs with speed. With VM bursting, you can see up to 8X improvement in throughput when bursting. Additionally, you can combine both VM and disk bursting (generally available in April) to get higher performance on your VM or disks without overprovisioning. If you have workloads running on-premises with unpredictable or cyclical disk traffic, you can migrate to Azure and take advantage of our VM bursting support to improve your application performance.

Bursting flow

VM bursting is regulated on a credit-based system. Your VM starts with a full amount of credits and these credits allow you to burst for 30 minutes at the maximum burst rate. Bursting credits accumulate when your VM instance is running under their performance disk storage limits. Bursting credits are consumed when your VM instance is running over their performance limits. For detailed examples on how bursting works, check out the disk bursting documentation. 

Benefits of virtual machine bursting

Cost savings: If your daily peak performance time is less than the burst duration, you can use bursting VMs or disks as a cost-effective solution. You can build your VM and disk combination so the bursting limits match the required peak performance and the baseline limits match the average performance.
Preparedness for traffic spikes: Web servers and their applications can experience traffic surges at any time. If your web server is backed by VMs or disks using bursting, the servers are better equipped to handle traffic spikes.
Handling batch jobs: Some application’s workloads are cyclical in nature and require a baseline performance for most of the time and require higher performance for a short period of time. An example of this would be an accounting program that processes transactions daily that require a small amount of disk traffic, but at the end of the month does reconciling reports that need a much higher amount of disk traffic.

Get started with disk bursting

Create new virtual machines on the burst supported virtual machines using the Azure portal, PowerShell, or command-line interface (CLI) now. Bursting comes enabled by default on VMs that support it, so you don't need to do anything but deploy the instance to get the benefits. Any of your exisiting VMs that support bursting will have the capability enabled automatically. You can find the specifications of burst eligible virtual machines in the table below. Bursting feature is available in all regions where Lsv2-series VMs are available.

Size

Uncached data disk throughput (MB/s)

Max burst uncached data disk throughput (MB/s)

Standard_L8s_v2

160

1280

Standard_L16s_v2

320

1280

Standard_L32s_v2

640

1280

Standard_L48s_v2

960

2000

Standard_L64s_v2

1280

2000

Standard_L80s_v2

1400

2000

Next steps

Support for more VM types as well as IOPS bursting on VMs will be available soon.

If you’d like to learn more about how the bursting feature works for both our virtual machines and disks, check out the disk bursting documentation.

Please email us at AzureDisks@microsoft.com to share your feedback on our bursting feature, or leave a post in the Azure Storage feedback forum.
Quelle: Azure

Cost optimization strategies for cloud-native application development

Today, we’ll explore some strategies that you can leverage on Azure to optimize your cloud-native application development process using Azure Kubernetes Service (AKS) and managed databases, such as Azure Cosmos DB and Azure Database for PostgreSQL.

Optimize compute resources with Azure Kubernetes Service

AKS makes it simple to deploy a managed Kubernetes cluster in Azure. AKS reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. As a managed Kubernetes service, Azure handles critical tasks like health monitoring and maintenance for you.

When you’re using AKS to deploy your container workloads, there are a few strategies to save costs and optimize the way you run development and testing environments.

Create multiple user node pools and enable scale to zero

In AKS, nodes of the same configuration are grouped together into node pools. To support applications that have different compute or storage demands, you can create additional user node pools. User node pools serve the primary purpose of hosting your application pods. For example, you can use these additional user node pools to provide GPUs for compute-intensive applications or access to high-performance SSD storage.

When you have multiple node pools, which run on virtual machine scale sets, you can configure the cluster autoscaler to set the minimum number of nodes, and you can also manually scale down the node pool size to zero when it is not needed, for example, outside of working hours.

For more information, learn how to manage node pools in AKS.

Spot node pools with cluster autoscaler

A spot node pool in AKS is a node pool backed by a virtual machine scale set running spot virtual machines. Using spot VMs allows you to take advantage of unused capacity in Azure at significant cost savings. Spot instances are great for workloads that can handle interruptions like batch processing jobs and developer and test environments.

When you create a spot node pool. You can define the maximum price you want to pay per hour as well as enable the cluster autoscaler, which is recommended to use with spot node pools. Based on the workloads running in your cluster, the cluster autoscaler scales up and scales down the number of nodes in the node pool. For spot node pools, the cluster autoscaler will scale up the number of nodes after an eviction if additional nodes are still needed.

Follow the documentation for more details and guidance on how to add a spot node pool to an AKS cluster.

Enforce Kubernetes resource quotas using Azure Policy

Apply Kubernetes resource quotas at the namespace level and monitor resource usage to adjust quotas as needed. This provides a way to reserve and limit resources across a development team or project. These quotas are defined on a namespace and can be used to set quotas for compute resources, such as CPU and memory, GPUs, or storage resources. Quotas for storage resources include the total number of volumes or amount of disk space for a given storage class and object count, such as a maximum number of secrets, services, or jobs that can be created.

Azure Policy integrates with AKS through built-in policies to apply at-scale enforcements and safeguards on your cluster in a centralized, consistent manner. When you enable the Azure Policy add-on, it checks with Azure Policy for assignments to the AKS cluster, downloads and caches the policy details, runs a full scan, and enforces the policies.

Follow the documentation to enable the Azure Policy add-on on your cluster and apply the Ensure CPU and memory resource limits policy which ensures CPU and memory resource limits are defined on containers in an Azure Kubernetes Service cluster.

Optimize the data tier with Azure Cosmos DB

Azure Cosmos DB is Microsoft's fast NoSQL database with open APIs for any scale. A fully managed service, Azure Cosmos DB offers guaranteed speed and performance with service-level agreements (SLAs) for single-digital millisecond latency and 99.999 percent availability, along with instant and elastic scalability worldwide. With the click of a button, Azure Cosmos DB enables your data to be replicated across all Azure regions worldwide and use a variety of open-source APIs including MongoDB, Cassandra, and Gremlin.

When you’re using Azure Cosmos DB as part of your development and testing environment, there are a few ways you can save some costs. With Azure Cosmos DB, you pay for provisioned throughput (Request Units, RUs) and the storage that you consume (GBs).

Use the Azure Cosmos DB free tier

Azure Cosmos DB free tier makes it easy to get started, develop, and test your applications, or even run small production workloads for free. When a free tier is enabled on an account, you'll get the first 400 RUs per second (RU/s) throughput and 5 GB of storage. You can also create a shared throughput database with 25 containers that share 400 RU/s at the database level, all covered by free tier (limit 5 shared throughput databases in a free tier account). Free tier lasts indefinitely for the lifetime of the account and comes with all the benefits and features of a regular Azure Cosmos DB account, including unlimited storage and throughput (RU/s), SLAs, high availability, turnkey global distribution in all Azure regions, and more.

Try Azure Cosmos DB for free.

Autoscale provisioned throughput with Azure Cosmos DB

Provisioned throughput can automatically scale up or down in response to application patterns.  Once a throughput maximum is set, Azure Cosmos DB containers and databases will automatically and instantly scale provisioned throughput based on application needs.

Autoscale removes the requirement for capacity planning and management while maintaining SLAs. For that reason, it is ideally suited for scenarios of highly variable and unpredictable workloads with peaks in activity. It is also suitable for when you’re deploying a new application and you’re unsure about how much provisioned throughput you need. For development and test databases, Azure Cosmos DB containers will scale down to a pre-set minimum (starting at 400 RU/s or 10 percent of maximum) when not in use. Autoscale can also be paired with the free tier.

Follow the documentation for more details on the scenarios and how to use Azure Cosmos DB autoscale.

Share throughput at the database level

In a shared throughput database, all containers inside the database share the provisioned throughput (RU/s) of the database. For example, if you provision a database with 400 RU/s and have four containers, all four containers will share the 400 RU/s. In a development or testing environment, where each container may be accessed less frequently and thus require lower than the minimum of 400 RU/s, putting containers in a shared throughput database can help optimize cost.

For example, suppose your development or test account has four containers. If you create four containers with dedicated throughput (minimum of 400 RU/s), your total RU/s will be 1,600 RU/s. In contrast, if you create a shared throughput database (minimum 400 RU/s) and put your containers there, your total RU/s will be just 400 RU/s. In general, shared throughput databases are great for scenarios where you don't need guaranteed throughput on any individual container

Follow the documentation to create a shared throughput database that can be used for development and testing environments.

Optimize the data tier with Azure Database for PostgreSQL

Azure Database for PostgreSQL is a fully-managed service providing enterprise-grade features for community edition PostgreSQL. With the continued growth of open source technologies especially in times of crisis, PostgreSQL has been seeing increased adoption by users to ensure the consistency, performance, security, and durability of their applications while continuing to stay open source with PostgreSQL. With developer-focused experiences and new features optimized for cost, Azure Database for PostgreSQL enables the developer to focus on their application while database management is taken care of by Azure Database for PostgreSQL.

Reserved capacity pricing—Now on Azure Database for PostgreSQL

Manage the cost of running your fully-managed PostgreSQL database on Azure through reserved capacity now made available on Azure Database for PostgreSQL. Save up to 60 percent compared to regular pay-as-you-go payment options available today.

Check out pricing on Azure Database for PostgreSQL to learn more.

High performance scale-out on PostgreSQL

Leverage the power of high-performance horizontal scale-out of your single-node PostgreSQL database through Hyperscale. Save time by doing transactions and analytics in one database while avoiding the high costs and efforts of manual sharding.

Get started with Hyperscale on Azure Database for PostgreSQL today.

Stay compatible with open source PostgreSQL

By leveraging Azure Database for PostgreSQL, you can continue enjoying the many innovations, versions, and tools of community edition PostgreSQL without major re-architecture of your application. Azure Database for PostgreSQL is extension-friendly so you can continue achieving your best scenarios on PostgreSQL while ensuring top-quality, enterprise-grade features like Intelligent Performance, Query Performance Insights, and Advanced Threat Protection are constantly at your fingertips.

Check out the product documentation on Azure Database for PostgreSQL to learn more.
Quelle: Azure

Making your data residency choices easier with Azure

Azure is now available in over 140 countries and offers customers more than 60 datacenter regions worldwide (and growing) from which to choose. These Azure regions provide customers with the benefits of data residency and latency optimization and may enable regional compliance.

We understand that with Azure’s over 200 services, advances in architecture, and data protection promises, there are a lot of options available to customers. To help you make the right decisions, we have summarized the answers to your questions on Azure regions, data residency, data access, and retention. Download the white paper, Enabling Data Residency and Data Protection in Azure Regions to learn more.

When customers move workloads to Azure, they face a number of choices, such as datacenter regions, high availability (HA) and disaster recovery (DR) architecture, and encryption models. To make the right decisions, customer need to consider both technical and regulatory requirements. To optimize latency, customers should determine the appropriate region based on the location of their users or customer base.

For regulatory compliance considerations, data residency considerations may support or even mandate the physical locations where data can be stored, and how and when it can be transferred internationally. These regulations can differ significantly depending on jurisdiction. Azure’s regions and service features provide customers with different avenues so they can select and limit data residency and data access. This enables customers in regulated industries to successfully run mission-critical workloads in the cloud and leverage all the advantages of the Microsoft hyperscale cloud.

The purpose of the white paper is to give customer-specific guidance in navigating these decisions, including:

Understanding Azure’s regional infrastructure, including high availability, availability zones, disaster recovery, latency, and service availability considerations, and how to make optimal architecture decisions.
Data residency assurances and how customers can control data residency. Most Azure services are deployed regionally and enable the customer to specify the region into which the service will be deployed and control where the customer data will be stored. Certain services and regions have some exceptions and limitations to these rules, which are outlined fully in the white paper.
Data access to telemetry data, including elevated access for support data, and how customers can manage data access. The collection and use of telemetry and support data issues  has raised questions from some of our customers, and the white paper provides detailed answers.
How Microsoft protects customer data from unauthorized access and how Microsoft handles government requests, including implications of the Cloud ACT. Customers have asked us for specific details about when Microsoft engineers may access data and how we respond to government requests for data. The white paper provides clarity.
Tools customers can use to protect from unauthorized and authorized data access. Customers have a wealth of tools available to restrict, protect, and encrypt data at rest, in transit, and in some cases, in use.
Data retention and deletion. The white paper details Microsoft’s policies and practices for the retention and disposal of customer data.

We appreciate all of the feedback and questions we have received from customers regarding data residency and data protection in recent months, and we will continue to strive to provide you the most complete and current answers we can, so expect this white paper to be updated in the future.

Download Enabling Data Residency and Data Protection in Azure Regions, and visit Azure Global Infrastructure and Microsoft Trust Center to learn more.
Quelle: Azure

Minimize disruption with cost-effective backup and disaster recovery solutions on Azure

A top of mind concern among our customers is keeping their applications and data workloads running and recoverable in the case of unforeseen events or disasters. For example, COVID-19 has presented daunting challenges for IT, which are only compounded by growing threats from ransomware or setbacks related to technical or operational failure. These considerations further highlight the importance of a plan to ensure business continuity. IT admins are looking to cloud-based backup and disaster recovery solutions as part of their business continuity strategy because of the ability to quickly onboard, scale based on storage needs, remotely manage, and save costs by avoiding additional on-premises investments.

Azure provides native cloud solutions for customers to implement simple, secure and cost-effective business continuity and disaster recovery (BCDR) strategies for their applications and data whether they are on-premises or on Azure. Once enabled, customers benefit from minimal maintenance and monitoring overhead, remote management capabilities, enhanced security, and the ability to immutably recover services in a timely and orchestrated manner. Customers can also use their preferred backup and disaster recovery providers from a range of our partner solutions to extend their on-premises BCDR solutions to Azure.

All of this is possible without the need to learn new tools for configuration or management. Simply create an Azure Storage account and you have Petabytes of available offsite storage to add to your BCDR solution within a few minutes.

Reduce complexity, cost, and enhance security with Azure solutions

Azure Backup is a service designed to back up and restore data, and Azure Site Recovery is designed to perform seamless application disaster recovery. Together, these services provide a more complete backup and recovery solution that can be implemented and scaled with just a few clicks.

By not having to build on-premises solutions or maintain a costly secondary datacenter, users can reduce the cost of deploying, monitoring, and patching disaster recovery infrastructure. Azure Backup uses flexible policies to automatically allocate and manage storage to optimize cost and meet business objectives. Together, Azure Backup and Azure Site Recovery use the underlying power of Azure’s highly available storage to store customer data. These native capabilities are available through a pay-as-you-use model that only bills for storage consumed.

Azure’s centralized management interface for Azure Backup and Azure Site Recovery makes it simple and easy to define policies to natively protect a wide range of enterprise workloads including Azure Virtual Machines, SQL and SAP databases, Azure File shares and on-premises Windows servers or Linux VMs. Using Azure Site Recovery, users can set up and manage replication, failover, and failback from the Azure portal. Customers can also take advantage of the Windows Admin Center Azure Hybrid Services Hub to protect on-premises virtual machines (VMs) and enable Azure Backup and Site Recovery right from the Windows Admin Center console.

We are committed to providing the best-in-class security capabilities to protect customer resources on Azure. Azure Backup protects backups of on-premises and cloud-resources from ransomware attacks by isolating backup data from source data, combined with multi-factor authentication (MFA) and the ability to recover maliciously or accidentally deleted backup data. With Azure Site Recovery you can fail over VMs to the cloud or between cloud datacenters and secure them with network security groups.

Peace of mind is paramount when it comes to recovering from the unexpected. In the case of a disruption, accidental deletion, or corruption of data, customers can rest assured that they will be able to recover their business services and data in a timely and orchestrated manner. These native capabilities support low recovery-point objective (RPO) and recovery-time objective (RTO) targets for any critical workload. Azure is here to help customers pivot towards a strengthened BCDR strategy.

Extend solutions to Azure with our trusted partner ecosystem

We understand that organizations may be using an on-premises BCDR solution from another technology provider. A number of popular BCDR solutions are integrated with Azure enabling customers to extend their existing solutions into the cloud.

Some examples include:

Commvault supports all tiers of Azure Storage as an offsite backup and data management target and enables backup and recovery from on-premises to Azure and for Azure VMs. Customers can quickly and easily restore applications, workloads and data to Azure as a cost-effective disaster recovery (DR) site and use Commvault Live Sync to achieve low recovery point objectives (RPOs).
Rubrik offers built-for-Azure features like Smart Tiering for easy backup to Azure, cost-effective data storage in the tier of choice, and quick recovery of data and apps to Azure in the event of a disaster or for dev-test scenarios. Rubrik enables backup and recovery from on-premises to Azure and for Azure VMs.
Veeam Backup and Replication integrates with Azure to easily protect and recover on-premises VMs, physical servers, and endpoints into Azure. Veeam Backup for Microsoft Azure leverages native Azure functionality and a built-in cost-calculator to provide an integrated, simple and cost-effective backup for Azure VMs.
Veritas’ NetBackup and Backup Exec offer backup, disaster recovery and migration to Azure. NetBackup CloudCatalyst and CloudPoint enable backup and recovery of on-premises assets to Azure, and protection of Azure VMs respectively. NetBackup Resiliency enables integrated disaster recovery and migration experiences to Azure, between Azure regions and Azure Stack.

Discover the available partner solutions in the Azure Marketplace.

Learn more

Strengthen your BCDR strategy today by taking these next steps:

Sign up for the webinar, Minimize Business Disruption with Azure BCDR Solutions.
Review options to extend your current BCDR solution to Azure with our trusted partners.
Get started with Azure Backup and Azure Site Recovery today.

Quelle: Azure

New Azure maps make identifying local compliance options easy

Countries around the world are placing more compliance requirements on organizations of all types, and customers want to more easily understand which compliance offerings are available in their locale before they deploy cloud resources. Today we're releasing a new infographic, along with a 37-page e-book showing compliance details in over 30 key geographies.

Organizations around the world are taking advantage of digital transformation by moving data and services to the cloud. Yet for organizations to feel secure in taking advantage of the cloud, they must first trust in the security and privacy protections offered by cloud providers. Compliance plays a critical role in building that trust.

Azure is a cloud platform that is built for global compliance, being certified by independent auditors for a set of rigorous and widely-recognized compliance standards, including ISO/IEC 27001 and ISO/IEC 27018, FedRAMP, and SOC 1, 2, and 3 Reports. Azure compliance offerings are global, with over 90 compliance offerings, including offerings specific to separate geographies, regions, and industries.

Azure global compliance infographic

The Azure global compliance infographic provides a full-page, single view of all of Azure’s over 90 compliance offerings in a global context. The infographic displays global offerings, which apply to all Azure regions; regional offerings, such as the GDPR, which applies to all EU countries; and local offerings that apply to specific geographies. The compliance map is overlaid with Azure’s global network of over 60 regions, providing customers with a complete view of Azure’s global compliance. Download the infographic and see for yourself the global reach of Azure, leading the industry with the broadest and deepest compliance portfolio in the industry.

Azure enables a world of compliance e-book

The Azure enables a world of compliance e-book features illustrated compliance maps for over 30 geographies, each map displaying all the applicable compliance offerings for every geography. Included in each compliance map are:

Local offerings that are specific to the geography: In the example map of France below, local offerings are Autorité des Marchés Financiers (AMF) and Autorité de Contrôle Prudentiel et de Résolution, (ACPR), standards for financial services in France, and Hébergeurs de Données de Santé (HDS), a healthcare certification.
Regional offerings that are applied to a geographic region: The France map includes offerings that apply across the EU, including the GDPR and other EU-wide laws and standards.
Global offerings: All maps include international offerings like International Standards Organization (ISO) standards or Cloud Security Alliance (CSA) certifications that apply in all Azure public cloud regions.
Azure Regions: Each map also includes the location of any Azure regions in the geography. It’s important to note that most country-specific compliance offerings do not require that data be retained in local datacenters.
Links: Each compliance offering listed on the maps is also hyperlinked to a more detailed listing on the Microsoft Trust Center.

Download the e-book here.

Navigating your way to the cloud

Microsoft subsidiaries also provide detailed information about compliance for specific geographies through a set of web pages titled Navigating your way to the cloud. These documents address data residency requirements generally, with special emphasis on the financial services and healthcare sectors in over 40 countries. These resources include:

Navigating your way to the Cloud in Europe: A Compliance Guide.
Navigating your way to the Cloud in Asia: A Guide for the Legal & Compliance Professionals.
Navigating your way to the Cloud Middle East and Africa: Guide for Legal and Compliance Professionals.

To learn more about why Azure is the most trusted cloud, visit Trust your cloud and learn about Azure’s security, privacy, compliance, and resiliency features.
Quelle: Azure

Optimize your Azure costs to help meet your financial objectives

Many of our customers are facing difficult decisions about how to meet their funding needs for critical IT projects. We’re in this together to help you meet your financial objectives. Ensuring your Azure workloads are cost optimized can help free up funds to support essential surge areas like remote work.

For the fourth year running, cost optimization is the top cloud initiative according to Flexera’s 2020 State of the Cloud Report

Today, we’ll cover the Azure tools, offers, and guidance that can help you manage and optimize your cloud costs. You’ll learn how to understand and forecast your bill, cost optimize your workload, and control your spending. Then we’ll show you seven things you can do today to optimize your cloud costs and start saving.

Understand and forecast your costs

To manage and optimize your Azure costs, you first need to understand what you’re spending now and forecast what your bill is likely to be in the future for your current and planned projects.

Azure Cost Management + Billing gives you a full set of cloud cost management capabilities. You can use Cost Management + Billing to:

Monitor and analyze your Azure bill.
Set budgets and spending alerts.
Allocate costs to your teams and projects.

As you embark on new priority workloads or revisit existing workloads for cost optimization opportunities, you’ll want to estimate the costs involved. Cost Management + Billing surfaces rich operational and financial insights, including cost forecasts. In addition, the Azure pricing calculator and Total Cost of Ownership (TCO) calculator can help you estimate costs for your next Azure projects.

 

Cost optimize your workloads

Once you have a better understanding of your spending now and in the future, you can focus on cost optimizing your Azure resources and workloads. In this area, Azure Advisor and the Microsoft Azure Well-Architected Framework are our primary optimization tools, while key Azure offers and licensing terms, like the Azure Hybrid Benefit and Azure Reservations can help you realize significant savings.

Azure Advisor helps you cost optimize your Azure resources with personalized best practice recommendations based on your configurations and usage data. For example, Azure Advisor can help you identify unused resources like idle virtual machines (VMs) and find opportunities to right-size resources like SQL databases.

The Azure Well-Architected Framework provides guidance and best practices at a broader architectural level to help you cost optimize your workloads. You can also take the Azure Well-Architected Review to assess your current or planned workloads from a cost perspective to get tailored recommendations on how to improve.

In addition, Azure offers and licensing terms can make a big difference when it comes to cost savings including:

Azure Hybrid Benefit: AWS is five times more expensive than Azure for Windows Server and SQL Server. Save when you migrate your on-premises workloads to Azure.
Azure Reservations: Get a discount of up to 72 percent over pay-as-you-go pricing on Azure services when you prepay for a one- or three-year term with reservation pricing.
Azure Spot Virtual Machines: Utilize unused Azure compute capacity at deep discounts—up to 90 percent compared to pay-as-you-go prices.
Azure Dev/Test Pricing: Take advantage of significant rate discounts for your ongoing development and testing—and avoid paying Microsoft software charges on your VMs.

Control your costs

Finally, you’ll want to put in place organization-wide cost management policies and guardrails, so you can free up your teams to go fast while keeping your costs under control.

For many customers, governing cost is a major concern when it comes to the cloud. Balancing workload requirements like performance demands and reliability needs with cloud costs can be challenging, especially during times of business transformation like we see today. The Microsoft Cloud Adoption Framework for Azure provides organizational cost management guidance as part of a cloud governance strategy. This guidance can help mitigate cloud spending risks through cost management best practices.

Once you’ve set your organizational policies for cost management, you can implement these cost controls and guardrails directly in your Azure environment with Azure Policy to achieve real-time cloud compliance at scale.

7 ways to optimize your Azure costs today

With many tools to use, offers to take advantage of, and best practices to follow, you may wonder how to get started. To help you get going fast, we recommend starting with these seven ways you can optimize your Azure costs today.

Shut down unused resources: Identify idle virtual machines, ExpressRoute circuits, and other resources with Azure Advisor. Get recommendations on which resources to shut down and see how much you would save.
Right-size underused resources: Find underutilized resources with Azure Advisor and get recommendations on how to reduce spending by reconfiguring or consolidating them.
Reserve instances for consistent workloads: Save money when you reserve resources in advance. Plus, enjoy monthly payment options at no extra cost.
Take advantage of the Azure Hybrid Benefit: Realize major savings when you bring your Windows Server and SQL Server on-premises licenses with Software Assurance to Azure.
Get Azure Dev/Test pricing for development environments: Get rate discounts on Azure to support development and testing scenarios.
Set up budgets and allocate costs to teams and projects: Create and manage budgets for the Azure services you use or subscribe to—and monitor your organization’s cloud spending—with Azure Cost Management.
Explore serverless technologies and new architectures: Optimize your workloads with guidance for building apps and solutions on Azure using best practices and proven patterns.

Get started with cost optimization

Cloud cost optimization has never been more critical than it is today. Azure is here to support you with tools, offers, and best practices to help you optimize your cloud costs and meet your technical and business goals during this difficult time.

We’re also kicking off a new blog series to help you optimize your Azure environment. Check the Azure blog soon for the latest cost optimization guidance and best practices or visit our cost optimization on Azure page to learn more.
Quelle: Azure

Seven ways to achieve cost savings and deliver efficiencies with Azure infrastructure

The global health crisis has transformed the way we work and live. At Microsoft, we are committed to doing what we can to help our customers respond to the crisis and plan ahead for future success.

In the past couple of months, I have been learning from—and inspired by—IT leaders around the globe who have been quickly adjusting IT priorities to enable remote work, and optimize costs and efficiencies while investing in smart ways to prepare for recovery and future growth. To achieve these goals, many IT leaders are accelerating the adoption of cloud computing.

Achieve cost savings and deliver efficiencies with Azure infrastructure

Below are seven ways in which Azure infrastructure can help you today to improve cash flow, achieve cost savings, increase operational efficiencies, and unify security and management.

1. Enable remote work anywhere and ensure productivity

To ensure that users have access to desktops and apps they need to work from anywhere, you can spin up and scale virtual desktops quickly with Windows Virtual Desktop—no need to provision new hardware. Windows Virtual Desktop delivers the best Windows 10 and Office 365 virtual desktop experience with support for multi-session. You only pay for the infrastructure that you use and save money by turning off machines when they are not in use. Read the blog to learn about new Windows Virtual Desktop capabilities we recently released to enable even faster deployment, further enhance security and compliance, and provide the Microsoft Teams user experience you would expect on a desktop from within your virtual desktop.

In addition to virtual desktops, employees need access to resources across on-premises and cloud. With Azure, you can quickly extend and expand your on-premises virtual private network (VPN) solution with Azure VPN Gateway. It can be provisioned quickly and scale up and down easily. The Azure network is designed to withstand sudden changes in resource utilization and can easily manage peak utilization periods. Read the blog to learn about best practices and tips on how Azure VPN helps organizations scale remote work.

2. Maintain business continuity to avoid costly disruptions

Many companies are taking a closer look at their backup and disaster recovery strategy to make sure data and applications are fully protected from business disruptions. With just a few clicks, Azure Backup and Azure Site Recovery let customers easily back up their data and applications to the cloud, and quickly deploy cloud-based replication, failover, and recovery processes.

Also, to ensure reliable connectivity, customers are using Azure Virtual WAN with supported partner SD-WAN devices to connect remote branch offices to on-premises and public cloud where applications and data are hosted. Azure Virtual WAN routes traffic from the closest network location to the branch office over Microsoft’s dedicated global network to its destination, providing fast and reliable connectivity at scale.

3. Secure on-premises and cloud workloads from increased cyberattacks

With the increased dependence on digital infrastructure to enable remote work, there has been an increase in cybercrime. Microsoft invests more than $1 billion USD every year on cybersecurity and has a massive threat intelligence source processing more than eight trillion signals a day.

Azure Security Center provides a unified view of the security state of all your cloud and on-premises workloads and gives you security recommendations including turning on multi-factor authentication (MFA), ensuring secure shell (SSH) and remote desktop protocol (RDP) ports are protected, and more. Azure Web Application Firewall and Azure DDoS Protection protect your web workloads with zero trust security including verifying identity on every access and ensuring your networks are correctly segmented. Many customers are also taking advantage of Azure Sentinel, a cloud native security information and event management SIEM, to access advanced AI-enabled threat protection. Learn more about strengthening security operations and open-sourcing new threat intelligence during Covid-19.

4. Efficiently govern and manage your hybrid environments

Customers’ IT environments are evolving with different types of applications often running on a diverse set of hardware across distributed locations. How to efficiently manage IT resources without slowing down developer innovation is a key challenge that IT leaders face today. Azure Arc lets customers seamlessly govern, manage, and secure Windows and Linux servers, Kubernetes clusters and applications across on-premises, multicloud, and the edge from a single control plane. Azure Arc also brings Azure services such as Azure data services to any infrastructure, so customers can take advantage of Azure innovation—including the latest cloud capabilities, scalability, rapid deployment, and cloud billing—on any cloud and any Kubernetes cluster.

5. Migrate to the cloud and save money

Organizations are accelerating cloud migration to reduce capital expenditure, realize cost savings and speed up time to value. With offers including Azure Hybrid Benefit and free extended security updates, Azure is five times cheaper than AWS to migrate Windows Server and SQL Server workloads. Migrate your Linux workloads to Azure to gain more performance and efficiency on your favorite distributions with integrated support from Microsoft. In addition, we help customers reduce migration costs and accelerate their migration journey with free migration tools, guidance, and best practices through the Azure Migration Program.

6. Modernize on-premises infrastructure

Many customers keep certain workloads on-premises to meet regulatory requirements, address latency issues, or maximize existing investments. In this case, customers can take advantage of cloud innovation by modernizing their on-premises infrastructure and increase efficiency.

For example, Azure Stack HCI lets customers consolidate on-premises virtualized applications on cloud-connected, hyper-converged infrastructure to access the best price-performance for storage and compute. Azure Stack Hub helps customers in over 60 countries build and deploy cloud-native applications on-premises and run their own private, autonomous cloud—connected or disconnected—from Azure.

7. Continuously cost optimize your workloads

Azure offers many ways for customers to optimize their costs. Azure Cost Management + Billing give customers free tools to monitor and analyze cloud spend, set budget and spending alerts, and allocate cloud costs across teams. Azure Advisor helps organizations optimize cloud resource utilization with personalized recommendations. Offers such as Azure Spot Virtual Machine combined with Azure Reservations let customers save up to 90 percent over pay-as-you-go pricing on Azure services with pre-paid reservation pricing or utilizing unused Azure compute capacity at deep discounts. Learn more about all the ways to optimize your Azure costs.

Get started with Azure infrastructure services today

We are publishing more blog posts in the coming weeks to go deeper on some of the topics covered above. Be sure to come back to the Azure blog and check out our new post. Also visit Azure infrastructure as a service (IaaS) to learn more about how Azure infrastructure services and solutions can help you.
Quelle: Azure

Announcing Azure Machine Learning scholarships and courses with Udacity

The demand for artificial intelligence (AI) and data science roles continues to rise. According to LinkedIn’s Emerging Jobs Report for 2020, AI specialist roles are most sought after with a 74 percent annual growth rate in hiring over the last four years. Additionally, the current global health pandemic has powered a shift towards remote working as well as an increased interest in professional training resources. To address this demand, we’re announcing our collaboration with Udacity to launch new machine learning courses for both beginners and advanced users, as well as a scholarship program.

Through these new offerings, Microsoft aims to help expand the talent pool of data scientists and improve access to education and resources to anyone interested. I recently sat down for a chat with Udacity CEO, Gabe Dalporto, to talk about this collaboration.

Udacity is a digital education platform with over 250,000 currently active students. Their students have expressed continued interest in introductory machine learning (ML) content that doesn’t require advanced programming knowledge. In response, Microsoft Azure and Udacity have created a unique free course based on Azure Machine Learning. This Introduction to machine learning on Azure course will help students learn the basics of ML through a low-code experience powered by Azure Machine Learning’s automated ML and drag-and-drop capabilities. Students will have the opportunity to learn using Azure Machine Learning hands-on labs directly within the Udacity classroom and develop the foundations for their data science skills.

For advanced users, we’re offering a new machine learning Nanodegree Program with Microsoft Azure. In this program, students will further enhance their skills by building and deploying sophisticated ML solutions using popular open source tools and frameworks such as PyTorch, TensorFlow, scikit-learn, and ONNX. Using Azure Machine Learning’s responsible ML and MLOps capabilities, students will gain experience in understanding their ML models, protecting people and their data, and controlling the end-to-end ML lifecycle at scale.

As part of this collaboration, we are offering the top 300 performers of the free introductory course with scholarships to the Nanodegree Program, so they can continue to develop their data science skills. These new courses will empower more students to gain proficiency in data science and AI. More details on the program can be found on the course page.

Sign up today!
Quelle: Azure

Azure responds to COVID-19

The global health pandemic continues to impact every organization—large or small—their employees, and the customers they serve. Over the last several months, we have seen firsthand the role that cloud computing plays in sustaining operations across the board that helps us live, work, learn, and play.

During this unparalleled time all of Microsoft’s cloud services, in particular Azure, Microsoft Teams, Windows Virtual Desktop, and Xbox Live experienced unprecedented demand. It has been our privilege to provide support and the infrastructure needed to help our customers successfully accelerate their cloud adoption to enable digital transformation during such a critical time.

Over the last 90 days, we have learned a lot and I want to share those observations with you all. The following video has been developed to provide a more technical look at how we scaled Azure as the COVID-19 outbreak rapidly pushed demand for cloud services.

Related post: Advancing Microsoft Teams on Azure – Operating at pandemic scale.
Related article: Growing Azure’s capacity to help customers, Microsoft during the COVID-19 pandemic.
Quelle: Azure

Advancing Microsoft Teams on Azure—operating at pandemic scale

“The COVID-19 pandemic has reset what it means to work, study, and socialize. Like many of us, I have come to rely on Microsoft Teams as my connection to my colleagues. In this post, our friends from the Microsoft Teams product group—Rish Tandon (Corporate Vice President), Aarthi Natarajan (Group Engineering Manager), and Martin Taillefer (Architect)—share some of their learnings about managing and scaling an enterprise-grade, secure productivity app.” – Mark Russinovich, CTO, Azure

 

Scale, resiliency, and performance do not happen overnight—it takes sustained and deliberate investment, day over day, and a performance-first mindset to build products that delight our users. Since its launch, Teams has experienced strong growth: from launch in 2017 to 13 million daily users in July 2019, to 20 million in November 2019. In April, we shared that Teams has more than 75 million daily active users, 200 million daily meeting participants, and 4.1 billion daily meeting minutes. We thought we were accustomed to the ongoing work necessary to scale service at such a pace given the rapid growth Teams had experienced to date. COVID-19 challenged this assumption; would this experience give us the ability to keep the service running amidst a previously unthinkable growth period?

A solid foundation

Teams is built on a microservices architecture, with a few hundred microservices working cohesively to deliver our product’s many features including messaging, meetings, files, calendar, and apps. Using microservices helps each of our component teams to work and release their changes independently.

Azure is the cloud platform that underpins all of Microsoft’s cloud services, including Microsoft Teams. Our workloads run in Azure virtual machines (VMs), with our older services being deployed through Azure Cloud Services and our newer ones on Azure Service Fabric. Our primary storage stack is Azure Cosmos DB, with some services using Azure Blob Storage. We count on Azure Cache for Redis for increased throughput and resiliency. We leverage Traffic Manager and Azure Front Door to route traffic where we want it to be. We use Queue Storage and Event Hubs to communicate, and we depend on Azure Active Directory to manage our tenants and users.

 

 

While this post is mostly focused on our cloud backend, it’s worth highlighting that the Teams client applications also use modern design patterns and frameworks, providing a rich user experience, and support for offline or intermittently connected experiences. The core ability to update our clients quickly and in tandem with the service is a key enabler for rapid iteration. If you’d like to go deeper into our architecture, check out this session from Microsoft Ignite 2019.

Agile development

Our CI/CD pipelines are built on top of Azure Pipelines. We use a ring-based deployment strategy with gates based on a combination of automated end-to-end tests and telemetry signals. Our telemetry signals integrate with incident management pipelines to provide alerting over both service- and client-defined metrics. We rely heavily on Azure Data Explorer for analytics.

In addition, we use an experimentation pipeline with scorecards that evaluate the behavior of features against key product metrics like crash rate, memory consumption, application responsiveness, performance, and user engagement. This helps us figure out whether new features are working the way we want them to.

All our services and clients use a centralized configuration management service. This service provides configuration state to flip product features on and off, adjust cache time-to-live values, control network request frequencies, and set network endpoints to contact for APIs. This provides a flexible framework to “launch darkly,” and to conduct A/B testing such that we can accurately measure the impact of our changes to ensure they are safe and efficient for all users.

Key resiliency strategies

We employ several resiliency strategies across our fleet of services:

Active-active fault tolerant systems: An active-active fault tolerant system is defined as two (or more) operationally-independent heterogenous paths, with each path not only serving live traffic at a steady-state but also having the capability to serve 100 percent of expected traffic while leveraging client and protocol path-selection for seamless failover. We adopt this strategy for cases where there is a very large failure domain or customer impact with reasonable cost to justify building and maintaining heterogeneous systems. For example, we use the Office 365 DNS system for all externally visible client domains. In addition, static CDN-class data is hosted on both Azure Front Door and Akamai.
Resiliency-optimized caches: We leverage caches between our components extensively, for both performance and resiliency. Caches help reduce average latency and provide a source of data in case a downstream service is unavailable. Keeping data in caches for a long time introduces data freshness issues yet keeping data in caches for a long time is the best defense against downstream failures. We focus on Time to Refresh (TTR) to our cache data as well as Time to Live (TTL). By setting a long TTL and a shorter TTR value, we can fine-tune how fresh to keep our data versus how long we want data to stick around whenever a downstream dependency fails.
Circuit Breaker: This is a common design pattern that prevents a service from doing an operation that is likely to fail. It provides a chance for the downstream service to recover without being overwhelmed by retry requests. It also improves the response of a service when its dependencies are having trouble, helping the system be more tolerant of error conditions.
Bulkhead isolation: We partition some of our critical services into completely isolated deployments. If something goes wrong in one deployment, bulkhead isolation is designed to help the other deployments to continue operating. This mitigation preserves functionality for as many customers as possible.
API level rate limiting: We ensure our critical services can throttle requests at the API level. These rate limits are managed through the centralized configuration management system explained above. This capability enabled us to rate limit non-critical APIs during the COVID-19 surge.
Efficient Retry patterns: We ensure and validate all API clients implement efficient retry logic, which prevents traffic storms when network failures occur.
Timeouts: Consistent use of timeout semantics prevents work from getting stalled when a downstream dependency is experiencing some trouble.
Graceful handling of network failures: We have made long-term investments to improve our client experience when offline or with poor connections. Major improvements in this area launched to production just as the COVID-19 surge began, enabling our client to provide a consistent experience regardless of network quality.

If you have seen the Azure Cloud Design Patterns, many of these concepts may be familiar to you.  We also use the Polly library extensively in our microservices, which provides implementations for some of these patterns.

Our architecture had been working out well for us, Teams use was growing month-over-month and the platform easily scaled to meet the demand. However, scalability is not a “set and forget” consideration, it needs continuous attention to address emergent behaviors that manifest in any complex system.

When COVID-19 stay-at-home orders started to kick in around the world, we needed to leverage the architectural flexibility built into our system, and turn all the knobs we could, to effectively respond to the rapidly increasing demand.

Capacity forecasting

Like any product, we build and constantly iterate models to anticipate where growth will occur, both in terms of raw users and usage patterns. The models are based on historical data, cyclic patterns, new incoming large customers, and a variety of other signals.

As the surge began, it became clear that our previous forecasting models were quickly becoming obsolete, so we needed to build new ones that take the tremendous growth in global demand into account. We were seeing new usage patterns from existing users, new usage from existing but dormant users, and many new users onboarding to the product, all at the same time. Moreover, we had to make accelerated resourcing decisions to deal with potential compute and networking bottlenecks. We use multiple predictive modeling techniques (ARIMA, Additive, Multiplicative, Logarithmic). To that we added basic per-country caps to avoid over-forecasting. We tuned the models by trying to understand inflection and growth patterns by usage per industry and geographic area. We incorporated external data sources, including Johns Hopkins’ research for COVID-19 impact dates by country, to augment the peak load forecasting for bottleneck regions.

Throughout the process, we erred on the side of caution and favored over-provisioning—but as the usage patterns stabilized, we also scaled back as necessary.

Scaling our compute resources

In general, we design Teams to withstand natural disasters. Using multiple Azure regions helps us to mitigate risk, not just from a datacenter issue, but also from interruptions to a major geographic area. However, this means we provision additional resources to be ready to take on an impacted region’s load during such an eventuality. To scale out, we quickly expanded deployment of every critical microservice to additional regions in every major Azure geography. By increasing the total number of regions per geography, we decreased the total amount of spare capacity each region needed to hold to absorb emergency load, thereby reducing our total capacity needs. Dealing with load at this new scale gave us several insights into ways we could improve our efficiency:

We found that by redeploying some of our microservices to favor a larger number of smaller compute clusters, we were able to avoid some per-cluster scaling considerations, helped speed up our deployments, and gave us more fine-grained load-balancing.
Previously, we depended on specific virtual machine (VM) types we use for our different microservices. By being more flexible in terms of a VM type or CPU, and focusing on overall compute power or memory, we were able to make more efficient use of Azure resources in each region.
We found opportunities for optimization in our service code itself. For example, some simple improvements led to a substantial reduction in the amount of CPU time we spend generating avatars (those little bubbles with initials in them, used when no user pictures are available).

Networking and routing optimization

Most of Teams’ capacity consumption occurs within daytime hours for any given Azure geography, leading to idle resources at night. We implemented routing strategies to leverage this idle capacity (while always respecting compliance and data residency requirements):

Non-interactive background work is dynamically migrated to the currently idle capacity. This is done by programming API-specific routes in Azure Front Door to ensure traffic lands in the right place.
Calling and meeting traffic was routed across multiple regions to handle the surge. We used Azure Traffic Manager to distribute load effectively, leveraging observed usage patterns. We also worked to create runbooks which did time-of-day load balancing to prevent wide area network (WAN) throttling.

Some of Teams’ client traffic terminates in Azure Front Door. However, as we deployed more clusters in more regions, we found new clusters were not getting enough traffic. This was an artifact of the distribution of the location of our users and the location of Azure Front Door nodes. To address this uneven distribution of traffic we used Azure Front Door’s ability to route traffic at a country level. In this example you can see below that we get improved traffic distribution after routing additional France traffic to the UK West region for one our services.

 
Figure 1: Improved traffic distribution after routing traffic between regions.

Cache and storage improvements

We use a lot of distributed caches. A lot of big, distributed caches. As our traffic increased, so did the load on our caches to a point where the individual caches would not scale. We deployed a few simple changes with significant impact on our cache use:

We started to store cache state in a binary format rather than raw JSON. We used the protocol buffer format for this.
We started to compress data before sending it to the cache. We used LZ4 compression due to its excellent speed versus compression ratio.

We were able to achieve a 65 percent reduction in payload size, 40 percent reduction in deserialization time, and 20 percent reduction in serialization time. A win all around.

Investigation revealed that several of our caches had overly aggressive TTL settings, resulting in unnecessary eager data eviction. Increasing those TTLs helped both reduce average latency and load on downstream systems.

Purposeful degradation (feature brownouts)

As we didn’t really know how far we’d need to push things, we decided it was prudent to put in place mechanisms that let us quickly react to unexpected demand spikes in order to buy us time to bring additional Teams capacity online.

Not all features have equal importance to our customers. For example, sending and receiving messages is more important than the ability to see that someone else is currently typing a message. Because of this, we turned off the typing indicator for a duration of two weeks while we worked on scaling up our services. This reduced peak traffic by 30 percent to some parts of our infrastructure.

We normally use aggressive prefetching at many layers of our architecture such that needed data is close at hand, which reduces average end-to-end latency. Prefetching however can get expensive, as it results in some amount of wasted work when fetching data that will never be used, and it requires storage resources to hold the prefetched data. In some scenarios we chose to disable prefetching, freeing up capacity on some of our services at the cost of higher latency. In other cases, we increased the duration of prefetch sync intervals. One such example was suppressing calendar prefetch on mobile which reduced request volume by 80 percent:
 

Figure 2: Disable prefetch of calendar event details in mobile.

Incident management

While we have a mature incident management process that we use to track and maintain the health of our system, this experience was different. Not only were we dealing with a huge surge in traffic, our engineers and colleagues were themselves going through personal and emotional challenges while adapting to working at home.

To ensure that we not only supported our customers but also our engineers, we put a few changes in place:

Switched our incident management rotations from a weekly cadence to a daily cadence.
Every on-call engineer had at least 12 hours off between shifts.
We brought in more incident managers from across the company.
We deferred all non-critical changes across our services.

These changes helped ensure that all of our incident managers and on-call engineers had enough time to focus on their needs at home while meeting the demands of our customers.

The future of Teams

It is fascinating to look back and wonder what this situation would have been like if it happened even a few years ago. It would have been impossible to scale like we did without cloud computing. What we can do today by simply changing configuration files could previously have required purchasing new equipment or even new buildings. As the current scaling situation stabilizes, we have been returning our attention to the future. We think there are many opportunities for us to improve our infrastructure:

We plan to transition from VM-based deployments to container-based deployments using Azure Kubernetes Service, which we expect will reduce our operating costs, improve our agility, and align us with the industry.
We expect to minimize the use of REST and favor more efficient binary protocols such as gRPC. We will be replacing several instances of polling throughout the system with more efficient event-based models.
We are systematically embracing chaos engineering practices to ensure all those mechanisms we put in place to make our system reliable are always fully functional and ready to spring into action.

By keeping our architecture aligned with industry approaches and by leveraging best practices from the Azure team, when we needed to call for assistance, experts could quickly help us solve problems ranging from data analysis, monitoring, performance optimization and incident management. We are grateful for the openness of our colleagues across Microsoft and the broader software development community. While the architectures and technologies are important, it is the team of people you have that keeps your systems healthy.

 

Related post: Azure responds to COVID-19.
Related article: Growing Azure’s capacity to help customers, Microsoft during the COVID-19 pandemic.
Quelle: Azure