Aiming for more than just net zero

Climate experts across the globe agree: if we can’t drastically reduce carbon emissions, our planet will face catastrophic consequences. Microsoft has operated carbon neutral since 2012, and in January 2020 Brad Smith announced our commitment to going carbon negative by 2030. This isn’t a goal we can reach in one easy swoop—it will take time, dedication, and many small steps that coalesce into something greater.

As the cloud business grows, our datacenter footprint grows. In our journey toward carbon negative, Microsoft is taking steps to roll back the effect datacenters have on the environment. Reaching this goal will take many steps, along with the implementation of innovative technologies that have yet to be developed.

Many companies are reaching for net zero emissions, but we’re taking it even further. We’re not just reducing our output to zero. We’re committed to reducing our emissions by half, and then removing the carbon we’ve emitted since 1975, to truly go carbon negative.

The journey to carbon negative

A big part of going carbon negative means completely changing the way datacenters operate. Datacenters have adopted some sustainable methods around cooling, including open-air and adiabatic cooling. These methods have helped to drastically reduce the water and energy consumption of datacenters, but they’re not enough. Currently, datacenters and the backup that powers them in peak load times depend on fossil fuels like diesel. Microsoft is working to change that.

Our ambitious goals to cut down our carbon footprint have necessitated exploration into various technologies. With each kind of technology, we’re determining the best combination to implement based on our overall goal as well as the specific datacenter locations and their local needs.

Liquid immersion cooling

Liquid immersion cooling is predicted to not only help eliminate water consumption but to lower energy consumption, at a minimum, by 5 to 15 percent. As a further benefit, this closed-loop cooling system leads to fewer server racks and smaller datacenter configurations. Datacenters take up a massive amount of space in their current configuration, making this a huge advantage.

Learn more using liquid immersion.

Grid-interactive UPS batteries

Grid-interactive Uninterruptible Power Supply (UPS) batteries help to balance supply and lower energy demand on the grid by directing microbursts of electricity to datacenters or the grid as needed. These batteries store energy at close to 90 percent efficiency and smooth out intermittency from renewables. As we continue to explore this technology further, we could potentially extend the duration of the batteries from a few minutes to several hours—potentially using these long-duration batteries as a replacement for traditional backup generators.

Learn more about powering sustainability goals.

Clean power backup

Clean power backup has the potential to easily replace conventional diesel with less harmful emissions. Synthetic diesel causes less harm to the environment and provides a much-needed bridge to using renewables. Synthetic diesel can even be used in diesel generators without any modifications, reducing emissions on the way to carbon negative.

Hydrogen fuel cells provide another option for green backup energy to datacenters and are almost two times more efficient than combustion engines. The only output is food-grade steam that is then recaptured and reused.

Learn more about clean power generators.

Learn more about hydrogen innovations.

Power your sustainability goals

These small steps add up to something big. According to a 2018 study, workloads on Azure can be up to 98 percent more carbon-efficient than when running on traditional on-premises datacenters, and we’re making more investments in the future of sustainability. By moving your workloads to Azure, you’re ensuring that your workloads are powered by datacenters with reduced emissions and lowered energy consumption. We’re committed to not only reducing our own carbon footprint but also helping you reduce yours.

Work with us toward a carbon-negative future.

Visit Microsoft Sustainability.
Quelle: Azure

Advancing resilience through chaos engineering and fault injection

“When I first kicked off this Advancing Reliability blog series in my post last July, I highlighted several initiatives underway to keep improving platform availability, as part of our commitment to provide a trusted set of cloud services. One area I mentioned was fault injection, through which we’re increasingly validating that systems will perform as designed in the face of failures. Today I’ve asked our Principal Program Manager in this space, Chris Ashton, to shed some light on these broader ‘chaos engineering’ concepts, and to outline Azure examples of how we’re already applying these, together with stress testing and synthetic workloads, to improve application and service resilience.” – Mark Russinovich, CTO, Azure

 

Developing large-scale, distributed applications has never been easier, but there is a catch. Yes, infrastructure is provided in minutes thanks to your public cloud, there are many language options to choose from, swaths of open source code available to leverage, and abundant components and services in the marketplace to build upon. Yes, there are good reference guides that help give a leg up on your solution architecture and design, such as the Azure Well-Architected Framework and other resources in the Azure Architecture Center. But while application development is easier, there’s also an increased risk of impact from dependency disruptions. However rare, outages beyond your control could occur at any time, your dependencies could have incidents, or your key services/systems could become slow to respond. Minor disruptions in one area can be magnified or have longstanding side effects in another. These service disruptions can rob developer productivity, negatively affect customer trust, cause lost business, and even impact an organization’s bottom line.

Modern applications, and the cloud platforms upon which they are built, need to be designed and continuously validated for failure. Developers need to account for known and unknown failure conditions, applications and services must be architected for redundancy, algorithms need retry and back-off mechanisms. Systems need to be resilient to the scenarios and conditions caused by infrequent but inevitable production outages and disruptions. This post is designed to get you thinking about how best to validate typical failure conditions, including examples of how we at Microsoft validate our own systems.

Resilience

Resilience is the ability of a system to fail gracefully in the face of—and eventually recover from—disruptive events. Validating that an application, service, or platform is resilient is equally as important as building for failure. It is easy and tempting to validate the reliability of individual components in isolation and infer that the entire system will be just as reliable, but that could be a mistake. Resilience is a property of an entire system, not just its components. To understand if a system is truly resilient, it is best to measure and understand the resilience of the entire system in the environment where it will run. But how do you do this, and where do you start?

Chaos engineering and fault injection

Chaos engineering is the practice of subjecting a system to the real-world failures and dependency disruptions it will face in production. Fault injection is the deliberate introduction of failure into a system in order to validate its robustness and error handling.

Through the use of fault injection and the application of chaos engineering practices generally, architects can build confidence in their designs – and developers can measure, understand, and improve the resilience of their applications. Similarly, Site Reliability Engineers (SREs) and in fact anyone who holds their wider teams accountable in this space can ensure that their service level objectives are within target, and monitor system health in production. Likewise, operations teams can validate new hardware and datacenters before rolling out for customer use. Incorporation of chaos techniques in release validation gives everyone, including management, confidence in the systems that their organization is building.

Throughout the development process, as you are hopefully doing already, test early and test often. As you prepare to take your application or service to production, follow normal testing practices by adding and running unit, functional, stress, and integration tests. Where it makes sense, add test coverage for failure cases, and use fault injection to confirm error handling and algorithm behavior. For even greater impact, and this is where chaos engineering really comes into play, augment end-to-end workloads (such as stress tests, performance benchmarks, or a synthetic workload) with fault injection. Start in a pre-production test environment before performing experiments in production, and understand how your solution behaves in a safe environment with a synthetic workload before introducing potential impact to real customer traffic.

Healthy use of fault injection in a validation process might include one or more of the following:

Ad hoc validation of new features in a test environment:
A developer could stand up a test virtual machine (VM) and run new code in isolation. While executing existing functional or stress tests, faults could be injected to block network access to a remote dependency (such as SQL Server) to prove that the new code handles the scenario correctly.
Automated fault injection coverage in a CI/CD pipeline, including deployment or resiliency gates:
Existing end-to-end scenario tests (such as integration or stress tests) can be augmented with fault injection. Simply insert a new step after normal execution to continue running or run again with some faults applied. The addition of faults can find issues that would normally not be found by the tests or to accelerate discovery of issues that might be found eventually.
Incident fix validation and incident regression testing:
Fault injection can be used in conjunction with a workload or manual execution to induce the same conditions that caused an incident, enabling validation of a specific incident fix or regression testing of an incident scenario.
BCDR drills in a pre-production environment:
Faults that cause database failover or take storage offline can be used in BCDR drills, to validate that systems behave appropriately in the face of these faults and that data is not lost during any failover tests.
Game days in production:
A ‘game day’ is a coordinated simulation of an outage or incident, to validate that systems handle the event correctly. This typically includes validation of monitoring systems as well as human processes that come into play during an incident. Teams that perform game days can leverage fault injection tooling, to orchestrate faults that represent a hypothetical scenario in a controlled manner.

Typical release pipeline

This figure shows a typical release pipeline, and opportunities to include fault injection:

 

 

An investment in fault injection will be more successful if it is built upon a few foundational components:

Coordinated deployment pipeline.
Automated ARM deployments.
Synthetic runners and synthetic end-to-end workloads.
Monitoring, alerting, and livesite dashboards.

With these things in place, fault injection can be integrated in the deployment process with little to no additional overhead – and can be used to gate code flow on its way to production.

Localized rack power outages and equipment failures have been found as single points of failure in root cause analysis of past incidents. Learning that a service is impacted by, and not resilient to, one of these events in production is a timebound, painful, and expensive process for an on-call engineer. There are several opportunities to use fault injection to validate resilience to these failures throughout the release pipeline in a controlled environment and timeframe, which also gives more opportunity for the code author to lead an investigation of issues uncovered. A developer who has code changes or new code can create a test environment, deploy the code, and perform ad hoc experiments using functional tests and tools with faults that simulate taking dependencies offline – such as killing VMs, blocking access to services, or simply altering permissions. In a staging environment, injection of similar faults can be added to automated end-to-end and integration tests or other synthetic workloads. Test results and telemetry can then be used to determine impact of the faults and compared against baseline performance to block code flow if necessary.

In a pre-production or ‘Canary’ environment, automated runners can be used with faults that again block access to dependencies or take them offline. Monitoring, alerting, and livesite dashboards can then be used to validate that the outages were observed as well as that the system reacted and compensated for the issue—that it demonstrated resilience. In this same environment, SREs or operations teams may also perform business continuity/disaster recovery (BCDR) drills, using fault injection to take storage or databases offline and once again monitoring system metrics to validate resilience and data integrity. These same Canary activities can also be performed in production where there is real customer traffic, but doing so incurs a higher possibility of impact to customers so it is recommended only to do this after leveraging fault injection earlier in the pipeline. Establishing these practices and incorporating fault injection into a deployment pipeline allows systematic and controlled resilience validation which enables teams to mitigate issues, and improve application reliability, without impacting end customers.

Fault injection at Microsoft

At Microsoft, some teams incorporate fault injection early in their validation pipeline and automated test passes. Different teams run stress tests, performance benchmarks, or synthetic workloads in their automated validation gates as normal and a baseline is established. Then the workload is run again, this time with faults applied – such as CPU pressure, disk IO jitter, or network latency. Workload results are monitored, telemetry is scanned, crash dumps are checked, and Service Level Indicators (SLIs) are compared with Service Level Objectives (SLOs) to gauge the impact. If results are deemed a failure, code may not flow to the next stage in the pipeline.

Other Microsoft teams use fault injection in regular Business Continuity, Disaster Recovery (BCDR) drills, and Game Days. Some teams have monthly, quarterly, or half-yearly BCDR drills and use fault injection to induce a disaster and validate both the recovery process as well as the alerting, monitoring and live site processes. This is often done in a pre-production Canary environment before being used in production itself with real customer traffic. Some teams also carry out Game Days, where they come up with a hypothetical scenario, such as replication of a past incident, and use fault injection to help orchestrate it. Faults, in this case, might be more destructive—such as crashing VMs, turning off network access, causing database failover, or simulating an entire datacenter going offline. Again, normal live site monitoring and alerting are used, so your DevOps and incident management processes are also validated. To be kind to all involved, these activities are typically performed during business hours and not overnight or over a weekend.

Our operations teams also use fault injection to validate new hardware before it is deployed for customer use. Drills are performed where the power is shut off to a rack or datacenter, so the monitoring and backup systems can be observed to ensure they behave as expected.

At Microsoft, we use chaos engineering principles and fault injection techniques to increase resilience, and confidence, in the products we ship. They are used to validate the applications we deliver to customers, and the services we make available to developers. They are used to validate the underlying Azure platform itself, to test new hardware before it is deployed. Separately and together, these contribute to the overall reliability of the Azure platform—and improved quality in our services all up.

Unintended consequences

Remember, fault injection is a powerful tool and should be used with caution. Safeguards should be in place to ensure that faults introduced in a test or pre-production environment will not also affect production. The blast radius of a fault scenario should be contained to minimize impact to other components and to end customers. The ability to inject faults should have restricted access, to prevent accidents and prevent potential use by hackers with malicious intent. Fault injection can be used in production, but plan carefully, test first in pre-production, limit the blast radius, and have a failsafe to ensure that an experiment can be ended abruptly if needed. The 1986 Chernobyl nuclear accident is a sobering example of a fault injection drill gone wrong. Be careful to insulate your system from unintended consequences.

Chaos as a service?

As Mark Russinovich mentioned in this earlier blog post, our goal is to make native fault injection services available to customers and partners so they can perform the same validation on their own applications and services. This is an exciting space with so much potential to improve cloud service reliability and reduce the impact of rare but inevitable disruptions. There are many teams doing lots of interesting things in this space, and we’re exploring how best to bring all these disparate tools and faults together to make our lives easier—for our internal developers building Azure services, for built-on-Azure services like Microsoft 365, Microsoft Teams, and Dynamics, and eventually for our customers and partners to use the same tooling to wreak havoc on (and ultimately improve the resilience of) their own applications and solutions.
Quelle: Azure

NFS 3.0 support for Azure Blob storage is now in preview

Many enterprise and organizations are moving their data to Microsoft Azure Blob storage for its massive scale, security capabilities, and low total cost of ownership. At the same time, they continue running many apps on different storage systems using the Network File System (NFS) protocol. Companies that use different storage systems due to protocol requirements are challenged by data silos where data resides in different places and requires additional migration or app rewrite steps.

To help break down these silos and enable customers to run NFS-based applications at scale, we are announcing the preview of NFS 3.0 protocol support for Azure Blob storage. Azure Blob storage is the only storage platform that supports NFS 3.0 protocol over object storage natively (no gateway or data copying required), with object storage economics, which is essential for our customers.

One of our Media and Entertainment (M&E) customers said, “NFS access to blob storage will enable our customers to preserve their legacy data access methods when migrating the underlying storage to Azure Blob storage.” Other customers have requested NFS for blob storage so they can reuse the same code from an on-premises solution to access files while controlling the overall cost of the solution. Financial services customers want NFS based offering for their analytic workloads. These are a few of the many examples from customers that have embraced private preview of NFS 3.0 support for Azure Blob Storage.

NFS 3.0 support for Azure Blob storage helps with large scale read-heavy sequential access workloads where data will be ingested once and minimally modified further including large scale analytic data, backup and archive, NFS apps for seismic and subsurface processing, media rendering, genomic sequencing, and line-of-business applications.

During the preview, NFS 3.0 is available to BlockBlobStorage accounts with premium performance in the following regions: US East, US Central, and Canada Central. Support for GPV2 accounts with standard tier performance will be announced soon.

Mount blob container using NFS 3.0

Each container in a newly created NFS 3.0 enabled storage account is automatically exported. NFS clients within the same network can mount it using this sample command:

mount -o sec=sys,vers=3,nolock,proto=tcp <storage-account-name>.blob.core.windows.net:/<storage-account-name>/<container-name> /mnt/test

Replace the <storage-account-name> placeholders with the name of your storage account.
Replace the <container-name> placeholder with the name of your container.

During preview, the test data stored in your NFS 3.0 enabled storage accounts are billed the same capacity rate that Azure Blob Storage has per GB per month. Pricing for transactions is subject to change and will be determined when it is generally available. To learn more, visit our documentation, NFS 3.0 protocol support in Azure Blob storage (preview).

Next steps

We are confident that NFS 3.0 on Azure Blob storage can simplify your workload migration to Azure. To register the NFS 3.0 protocol feature with your subscription, see the step-by-step guide. We look forward to hearing your feedback on this feature and suggestions for future improvements through email at azurenfspreview@microsoft.com.
Quelle: Azure

New Windows Virtual Desktop capabilities now generally available

With the global pandemic, customers are relying on remote work more than ever, and Windows Virtual Desktop is helping customers rapidly deliver a secure Windows 10 desktop experience to their users. Charlie Anderson, CIO of Fife Council in the United Kingdom, was planning to modernize his companies’ existing Remote Destop Services (RDS) infrastructure, and then business requirements changed. He needed increased agility and scale to meet the changing requirements. In his own words:

“Windows Virtual Desktop was absolutely essential for us in terms of our response to the COVID-19 pandemic. Like many, we were faced with a continuity issue unparalleled in recent times. For us, this meant not only the continuation of services we already delivered, but also responding very quickly to new demands arising as a result of our public response to the pandemic.

To do that, we needed to provide as close to the “in-office” experience as we could to a workforce now working away from our offices. This meant multiplying previous remote working capacities by a factor of 15 almost overnight – something which would have been impossible without a scalable and cloud-based approach, which also worked well on a range of Council and self-provided devices.

There is little doubt that the Windows Virtual Desktop solution will not only be vital to the future resilience of our public services to the people of Fife, but it will also form a key part of our future device strategy as we seek to develop new, agile, and cost-effective approaches going forward.“

In April 2020, we released the public preview of Azure portal integration which made it easier to deploy and manage Windows Virtual Desktop. We also announced a new audio/video redirection (A/V redirect) capability that provided seamless meeting and collaboration experience for Microsoft Teams. We are humbled by the amazing feedback we’ve received from you on these capabilities, and that’s been a huge motivation for our team to accelerate development. We are happy to announce that both the Azure portal integration and A/V redirect in Microsoft Teams are now generally available.

Azure portal integration

With the Azure portal integration, you get a simple interface to deploy and manage your apps and virtual desktops. Host pool, workspace, and all other objects you create are Azure Resource Manager objects and are managed the same way you manage other Azure resources.

 
Customers who have existing deployments based on the previous (classic) model can continue using it. We will soon publish guidance on migrating to the new Azure Resource Manager-based deployment model so you can take advantage of all the new capabilities, including:

Azure role-based access control (RBAC)

You can use Azure RBAC to provide fine-grained access control to your Windows Virtual Desktop resources. There are four built-in admin roles that you can get started with, and you can create custom roles if necessary.

User management

Previously, you could only publish Remote Apps and Desktops to individual users. You can now publish resources to Azure Active Directory (Azure AD) groups, which makes it much easier to scale.

Monitoring

The monitoring logs are now stored in Azure Monitor Logs. You can analyze the logs with Log Analytics and create visualizations to help you quickly troubleshoot issues.

A/V redirect for Microsoft Teams

Many of you use Microsoft Teams to collaborate with your colleagues. Traditionally, virtual desktops have not been ideal for audio and video conferencing due to latency issues. That changes with the new A/V redirect feature in Windows Virtual Desktop. Once you enable A/V redirect in the Desktop client for Windows, the audio and video will be handled locally for Microsoft Teams calls and meetings. You can still use Microsoft Teams on Windows Virtual Desktop with other clients without optimized calling and meetings. Microsoft Teams chat and collaboration features are supported on all platforms.

Next steps

You can read more about these updates in the Azure portal integration and Microsoft Teams integration documentation pages.

Thank you for your support during the preview. If you have any questions, please reach out to us on Tech Community and UserVoice. 
Quelle: Azure

Multi-arch build, what about GitLab CI?

Following the previous article where we saw how to build multi arch images using GitHub Actions, we will now show how to do the same thing using another CI. In this article, we’ll show how to use GitLab CI, which is part of the GitLab.

To start building your image with GitLab CI, you will first need to create a .gitlab-ci.yml file at the root of your repository, commit it and push it.

image: docker:stable
variables:  DOCKER_HOST: tcp://docker:2375/
  DOCKER_DRIVER: overlay2
services:
  – docker:dind
build:
  stage: build
  script:
    – docker version

This should result in a build output that shows the version of the Docker CLI and Engine: 

We will now install Docker buildx. Because GitLabCI runs everything in containers and uses any image you want to start this container, we can use one with buildx preinstalled, like the one we used for CircleCI. And as for CircleCI, we need to start a builder instance.

image: jdrouet/docker-with-buildx:stable
variables:  DOCKER_HOST: tcp://docker:2375/
  DOCKER_DRIVER: overlay2
services:
  – docker:dind
build:
  stage: build
  script:
    – docker buildx create –use
    – docker buildx build –platform linux/arm/v7,linux/arm64/v8,linux/amd64 –tag your-username/multiarch-example:gitlab .

And that’s it, your image will now be built for both ARM and x86 platforms.

The last step is now to store the image on the Docker Hub. To do so we’ll need an access token from Docker Hub to get write access.

Once you created it, you’ll have to set in your project CI/CD settings in the Variables section.

We can then add  DOCKER_USERNAME and DOCKER_PASSWORD variables to GitLab CI so that we can login to push our images.

Once this is done, you can add the login step and the –push option to the buildx command as follows.

build:
  stage: build
  script:
    – docker login -u “$DOCKER_USERNAME” -p “$DOCKER_PASSWORD”
    – docker buildx create –use
    – docker buildx build –push –platform linux/arm/v7,linux/arm64/v8,linux/386,linux/amd64 –tag your-username/multiarch-example:gitlab .

And voila, you can now create a multi arch image each time you make a change in your codebase.
The post Multi-arch build, what about GitLab CI? appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/

10 Years of OpenStack – Alan Clark at SUSE

Happy 10 years of OpenStack! Millions of cores, 100,000 community members, 10 years of you. Storytelling is one of the most powerful means to influence, teach, and inspire the people around us. To celebrate OpenStack’s 10th anniversary, we are spotlighting stories from the individuals in various roles from the community who have helped to make… Read more »
Quelle: openstack.org