Amazon Transcribe führt die Sprachunterstützung in Spanisch für die PII-Identifizierung und -Redaktion ein

Wir freuen uns, Ihnen heute mitteilen zu dürfen, dass Amazon Transcribe neben der Unterstützung für englische Audio auch die Identifizierung und Redaktion persönlich identifizierbarer Informationen (PII) für die spanische (US) Audiotranskription unterstützt. Amazon Transcribe verwendet modernste Machine-Learning-Technologie, um sensible Informationen wie Sozialversicherungsnummer, Kreditkarten-/Bankkontoinformationen und Kontaktinformationen (d. h. Name, E-Mail-Adresse, Telefonnummer und Postanschrift) zu identifizieren.
Quelle: aws.amazon.com

10 WordPress Influencers to Follow in 2024  

In this “Build and Beyond” video, Jamie Marsland highlights 10 WordPressers to keep an eye on in 2024. 

A couple of weeks ago, we shared a list of 15 WordPress developers you should follow to stay on top of WordPress development news and tips. This video broadens the scope and features folks worth following, regardless of your role or experience with WordPress. If you’re at all interested in or curious about WordPress, these are folks to pay attention to.

Interested in a free trial that allows you to test our all that WordPress.com has to offer? Click below:

WordPress.com/Jamie

Remkus de Vries

Remkus is a well-known figure in the WordPress community, recognized for his contributions to WordPress development and his overall expertise in web technology.

Website | YouTube

Kevin Geary

Kevin helps digital agency owners, freelancers, and web designers to learn best practices for UX/UI design, development, and CSS.

Website | YouTube

Tyler Moore

Tyler has free video lessons on YouTube that teach people how to create their own professional website without any coding experience.

Website | YouTube

Sabrina Zeidan

Sabrina is a WordPress performance engineer, who’s daily work is to speed up WordPress websites, plugins, and themes.

YouTube

Mike McAlister

Mike is a designer and principal software engineer from the USA. He builds killer products and brands that people love, including the fantastic Ollie WordPress theme.

Website | X (Twitter)

Jonathan Jernigan

Jonathan runs a small web development agency, creates courses, and makes YouTube videos. He started is WordPress-focused YouTube channel in late 2018.

Website | YouTube

Birgit Pauli-Haack

Birgit works as developer advocate for WordPress, curates community voices on Gutenberg Times, and co-hosts the Gutenberg Changelog podcast.

Website | X (Twitter)

David McCan

For the past 20 years David has worked professionally developing websites and in IT management.

Website | Facebook

Paul Charlton

Paul has over 15 years of commercial web design and development experience working on a large range of diverse projects, with clients ranging from start-ups to blue-chip companies.

Website | YouTube

Matt Medeiros

The WP Minute, founded by Matt, is a website dedicated to delivering the most important news and topics from the WordPress ecosystem, keeping WordPress professionals informed, educated, and entertained.

Website | Podcast

Imran Sadiq

Imran has 17+ years of web design and marketing experience. His YouTube channel has over 55k YouTube subscribers.

Website | YouTube

Rich Tabor

Rich describes himself as a multidisciplinary maker specializing in the intersection of product, design, and engineering.

Website | X (Twitter)

Jamie Marsland

Jamie has trained over 5,000 people on WordPress in the past 10 years, and he also makes WordPress plugins. His YouTube channel is dedicated to helping people with WordPress Blocks.

Website | YouTube

Quelle: RedHat Stack

What’s new in Azure Data, AI, and Digital Applications: Helping you navigate the fast pace of change

I’ve met with customers and partners from around the world over the past few months and almost every one of them mentioned the same thing: how fast AI innovation is happening—much faster than the pace of change many are used to. While most see the transformative potential AI poses for their organization, many feel the technology isn’t sitting still long enough for them to fully understand what it means for them before something new is announced. The number of new items I share each month in this blog series is a great example of this ongoing change. 

I don’t expect the pace to slow down. We’re at a moment of generational change and are rapidly building and scaling the tools and infrastructure to empower the new AI era. This is potentially as big as the electrification of industry and households, which took decades. Nothing that big has ever happened as quickly as generative AI has captured the attention and interest of businesses and individuals alike. 

Azure AI Platform

Reimagine your business with generative AI

Learn more

The good news is we at Microsoft can help you determine how rapidly evolving technology can best support your strategic objectives. 

We’re proud to be a trusted partner for thousands of organizations already embracing generative AI to transform their business. Customers like SymphonyAI, whose Sensa Copilot uses Microsoft Azure OpenAI Service to identify potential financial crimes, helping investigators complete cases 60-70% faster with 70% less human effort. And KPMG, who harnessed the capabilities of vector searching across their operational databases and saw a 41% increase in search and response quality of a critical AI assistant used by their workforce. 

These are just two brief stories of how generative AI-powered assistants, or copilots, are driving efficiency and innovation across industries. They are also great examples of how no two organizations are alike, and neither are the apps, data, and workflows that drive your business.  

While the pace can feel relentless, many businesses are eager to make this year about moving from possibility to production with AI transformation. Our job at Microsoft is to help you understand how the AI era can benefit your organization and trust the AI solutions you deploy are done so safely and responsibly. On that note, here’s a look at some of the innovations and updates we’ve announced recently. 

What’s new in data, AI, and digital applications 

Build, buy, or both?  

There are many different options for organizations to take advantage of this new wave of AI solutions, and the pathways are by no means mutually exclusive. Microsoft Copilot can help boost employee productivity—it’s like providing every employee with a personal assistant—but you may also need customized solutions for your business. Our out-of-the box copilots are easily customized with the low-code Microsoft Copilot Studio, and with Microsoft Azure AI Studio, we provide a pro-code generative AI environment for developing intelligent applications from end-to-end, including custom copilots. See my blog post on how to determine if you should take the build, buy, or both approach for more. 

The Microsoft Intelligent Data Platform—Unleash your data and accelerate your transformation  

With the emergence of generative AI, the Microsoft Intelligent Data Platform has evolved into what I regard now as our strongest, simplest recommendation for a collection of deeply integrated solutions capable of solving data fragmentation problems of the past and present, and stands ready to empower uncapped innovation for even the most ambitious organizations. Read my blog post on the Microsoft Intelligent Data Platform to learn more.  

New capabilities for Microsoft Fabric 

Thousands gathered at the Microsoft Fabric Community Conference last week to hear the latest updates from experts and get hands-on with everything Fabric has to offer, from data warehousing to data movement to AI, real-time analytics, and business intelligence. See the blog post from Arun Ulag for a run down on everything we announced last week, from transformed continuous integration and continuous delivery (CI/CD) and dataflows and data pipelines experiences, to new security and governance features, and more. 

New Microsoft Azure AI tools help you build more secure and trustworthy generative AI apps 

We’re introducing new tools, like Prompt Shields to detect and block prompt injection attacks, and Groundedness detection to detect and block “hallucinations” in model outputs, along with other responsible AI capabilities to help prevent, evaluate, and monitor AI risks and attacks. Learn more in the blog post from Sarah Bird, Chief Product Officer of Responsible AI. 

Cohere’s Command R+ now available first on Azure AI 

We’re pleased to announce a new collaboration between Cohere and Microsoft to integrate Cohere’s latest LLM, Command R+, into the Azure AI model catalog as part of our Models as a Service (MaaS) offering. This latest addition furthers our commitment to enriching the toolkit available to Azure AI users, empowering them to create advanced, user-centric generative AI applications that prioritize security, compliance, adaptability, and scalability. 

Azure AI for accessibility 

Our annual Ability Summit in March focused on how AI is fueling accessibility innovation. (Check it out on demand. Great stuff!) Leading up to the event, we shared two blog posts highlighting specific AI apps that turn obstacles into stepping stones. In “6 ways generative AI helps improve accessibility for all with Azure”, we see how AI ignites the spark of inclusion in six apps that are a testament to innovation, breaking down barriers and redefining possibilities for people with disabilities. And in “Paralympian Lex Gillette’s top 7 AI tools for accessibility”, we get a personal look at how AI-powered assistive technology helps Paralympian Champion long jumper Lex Gillette thrive on and off the field. 

OpenAI Whisper on Azure AI now generally available 

OpenAI Whisper is a speech-to-text model from OpenAI that developers can use to transcribe audio files. Developers can now use the generally available Whisper API in both Microsoft Azure OpenAI Service as well as Microsoft Azure AI Speech services on production workloads, knowing that it is backed by the Microsoft Azure enterprise-readiness promise. With all our speech-to-text models generally available, customers have greater choice and flexibility to enable AI-powered transcription and other speech scenarios. 

AI in action: 5 real-world intelligent apps you can build on Azure 

We dive into 5 types of AI-powered apps our customers are building with Azure generative AI technology, cloud-scale data, and modern application platforms. Check out how Mercedes-Benz built their connected car platform by first modernizing then adding AI capabilities, or how KPMG created KymChat, a highly-secure AI assistant with a conversational interface to help employees surface insights from unstructured data. Check out sample architectures and solution accelerators to help you build similar apps quickly and easily. 

Azure AI Speech launches new zero-shot text-to-speech (TTS) models for Personal Voice 

Azure AI Speech service has upgraded its Personal Voice feature with new zero-shot TTS models. Compared to the initial model, these new models improve the authenticity of synthesized voices and better resemble the speech characteristics of the voice in the prompt. Learn more about the models and apply for access. 

Database watcher for Microsoft Azure SQL public preview 

Enable in-depth, managed monitoring of Azure SQL databases, elastic pools, and managed instances, and view detailed database performance and health data in near real time, visualized on a single-pane-of-glass dashboard in the Azure portal. Explore detailed performance and configuration data to detect, investigate, and troubleshoot a wide variety of database performance and health problems.

Huge performance gains in Azure Database for PostgreSQL – Flexible Server   

It’s time to take advantage of the faster architecture and stronger value of Microsoft Azure Database for PostgreSQL – Flexible Server. A new benchmark study by Principled Technologies compares the performance and cost of Azure Database for PostgreSQL – Flexible Server to Azure Database for PostgreSQL – Single Server (slated for retirement March 28, 2025). Azure Database for PostgreSQL – Flexible Server offers up to 4.71 times faster performance and up to 3.88 times better performance per dollar. To learn more about Azure Database for PostgreSQL – Single Server to Flexible Server, read the full report.

Realize better value and performance in Azure Database for MySQL – Flexible Server 

A new benchmark study by Principled Technologies compares the performance and cost of Azure Database for MySQL – Flexible Server to Azure Database for MySQL – Single Server (slated for retirement September 16, 2024). The results show Azure Database for MySQL – Flexible Server offers up to 117% better performance and up to 266% better performance per dollar. Customers can quickly and easily migrate to Flexible Server using Azure Database for MySQL Import. To learn more about Azure Database for MySQL – Flexible Server, read the full report.  

Microsoft Defender for Azure Database for PostgreSQL generally available 

Microsoft Defender for Cloud is now generally available for Azure Database for PostgreSQL – Flexible Server. It detects anomalous activities indicating unusual and potentially harmful attempts to access or exploit databases. When deployed, it provides an additional important security barrier to your Azure Database for PostgreSQL server beyond the built-in security features. Learn more about Defender for Cloud.

Public preview of Azure SQL Managed Instance Next-gen General Purpose (GP)

Azure SQL Managed Instance next-gen General Purpose is now in public preview, making migration and modernization faster and easier than ever for more customer scenarios. In addition, moving from Azure VMs to SQL Managed Instance gives the customer a better Azure experience with a fully managed database that’s always up-to-date and an availability SLA of up to 99.99%. Maximize your SQL workload efficiently with higher performance, flexible compute, and storage options. With 2x IOPS,5x database capacity, and 32TB storage, scale up to 128 vCores and manage your unique workload demands. Learn more about Azure SQL Managed Instance Next-gen GP. 

Learn something new 

Build intelligent apps with Microsoft Azure Kubernetes Service (AKS)

A new four-part, hands-on learning series: Building Intelligent Apps with AKS, is available on-demand. In this series, developers explore how to build enterprise-grade solutions by deconstructing open-source reference samples step-by-step. Episodes include how to leverage Azure OpenAI Service with AKS, using AKS to run your own AI models with KAITO, use OpenCost, Prometheus, and Grafana for intelligent apps, and how to use AKS and Azure to take your intelligent app global. Plus, there is a bonus “Ask the Expert” session with the AKS product team.  

New Microsoft Mechanics video: “What is the database behind ChatGPT?” 

Find out why OpenAI chose Microsoft Azure Cosmos DB as the database behind ChatGPT in this episode of Microsoft Mechanics. General Manager for the Azure Cosmos DB team Kirill Gavrylyuk shows how to take advantage of Azure Cosmos DB for AI-driven applications and how to seamlessly integrate with large language models like ChatGPT. Discover how to achieve real-time operational efficiency at any scale. 

Large language model operations (LLMOps) skilling workshop available for anyone on GitHub 

Organizations eager to fully leverage generative AI recognize the critical step of transitioning from prototyping to production. Orchestrating LLM workflows, evaluating prompt variations, managing version control for prompt assets, and ensuring the continuous quality and safety of generative AI applications can present challenges. Microsoft created the LLMOps workshop to address these challenges and help any organization operationalize their LLM-powered applications with Azure AI. 

Customers innovating with generative AI 

IWill Therapy and CARE Azure AI-powered bot provides access to cognitive behavioral therapy 

IWill Therapy and IWill CARE, a leading online mental health care provider in India, needed to reach more clients. It created a Hindi-speaking chatbot named IWill GITA using the cutting-edge products and services included in the unified Azure AI Studio platform, incorporating the latest LLMs in Azure OpenAI Service and integrated Azure AI Content Safety filters. IWill‘s scalable, AI-powered bot helps expand access to cognitive behavioral therapy throughout India. Learn more about building with customizable APIs and models on Azure AI.

Schneider Electric fast-tracks innovation with Azure OpenAI Service 

Schneider Electric, a leader in energy management and industrial automation, is innovating with Azure OpenAI Service to enhance sustainability and efficiency and address complex global issues like carbon emissions. Schneider’s commitment to AI is evident in their creation of a Chief AI Officer role and their collaboration with Microsoft’s Industry Solutions Engineering. Schneider’s EcoStruxure platform, powered by Microsoft Azure IoT, optimizes energy usage and drives strategic decisions. Schneider’s vision is to integrate AI more deeply into their products, enabling smarter, more sustainable energy solutions worldwide. Learn more about our Azure OpenAI Service.

Albert Heijn uses Azure OpenAI in its effort to reduce food waste 

Albert Heijn (AH), the Netherlands’ leading supermarket chain, collaborated with Microsoft to develop a recipe scan feature that allows users to conveniently add all the ingredients from a cookbook recipe to the shopping cart in the AH app, making healthy cooking more accessible. There is ongoing work aimed at helping users find recipes based on leftover ingredients, contributing to the effort to reduce food waste. As well, Dynamic Markdown automatically reduces prices of products that are approaching expiration throughout the day within stores. Learn more about our Azure OpenAI Service.

Opportunities to connect 

Join the free virtual Azure Cosmos DB Conf on April 16, 2024 

Go beyond traditional analysis and leverage AI-powered tools to build smarter, more scalable applications. Discover how companies are using Azure Cosmos DB with AI features to gain real-time insights, personalize experiences, and optimize performance. Register now and let AI transform your data strategy! This free virtual developer event offers a dynamic mix of 15-minute sessions showcasing innovative projects from Microsoft and community experts.

Microsoft Build happening May 21-23, 2024 

Register to attend Microsoft Build in person or online and learn before everyone else about every announcement and product update across the Microsoft Intelligent Data Platform and how AI can shape your future. Find solutions, get your questions answered, and have conversations with experts. There will be plenty of opportunities to grow and differentiate your skill set with hands-on demos and breakout sessions. 

What’s new?  

Jessica shares insights on technology transformation along with important updates and resources about the data, AI, and digital application solutions that make Microsoft Azure the platform for the era of AI. Find Jessica’s blog posts here and be sure to follow Jessica on LinkedIn. 
The post What’s new in Azure Data, AI, and Digital Applications: Helping you navigate the fast pace of change appeared first on Azure Blog.
Quelle: Azure

Get ready for AI at the Migrate to Innovate digital event

Organizations of all sizes are recognizing that using AI fuels the kind of innovation that’s needed to maintain a competitive edge. What’s often less clear is how to prepare your organization to be able to take full advantage of AI. For organizations running business-critical workloads on Windows Servers and SQL Server, how do you get from running in a traditional, on-premises environment to operating in an environment that supports AI and other modern technologies?

Get ready for AI

To get a solid understanding of how to navigate this type of move, join us at the free digital event Migrate to Innovate: Be AI Ready, Be Secure on Tuesday, April 16, 2024, at 9:00 AM–11:00 AM Pacific Time. Register now to attend information-packed sessions that will help you understand the challenges organizations face in preparing for AI and cloud-native technologies, and what you need to have in place to solve those challenges.

Address your most pressing business challenges

One of the biggest obstacles in the path to modernization is balancing the need to embrace the latest advancements with the need to meet current business challenges. Whether it’s managing rising costs, safeguarding against security threats, maintaining compliance, or controlling IT sprawl as business expands, there are a lot of different priorities competing for your focus, time, and resources.

At the Migrate to Innovate digital event, you’ll learn how Azure provides an optimized platform to fully embrace AI while addressing your most pressing business priorities by maximizing ROI, performance, and resilience. Sessions will focus on how to optimize migration of your Windows Server and SQL Server workloads to Azure to position your organization for innovation, efficiency, growth, and long-term success.

Migrate to Innovate digital event

Join us to learn how migrating Windows Server and SQL Server to Azure helps fuel innovation and address your business challenges

Register now

Be the first to see product updates, deep dives, and demos

Attend the Migrate to Innovate digital event to get first access to seeing what’s included in the upcoming Windows Server 2025 release. View product demos of the newest AI innovations, including Microsoft Copilot. Join product experts for deep-dive product sessions covering Windows Server, SQL Server, AI, security, and a range of modernization-related capabilities. Learn about the latest updates on intelligent Azure databases to power your data and AI workloads, and discover strategies for gaining cloud agility, including running VMware workloads across cloud, hybrid, and on-premises environments.

Session highlights include:

Keynote address—Understand current business challenges and learn how migrating to Azure provides the agility that’s needed to address them. Hear about the latest product announcements and advancements that will help you get ready to take advantage of AI.

Migrate to Azure to be AI Ready—Learn the steps that organizations need to take to be ready for AI. Watch demos showing how customers are using AI solutions—including Microsoft Copilot in Azure—to solve complex problems, and how migration accelerates innovation with AI.

Customer interview—Hear customers discuss why they chose to make the move to Azure, and how migration has provided them with the business outcomes they need for success, including AI-readiness, security, cost savings, and performance.

Migrate your Windows Server to Azure—Learn how Azure is optimized to help you migrate and modernize your Windows Server workloads. Discover on-premises, hybrid, and cloud scenarios for VMware. Watch a demo on Windows Server 2025 and its support for AI capabilities as well as hybrid and edge scenarios.

Migrate your Data to Azure—In the era of AI, learn how to power mission-critical applications with Azure databases. See how to simplify migration with the help of Azure Arc migration assessments.

New risks, New rules: Secure Code-to-Cloud Migration—Find out how Azure helps you secure your entire migration journey using a cloud-native application platform (CNAPP), Microsoft Defender for Cloud, and Microsoft Copilot for Security.

Get cloud agility anywhere: Strategies for VMware workloads—Understand the issues that on-premises VMware users face, and learn how taking an adaptive cloud approach with Azure helps address these challenges.  

Optimize your migration with key guidance, offerings, and tools—Learn about the three most important optimization activities, and discover resources, guidance, and tools that will help you plan and implement your migration solution.

Discover the business outcomes of migrating to Azure

Register for the event to understand how to get your organization on the path to modernization and hear about the business outcomes that customers are seeing when they migrate Windows Server and SQL Server workload to Azure, including:

AI readiness: Organizations get results with an AI-ready foundation on Azure. In a study of customers using Azure AI services, a composite organization based on the experiences of six interviewees achieved a three-year ROI of 284%. Work output and operational efficiency increased, employee collaboration and safety improved, and organizations reported faster and more data-driven decision-making.

Code-to-cloud security: With Azure, you get a complete code-to-cloud security platform. From foundational security to cloud-native workload protection, replacing multiple third-party security tools with comprehensive, multilayered security reduces risk and costs.

Maximizing ROI and performance: Workloads run faster and at a lower cost on Azure than with other cloud providers. AWS is up to 5 times more expensive than Azure for Windows Server and SQL Server—and SQL Server runs up to 5.5 times faster on Azure than on AWS.

Cloud agility anywhere: Azure meets organizations where they are in their migration journey through an adaptive cloud approach. Azure provides the tools and support to help you secure and govern your entire digital estate across hybrid, multicloud, and edge environments on your own terms.

Register now

Register for the Migrate to Innovate digital event today to join us for this free digital event on Tuesday, April 16, 9:00 AM–11:00 AM Pacific Time.
The post Get ready for AI at the Migrate to Innovate digital event appeared first on Azure Blog.
Quelle: Azure

Advancing memory leak detection with AIOps—introducing RESIN

“Operating a cloud infrastructure at global scale is a large and complex task, particularly when it comes to service standard and quality. In a previous blog, we shared how AIOps was leveraged to improve service quality, engineering efficiency, and customer experience. In this blog, I’ve asked Jian Zhang, Principal Program Manager from the AIOps Platform and Experiences team to share how AI and machine learning is used to automate memory leak detection, diagnosis, and mitigation for service quality.”—Mark Russinovich, Chief Technology Officer, Azure.

In the ever-evolving landscape of cloud computing, memory leaks represent a persistent challenge—affecting performance, stability, and ultimately, the user experience. Therefore, memory leak detection is important to cloud service quality. Memory leaks happen when memory is allocated but not released in a timely manner unintentionally. It causes potential performance degradation of the component and possible crashes of the operation system (OS). Even worse, it often affects other processes running on the same machine, causing them to be slowed down or even killed.

Given the impact of memory leak issues, there are many studies and solutions for memory leak detection. Traditional detection solutions fall into two categories: static and dynamic detection. The static leak detection techniques analyze software source code and deduce potential leaks whereas the dynamic method detects leak through instrumenting a program and tracks the object references at runtime.

However, these conventional techniques for detecting memory leaks are not adequate to meet the needs of leak detection in a cloud environment. The static approaches have limited accuracy and scalability, especially for leaks that result from cross-component contract violations, which need rich domain knowledge to capture statically. In general, the dynamic approaches are more suitable for a cloud environment. However, they are intrusive and require extensive instrumentations. Furthermore, they introduce high runtime overhead which is costly for cloud services.

RESIN

Designed to address memory leaks in production cloud infrastructure

Explore the research

Introducing RESIN

Today, we are introducing RESIN, an end-to-end memory leak detection service designed to holistically address memory leaks in large cloud infrastructure. RESIN has been used in Microsoft Azure production and demonstrated effective leak detection with high accuracy and low overhead.

REsin: a holistic service for memory leaks

Read the report

RESIN system workflow

A large cloud infrastructure could consist of hundreds of software components owned by different teams. Prior to RESIN, memory leak detection was an individual team’s effort in Microsoft Azure. As shown in Figure 1, RESIN utilizes a centralized approach, which conducts leak detection in multi-stages for the benefit of low overhead, high accuracy, and scalability. This approach does not require access to components’ source code or extensive instrumentation or re-compilation.

Figure 1: RESIN workflow

RESIN conducts low-overhead monitoring using monitoring agents to collect memory telemetry data at host level. A remote service is used to aggregate and analyze data from different hosts using a bucketization-pivot scheme. When leaking is detected in a bucket, RESIN triggers an analysis on the process instances in the bucket. For highly suspicious leaks identified, RESIN performs live heap snapshotting and compares it to regular heap snapshots in a reference database. After generating multiple heap snapshots, RESIN runs diagnosis algorithm to localize the root cause of the leak and generates a diagnosis report to attach to the alert ticket to assist developers for further analysis—ultimately, RESIN automatically mitigates the leaking process.

Detection algorithms

There are unique challenges in memory leak detection in cloud infrastructure:

Noisy memory usage caused by changing workload and interference in the environment results in high noise in detection using static threshold-based approach.

Memory leak in production systems are usually fail-slow faults that could last days, weeks, or even months and it can be difficult to capture gradual change over long periods of time in a timely manner.

At the scale of Azure global cloud, it’s not practical to collect fine-grained data over long period of time.

To address these challenges, RESIN uses a two-level scheme to detect memory leak symptoms: A global bucket-based pivot analysis to identify suspicious components and a local individual process leak detection to identify leaking processes.

With the bucket-based pivot analysis at component level, we categorize raw memory usage into a number of buckets and transform the usage data into summary about number of hosts in each bucket. In addition, a severity score for each bucket is calculated based on the deviations and host count in the bucket. Anomaly detection is performed on the time-series data of each bucket of each component. The bucketization approach not only robustly represents the workload trend with noise tolerance but also reduces computational load of the anomaly detection.

However, detection at component level only is not sufficient for developers to investigate the leak efficiently because, normally, many processes run on a component. When a leaking bucket is identified at the component level, RESIN runs a second-level detection scheme at the process granularity to narrow down the scope of investigation. It outputs the suspected leaking process, its start and end time, and the severity score.

Diagnosis of detected leaks

Once a memory leak is detected, RESIN takes a snapshot of live heap, which contains all memory allocations referenced by running application, and analyzes the snapshots to pinpoint the root cause of the detected leak. This makes memory leak alert actionable.

RESIN also leverages Windows heap manager’s snapshot capability to perform live profiling. However, the heap collection is expensive and could be intrusive to the host’s performance. To minimize overhead caused by heap collection, a few considerations are considered to decide how snapshots are taken.

The heap manager only stores limited information in each snapshot such as stack trace and size for each active allocation in each snapshot.

RESIN prioritizes candidate hosts for snapshotting based on leak severity, noise level, and customer impact. By default, the top three hosts in the suspected list are selected to ensure successful collection.

RESIN utilizes a long-term, trigger-based strategy to ensure the snapshots capture the complete leak. To facilitate the decision regarding when to stop the trace collection, RESIN analyzes memory growth patterns (such as steady, spike, or stair) and takes a pattern-based approach to decide the trace completion triggers.

RESIN uses a periodical fingerprinting process to build reference snapshots, which is compared with the snapshot of suspected leaking process to support diagnosis.

RESIN analyzes the collected snapshots to output stack traces of the root.

Mitigation of detected leaks

When a memory leak is detected, RESIN attempts to automatically mitigate the issue to avoid further customer impact. Depending on the nature of the leak, a few types of mitigation actions are taken to mitigate the issue. RESIN uses a rule-based decision tree to choose a mitigation action that minimizes the impact.

If the memory leak is localized to a single process or Windows service, RESIN attempts the lightest mitigation by simply restarting the process or the service. OS reboot can resolve software memory leaks but takes a much longer time and can cause virtual machine downtime and as such, is normally reserved as the last resort. For a non-empty host, RESIN utilizes solutions such as Project Tardigrade, which skips hardware initialization and only performs a kernel soft reboot, after live virtual machine migration, to minimize user impact. A full OS reboot is performed only when the soft reboot is ineffective.

RESIN stops applying mitigation actions to a target once the detection engine no longer considers the target leaking.

Result and impact of memory leak detection

RESIN has been running in production in Azure since late 2018 and to date, it has been used to monitor millions of host nodes and hundreds of host processes daily. Overall, we achieved 85% precision and 91% recall with RESIN memory leak detection,1 despite the rapidly growing scale of the cloud infrastructure monitored.

The end-to-end benefits brought by RESIN are clearly demonstrated by two key metrics:

Virtual machine unexpected reboots: the average number of reboots per one hundred thousand hosts per day due to low memory.

Virtual machine allocation error: the ratio of erroneous virtual machine allocation requests due to low memory.

Between September 2020 and December 2023, the virtual machine reboots were reduced by nearly 100 times, and allocation error rates were reduced by over 30 times. Furthermore, since 2020, no severe outages have been caused by Azure host memory leaks.1

Learn more about RESIN

You can improve the reliability and performance of your cloud infrastructure, and prevent issues caused by memory leaks through RESIN’s end-to-end memory leak detection capabilities designed to holistically address memory leaks in large cloud infrastructure. To learn more, read the publication.

1 RESIN: A Holistic Service for Dealing with Memory Leaks in Production Cloud Infrastructure, Chang Lou, Johns Hopkins University; Cong Chen, Microsoft Azure; Peng Huang, Johns Hopkins University; Yingnong Dang, Microsoft Azure; Si Qin, Microsoft Research; Xinsheng Yang, Meta; Xukun Li, Microsoft Azure; Qingwei Lin, Microsoft Research; Murali Chintalapati, Microsoft Azure, OSDI’22.
The post Advancing memory leak detection with AIOps—introducing RESIN appeared first on Azure Blog.
Quelle: Azure