How to accelerate DevOps with Machine Learning lifecycle management

DevOps is the union of people, processes, and products to enable the continuous delivery of value to end users. DevOps for machine learning is about bringing the lifecycle management of DevOps to Machine Learning. Utilizing Machine Learning, DevOps can easily manage, monitor, and version models while simplifying workflows and the collaboration process.

Effectively managing the Machine Learning lifecycle is critical for DevOps’ success. And the first piece to machine learning lifecycle management is building your machine learning pipeline(s).

What is a Machine Learning Pipeline? 

DevOps for Machine Learning includes data preparation, experimentation, model training, model management, deployment, and monitoring while also enhancing governance, repeatability, and collaboration throughout the model development process. Pipelines allow for the modularization of phases into discrete steps and provide a mechanism for automating, sharing, and reproducing models and ML assets. They create and manage workflows that stitch together machine learning phases. Essentially, pipelines allow you to optimize your workflow with simplicity, speed, portability, and reusability.

There are four steps involved in deploying machine learning that data scientists, engineers and IT experts collaborate on:

Data Ingestion and Preparation
Model Training and Retraining
Model Evaluation
Deployment

Together, these steps make up the Machine Learning pipeline. Below is an excerpt from documentation on building machine pipelines with Azure Machine Learning service, which explains it well.

“Using distinct steps makes it possible to rerun only the steps you need, as you tweak and test your workflow. A step is a computational unit in the pipeline. As shown in the preceding diagram, the task of preparing data can involve many steps. These include, but aren't limited to, normalization, transformation, validation, and featurization. Data sources and intermediate data are reused across the pipeline, which saves compute time and resources.”

4 benefits of accelerating Machine Learning pipelines for DevOps

 

1. Collaborate easily across teams

Data scientists, data engineers, and IT professionals using machine learning pipelines need to collaborate on every step involved in the machine learning lifecycle: from data prep to deployment.
Azure Machine Learning service workspace is designed to make the pipelines you create visible to the members of your team. You can use Python to create your machine learning pipelines and interact with them in Jupyter notebooks, or in another preferred integrated development environment.

2. Simplify workflows

Data prep and modeling can last days or weeks, taking time and attention away from other business objectives.
The Azure Machine Learning SDK offers imperative constructs for sequencing and parallelizing the steps in your pipelines when no data dependency is present. You can also templatize pipelines for specific scenarios and deploy them to a REST endpoint, so you can schedule batch-scoring or retraining jobs. You only need to rerun the steps you need, as you tweak and test your workflow when you rerun a pipeline.

3. Centralized Management

Tracking models and their version histories is a hurdle many DevOps teams face when building and maintaining their machine learning pipelines.
The Azure Machine Learning service model registry tracks models, their version histories, their lineage and artifacts. Once the model is in production, the Application Insights service collects both application and model telemetry that allows the model to be monitored in production for operational and model correctness. The data captured during inferencing is presented back to the data scientists and this information can be used to determine model performance, data drift, and model decay, as well as the tools to train, manage, and deploy machine learning experiments and web services in one central view.
The Azure Machine Learning SDK also allows you to submit and track individual pipeline runs. You can explicitly name and version your data sources, inputs, and outputs instead of manually tracking data and result paths as you iterate. You can also manage scripts and data separately for increased productivity. For each step in your pipeline. Azure coordinates between the various compute targets you use, so that your intermediate data can be shared with the downstream compute targets easily. You can track the metrics for your pipeline experiments directly in the Azure portal.

4. Track your experiments easily

 

DevOps capabilities for machine learning further improve productivity by enabling experiment tracking and management of models deployed in the cloud and on the edge. All these capabilities can be accessed from any Python environment running anywhere, including data scientists’ workstations. The data scientist can compare runs, and then select the “best” model for the problem statement.
The Azure Machine Learning workspace keeps a list of compute targets that you can use to train your model. It also keeps a history of the training runs, including logs, metrics, output, and a snapshot of your scripts. Create multiple workspaces or common workspaces to be shared by multiple people.

 

Conclusion

As you can see, DevOps for Machine Learning can be streamlined across the ML pipeline with more visibility into training, experiment metrics, and model versions. Azure Machine Learning service, seamlessly integrates with Azure services to provide end-to-end capabilities for the entire Machine Learning lifecycle, making it simpler and faster than ever.

This is part two of a four-part series on the pillars of Azure Machine Learning services. Check out part one if you haven’t already, and be sure to look out for our next blog, where we’ll be talking about ML at scale.

Learn More

Visit our product site to learn more about the Azure Machine Learning service, and get started with a free trial of Azure Machine Learning service.
Quelle: Azure

How do teams work together on an automated machine learning project?

When it comes to executing a machine learning project in an organization, data scientists, project managers, and business leads need to work together to deploy the best models to meet specific business objectives. A central objective of this step is to identify the key business variables that the analysis needs to predict. We refer to these variables as the model targets, and we use the metrics associated with them to determine the success of the project.

In this use case, available to the public on GitHub, we’ll see how a data scientist, project manager, and business lead at a retail grocer can leverage automated machine learning and Azure Machine Learning service to reduce product overstock. Azure Machine Learning service is a cloud service that you use to train, deploy, automate, and manage machine learning models, all at the broad scale that the cloud provides. Automated machine learning within Azure Machine Learning service is the process of taking training data with a defined target feature, and iterating through combinations of algorithms and feature selections to automatically select the best model for your data based on the training scores.

Excess stock quickly becomes a liquidity problem, as it is not converted back to cash unless margins are reduced by means of discounts and promotions or, even worse, when it accumulates to be sent to other channels such as outlets, delaying its sale. Identifying in advance which products will not have the level of rotation they expect and controlling replenishment with stock cover that is aligned with sales forecasts are key factors in helping retailers achieve ROI on their investments. Let’s see how the team goes about solving this problem and how automated machine learning enables the democratization of artificial intelligence across the company.

Identify the right business objective for the company

Strong sales and profits are the result of having the right product mix and level of inventory. Achieving this ideal mix requires having current and accurate inventory information. Manual processes not only take time, causing delays in producing current and accurate inventory information, but also increase the likelihood of errors. These delays and errors are likely to cause lost revenue due to inventory overstocks, understocks, and out-of-stocks.

Overstock inventory can also take valuable warehouse space and tie up cash that ought to be used to purchase new inventory. But selling it in liquidation mode can cause its own set of problems, such as tarnishing your reputation and cannibalizing sales of other current products.

The project manager, being the bridge between data scientists and business operations, reaches out to the business lead to discuss the possibilities of using some of their internal and historical sales to solve their overstock inventory problem. The project manager and the business lead define project goals by asking and refining tangible questions that are relevant for the business objective.

There are two main tasks addressed in this stage:

Define objectives: The project manager and the business lead need to identify the business problems and, most importantly, formulate questions that define the business goals that the data science techniques can target.
Identify data sources: The project manager and data scientist need to find relevant data that helps answer the questions that define the objectives of the project.

Look for the right data and pipeline

It all starts with data. The project manager and the data scientist need to identify data sources that contain known examples of answers to the business problem. They look for the following types of data:

Data that is relevant to the question. Do they have measures of the target and features that are related to the target?
Data that is an accurate measure of their model target and the features of interest.

There are three main tasks that the data scientist needs to address in this stage:

Ingest the data into the target analytics environment
Explore the data to determine if the data quality is adequate to answer the question
Set up a data pipeline to score new or regularly refreshed data

After setting up the process to move the data from the source locations to the target locations where it’s possible to run analytics operations, the data scientist starts working on raw data to produce a clean, high-quality data set whose relationship to the target variables is understood. Before training machine learning models, the data scientist needs to develop a sound understanding of the data and create a data summarization and visualization to audit the quality of the data and provide the information needed to process the data before it's ready for modeling.

Finally, the data scientist is also in charge of developing a solution architecture of the data pipeline that refreshes and scores the data regularly.

Forecast orange juice sales with automated machine learning

The data scientist and project manager decide to use automated machine learning for a few reasons: automated machine learning empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for any problem, achieving higher accuracy while spending far less of their time. And it also enables a significantly larger number of experiments to be run, resulting in faster iteration toward production-ready intelligent experiences.

Let’s look at how their process using automated machine learning for orange juice sales forecasting delivers on these benefits.

After agreeing on the business objective and what type of internal and historical data should be used to meet that objective, the data scientist creates a workspace. This workspace is the top-level resource for the service and provides data scientists with a centralized place to work with all the artifacts they need to create. When a workspace is created in an AzureML service, the following Azure resources are added automatically (if they are regionally available):

Azure Container Registry
Azure Storage
Azure Application Insights
Azure Key Vault

To run automated machine learning, the data scientist also needs to create an Experiment. An Experiment is a named object in a workspace that represents a predictive task, the output of which is a trained model and a set of evaluation metrics for the model.

The data scientist is now ready to load the historical orange juice sales data and loads the CSV file into a plain pandas DataFrame. The time column in the CSV is called WeekStarting, so it will be specially parsed into the datetime type.

Each row in the DataFrame holds a quantity of weekly sales for an orange juice brand at a single store. The data also includes the sales price, a flag indicating if the orange juice brand was advertised in the store that week, and some customer demographic information based on the store location. For historical reasons, the data also includes the logarithm of the sales quantity.

The task is now to build a time series model for the Quantity column. It’s important to note that this data set is comprised of many individual time series; one for each unique combination of Store and Brand. To distinguish the individual time series, we thus define the grain—the columns whose values determine the boundaries between time series.

After splitting the data into a training and a testing set for later forecast evaluation, the data scientist starts working on the modeling step for forecasting tasks, and automated machine learning uses pre-processing and estimation steps that are specific to time series. Automated machine learning will undertake the following pre-processing steps:

Detect the time series sample frequency (e.g., hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span.
Impute missing values in the target via forward-fill and feature columns using median column values.
Create grain-based features to enable fixed effects across different series.
Create time-based features to assist in learning seasonal patterns.
Encode categorical variables to numeric quantities.

The AutoMLConfig object defines the settings and data for an automated machine learning training job. Below is a summary of automated machine learning configuration parameters that were used for training the orange juice sales forecasting model:

Visit GitHub for more information on forecasting. Each iteration runs within an experiment and stores serialized pipelines from the automated machine learning iterations until they retrieve the pipeline with the best performance on the validation data set.

Once the evaluation has been performed, the data scientist, project manager, and business lead meet again to review the forecasting results. It’s the project manager and business lead’s job to make sense of the outputs and choose practical steps based on those results. The business lead needs to confirm that the best model and pipeline meet the business objective and that the machine learning solution answers the questions with acceptable accuracy to deploy the system to production for use by their internal sales forecasting application.

Microsoft invests in Automated Machine Learning

Automated machine learning is based on a breakthrough from the Microsoft Research division. The approach combines ideas from collaborative filtering and Bayesian optimization to search an enormous space of possible machine learning pipelines intelligently and efficiently. It’s essentially a recommender system for machine learning pipelines. Similar to how streaming services recommend movies for users, automated machine learning recommends machine learning pipelines for data sets.

It’s now offered as part of the Azure Machine Learning service. As you’ve seen here, Automated machine learning empowers customers, with or without data science expertise, to identify an end-to-end machine learning pipeline for any problem and save time while increasing accuracy. It also enables a larger number of experiments to be run and faster iterations. How could automated machine learning benefit your organization? How could your team work more closely on using machine learning to meet your business objectives?

 

Resources

Learn more about Azure Machine Learning service
Learn more about automated machine learning
Get started with a free trial of the Azure Machine Learning service

Quelle: Azure

How to stay informed about Azure service issues

Azure Service Health helps you stay informed and take action when Azure service issues like outages and planned maintenance affect you, and provides a personalized dashboard that can help you understand issues that may be impacting resources in your Azure subscriptions.
Quelle: Azure

Bitnami Apache Airflow Multi-Tier now available in Azure Marketplace

A few months ago, we released a blog post that provided guidance on how to deploy Apache Airflow on Azure. The template in the blog provided a good quick start solution for anyone looking to quickly run and deploy Apache Airflow on Azure in sequential executor mode for testing and proof of concept study. However, the template was not designed for enterprise production deployments and required expert knowledge of Azure app services and container deployments to run it in Celery Executor mode. This is where we partnered with Bitnami to help simplify production grade deployments of Airflow on Azure for customers.

We are excited to announce that the Bitnami Apache Airflow Multi-Tier solution and the Apache Airflow Container are now available for customers in the Azure Marketplace. Bitnami Apache Airflow Multi-Tier template provides a 1-click solution for customers looking to deploy Apache Airflow for production use cases. To see how easy it is to launch and start using them, check out the short video tutorial.

We are proud to say that the main committers to the Apache Airflow project have also tested this application to ensure that it was performed to the standards that they would expect.

Apache Airflow PMC Member and Core Committer Kaxil Naik said, “I am excited to see that Bitnami provided an Airflow Multi-Tier in the Azure Marketplace. Bitnami has removed the complexity of deploying the application for data scientists and data engineers, so they can focus on building the actual workflows or DAGs instead. Now, data scientists can create a cluster for themselves within about 20 minutes. They no longer need to wait for DevOps or a data engineer to provision one for them.”

What is Apache Airflow?

Apache Airflow is a popular open source workflow management tool used in orchestrating ETL pipelines, machine learning workflows, and many other creative use cases. It provides a scalable, distributed architecture that makes it simple to author, track and monitor workflows.

Users of Airflow create Directed Acyclic Graph (DAG) files to define the processes and tasks that must be executed, in what order, and their relationships and dependencies. DAG files are synchronized across nodes and the user will then leverage the UI or automation to schedule, execute and monitor their workflow.

Introduction to Bitnami’s Apache Airflow Multi-tier architecture

Bitnami Apache Airflow has a multi-tier distributed architecture that uses Celery Executor, which is recommended by Apache Airflow for production environments.

It is comprised of several synchronized nodes:

Web server (UI)
Scheduler
Workers

It includes two managed Azure services:

Azure Database for PostgreSQL
Azure Cache for Redis

All nodes have a shared volume to synchronize DAG files.

DAG files are stored in a directory of the node. This directory is an external volume mounted in the same location in all nodes (both workers, scheduler, and web server). Since it is a shared volume, the files are automatically synchronized between servers. Add, modify or delete DAG files from this shared volume and the entire Airflow system will be updated.

You can also use DAGs from a GitHub repository. By using Git, you won’t have to access any of the Airflow nodes and you can just push the changes through the Git repository instead.

To automatically synchronize DAG files with Airflow, please refer to Bitnami’s documentation.

Bitnami’s secret sauce – Packaging for production use

Bitnami specializes in packaging multi-tier applications to work right out of the box leveraging the managed Azure services like Azure Database for PostgreSQL.

When packaging the Apache Airflow Multi-Tier solution, Bitnami added a few optimizations to ensure that it would work for production needs.

Pre-packaged to leverage the most popular deployment strategies. For example, using PostgreSQL as the relational metadata store and the Celery executor.
Role-based access control is enabled by default to secure access to the UI.
The cache and the metadata store are Azure-native PaaS services that leverage the additional benefits those services offer, such as data redundancy and retention/recovery options as well as allowing Airflow to scale out to large jobs.
All communication between Airflow nodes and the PostgreSQL database service is secured using SSL.

To learn more, join Azure, Apache Airflow, and Bitnami for a webinar on Wednesday, May 1st at 11:00 am PST. Register now.

Get Started with Apache Airflow Multi-Tier Certified by Bitnami today!
Quelle: Azure

Want to evaluate your cloud analytics provider? Here are the three questions to ask.

We all want the truth. To properly assess your cloud analytics provider, ask them about the only three things that matter:

Independent benchmark results
Company-wide access to insights
Security and privacy

What are their results on independent, industry-standard benchmarks? 

Perhaps you’ve heard from other providers that benchmarks are irrelevant. If that’s what you’re hearing, maybe you should be asking yourself why? Independent, industry-standard benchmarks are important because they help you measure price and performance on both common and complex analytics workloads. They are essential indicators of value because as data volumes grow, it is vital to get the best performance you can at the lowest price possible.

In February, an independent study by GigaOm compared Azure SQL Data Warehouse, Amazon Redshift, and Google BigQuery using the highly recognized TPC-H benchmark. They found that Azure SQL Data Warehouse is up to 14x faster and costs 94 percent less than other cloud providers. And today, we are pleased to announce that in GigaOm’s second benchmark report, this time with the equally important TPC-DS benchmark, Azure SQL Data Warehouse is again the industry leader. Not Amazon Redshift. Not Google BigQuery. These results prove that Azure is the best place for all your analytics.

This is why customers like Columbia Sportswear choose Azure.

“Azure SQL Data Warehouse instantly gave us equal or better performance as our current system, which has been incrementally tuned over the last 6.5 years for our demanding performance requirements.”

Lara Minor, Sr. Enterprise Data Manager, Columbia Sportswear

 

Can they easily deliver powerful insights across your organization?

Insights from your analytics must be accessible to everyone in your organization. While other providers may say they can deliver this, the end result is often catered to specific workgroups versus being an enterprise-wide solution. Data can become quickly siloed in these situations, making it difficult to deliver insights across all users.

With Azure, employees can get their insights in seconds from all enterprise data. Data can seamlessly flow from your SQL Data Warehouse to Power BI. And without limitations on concurrency, Power BI can be used across teams to create the most beautiful visualizations that deliver powerful insights. This combination of powerful analytics with easy-to-use BI is quite unique. In fact, if you look at the Gartner 2019 Magic Quadrant for Analytics and Business Intelligence Platforms and the Gartner 2019 Magic Quadrant for Data Management Solutions for Analytics below, you’ll see that Microsoft is a Leader.

 

 

Our leadership position in BI, coupled with our undisputed performance in analytics means that customers can truly provide business-critical insights to all. As the TPC-DS benchmark demonstrates, Azure SQL Data Warehouse provides unmatched performance on complex analytics workloads that mimic the realities of your business. This means that Power BI users can effortlessly gain granular-level insights across all their data.

The TPC-DS industry benchmark I mentioned above is particularly useful for organizations that run intense analytics workloads because it uses demanding queries to test actual performance. For instance, one of the queries used in the TPC-DS benchmark report calculates the number of orders, time window for the orders, and filters by state on non-returned orders shipped from a single warehouse. This type of complex query, which spans across billions of rows and multiple tables, is a real-world example of how companies use a data warehouse for business insights. And with Power BI, users can perform intense queries like this by easily integrating with SQL Data Warehouse for fast, industry-leading performance.

How robust is their security?

Everyone is a target. When it comes to data, privacy and security are non-negotiable. No matter how cautious you are, there is always a threat lurking around the corner. Your analytics system contains the most valuable business data and must have both stringent security and privacy capabilities.

Azure has you covered. As illustrated by Donald Farmer, a well-respected thought leader in the analytics space, analytics in Azure has the most advanced security and privacy features in the market. From proactive threat detection to providing custom recommendations that enhance security, Azure SQL Data Warehouse uses machine learning and AI to secure your data. It also enables you to encrypt your data, both in flight and at rest. You can provide users with appropriate levels of access, from a single source, using row and column level security. This not only secures your data, but also helps you meet stringent privacy requirements.

“It was immediately clear to us that with Azure, particularly Azure Key Vault, we would be able to meet our own rigorous requirements for data protection and security.”

Guido Vetter, Head of Corporate Center of Excellence Advanced Analytics & Big Data, Daimler

Azure’s leading security and data privacy features not only make it the most trusted cloud in the market, but also complements its leadership in other areas, such as price-performance, making it simply unmatched.

Get started today

To learn more about Azure’s industry-leading price-performance and security, get started today!

 

 

Gartner Magic Quadrant for Analytics and Business Intelligence Platforms Cindi Howson, James Richardson, Rita Sallam, Austin Kronz, 11 February 2019.

Gartner Magic Quadrant for Data Management Solutions for Analytics, Adam Ronthal, Roxane Edjlali, Rick Greenwald, 21 January 2019.

This graphic was published by Gartner, Inc. as part of a larger research document and should be evaluated in the context of the entire document. The Gartner document is available upon request from Microsoft.

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Quelle: Azure

Smarter, faster, safer: Azure SQL Data Warehouse is simply unmatched

Today, we want to call attention to the exciting news that Azure SQL Data Warehouse has again outperformed other cloud providers in the most recent GigaOm benchmark report.

This is the result of relentless innovation and laser-focused execution on providing new features our customers need, all while reducing prices so customers get industry-leading performance at the best possible value. In just the past year, SQL Data Warehouse has released 130+ features focused on providing customers with enhanced speed, flexibility, and security. And today we are excited to announce three additional enhancements that continue to make SQL Data Warehouse the industry leader:

Unparalleled query performance
Intelligent workload management
Unmatched security and privacy

In this blog, we’ll take a closer look at the technical capabilities of these new features and, most importantly, how you can start using them today.

Unparalleled query performance

In our March 2019 release, a collection of newly available features improved workload performance by up to 22x compared to previous versions of Azure SQL Data Warehouse, which contributed to our leadership position in both the TPC-H and TPC-DS benchmark reports.

This didn’t just happen overnight. With decades of experience building industry-leading database systems, like SQL Server, Azure SQL Data Warehouse is built on top of the world’s largest cloud architectures.

Key innovations that have improved query performance include:

Query Optimizer enhancements
Instant Data Movement
Additional advanced analytic functions

Query Optimizer enhancements

Query Optimizer is one of the most critical components in any database. Making optimal choices on how to best execute a query can and does yield significant improvement. When executing complex analytical queries, the number of operations to be executed in a distributed environment matters. Every opportunity to eliminate redundant computation, such as repeated subqueries, has a direct impact to query performance. For instance, the following query is reduced from 13 down to 5 operations using the latest Query Optimizer enhancements.

Instant Data Movement

For a distributed database system, having the most efficient data movement mechanism is also a critical ingredient in achieving great performance. Instant Data Movement was introduced with the launch of the second generation of Azure SQL Data Warehouse. To improve instant data movement performance, broadcast and partition data movement operations were added. In addition, performance optimizations around how strings are processed during the data movement operations yielded improvements of up to 2x.

Advanced analytic functions

Having a rich set of analytic functions simplifies how you can write SQL across multiple dimensions that not only streamlines the query, but improves its performance. A set of such functions is GROUP BY ROLLUP, GROUPING(), GROUPING_ID(). See the example of a GROUP BY query from the online documentation below:

SELECT Country
,Region
,SUM(Sales) AS TotalSales
FROM Sales
GROUP BY ROLLUP(Country, Region)
ORDER BY Country
,Region

Intelligent workload management

The new workload importance feature in Azure SQL Data Warehouse enables prioritization over workloads that need to be executed on the data warehouse system. Workload importance provides administrators the ability to prioritize workloads based on business requirements (e.g., executive dashboard queries, ELT executions).

Workload classification

It all starts with workload classification. SQL Data Warehouse classifies a request based on a set of criteria, which administrators can define. In the absence of a matching classifier, the default classifier is chosen. SQL Data Warehouse supports classification at different levels including at the SQL query level, a database user, database role, Azure Active Directory login, or Azure Active Directory group, and maps the request to a system defined workload group classification.

Workload importance

Each workload classification can be assigned one of five levels of importance: low, below_normal, normal, above_normal, and high. Access to resources during compilation, lock acquisition, and execution are prioritized based on the associated importance of a request.

The diagram below illustrates the workload classification and importance function:

Classifying requests with importance

Classifying requests is done with the new CREATE WORKLOAD CLASSIFIER syntax. Below is an example that maps the login for the ExecutiveReports role to ABOVE_NORMAL importance and the AdhocUsers role to BELOW_NORMAL importance. With this configuration, members of the ExecutiveReports role have their queries complete sooner because they get access to resources before members of the AdhocUsers role.

CREATE WORKLOAD CLASSIFIER ExecReportsClassifier
   WITH (WORKLOAD_GROUP = 'mediumrc'
        ,MEMBERNAME     = 'ExecutiveReports'
        ,IMPORTANCE     =  above_normal);

CREATE WORKLOAD CLASSIFIER AdhocClassifier
    WITH (WORKLOAD_GROUP = 'smallrc'
         ,MEMBERNAME     = 'AdhocUsers'
         ,IMPORTANCE     =  below_normal);

For more information on workload importance, refer to the classification importance and CREATE WORKLOAD CLASSIFIER documents.

Unmatched security and privacy

When using a data warehouse, customers often have questions regarding security and privacy. As illustrated by Donald Farmer, a well-respected thought leader in the analytics space, Azure SQL Data Warehouse has the most advanced security and privacy features in the market. This wasn’t achieved by chance. In fact, SQL Server, the core technology of SQL Data Warehouse, has been the least vulnerable database over the last eight years in the NIST vulnerabilities database.

One of our newest security and privacy features in SQL Data Warehouse is Data Discovery and Classification. This feature enables automated discovery of columns potentially containing sensitive data, recommends metadata tags to associate with the columns, and can persistently attach those tags to your tables.

These tags will appear in the Audit log for queries against sensitive data, in addition to being included alongside the query results for clients which support this feature.

The Azure SQL Database Data Discovery & Classification article walks you through enabling the feature via the Azure portal. While the article was written for Azure SQL Database, it is now equally applicable to SQL Data Warehouse.

Next steps

Visit the Azure SQL Data Warehouse page to learn more.
Get started with a free Azure SQL Data Warehouse account.
Discover the seven essential security and privacy principles for your cloud data warehouse.

Azure is the best place for data analytics

Azure continues to be the best cloud for analytics. Learn more why analytics in Azure is simply unmatched.
Quelle: Azure

Azure.Source – Volume 77

Preview | Generally available | News & updates | Technical content | Azure shows | Events | Customers, partners, and industries

Now in preview

Announcing the Azure Functions Premium plan for enterprise serverless workloads

We are pleased to announce the Azure Functions Premium plan in preview, our newest Functions hosting model. This plan enables a suite of long requested scaling and connectivity options without compromising on event-based scale. With the Premium plan you can use pre-warmed instances to run your app with no delay after being idle, you can run on more powerful instances, and you can connect to VNETs, all while automatically scaling in response to load.

Windows Server 2019 support now available for Windows Containers on Azure App Service

We are happy to announce Windows Server 2019 Container support in public preview. Using a custom Windows container in App Service lets you make OS changes that your app needs, so it's easy to migrate on-premises app that requires custom OS and software configuration. Windows Container support is available in our West US, East US, West Europe, North Europe, East Asia, and East Australia regions. Windows Containers are not supported in App Service Environments at present.

Web application firewall at Azure Front Door service

We have heard from many of you that security is a top priority when moving web applications onto the cloud. Today, we are very excited to announce our public preview of the Web Application Firewall (WAF) for the Azure Front Door service.  By combining the global application and content delivery network with natively integrated WAF engine, we now offer a highly available platform helping you deliver your web applications to the world, secure and fast!

Azure Media Services: The latest Video Indexer updates from NAB Show 2019

After sweeping up multiple awards with the general availability release of Azure Media Services’ Video Indexer, including the 2018 IABM for innovation in content management and the prestigious Peter Wayne award, our team has remained focused on building a wealth of new features and models to allow any organization with a large archive of media content to unlock insights from their content; and use those insights improve searchability, enable new user scenarios and accessibility, and open new monetization opportunities. At NAB Show 2019, we are announcing a wealth of new enhancements to Video indexer’s models and experiences.

Now generally available

Extending Azure security capabilities

As more organizations are delivering innovation faster by moving their businesses to the cloud, increased security is critically important for every industry. Azure has built-in security controls across data, applications, compute, networking, identity, threat protection, and security management so you can customize protection and integrate partner solutions. Microsoft Azure Security Center is the central hub for monitoring and protecting against related incidents within Azure. We love making Azure Security Center richer for our customers, and were excited to share some great updates last week at Hannover Messe 2019. Read on to learn about them.

Event-driven Java with Spring Cloud Stream Binder for Azure Event Hubs

Spring Cloud Stream Binder for Azure Event Hubs is now generally available. It is now easier to build highly scalable event-driven Java apps using Spring Cloud Stream with Event Hubs, a fully managed, real-time data ingestion service on Azure that is resilient and reliable service for any situation. This includes emergencies, thanks to its geo-disaster recovery and geo-replication features.

Fast and optimized connectivity and delivery solutions on Azure

We’re announcing the availability of innovative and industry leading Azure services that will help the attendees of the National Association of Broadcasters Show realize their future vision to deliver for their audiences:  Azure Front Door Service (AFD), ExpressRoute Direct and Global Reach, as well as some cool new additions to both AFD and our Content Delivery Network (CDN). April 6-11, Microsoft will be at NAB Show 2019 in Las Vegas, bringing together an industry centered on the ability to deliver richer content experiences for audiences around  the word.

Azure Front Door Service is now generally available

We’re announcing the general availability of Azure Front Door Service (AFD) which we launched in preview last year – a scalable and secure entry point for fast delivery of your global applications. AFD is your one stop solution for your global website/application. Azure Front Door Service enables you to define, manage, and monitor the global routing for your web traffic by optimizing for best performance and instant global failover for high availability. With Front Door, you can transform your global (multi-region) consumer and enterprise applications into robust, high-performance personalized modern applications, APIs, and content that reach a global audience with Azure.

News and updates

Unlock dedicated resources and enterprise features by migrating to Service Bus Premium

Azure Service Bus has been the Messaging as a Service (MaaS) option of choice for our enterprise customers. We’ve seen tremendous growth to our customer base and usage of the existing namespaces, which inspires us to bring more features to the service. We recently expanded Azure Service Bus to support all Azure regions with Availability Zones to help our customers build more resilient solutions. We also expanded the Azure Service Bus Premium tier to more regions to enable our customers to leverage many enterprise ready features on their Azure Service Bus namespaces while also being closer to their customers.

Device template library in IoT Central

With the new addition of a device template library into our Device Templates page, we are making it easier than ever to onboard and model your devices. Now, when you get started with creating a new template, you can choose between building one from scratch or you can quickly select from a library of existing device templates. Today you’ll be able to choose from our MXChip, Raspberry Pi, or Windows 10 IoT Core templates. We will be working to improve this library by adding more device templates which provide customer value.

Azure Updates

Learn about important Azure product updates, roadmap, and announcements. Subscribe to notifications to stay informed.

Technical content

Step up your machine learning process with Azure Machine Learning service

The Azure Machine Learning service provides a cloud-based service you can use to develop, train, test, deploy, manage, and track machine learning models. With Automated Machine Learning and other advancements available, training and deploying machine learning models is easier and more approachable than ever. Automated machine learning helps users of all skill levels accelerate their pipelines, leverage open source frameworks, and scale easily. Automated machine learning, a form of deep machine learning, makes machine learning more accessible across an organization.

Schema validation with Event Hubs

Event Hubs is fully managed, real-time ingestion Azure service. It integrates seamlessly with other Azure services. It also allows Apache Kafka clients and applications to talk to Event Hubs without any code changes. Apache Avro is a binary serialization format. It relies on schemas (defined in JSON format) that define what fields are present and their type. Since it's a binary format, you can produce and consume Avro messages to and from the Event Hubs. Event Hubs' focus is on the data pipeline. It doesn't validate the schema of the Avro events.

SheHacksPurple: Changes to Azure Security Center Subscription

In this short video Tanya Janca will describe recent changes to Azure Security Center Subscription coverage; it now covers storage containers and app service.

PowerShell Basics: Finding the right VM size with Get-AzVMSize

Finding the right virtual machine for your needs can be difficult especially with all of the options available. New options seem to come around often so you may need to regularly check the VMs available within your Azure region. Using PowerShell makes it quick and easy to see all of the VM sizes so you can get to building your infrastructure, and Az-VM will help you determine the VM sizes you can deploy in specific regions, into availability sets, or what size a machine in your environment is running.

Hands-on Lab: Creating an IoT Solution with Kotlin Azure Functions

Dave Glover walks through building an end-to-end IoT Solution with Azure IoT Hub, IoT Hub, Kotlin based Azure Functions and Azure SignalR.

An Ambivert’s Guide to Azure Functions

Chloe Condon will walk you through how to use Azure Functions, Twilio, and a Flic Button to create an app to trigger calls/texts to your phone.

Making Machine Learning Approachable

Often we hear about machine learning and deep learning as a topic that only researchers, mathematicians, or PhDs can be smart enough grasp. It is possible to explain seemingly complex fundamental concepts and algorithms of machine learning without using cryptic terminology or confusing notation.

Azure shows

Episode 273 – Application Patterns in Azure | The Azure Podcast

Rasmus Lystrøm, a Senior Microsoft consultant from Denmark, shares his thoughts and ideas around building applications that take advantage of Azure and allow developers to focus on the business problem at hand.

HTML5 audio not supported

Azure Blob Storage on Azure IoT Edge | Internet of Things Show

Azure Blob Storage on IoT Edge is a light-weight Azure Consistent module which provides local Block blob storage. It comes with configurable abilities to: Automatically tier the data from IoT Edge device to Azure; Automatically delete the data from IoT edge device after specified time.

Azure Pipelines | Visual Studio Toolbox

In this episode, Robert is joined by Mickey Gousset, who takes us on a tour of Azure Pipelines. He shows how straightforward it is to automate your builds and deployments using Azure Pipelines. They are a great way to started on your path to using DevOps practices to ship faster at higher quality.

Deploy WordPress with Azure Database for MariaDB | Azure Friday

Learn how to deploy WordPress backed by Azure Database for MariaDB. It is the latest addition to the open source database services available on the Azure platform and further strengthens Azure's commitment to open source and its communities. The service offers built-in high availability, automatic backups, and scaling of resources to meet your workload's needs.

Hybrid enterprise serverless in Microsoft Azure | Microsoft Mechanics

Apply serverless compute securely and confidently to any workload with new enterprise capabilities. Jeff Hollan, Sr. Program Manager from the Azure Serverless team, demonstrates how you can turn on managed service identities and protect secrets with Key Vault integration, control virtual network connectivity for both Functions and Logic Apps, build apps that integrate with systems inside your virtual network using event-driven capabilities and set cost thresholds to control how much you want to scale with the Azure Functions Premium plan.

Virtual node autoscaling and Azure Dev Spaces in Azure Kubernetes Service (AKS) | Microsoft Mechanics

Recent updates to the Azure Kubernetes Service (AKS) for developers and ops. Join, Program Manager for Azure Kubernetes Service, Ria Bhatia as she shows you the new autoscaling options using virtual nodes as well as how you can use Azure Dev Spaces to test your AKS apps without simulating dependencies. Also, check out the and new ways to troubleshoot and monitor your Kubernetes apps with Azure Monitor.

How to host a static website with Azure Storage | Azure Tips and Tricks

In this edition of Azure Tips and Tricks, learn how you can host a static website running in Azure Storage in a few steps.

How to use the Azure Activity Log | Azure Portal Series

The Azure Activity Log informs you of the who, the what and the when for operations in your Azure resources. In this video of the Azure Portal “How To” Series, learn what activity logs are in the Azure Portal, how to access it, and how to make use of them.

Ted Neward on the ‘Ops’ Side of DevOps | Azure DevOps Podcast

Ted Neward and Jeffrey Palermo are going to be talking about the ‘Ops’ (AKA the operations) side of DevOps. They discuss how operations is implemented in the DevOps movement, the role of operations, how Dev and Ops should work together, what companies should generally understand around the different roles, where the industry is headed, and Ted’s many recommendations in the world of DevOps.

HTML5 audio not supported

Episode 5 – CodeCamping with Philly.NET founder Bill Wolff | AzureABILITY

Philly.NET founder and coding-legend Bill Wolff visits the podcast to talk about both the forthcoming Philly Code Camp 2019.1 and the user-group experience in general.

HTML5 audio not supported

Events

Welcome to NAB Show 2019 from Microsoft Azure!

At NAB Show 2019 this week in Las Vegas we’re announcing new Azure rendering, Azure Media Services, Video Indexer and Azure Networking capabilities to help you achieve more. We’ll also showcase how partners such as Zone TV and Nexx.TV are using Microsoft AI and Azure Cognitive Services to create more personalized content and improve monetization of existing media assets.

Deliver New Services | Hannover Messe 2019

With intelligent manufacturing technology, you can deliver new services, innovate faster to reduce time to market, and increase your margins. At the Hannover Messe 2019 event, discover how Microsoft and partners are empowering companies to create new business value with digital services to develop data-driven and AI-enhanced products and services.

Database administrators, discover gold in the cloud

Data is referred to these days as “the new oil” or “black gold” of industry. If the typical Fortune 100 company gains access to a mere 10 percent more of their data, that can result in increased revenue of millions of dollars. Recently, one of our teams discovered new technology that enables us to do more with less—like agile development helping us deploy new features and software faster to market, and DevOps ensuring it was done with less impact to mission-critical systems. To learn more, attend a free webinar where we’ll be sharing more on the many advantages of managing data in the cloud, and how your company’s “black gold” will make you tomorrow’s data hero.

Customers, partners, and industries

IoT in Action: Enabling cloud transformation across industries

The intelligent cloud and intelligent edge are sparking massive transformation across industries. As computing gets more deeply embedded in the real world, powerful new opportunities arise to transform revenue, productivity, safety, customer experiences, and more. According to a white paper by Keystone Strategy, digital transformation leaders generate eight percent more per year in operating income than other enterprises. Here we lay out a typical cloud transformation journey and provide examples of how the cloud is transforming city government, industrial IoT, and oil and gas innovators.

Enabling precision medicine with integrated genomic and clinical data

Kanteron Systems Platform is a patient-centric, workflow-aware, precision medicine solution. Their solution to data in silos, detached from the point of care integrates many key types of healthcare data for a complete patient longitudinal record to power precision medicine including medical imaging, digital pathology, clinical genomics, and pharmacogenomic data.

Spinnaker continuous delivery platform now with support for Azure

Spinnaker is an open source, multi-cloud continuous delivery platform for releasing software changes with high velocity and confidence. It is being chosen by a growing number of enterprises as the open source continuous deployment platform used to modernize their application deployments. With this blog post and the recent release of Spinnaker (1.13), we are excited to announce that Microsoft has worked with the core Spinnaker team to ensure Azure deployments are integrated into Spinnaker.

 

Azure Stack HCI solutions, Premium Block Blob Storage and new capabilities in the Azure AI space! | Azure This Week – A Cloud Guru

This time on Azure This Week, Lars discusses Microsoft’s hybrid cloud strategy which gets another push with hyper-converged infrastructure, Azure Premium Block Blob Storage is now generally available, and AI developers get more goodies on the Azure platform.

Be sure to check out the new series from A Cloud Guru, Azure Fireside Chats.
Quelle: Azure

How Skype modernized its backend infrastructure using Azure Cosmos DB – Part 3

This is a three-part blog post series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making to them. In part 1, we explored the challenges Skype faced that led them to take action. In part 2, we examined how Skype implemented Azure Cosmos DB to modernize its backend infrastructure. In this post (part 3 of 3), we cover the outcomes resulting from those efforts.

Note: Comments in italics/parenthesis are the author's.

The outcomes

Improved throughout, latency, scalability, and more

Using Azure Cosmos DB, Skype replaced three monolithic, geographically isolated data stores with a single, globally distributed user data service that delivers better throughput, lower latencies, and improved availability. The new PCS service can elastically scale on demand to handle to handle future growth, and gives the Skype team ownership of its data without the burden of maintaining its own infrastructure—all at less than half what it cost to maintain the old PCS system. Development of the solution was fast and straightforward thanks to the extensive functionality provided by Azure Cosmos DB and the fact that it’s a fully-hosted service.

Better throughout and lower latencies

Compared to the old solution, the new PCS service is delivering improved throughput and lower latency—in turn enabling the Skype team to easily meet all its SLAs. “Easy geographic distribution, as enabled by Azure Cosmos DB, was a key enabler in making all this possible,” says Kaduk. “For example, by enabling us to put data closer to where its users are, in Europe, we’ve been able to significantly reduce the time required for the permission service that’s used to setup a call—and meet our overall one-second SLA for that task.”

Higher availability

The new PCS service is supporting its workload without timeouts, deadlocks, or quality-of-service degradation—meaning that users are no longer inconvenienced with bad data or having to wait. And because the service runs on Azure Cosmos DB, the Skype team no longer needs to worry about the availability of the underlying infrastructure upon which its new PCS service runs. 

“Azure Cosmos DB provides a 99.999 percent read availability SLA for all multiregion accounts, with built-in helps protect against the unlikely event of a regional outage,” says Kaduk. “We can prioritize failover order for our multiregion accounts and can even manually trigger failover to test the end-to-end availability of our app—all with guaranteed zero data-loss.”

Elastic scalability

With Azure Cosmos DB, the Skype team can independently and elastically scale storage and throughput at any time, across the globe. All physical partition management required to scale is fully managed by Azure Cosmos DB and is transparent to the Skype team. Azure Cosmos DB handles the distribution of data across physical and logical partitions and the routing of query requests to the right partition—all without compromising availability, consistency, latency, or throughput. All this enables the team to pay for only the storage and throughput it needs today, and to avoid having to invest any time, energy, or money in spare capacity before it’s needed.

“The ability of Azure Cosmos DB to scale is obvious,” says Kaduk. “We planned for 100 terabytes of data 18 months ago and are already at 140 terabytes, with no major issues handling that growth.

Full ownership of data – with zero maintenance and administration

Because Azure Cosmos DB is a fully managed Microsoft Azure service, the Skype team doesn’t need to worry about day-to-day administration, deploy and configure software, or deal with upgrades. Every database is automatically backed up, protected against regional failures, and encrypted, so you the team doesn’t need to worry about those things either—leaving it with more time to focus on delivering new customer value.

“One of the great things about our new PCS service is that we fully own the data store, whereas we didn’t before,” says Kaduk. “In the past, when Skype was first acquired by Microsoft, we had a team that maintained our databases. We didn’t want to continue maintaining them, so we handed them off to a central team. Today, that same user data is back under our full control and we’re still not burdened with day-to-day maintenance—it’s really the best of both worlds.”

Lower costs

Although Kaduk’s team wasn’t paying to maintain the old PCS databases, he knows what that used to cost—and says that the monthly bill for the new solution running on Azure Cosmos DB is much lower. “Our new PCS data store is about 40 percent less expensive than the old one was,” he states. “We pay that cost ourselves today, but, given all the benefits, it’s well worth it.”

Rapid, straightforward implementation

All in all, Kaduk feels the migration to Azure Cosmos DB was “pretty simple and straightforward.” Development began in May 2017, and by October 2017, all development was complete and the team began migrating all 4 billion Skype users to the new solution. The team consisted of eight developers, one program manager, and one manager.

“We had no prior experience with Azure Cosmos DB, but it was pretty easy to come up to speed,” he states. “Even with a few lessons learned, we did it all in six months, which is pretty impressive for a project of this scale. One reason for our rapid success was that we didn’t have to worry about deploying any physical infrastructure. Azure Cosmos DB also gave us a schema-free document database with both SQL syntax and change feed streaming capabilities built-in, all under strict SLAs. This greatly simplified our architecture and enabled us to meet all our requirements in a minimum amount of time.”

Lessons learned

Looking back at the project, Kaduk recalls several “lessons learned.” These include:

Use direct mode for better performance – How a client connects to Azure Cosmos DB has important performance implications, especially with respect to observed client side latency. The team began by using the default Gateway Mode connection policy, but switched to a Direct Mode connection policy because it delivers better performance.
Learn how to write and handle stored procedures – With Azure Cosmos DB, transactions can only be implemented using stored procedures—pieces of application logic that are written in JavaScript that are registered and executed against a collection as a single transaction. (In Azure Cosmos DB, JavaScript is hosted in the same memory space as the database. Hence, requests made within stored procedures execute in the same scope of a database session, which enables Azure Cosmos DB to guarantee ACID for all operations that are part of a single stored procedure.)
Pay attention to query design – With Azure Cosmos DB, queries have a large impact in terms of RU consumption. Developers didn’t pay much attention to query design at first, but soon found that RU costs were higher than desired. This led to an increased focus on optimizing query design, such as using point document reads wherever possible and optimizing the query selections per API.
Use the Azure Cosmos DB SDK 2.x to optimize connection usage – Within Azure Cosmos DB, the data stored in each region is distributed across tens of thousands of physical partitions. To serve reads and writes, the Azure Cosmos DB client SDK must establish a connection with the physical node hosting the partition. The team started by using the Azure Cosmos DB SDK 1.x, but found that its lack of support for connection multiplexing led to excessive connection establishment and closing rates. Switching to the Azure Cosmos DB SDK 2.x, which supports connection multiplexing, helped solve the problem —and also helped mitigate SNAT port exhaustion issues.

The following diagram shows connection status and time_waits when using SDK 1.x.

And the following shows the same after the move to SDK 2.x.

Quelle: Azure

How Skype modernized its backend infrastructure using Azure Cosmos DB – Part 1

This is a three-part blog post series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making to them. In this post (part 1 of 3), we explore the challenges Skype faced that led them to take action. In part 2, we’ll examine how Skype implemented Azure Cosmos DB to modernize its backend infrastructure. In part 3, we’ll cover the outcomes resulting from those efforts.

Note: Comments in italics/parenthesis are the author's.

Scaling to four billion users isn’t easy

Founded in 2003, Skype has grown to become one of the world’s premier communication services, making it simple to share experiences with others wherever they are. Since its acquisition by Microsoft in 2010, Skype has grown to more than four billion total users, more than 300 million monthly active users, and more than 40 million concurrent users.

People Core Service (PCS), one of the core internal Skype services, is where contacts, groups, and relationships are stored for each Skype user. The service is called when the Skype client launches, is checked for permissions when initiating a conversation, and is updated as the user’s contacts, groups, and relationships are added or otherwise changed. PCS is also used by other, external systems, such as Microsoft Graph, Cortana, bot provisioning, and other third-party services.

Prior to 2017, PCS ran in three datacenters in the United States, with data for one-third of the service’s 4 billion users represented in each datacenter. Each location had a large, monolithic SQL Server relational database. Having been in place for several years, those databases were beginning to show their age. Specific problems and pains included:

Maintainability: The databases had a huge, complex, tightly coupled code base, with long stored procedures that were difficult to modify and debug. There were many interdependencies, as the database was owned by a separate team and contained data for more than just Skype, its largest user. And with user data split across three such systems in three different locations, Skype needed to maintain its own routing logic based on which user’s data it needed to retrieve or update.
Excessive latency: With all PCS data being served from the United States, Skype clients in other geographies and the local infrastructure that supported them (such as call controllers), experienced unacceptable latency when querying or updating PCS data. For example, Skype has an internal service level agreement (SLA) of less than one second when setting up a call. However, the round-trip times for the permission check performed by a local call controller in Europe, which reads data from PCS to ensure that user A has permission to call user B, made it impossible to setup a call between two users in Europe within the required one-second period.
Reliability and data quality: Database deadlocks were a problem—and were exacerbated because data used by PCS was shared with other systems. Data quality was also an issue, with users complaining about missing contacts, incorrect data for contacts, and so on.

All of these problems became worse as usage grew, to the point that, by 2017, the pain had become unacceptable. Deadlocks were becoming more and more common as database traffic increased, which resulted in service outages, and weekly backups were leaving some data unavailable. “We did the best with what we had, coming up with lots of workarounds to deal with all the deadlocks, such as extra code to throttle database requests,” recalls Frantisek Kaduk, Principal .NET Developer on the Skype team. “As the problems continued to get worse, we realized we had to do something different.”

In addition, the team faced a deadline related to General Data Protection Regulation (GDPR); the system didn’t meet GDPR requirements, so there was a deadline for shutting down the servers.

The team decided that, to deliver an uncompromised user experience, it needed its own data store. Requirements included high throughput, low latency, and high availability—all of which had to be met regardless of where users were in the globe.

An event-driven architecture was a natural fit, however, it would need to be more than just a basic implementation that stored current data. “We needed a better audit trail, which meant also storing all the events leading up to a state change,” explains Kaduk. “For example, to handle misbehaving clients, we need to be able to replay that series of events. Similarly, we need event history to handle cross-service/cross-shard transactions and other post-processing tasks. The events capture the originator of a state change, the intention of that change, and the result of it.”

Continue on to part 2, which examines how Skype implemented Azure Cosmos DB to modernize its backend infrastructure.
Quelle: Azure

How Skype modernized its backend infrastructure using Azure Cosmos DB – Part 2

This is a three-part blog post series about how organizations are using Azure Cosmos DB to meet real world needs, and the difference it’s making to them. In part 1, we explored the challenges Skype faced that led them to take action. In this post (part 2 of 3), we examine how Skype implemented Azure Cosmos DB to modernize its backend infrastructure. In part 3, we’ll cover the outcomes resulting from those efforts.

Note: Comments in italics/parenthesis are the author's.

The solution

Putting data closer to users

Skype found the perfect fit in Azure Cosmos DB, the globally distributed NoSQL database service from Microsoft. It gave Skype everything needed for its new People Core Service (PCS), including turnkey global distribution and elastic scaling of throughput and storage, making it an ideal foundation for distributed apps like Skype that require extremely low latency at global scale.

Initial design decisions

Prototyping began in May 2017. Some early choices made by the team included the following:

Geo-replication: The team started by deploying Azure Cosmos DB in one Azure region, then used its pushbutton geo-replication to replicate it to a total of seven Azure regions: three in North America, two in Europe, and two in the Asia Pacific (APAC) region. However, it later turned out that a single presence in each of those three geographies was enough to meet all SLAs.
Consistency level: In setting up geo-replication, the team chose session consistency from among the five consistency levels supported by Azure Cosmos DB. (Session consistency is often ideal for scenarios where a device or user session is involved because it guarantees monotonic reads, monotonic writes, and read-your-own-writes.)
Partitioning: Skype chose UserID as the partition key, thereby ensuring that all data for each user would reside on the same physical partition. A physical partition size of 20GB was used instead of the default 10GB size because the larger number enabled more efficient allocation and usage of request units per second (RU/s)—a measure of pre-allocated, guaranteed database throughput. (With Azure Cosmos DB, each collection must have a partition key, which acts as a logical partition for the data and provides Azure Cosmos DB with a natural boundary for transparently distributing it internally, across physical partitions.)

Event-driven architecture based on Azure Cosmos DB change feed

In building the new PCS service, Skype developers implemented a micro-services, event-driven architecture based on change feed support in Azure Cosmos DB. Change feed works by “listening” to an Azure Cosmos DB container for any changes and outputting a sorted list of documents that were changed, in the order in which they were modified. The changes are persisted, can be processed asynchronously and incrementally, and the output can be distributed across one or more consumers for parallel processing. (Change Feed in Azure Cosmos DB is enabled by default for all accounts, and it does not incur any additional costs. You can use provisioned RU/s to read from the feed, just like any other operation in Azure Cosmos DB.)

“Generally, an event-driven architecture uses Kafka, Event Hub, or some other event source,” explains Kaduk. “But with Azure Cosmos DB, change feed provided a built-in event source that simplified our overall architecture.”

To meet the solution’s audit history requirements, developers implemented an event sourcing with capture state pattern. Instead of storing just the current state of the data in a domain, this pattern uses an append-only store to record the full series of actions taken on the data (the “event sourcing” part of the pattern), along with the mutated state (i.e. the “capture state”). The append-only store acts as the system of record and can be used to materialize domain objects. It also provides consistency for transactional data, and maintains full audit trails and history that can enable compensating actions.

Separate read and write paths and data models for optimal performance

Developers used the Command and Query Responsibility Segregation (CQRS) pattern together with the event sourcing pattern to implement separate write and read paths, interfaces, and data models, each tailored to their relevant tasks. “When CQRS is used with the Event Sourcing pattern, the store of events is the write model, and is the official source of information capturing what has happened or changed, what was the intention, and who was the originator,” explains Kaduk. “All of this is stored on one JSON document for each changed domain aggregate—user, person, and group. The read model provides materialized views that are optimized for querying and are stored in a second, smaller JSON documents. This is all enabled by the Azure Cosmos DB document format and the ability to store different types of documents with different data structures within a single collection.” Find more information on using Event Sourcing together with CQRS.

Custom change feed processing

Instead of using Azure Functions to handle change feed processing, the development team chose to implement its own change feed processing using the Azure Cosmos DB change feed processor library—the same code used internally by Azure Functions. This gave developers more granular control over change feed processing, including the ability to implement retrying over queues, dead-letter event support, and deeper monitoring. The custom change feed processors run on Azure Virtual Machines (VMs) under the “PaaS v1” model.

“Using the change feed processor library gave us superior control in ensuring all SLAs were met,” explains Kaduk. “For example, with Azure Functions, a function can either fail or spin-and-wait while it retries. We can’t afford to spin-and-wait, so we used the change feed processor library to implement a queue that retries periodically and, if still unsuccessful after a day or two, sends the request to a ‘dead letter collection’ for review. We also implemented extensive monitoring—such as how fast requests are processed, which nodes are processing them, and estimated work remaining for each partition.” (See Frantisek’s blog article for a deeper dive into how all this works.)

Cross-partition transactions and integration with other services

Change feed also provided a foundation for implementing background post-processing, such as cross-partition transactions that span the data of more than one user. The case of John blocking Sally from sending him messages is a good example. The system accepts the command from user John to block user Sally, upon which the request is validated and dispatched to the appropriate handler, which stores the event history and updates the query able data for user John. A postprocessor responsible for cross-partition transactions monitors the change feed, copying the information that John blocked Sally into the data for Sally (which likely resides in a different partition) as a reverse block. This information is used for determining the relationship between peers. (More information on this pattern can be found in the article, “Life beyond Distributed Transactions: an Apostate’s Opinion.”)

Similarly, developers used change feed to support integration with other services, such as notification, graph search, and chat. The event is received on background by all running change feed processors, one of which is responsible for publishing a notification to external event consumers, such as Azure Event Hub, using a public schema.

Migration of user data

To facilitate the migration of user data from SQL Server to Azure Cosmos DB, developers wrote a service that iterated over all the user data in the old PCS service to:

Query the data in SQL Server and transform it into the new data models for Azure Cosmos DB.
Insert the data into Azure Cosmos DB and mark the user’s address book as mastered in the new database.
Update a lookup table for the migration status of each user.

To make the entire process seamless to users, developers also implemented a proxy service that checked the migration status in the lookup table for a user and routed requests to the appropriate data store, old or new. After all users were migrated, the old PCS service, the lookup table, and the temporary proxy service were removed from production.

Migration for production users began in October 2017 and took approximately two months. Today, all requests are processed by Azure Cosmos DB, which contains more than 140 terabytes of data in each of the replicated regions. The new PCS service processes up to 15,000 reads and 6,000 writes per second, consuming between 2.5 million and 3 million RUs per second across all replicas. A process monitors that RU usage automatically scaling allocated RUs up or down as needed.

Continue on to part 3, which covers the outcomes resulting from Skype’s implementation of Azure Cosmos DB.
Quelle: Azure