Top 3 free resources developers need for learning Azure

In this post, I’ll cover three free resources every developer needs for learning Azure. Dan Fernandez leads the team responsible for bringing our technical documentation and learning resources into a more modern experience that supports new capabilities that were impossible to deliver via MSDN. Recently, I invited Dan to record a few episodes of Azure Friday with Donovan Brown and spend some time showing off the work his team is doing to provide the best doc and learning experience.

1. Microsoft Docs

Last December, I wrote 4 tips for learning Azure in the new year, in which I included links to several resources, including the Azure documentation. In that post, I admit that I did a disservice by glossing over the revolution that Microsoft Docs truly represents – both internally and externally. Not only did it radically change how we create documentation, it improved how you can learn and use Azure.

Learning Azure: Part 1—Azure Docs tips and tricks

Did you know that the Azure docs are not only open source, but it’s currently the fastest growing project on GitHub? In this episode, Dan shows off some cool features, a few tips & tricks, how you can contribute, and a few buried treasures.

Azure documentation
Microsoft Docs contributor guide overview

Learning Azure: Part 2—Architecture and interactive APIs for .NET and REST APIs

Whether you’re trying to wrap your head around architectural concepts, or you need to get down into the nitty-gritty of understanding a particular API, Dan shows how Azure Docs has you covered.

Azure Architecture Center
Interactive code snippets in String.Format Method (System)
Cognitive Toolkit Python API Package Reference
Microsoft Cognitive Toolkit (CNTK) on GitHub
List Resource Groups interactive REST API

Unified Microsoft API references:

.NET API Browser
REST API Browser
Java API Browser
JavaScript API Browser
Python API Browser
PowerShell Module Browser

2. Microsoft Learn

Learning Azure: Part 3—A quick tour of Microsoft Learn

At Microsoft Ignite 2018, the team working on Microsoft Docs delivered a new approach to learning with Microsoft Learn, which added a new dimension to what's available for those seeking to learn Azure.

Dan gives a quick tour of Microsoft Learn. With Microsoft Learn, you learn-by-doing with interactive, step-by-step tutorials creating real resources in Azure. Even better – it’s free, and no credit card is required.

Microsoft Learn
A Tour of Microsoft Learn
Microsoft Learn Azure Content
Azure Fundamentals Learning Path

Microsoft Docs and Microsoft Learn are the bedrock for Azure technical content. Several of our documentation sets are open source, hosted on GitHub. More teams at Microsoft are adopting this model all the time. Even document sets that are not entirely open source have public-facing repos where we invite you to make pull requests. And Microsoft Learn will continue to grow, expanding into new areas and going deeper past the fundamentals to more advanced topics.

3. The Developer’s Guide to Azure

The most recent update to The Developer’s Guide to Azure includes cover-to-cover improvements that you shouldn't miss. Written by developers (Michael Crump at Microsoft and Azure MVP Barry Luijbregts) for developers, this guide will show you how to get started with Azure and which services you can use to run your applications, store your data, incorporate intelligence, build IoT apps, and deploy your solutions more efficiently and securely.

Quelle: Azure

Virtual Network Service Endpoints for serverless messaging and big data

This blog was co-authored by Sumeet Mittal, Senior Program Manager, Azure Networking.

Earlier this year in July, we announced the public preview for Virtual Network Service Endpoints and Firewall rules for both Azure Event Hubs and Azure Service Bus. Today, we’re excited to announce that we are making these capabilities generally available to our customers.

This feature adds to the security and control Azure customers have over their cloud environments. Now, traffic from your virtual network to your Azure Service Bus Premium namespaces and Standard and Dedicated Azure Event Hubs namespaces can be kept secure from public Internet access and completely private on the Azure backbone network.

Virtual Network Service Endpoints do this by extending your virtual network private address space and the identity of your virtual network to your virtual networks. Customers dealing with PII (Financial Services, Insurance, etc.) or looking to further secure access to their cloud visible resources will benefit the most from this feature. For more details on the finer workings of Virtual Network service endpoints, refer to the documentation.

Firewall rules further allow a specific IP address or a specified range of IP addresses to access the resources.

Virtual Network Service Endpoints and Firewall rules are supported for the tiers listed below for all public regions at no extra cost.

Service
Tier

Azure Service Bus
Premium tier

Azure Event Hubs
Standard and Dedicated tier

Azure Event Hubs, a highly reliable and easily scalable data streaming service, and Azure Service Bus, which provides enterprise messaging, are the new set of serverless offerings joining the growing list of Azure services that have enabled Virtual Network Service Endpoints.

To get more details on these features, please visit the documentation links below:

Azure Service Bus Virtual Network Service Endpoints and Firewall rules
Azure Event Hubs Virtual Network Service Endpoints and Firewall rules

For step by step guidance on how to integrate Virtual Networks service endpoints and set up Firewalls (IP Filtering), check out the tutorial, “Enable Virtual Networks Integration and Firewalls on Event Hubs namespace.”
Quelle: Azure

Azure Backup can automatically protect SQL databases in Azure VM through auto-protect

We are excited to share the auto-protection capability for SQL Server in Azure Virtual Machines (VM). This is a key addition to the public preview of Azure Backup for SQL Server on Azure VM, announced earlier this year. Azure Backup for SQL Server is an enterprise credible, zero-infrastructure pay as you go (PAYG) service that leverages native SQL backup and restore APIs to provide a comprehensive solution to backup SQL servers running in Azure VMs.

What happens when you add a new database to your protected SQL Server? You need to rediscover the database and then manually trigger configure protection to backup that database. Now imagine if we take away the work from you and automatically detect and protect each new database you add to the instance. Our new auto-protection feature does just that.

Auto-protection is a capability that lets you automatically protect all the databases in a standalone SQL Server instance or a SQL Server Always On availability group. Not only does it enable backups for the existing databases, but it also protects all the databases that you may add in future.

Getting started

You can enable auto-protection for the desired SQL Server instance or Always On availability group under Configure Backup for SQL Server in Azure VM. When enabled, all the databases for that SQL Server will automatically be selected. You can then define the backup policy for the selected databases. After you associate the policy, you can see the newly protected databases under the Backup items.

Thus, if there is a significant addition or deletion of databases in your environment, auto-protection capability will save you time and effort by automatically discovering and protecting the new databases.

Related links and additional content

Learn more about auto-protection by referring to our documentation on Backup SQL Server databases to Azure.
Learn more about Azure Backup.
Want more details? Check out the Azure Backup documentation.
Need help? Reach out to the Azure Backup forum for support.
Tell us how we can improve Azure Backup by contributing new ideas and voting up existing ones.
Follow us on Twitter @AzureBackup for latest news and updates.

Quelle: Azure

Azure PowerShell ‘Az’ Module version 1.0

There is a new Azure PowerShell module that is built to harness the power of PowerShell Core and Cloud Shell and maintain compatibility with Windows PowerShell 5.1. Its name is “Az.” Az ensures that Windows PowerShell and PowerShell Core users can get the latest Azure tooling in every PowerShell on every platform. Az also simplifies and normalizes Azure PowerShell cmdlet and module names. Az ships in Azure Cloud Shell and is available from the PowerShell Gallery. 

The Az module version 1.0 was released on December 18, 2018, and will be updated on a two-week cadence in 2019, starting with a January 15, 2019 release.

As with all Azure PowerShell modules, Az uses semantic versioning and implements a strict breaking change policy – all breaking changes require advance customer notice and can only occur during breaking change releases. 

For complete details on the release, timeline, and compatibility features, check out the Az announcement page.

New features

Az runs on Windows PowerShell 5.1 and PowerShell Core (cross-platform)
Az is always up to date with the latest tooling for Azure services
Az ships in Cloud Shell
Az shortens and normalizes cmdlet names – all cmdlets use ‘Az’ as their noun prefix
Az simplifies and normalizes module names – data plane and management plane cmdlets for each service use the same Az module
Az ships with new cmdlets to enable script compatibility with AzureRM (Enable/Disable-AzureRmAlias)
Az enables device code authentication support, allowing login when remoting via a terminal to another computer, VM, or container

You can find complete details on the Az roadmap and features on the Az announcement page.

Migrating from AzureRM

Users are not required to migrate from AzureRM, as AzureRM will continue to be supported. However, it is important to note that all new Azure PowerShell features will appear only in ‘Az’. Az has new features to ease migration, and these are discussed in depth in the Az 1.0 Migration Guide. If you have scripts written for AzureRM, there are three paths for migration

Convert existing scripts to Az – For users that mainly use Azure PowerShell interactively and have few scripts, the simplest option is to remove AzureRM when installing Az. You can use the ‘Uninstall-AzureRm’ cmdlet that is included with the Az module to do this. 
Keep Az and AzureRM scripts separate and run in separate sessions – Az and AzureRM cannot be executed in the same PowerShell session. However, if you are careful that your scripts use either Az or AzureRM, you can have both modules installed and run scripts using each module in separate sessions. 
Install Az in PowerShell Core – since PowerShell Core is installed side-by-side with Windows PowerShell 5.1, you can install Az in PowerShell Core without impacting any existing AzureRM scripts running in Windows PowerShell 5.1. See the Installation Guide for PowerShell Core on Windows for details.

Authentication changes

The Az module is cross-platform, and so some authentication mechanisms supported by Connect-AzureRmAccount have been changed or removed in the Az 1.0 version of the Connect-AzAccount cmdlet, as they are not supported on all platforms. Some of these authentication mechanisms will be enabled in future versions:

Support for user login with PSCredential: This capability is currently disabled in Az 1.0, but will be enabled in the January 15, 2019 release for Windows PowerShell 5.1 only. This scenario is discouraged as a mechanism for authenticating scripts, except in limited circumstances. Instead, scripts should use Service Principal authentication as described in “Sign In with Azure PowerShell.”
Support for interactive login using the automatic sign-in dialog: In AzureRM, interactive user sign-in automatically displays a web page where the user enters credentials. In Az 1.0, this is replaced with device code authentication, in which the user opens a browser to the device login page and enters a code before providing credentials, as shown in the graphic below.

Interactive login with automatic web page login display will be enabled for Windows PowerShell 5.1 only in the January 15, 2019 release and will be supported across all platforms in early 2019.

Azure Automation support

The Az module can run in Windows PowerShell 5.1, but requires .NET Framework version 4.7.2 or later. Azure Automation cloud workers currently support .NET Framework version 4.6.1. Azure Automation is updating their cloud workers to support .NET version 4.7.2, and until these new workers are deployed, Az cannot be used in Azure Automation cloud runbooks. 

The Azure Automation team plans to deploy cloud workers supporting .NET 4.7.2 to all Azure environments and regions by March 15, 2019. You should expect to see more announcements as this rollout progresses in the new year. You can find more information on the Az announcement page.

Try it out

Az, which is open source, shipped version 1.0 on December 18, 2018. You can install Az from the PowerShell Gallery and if you want to go through the code then you can check the Azure PowerShell GitHub repository.

We would like to invite you to install and try the new, cross-platform Az module and we look forward to hearing your questions, comments, or issues with Az either by using the built-in Send-Feedback cmdlet, or via GitHub by submitting an issue.
Quelle: Azure

Transparent Data Encryption (TDE) with customer managed keys for Managed Instance

We are excited to announce the public preview of Transparent Data Encryption (TDE) with Bring Your Own Key (BYOK) support for Microsoft Azure SQL Database Managed Instance. Azure SQL Database Managed Instance is a new deployment option in SQL Database that combines the best of on-premises SQL Server with the operational and financial benefits of an intelligent, fully-managed relational database service. 

TDE with BYOK support has been generally available for single databases and elastic pools since April 2018. It is one of the most frequently requested capabilities by enterprise customers who are looking to protect data at rest, or meet regulatory and compliance obligations that require implementation of specific key management controls. TDE with BYOK support is offered in addition to TDE with service managed keys which is enabled on all new Azure SQL Databases, single databases, pools, and managed instances by default.

TDE with BYOK support uses Azure Key Vault, which provides highly available and scalable secure storage for RSA cryptographic keys backed by FIPS 140-2 Level 2 validated hardware security modules (HSMs). Azure Key Vault streamlines the key management process and enables customers to maintain full control of encryption keys, including managing and auditing key access.

Customers can generate and import their RSA key to Azure Key Vault and use it with Azure SQL Database TDE with BYOK support for their managed instances. Azure SQL Database handles the encryption and decryption of data stored in databases, log files, and backups in a fully transparent fashion by using a symmetric Database Encryption Key (DEK) which is in turn protected using the customer managed key called TDE Protector stored in Azure Key Vault.

Customers can rotate the TDE Protector in Azure Key Vault to meet their specific security requirements or any industry specific compliance obligations. When the TDE Protector is rotated, Azure SQL Database detects the new key version within minutes and re-encrypts the DEK used to encrypt data stored in databases. This does not result in re-encryption of the actual data and there is no other action required from the user.

Customers can also revoke access to encrypted managed instances by revoking access to the managed instance’s TDE Protector stored in Azure Key Vault. There are several ways to revoke access to keys stored in Azure Key Vault. Please refer to the Azure Key Vault PowerShell and Azure Key Vault CLI documentation for more details. Revoking access in Azure Key Vault will effectively block access to all databases when the TDE Protector is inaccessible by the Azure SQL Database managed instance.

Azure SQL Database requires soft delete to be enabled in Azure Key Vault to protect the TDE Protector against accidental deletion.

You can get started today by visiting the Azure portal, reviewing REST API for Managed Instance, and the how-to guide for using PowerShell documentation. To learn more about the feature including best practices and to review our configuration checklist see our documentation “Azure SQL Transparent Data Encryption: Bring Your Own Key support.”
Quelle: Azure

Participate in the 16th Developer Economics Survey

The Developer Economics Q4 2018 survey is here in its 16th edition to shed light on the future of the software industry. Every year more than 40,000 developers around the world participate in this survey, so this is a chance to be part of something big, voice your thoughts, and make your contribution to the developer community. This edition introduces questions about ethics, privacy, security, and project management methodologies in software development.

Is this survey for me?

The Developer Economics Q4 2018 survey is for all developers (professionals, hobbyists, and students) engaging in the following software development areas: web, mobile, desktop, backend services, IoT, AR/VR, machine learning and data science, and gaming.

What questions am I likely to be asked?

The survey asks questions related to developer skills, and experiences with dev tools, platforms, frameworks, resources, and more.

Your background and skills for demographics
What’s going up and what’s going down in the software industry?
Are you working on the projects you would like to work on?
Where do you think development time should be invested?
Which are your favorite tools and platforms?

Also, keep an eye out for some technology trivia interspersed in the survey. You may learn something new.

What’s in it for me?

Here’s what you get for sharing your mind:

Everyone who completes the survey is eligible to win one of the following: Samsung S9 Plus, $25 Udemy vouchers, Filco (Ninja Majestouch-2 Tenkeyless NKR Tactile Action Keyboard), Axure RP8 Pro one year license, Samsung 970 EVO 500GB V-NAND M.2 PCI Express Solid State Drive, $200 towards the software subscription of your choice, Oculus Rift and Touch Virtual Reality System, mug with your AI Character on it, T-shirt with your AI Character on it, $100 USD Prepaid Virtual Visa card
A copy of the State of the Developer Nation 16th edition report with the key findings of the survey (when it's published), so you know how your responses match with other developers
Access to Developer Benchmarks, showing you Q4 2018 developer trends in your region

For each completed response to the survey, they’ll also donate money to the Raspberry Pi Foundation. Complete the survey and help us support a good cause!

What’s in it for Microsoft?

The Developer Economics Q4 2018 survey is an independent survey from SlashData, an analyst firm in the developer economy that tracks global software developer trends. We’re interested in seeing the report that comes from this survey, and we want to ensure the broadest developer audience participates.

Of course, any data collected by this survey is between you and SlashData. You should review their Terms and Conditions page to learn more about the awarding of prizes, their data privacy policy, and how SlashData will handle your data.

Ready to go?

The survey is open until Monday, January 14, 2019.

Take the survey today.

The survey is available in English, Chinese (Simplified and Traditional), Spanish, Portuguese, Vietnamese, Russian, Japanese, and Korean.
Quelle: Azure

Microsoft open sources Trill to deliver insights on a trillion events a day

In today’s high-speed environment, being able to process massive amounts of data each millisecond is becoming a common business requirement. We are excited to be announcing that an internal Microsoft project known as Trill for processing “a trillion events per day” is now being open sourced to address this growing trend.

Here are just a few of the reasons why developers love Trill:

As a single-node engine library, any .NET application, service, or platform can easily use Trill and start processing queries.
A temporal query language allows users to express complex queries over real-time and/or offline data sets.
Trill’s high performance across its intended usage scenarios means users get results with incredible speed and low latency. For example, filters operate at memory bandwidth speeds up to several billions of events per second, while grouped aggregates operate at 10 to 100 million events per second.

A rich history

Trill started as a research project at Microsoft Research in 2012, and since then, has been extensively described in research papers such as VLDB and the IEEE Data Engineering Bulletin. The roots of Trill’s language lie in Microsoft’s former service StreamInsight, a powerful platform allowing developers to develop and deploy complex event processing applications. Both systems are based off an extended query and data model that extends the relational model with a time component.

While systems prior to Trill only achieved subsets of these benefits, Trill provides all these advantages in one package. Trill was the first streaming engine to incorporate techniques and algorithms that process events in small batches of data based on the latency tolerated by the user. It was also the first engine to organize those batches in columnar format, enabling queries to execute much more efficiently than before. To users, working with Trill is the same as working with any .NET library, so there is no need to leave the .NET environment. Users can embed Trill within a variety of distributed processing infrastructures such as Orleans and a streaming version of Microsoft’s SCOPE data processing infrastructure.

Trill works equally well over real-time and offline datasets, achieving best of breed performance across the spectrum. This makes it the engine of choice for users who just want one tool for all their analyses. The highly expressive power of Trill’s language allows users to perform advanced time-oriented analytics over a rich range of window specifications, as well as look for complex patterns over streaming datasets.

After its launch and initial deployment across Microsoft, the Trill project moved from Microsoft Research to the Azure Data product team and became a key component of some of the largest mission-critical streaming pipelines within Microsoft.

Powering mission-critical streaming pipelines

Trill powers internal applications and external services, reaching thousands of developers. A number of powerful, streaming services are already being powered by Trill, including:

Financial Fabric

“Trill enables Financial Fabric to provide real-time portfolio & risk analytics on streaming investment data, fundamentally changing the way financial analytics on high volume and velocity datasets are delivered to fund managers.” – Paul A. Stirpe, Ph.D., Chief Technology Officer, Financial Fabric

Bing Ads

“Trill has enabled us to process large scale data in petabytes, within a few minutes and near real-time compared to traditional processing that would give us results in 24 plus hours. The key capabilities that differentiate Trill in our view are the ability to do complex event processing, clean APIs for tracking and debugging, and the ability to run the stream processing pipeline continuously using temporal semantics. Without Trill, we would have been struggling to get streaming at scale, especially with the additional complex requirements we have for our specific big data processing needs.” – Rajesh Nagpal, Principal Program Manager, Bing

“Trill is the centerpiece of our stream processing system for ads in Bing. We are able to construct and execute complex business scenarios with ease because of its powerful, consistent data model and expressive query language. What’s more is its design for performance, Trill lives up to its namesake of “trillions of events per day” because it can easily process extremely large volumes of data and operate against terabytes of state, even in queries that contain hundreds of operators.” – Daniel Musgrave, Principal Software Engineer, Bing

Azure Stream Analytics

“Azure Stream Analytics went from the first line of code to public preview within 10 months by using Trill as the on-node processing engine. The library form factor conveniently integrates with our distributed processing framework and input/output adaptors. Our SQL compiler simply compiles SQL queries to Trill expressions, which takes care of the intricacies of the temporal semantics. It is a beautiful programming model and high-performance engine to use. In the near future, we are considering exposing Trill’s programming model through our user defined operator model so that all of our customers can take advantage of the expressive power.” – Zhong Chen, Principal Group Engineering Manager, Azure Data.

Halo

“Trill has been intrinsic to our data processing pipeline since the day we introduced it into our services back in 2013. Its impact has been felt by any player who has picked up the sticks to play a game of Halo. Their data dense game telemetry flows through our pipelines and into the Trill engine within our services. From finding anomalous and interesting experiences to providing frontline defense against bad behavior, Trill continues to be a stalwart in our data processing pipeline.” – Mike Malyuk, Senior Software Engineer, Halo

There are many other examples of Trill enabling streaming at scale, including Exchange, Azure Networking, and telemetry analysis in Windows.

Open-sourcing Trill

We believe there is no equivalent to Trill available in the developer community today. In particular, by open-sourcing Trill we want to offer the power of the IStreamable abstraction to all customers the same way that IEnumerable and IObservable are available. We hope that Trill and IStreamable will provide a strong foundation for streaming or temporal processing for current and future open-source offerings.

We also have many opportunities for community involvement in the future development of Trill. First, one of Trill’s extensibility points is that it allows users to write custom aggregates. Trill’s internal aggregates are implemented in the same framework as user-defined ones. Every aggregate uses the same underlying high-performance architecture with no special cases. While Trill has a wide variety of aggregates already, there are countless others that could be added, especially in verticals such as finance.

There are also several research projects built on top of Trill where the code exists but is not yet in product-ready form. Three projects at the top of our working list include:

Digital signal processing with the capability and performance normally seen in R or better.
An improved ability to handle out of order data for allowing users to specify multiple levels of latency.
Allowing operator state to be managed using the recently open-sourced FASTER framework.

Welcome to Trill!

We are incredibly excited to be sharing Trill with all of you! You can look forward to more blog posts about Trill’s API, how Trill is used within Microsoft, and in-depth technical details. In the meantime, please take a look at the query writing guide in our GitHub repository, take Trill for a spin, and tell us what you think! Reach out to us at asktrill@microsoft.com, we’d love to hear from you.
Quelle: Azure

A fintech startup pivots to Azure Cosmos DB

The right technology choices can accelerate success for a cloud born business. This is true for the fintech start-up clearTREND Research. Their solution architecture team knew one of the most important decisions would be the database decision between SQL or NoSQL. After research, experimentation, and many design iterations the team was thrilled with their decision to deploy on Microsoft Azure Cosmos DB. This blog is about how their decision was made.

Data and AI are driving a surge of cloud business opportunities, and one technology decision that deserves careful evaluation is the choice of a cloud database. Relational databases continue to be popular and drive a significant demand with cloud-based solutions, but NoSQL databases are well suited for distributed global scale solutions.

For our partner clearTREND, the plan was to commercialize a financial trend engine and provide a subscription investment service to individuals and professionals. The team responsible for clearTREND’s SaaS solution are a veteran team of software developers and architects who have been implementing cloud-based solutions for years. They understood the business opportunity and wanted to better understand the database technology options. Through their due diligence, the architecture morphed as business priorities and data sets were refined. After a lot of research and hands-on experimentation, the architectural team decided on Azure Cosmos DB as the best fit for the solution.

Business models are under attack, especially in the financial industry. Cosmos DB is a technology that can adapt, evolve, and allow a business to innovate faster in order to turn opportunities into strategic advantages.

Six reasons to choose Cosmos DB

Below are reasons the team at clearTREND selected Cosmos DB:

Schema design is much easier and flexible. With an agile development methodology, schemas change frequently and the ability to quickly and safely implement changes is a big advantage. Cosmos DB is schema-agnostic so there is massive flexibility around how the data can be consumed.
Database reads and writes are really fast. Cosmos DB can provide less than 10 millisecond reads and writes, backed with a service level agreement (SLA).
Queries run lightning fast and autoindexing is a game-changer. Reads and writes based on a primary or partition key are fast, but for many NoSQL implementations, queries executed against non-keyed document attributes may perform poorly. Secondary indexing can be a management and maintenance burden. By default, Cosmos DB automatically indexes all the attributes in a document so query performance is optimized as soon as data is loaded. Another benefit of auto-indexing is that the schema and indexes are fully synchronized so schema changes can be implemented quickly without downtime or management needed for secondary indexes. 
With thoughtful design Cosmos DB can be very cost-effective. The Cosmos DB cost model depends on how the database is designed via number of collections, partitioning key, index strategy, document size, and number of documents. Pricing for Cosmos DB is based on resources that have been reserved, these resources are called request units or RUs and are described in the “Request Units in Azure Cosmos DB” documentation. The clearTREND schema design is implemented as a single document collection and the entire cost of the solution on Azure, including Cosmos DB is at an affordable monthly price. Keep in mind this is a managed database service so monthly cost includes support, 99.999 percent high-availability, an SLA for read and write performance, automatic partitioning, data encrypted by default, and automatic backups.
Programmatically re-size capacity for workload bursts. The clearTREND workload has a predictable daily burst pattern and RUs can be programmatically adjusted. When additional compute resources are needed for complex processing or to meet higher throughput requirements, RUs can be increased. Once the processing completes, RUs are adjusted back down. This elasticity means Cosmos DB can be re-sized in order to cost-effectively adapt to workload demands.
Push-button globally distributed data. Designing for future scalability of a solution can be tricky, technology and design choices can become inefficient as a solution grows beyond the initial vision. The advantage with Cosmos DB is that it can become a globally configured, massively scaled out solution with just a few clicks. There are none of the operational complications of setting up and managing a cloud-scale, NoSQL distributed database.

Design and implementation tips for Cosmos DB

If you are new to Cosmos DB, here are some tips from the clearTREND team to consider when designing and implementing a solution:   

Design the schema around query and API optimization. Schema design for a NoSQL database is just as important as it is for a relational database management system (RDBMS) database, but it’s different. While a NoSQL database doesn’t require pre-defined table structures, you do have to be intentional about organizing and defining the document schema while also being aware of where and how relationships will be represented and embedded. To guide the schema design, the clearTREND team tends to group data based on the data elements that are written and retrieved by the solution’s APIs.
Design a flexible partition key. Cosmos DB requires a partition key to be specified when creating a document collection over 10GB. Deciding on a partition key can be tricky because initially it may not be clear what the optimal choice is for a partition key. Should it be a data category, geographical region, ID field, or a time scale like day, week, or month? A poorly designed partition key can create a performance bottleneck called a hot spot which concentrates read and write activity on a single partition rather than distributing activity evenly across partitions. If a partition key has to be changed, it can impact application availability as the underlying data is copied to the new collection and re-indexed. The clearTREND team uses an approach that affords flexibility in setting a partition key. The partition key is a string called PartitionID and initially it was set to be a value that represents a geography. Later when it was realized a more efficient key would be a calculated field, they programmatically replaced the geography values with the calculated values, avoiding a data copy and re-indexing operation.
Consider a schema design based on a single collection. A common design strategy is to use one document type per collection, but there are benefits to storing multiple document types in a single collection. Collections are the basis for partitioning and indexing so it may not seem intuitive to store multiple document types in a single collection. But it can maximize functionality with no cross-collection operations needed and minimize overall cost, this is because a single collection is less expensive than multiple collections. The clearTREND solution has seven different document types, all stored in a single collection. The approach is implemented with an enumerated field called doc type from which all documents are derived. Every document has a doc type property to correspond to one of the seven document types.     
Tune schema design by understanding the RU costs of complex queries and stored procedure operations. It can be difficult to anticipate the costs for complex queries and stored procedures, especially if you don’t know in advance how many reads or writes Cosmos DB will need to execute the operation. Capture the metrics and costs (RUs) for complex operations and use the information to streamline schema design. One way to capture these metrics is to execute the query or stored procedure from the Cosmos DB dashboard on the Azure portal.

Consider embedding a simple or calculated expression as a document property. If there are requirements to calculate a simple aggregation like a count, sum, minimum, and maximum, or there is a need to evaluate a simple Boolean logic expression, it may make sense to define the expression as a property of the base document class. For instance, in a logging application there is likely logic to evaluating conditions and determine if an operation has been successful or not. If the logic is a simple Boolean expression like the one below, consider including it in the class definition:

public class LogStatus
{
// C# example of a Boolean expression embedded in a class definition
public bool Failed => !((WasReadSuccessful && WasOptimizationSuccssful && StatusMsg == “Success”) ||
(WasReadSuccessful && !IsDataCurrent));
public string StatusMsg {get; set;}
public bool WasReadSuccessful {get; set;}
public bool WasOptimizationSuccessful {get;set}
public bool IsDataCurrent {get;set}
}

The command field showing Failed is defined as a read-only calculated property. If database usage is primarily read intensive, then this approach has the potential to reduce overall RU cost as the expression is evaluated and stored or when the document is written. This is an alternative to reducing cost each time the document is queried.  

Remember, referential integrity is implemented in the application layer. Referential integrity ensures that relationships between data elements are preserved, and with an RDBMS referential integrity is enforced through keys. For example, an RDBMS uses primary and foreign keys to ensure a product exists before an order for it can be created. If referential integrity is a requirement and data dependencies need to be monitored and enforced, it needs to be done at the application layer. Be rigorous about testing for referential and data integrity. 
Use Application Insights to monitor Cosmos DB activity. Application Insights is a telemetry service and for this solution was used to collect and report detailed performance, availability, and usage information about Cosmos DB activities. Azure Functions provided the integration between Cosmos DB and Application Insights through the use of Metrics Explorer and the capability to capture custom events using TelemetryClient.GetMetric() .

Recommended next steps

NoSQL is a paradigm rapidly shifting the way database solutions are implemented in the cloud. Whether you are a developer or database professional, Cosmos DB is an increasingly important player in the cloud database landscape and can be a game changer for your solution. If you haven’t already, get introduced to the advantages and capabilities of Cosmos DB. Take a look at the documentation, dissect the sample GitHub application, and learn more about design patterns:

Fintech Startup Commercializes Internal tool as a SaaS Product.
Discover clearTREND, the world’s first cloud-based financial trend engine.
Try Cosmos DB for free. You get a limited time, full service experience. Try it out, run through a tutorial or demo, and step through a quick start without a required Azure account or credit card.
If you are a developer, try out the Cosmos DB emulator. Develop and test an application locally without creating an Azure subscription or incurring costs. Once the application works, switch to using Azure Cosmos DB.

Thank you to our partners clearTREND and Skyline Technologies!

One of the great things about working for Microsoft are the opportunities to work with customers and partners, and to learn through them about their creative approaches for implementing technology. The team that designed and implemented the clearTREND solution are architects and developers with Skyline Technologies. Passionate about their business clients and solving complex technical challenges, they were very early cloud adopters. We especially appreciate the team members who gave their time to this effort including Tim Miller, Greg Levenhagen, and Michael Lauer. It’s been a pleasure working with you.
Quelle: Azure

Azure.Source – Volume 62

KubeCon North America 2018

KubeCon North America 2018: Serverless Kubernetes and community led innovation!

Brendan Burns, Distinguished Engineer in Microsoft Azure and co-founder of the Kubernetes project, provides a welcome to KubeCon North America 2018, which took place last week in Seattle. In his post, Brendan provides a retrospective on Azure Kubernetes Services (AKS), including how engineers at companies such as Maersk, Siemens, and Bosch benefited from adoption of AKS in their solutions. He also provides an overview of the various announcements we made at KubeCon. With Docker, Bitnami, Hashicorp, and others we announced the Cloud Native Application Bundle (CNAB) specification, which is a new distributed application package that combines Helm or other configuration tools with Docker images to provide a complete, self-installing cloud applications. He also announced that Microsoft is donating the likeness of Phippy, and all of your favorites from the Children’s Illustrated Guide to Kubernetes to the CNCF, and the release of a special second episode of the guide, Phippy Goes to the Zoo, which covers ingresses, CronJobs, CRDs, and more.

A hybrid approach to Kubernetes

Azure Stack enables you to run your containers on-premise in pretty much the same you as you do with global Azure. Microsoft Azure Stack is a hybrid cloud platform that lets you deliver services from your datacenter. As a service provider, you can offer services to your tenants. The Kubernetes Cluster Marketplace item 0.3.0 for Azure Stack is consistent with Azure since the template is generated by the Azure Container Service Engine, the resulting cluster will run the same containers as in AKS. It also complies with the Cloud Native Foundation. The cluster depends on an Ubuntu server, custom script, and the Kubernetes items to be in the Azure Stack Marketplace.

Now in preview

Microsoft previews neural network text-to-speech

Speech Service, part of Azure Cognitive Services now offers a neural network-powered text-to-speech capability. Neural Text-to-Speech makes the voices of your apps nearly indistinguishable from the voices of people. Use it to make conversations with chatbots and virtual assistants more natural and engaging, to convert digital texts such as e-books into audiobooks and to upgrade in-car navigation systems with natural voice experiences and more. This release includes significant enhancements since we first revealed Neural Text-to-Speech at Ignite earlier this year, such as: enhanced voice quality, accelerated runtime performance, and greater service availability. With these updates, Speech Services Neural Text-to-Speech capability offers the most natural-sounding voice experience for your users in comparison to the traditional and hybrid system approaches.

Native Python support on Azure App Service on Linux: new public preview!

Built-in Python images for Azure App Service on Linux are now available in public preview. With the choice of Python 3.7, 3.6 and soon 2.7, developers can get started quickly and deploy Python applications to the cloud, including Django and Flask, and leverage the full suite of features of Azure App Service on Linux. When you use the official images for Python on App Service on Linux, the platform automatically installs the dependencies specified in the requirements.txt​ file. While the underlying infrastructure of Azure App Service on Linux has been generally available (GA) for over a year, at the moment we’re releasing the runtime for Python in public preview, with GA expected in a few months.

Automatic performance monitoring in Azure SQL Data Warehouse (preview)

The preview of Query Store for Azure SQL Data Warehouse is now available in preview for both our Gen1 and Gen2 offers. The Query Store contains three actual stores: a plan store for persisting the execution plan information, a runtime stats store for persisting the execution statistics information, and a wait stats store for persisting wait stats information. Query Store is a set of internal stores and Dynamic Management Views (DMVs). These stores are managed automatically by SQL Data Warehouse and provide an unlimited number of queries storied over the last 7 days at no additional charge. Query Store is available in all Azure regions with no additional charge.

Also in preview

Connect Cognitive Services subscription to enable unlimited skillset execution
Python images for App Service Linux are now in preview
MongoDB to Azure Cosmos DB migration is in preview

Now generally available

Azure Monitor for containers now generally available

Azure Monitor for containers monitors the health and performance of Kubernetes clusters hosted on Azure Kubernetes Service (AKS). Since the public preview, we have added several capabilities including: Multi-cluster view, Performance Grid view, Live debugging, and automated onboarding Azure Monitor for containers. Azure Monitor for containers gives you performance visibility by collecting memory and processor metrics from controllers, nodes, and containers that are available in Kubernetes through the Metrics API. After you enable monitoring from Kubernetes clusters, metrics and logs are automatically collected for you through a containerized version of the Log Analytics agent for Linux and stored in your Log Analytics workspace.

Streamlined IoT device certification with Azure IoT certification service

Azure IoT certification service (AICS), a new web-based test automation workflow, is now generally available. AICS will significantly reduce the operational processes and engineering costs for hardware manufacturers to get their devices certified for Azure Certified for IoT program and be showcased on the Azure IoT device catalog. The goals of the certification program are to showcase the right set of IoT devices for industry-specific vertical solutions and to simplify IoT device development. AICS helps achieve these goals by delivering a consistent certification process through automation, additional tests to support validation of device twins and direct methods with IoT Hub primitives, flexibility for customized test cases, and a simple and intuitive user experience.

Static websites on Azure Storage now generally available

Static websites are websites that can be loaded and served statically from a pre-defined set of files. You can now build a static website using HTML, CSS, and JavaScript files that are hosted on Azure Storage. Static websites can be powerful with the use of client-side JavaScript. Azure Storage makes hosting of websites easy and cost-efficient. You can enable static website hosting using the Azure portal, Azure CLI, or Azure PowerShell, which creates a container named ‘$web’. You can then upload your static content to this container for hosting. Your content will be available through a web endpoint. There are no additional charges for enabling static websites on Azure Storage.

Also generally available

Azure Monitor for containers is now available
General availability: Azure Kubernetes Service in East Asia

News and updates

Azure HDInsight integration with Data Lake Storage Gen2 preview – ACL and security update

This integration will enable HDInsight customers to drive analytics from the data stored in Azure Data Lake Storage Gen 2 using popular open source frameworks such as Apache Spark, Hive, MapReduce, Kafka, Storm, and HBase in a secure manner. Azure Data Lake Storage Gen2 unifies the core capabilities from the first generation of Azure Data Lake with a Hadoop compatible file system endpoint now directly integrated into Azure Blob Storage. HDInsight and Azure Data Lake Storage Gen2 integration is based upon user-assigned managed identity. You assign appropriate access to HDInsight with your Azure Data Lake Storage Gen2 accounts. Once configured, your HDInsight cluster is able to use Azure Data Lake Storage Gen2 as its storage.

Azure Backup Server now supports SQL 2017 with new enhancements

You can now install Azure Backup Server on Windows Server 2019 with SQL 2017 as its database. With Azure Backup Server, you can protect application workloads such as Hyper-V VMs, Microsoft SQL Server, SharePoint Server, Microsoft Exchange, and Windows clients from a single console. Azure Backup Server version 3 (MABS V3) is the latest upgrade, and includes critical bug fixes, Windows Server 2019 support, SQL 2017 support and other features and enhancements. MABS V3 is a full release, and can be installed directly on Windows Server 2016, Windows Server 2019, or can be upgraded from MABS V2. Before you upgrade to or install Backup Server V3, read the installation prerequisites.

Azure Functions now supported as a step in Azure Data Factory pipelines

Azure Functions is a serverless compute service that enables you to run code on-demand without having to explicitly provision or manage infrastructure. Using Azure Functions, you can run a script or piece of code in response to a variety of events. Azure Data Factory (ADF) is a managed data integration service in Azure that allows you to iteratively build, orchestrate, and monitor your Extract Transform Load (ETL) workflows. Azure Functions is now integrated with ADF, enabling you to run an Azure function as a step in your data factory pipelines. To run an Azure Function, you need to create a linked service connection and an activity that specifies the Azure Function that you plan to execute.

Automate Always On availability group deployments with SQL Virtual Machine resource provider

High availability architectures are designed to continue to function even when there are database, hardware, or network failures. Azure Virtual Machine instances using Premium Storage for all operating system disks and data disks offers 99.9 percent availability. This SLA is impacted by three scenarios – unplanned hardware maintenance, unexpected downtime, and planned maintenance. You now have a new, automated method to configure Always On availability groups (AG) for SQL Server on Azure VMs with SQL VM resource provider (RP) as a simple and reliable alternative to manual configuration. SQL VM resource provider automates Always On AG setup by orchestrating the provisioning of various Azure resources and connecting them to work together.

Additional news and updates

Azure Database for MariaDB name changes
Azure databases for MySQL and PostgreSQL resource GUID changes

Technical content

Power BI and Azure Data Services dismantle data silos and unlock insights

Power BI data flows, the Common Data Model, and Azure Data Services can be used together to break open silos of data in your organization and enable business analysts, data engineers, and data scientists to share data to fuel advanced analytics and unlock new insights to give you a competitive edge. Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. The tutorial gives you a first look at how to use CDM folders to share data between Power BI and Azure Data Services. The tutorial uses sample libraries, code, and Azure resource templates that you can use with CDM folders that you create from your own data. By working through the tutorial, you’ll see first-hand how the metadata stored in a CDM folder makes it easier to for each service to understand and share data.

Deploying Apache Airflow in Azure to build and run data pipelines

Apache Airflow is an open source platform used to author, schedule, and monitor workflows. Airflow overcomes some of the limitations of the cron utility by providing an extensible framework that includes operators, programmable interface to author jobs, scalable distributed architecture, and rich tracking and monitoring capabilities. We developed an Azure Quickstart template that enables you to deploy and create an Airflow instance in Azure more quickly by using Azure App Service and an instance of Azure Database for PostgreSQL as a metadata store.

How news platforms can improve uptake with Microsoft Azure’s Video AI service

Microsoft News is an app that delivers breaking news and trusted, in-depth reporting from the world's best journalists. Microsoft News created advanced algorithms to analyze their articles and determine how to increase personalization, which ultimately increases consumption, but wanted more insight on their videos. Anna Thomas, an Applied Data Scientist within Microsoft Engineering, set off to determine how to deliver these insights using a combination of Microsoft technologies and custom solutions; however, she discovered that the Video Indexer API held more capabilities than she expected. Check out her post to see what she discovered.

Know exactly how much it will cost for enabling DR to your Azure VMs

Azure offers built-in disaster recovery (DR) solution for Azure Virtual Machines through Azure Site Recovery (ASR). Site Recovery manages and orchestrates disaster recovery of on-premises machines and Azure virtual machines (VMs), including replication, failover, and recovery. A common question we get is about costs associated with configuring DR for Azure virtual machines, so Sujay Talasila explored how to estimate DR costs. Follow his example to explore how much it will cost to support your particular solution. Disaster Recovery between Azure regions is available in all Azure regions where ASR is available.

Taking a closer look at Python support for Azure Functions

As announced at Microsoft Connect(); 2018 earlier this month, you can now develop your Functions using Python 3.6, based on the open-source Functions 2.0 runtime and publish them to a Consumption plan (pay-per-execution model) in Azure. Python is a great fit for data manipulation, machine learning, scripting, and automation scenarios. Building these solutions using serverless Azure Functions can take away the burden of managing the underlying infrastructure, so you can move fast and actually focus on the differentiating business logic of your applications. Read this post for details about the newly announced features and dev experiences for Python Functions.

Additional technical content

Kubernetes Pod Security 101
Flipping the static site switch for Azure Blob Storage programmatically
How to Launch a Dockerized Node.js App Using the Azure Web App for Containers Service

Azure shows

Episode 258 – Live from KubeCon 2018 | The Azure Podcast

We are live at KubeCon+CloudNative in Seattle where Microsoft, together with the who's-who of the tech world, are talking about Kubernetes, We are very fortunate to get Lachie Evenson, Principal PM in the Azure team, Tommy Falgout, a Cloud Solution Architect and Daniel Selman, a Kubernetes Consultant, together in a room to discuss the current state of Kubernetes and AKS.

HTML5 audio not supported

How to get started with Docker and Azure | Azure Tips and Tricks

Learn how you can get started using Docker and Azure. To get started with Docker, make sure you have the Docker desktop application installed on your local dev machine.

How to deploy an image classification model using Azure services

Learn how to deploy an image classification model using Azure Machine Learning service. In this tutorial, you'll use Azure Machine Learning service to set up your testing environment, retrieve the model from your work space, and test the model locally. You’ll then see how to deploy the model to Azure Container Instance (ACI) and test the deployed model to Azure Kubernetes Service (AKS).

Decentralized Identity and Blockchain | Block Talk

This video introduces the concept of decentralized identity and how blockchain enables hosting these identities in a decentralized fashion. The demo provides a walkthrough of a decentralized identity that is anchored on Ethereum blockchain and is consumed using uPort application.

Running AI on IoT microcontroller devices with ELL | The IoT Show

How about designing and deploying intelligent machine-learned models onto resource constrained platforms and small single-board computers, like Raspberry Pi, Arduino, and micro:bit? How interesting would that be? This is exactly what the open source Embedded Learning Library (ELL) project is about. The deployed models run locally, without requiring a network connection and without relying on servers in the cloud. ELL is an early preview of the embedded AI and machine learning technologies developed at Microsoft Research. Chris Lovett from Microsoft Research gives us a fantastic demo of the project in this episode of the IoT Show.

AzureIoT TypeEdge : a strongly-typed development experience for Azure IoT Edge | The IoT Show

Are you excited about Azure IoT Edge? Then you are going to love TypeEdge because it simplifies the IoT Edge development down to a simple F5 experience. Watch how you can now create a complete Azure IoT Edge application from scratch in your favorite development environment, in just a few minutes.

LearnAI: Adding Bing Search to Bots | AI Show

The LearnAI team has updated the Azure Cognitive Services Bootcamp! Tune in to get an overview of the changes and a walk through of how you can add Bing Search, LUIS, and Azure Search to bots via the Bot Framework SDK V4.

LearnAI: LUIS – Notes from the Field | AI Show

Anna Thomas has been collecting notes for the past two years from field members (internal and external) who have developed complex LUIS models. In this video, we'll explore some of the limitations or challenges that are faced when you try to deploy enterprise-ready LUIS models at scale, and how they can be addressed.

Jeremy Epling on Azure Pipelines – Episode 014 | The Azure DevOps Podcast

Jeffrey Palermo is joined by Jeremy Epling, Head of Product for Azure Pipelines and a Principal Group Program Manager at Microsoft. He has been a leader at Microsoft for over 15 years in various roles. There’s a lot going on in the DevOps space with Azure right now — and in particular, with Azure Pipelines. Jeremy is incredibly passionate about the current progress being made and is excited to discuss all the new features coming to Pipelines in today’s episode!

HTML5 audio not supported

Customers, partners, and industries

Cloud Commercial Communities webinar and podcast update

Check out the Cloud Commercial Communities monthly webinar and podcast update, which provides a comprehensive list of forthcoming (three scheduled for today) and on-demand content. Each month the Industry Experiences team focuses on core programs, updates, trends, and technologies that Microsoft partners and customers need to know to increase success using Azure and Dynamics.

 

An Azure Function orchestrates a real-time, serverless, big data pipeline

Although it’s not a typical use case for Azure Functions, a single Azure function is all it took to fully implement an end-to-end, real-time, mission-critical data pipeline for a fraud detection scenario. The solution was built on an architectural pattern common for big data analytic pipelines, with massive volumes of real-time data ingested into a cloud service where a series of data transformation activities provided input for a machine learning model to deliver predictions. Kate Baroni, Software Architect at Microsoft Azure, provides an overview of the solution, which is covered in the Mobile Bank Fraud Solution Guide with details on the architecture and implementation.

Extracting insights from IoT data using the warm path data flow

If you are responsible for the machines on a factory floor, you are already aware that the Internet of Things (IoT) is the next step in improving your processes and results. Having sensors on machines, or the factory floor, is the first step. The next step is to use the data. In this post, Ercenk Keresteci Principal Solutions Architect, Industry Experiences, highlights another scenario from the Extracting Insights from IoT solution guide, which provides a technical overview of the components needed to extract actionable insights from IoT data analytics. This post covers the speed layer (warm path), which analyze data in real time. This layer is designed for low latency, at the expense of accuracy. It is a faster-processing pipeline that archives and displays incoming messages, and analyzes these records, generating short-term critical information and actions such as alarms.

Extracting insights from IoT data using the cold path data flow

In a further exploration of the guide described above, this post covers the batch and serving layers (cold path), which stores all incoming data in its raw form and performs batch processing on the data. The result of this processing is stored as a batch view. It is a slow-processing pipeline, executing complex analysis, combining data from multiple sources over a longer period (such as hours or days), and generating new information such as reports and machine learning models.

How smart buildings can help combat climate change

Fast-paced urbanization offers an exciting opportunity to immediately reduce climate impacts. Because buildings—office complexes, multifamily housing, hotels, stores, schools, hospitals, and malls, among others—comprise a big part of city infrastructure, making them smarter can dramatically lower the energy and carbon footprint of a city. Read this post to learn how connected building technology can manage lighting, heating, and cooling, reducing unnecessary use while maximizing usability and comfort. In addition, you will learn how smart building software can schedule preventive maintenance, automatically identify and prioritize issues for resolution by cost and impact, and continually optimize buildings for comfort and energy efficiency.

Creating a smart grid with technology and people

Utilities and their partners are searching for new solutions that can meet 21st-century energy challenges: surging demand for electricity, two-way energy flow, increased use of clean energy sources, and stairstep approaches to creating a smart grid to tackle the thorniest challenges first. This post provides a look at the digital transformation of the power and utilities industry that is picking up steam. In the very near future, power generation companies will have greater options in how they run their businesses, using IoT-enabled insights to strategically stairstep their way to creating a smart grid and ensure business continuity.

Azure Marketplace new offers – Volume 26

The Azure Marketplace is the premier destination for all your software needs – certified and optimized to run on Azure. Find, try, purchase, and provision applications & services from hundreds of leading software providers. You can also connect with Gold and Silver Microsoft Cloud Competency partners to help your adoption of Azure. During September and October, 149 new consulting offers successfully met the onboarding criteria and went live.

Azure Marketplace new offers – Volume 27

The Azure Marketplace is the premier destination for all your software needs – certified and optimized to run on Azure. Find, try, purchase, and provision applications & services from hundreds of leading software providers. You can also connect with Gold and Silver Microsoft Cloud Competency partners to help your adoption of Azure. From November 1 to November 16, 2018, 61 new offers successfully met the onboarding criteria and went live.

A Cloud Guru's Azure This Week – 14 December 2018

This time on Azure This Week, Lars talks about Azure Machine Learning service now in general availability, Business Critical service tier in Azure SQL Database Managed Instance in general availability, Azure Cosmos DB .NET SDK V3.0 in public preview and a new Azure API Management tier for serverless architectures.

Quelle: Azure

Fine-tune natural language processing models using Azure Machine Learning service

In the natural language processing (NLP) domain, pre-trained language representations have traditionally been a key topic for a few important use cases, such as named entity recognition (Sang and Meulder, 2003), question answering (Rajpurkar et al., 2016), and syntactic parsing (McClosky et al., 2010).

The intuition for utilizing a pre-trained model is simple: A deep neural network that is trained on large corpus, say all the Wikipedia data, should have enough knowledge about the underlying relationships between different words and sentences. It should also be easily adapted to a different domain, such as medical or financial domain, with better performance than training from scratch.

Recently, a paper called “BERT: Bidirectional Encoder Representations from Transformers” was published by Devlin, et al, which achieves new state-of-the-art results on 11 NLP tasks, using the pre-trained approach mentioned above. In this technical blog post, we want to show how customers can efficiently and easily fine-tune BERT for their custom applications using Azure Machine Learning Services. We open sourced the code on GitHub.

Intuition behind BERT

The intuition behind the new language model, BERT, is simple yet powerful. Researchers believe that a large enough deep neural network model, with large enough training corpus, could capture the relationship behind the corpus. In NLP domain, it is hard to get a large annotated corpus, so researchers used a novel technique to get a lot of training data. Instead of having human beings label the corpus and feed it into neural networks, researchers use the large Internet available corpus – BookCorpus (Zhu, Kiros et al) and English Wikipedia (800M and 2,500M words respectively). Two approaches, each for different language tasks, are used to generate the labels for the language model.

Masked language model: To understand the relationship between words. The key idea is to mask some of the words in the sentence (around 15 percent) and use those masked words as labels to force the models to learn the relationship between words. For example, the original sentence would be:

The man went to the store. He bought a gallon of milk.

And the input/label pair to the language model is:

Input: The man went to the [MASK1]. He bought a [MASK2] of milk.
Labels: [MASK1] = store; [MASK2] = gallon

Sentence prediction task: To understand the relationships between sentences. This task asks the model to predict whether sentence B, is likely to be the next sentence following a given sentence A. Using the same example above, we can generate training data like:

Sentence A: The man went to the store.
Sentence B: He bought a gallon of milk.
Label: IsNextSentence

Applying BERT to customized dataset

After BERT is trained on a large corpus (say all the available English Wikipedia) using the above steps, the assumption is that because the dataset is huge, the model can inherit a lot of knowledge about the English language. The next step is to fine-tune the model on different tasks, hoping the model can adapt to a new domain more quickly. The key idea is to use the large BERT model trained above and add different input/output layers for different types of tasks. For example, you might want to do sentiment analysis for a customer support department. This is a classification problem, so you might need to add an output classification layer (as shown on the left in the figure below) and structure your input. For a different task, say question answering, you might need to use a different input/output layer, where the input is the question and the corresponding paragraph, while the output is the start/end answer span for the question (see the figure on the right). In each case, the way BERT is designed can enable data scientists to plug in different layers easily so BERT can be adapted to different tasks.

Figure 1. Adapting BERT for different tasks (Source)

The image below shows the result for one of the most popular dataset in NLP field, the Stanford Question Answering Dataset (SQuAD).

Figure 2. Reported BERT performance on SQuAD 1.1 dataset (Source).

Depending on the specific task types, you might need to add very different input/output layer combinations. In the GitHub repository, we demonstrated two tasks, General Language Understanding Evaluation (GLUE) (Wang et al., 2018) and Stanford Question Answering Dataset (SQuAD) (Rajpurkar and Jia et al., 2018).

Using the Azure Machine Learning Service

We are going to demonstrate different experiments on different datasets. In addition to tuning different hyperparameters for various use cases, Azure Machine Learning service can be used to manage the entire lifecycle of the experiments. Azure Machine Learning service provides an end-to-end cloud-based machine learning environment, so customers can develop, train, test, deploy, manage, and track machine learning models, as shown below. It also has full support for open-source technologies, such as PyTorch and TensorFlow which we will be using later.

Figure 3. Azure Machine Learning Service Overview

What is in the notebook

Defining the right model for specific task

To fine-tune the BERT model, the first step is to define the right input and output layer. In the GLUE example, it is defined as a classification task, and the code snippet shows how to create a language classification model using BERT pre-trained models:

model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)

logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)

Set up training environment using Azure Machine Learning service

Depending on the size of the dataset, training the model on the actual dataset might be time-consuming. Azure Machine Learning Compute provides access to GPUs either for a single node or multiple nodes to accelerate the training process. Creating a cluster with one or multiple nodes on Azure Machine Learning Compute is very intuitive, as below:

compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC24s_v3',
min_nodes=0,
max_nodes=8)
# create the cluster
gpu_compute_target = ComputeTarget.create(ws, gpu_cluster_name, compute_config)
gpu_compute_target.wait_for_completion(show_output=True)
estimator = PyTorch(source_directory=project_folder,
compute_target=gpu_compute_target,
script_params = {…},
entry_script='run_squad.azureml.py',
conda_packages=['tensorflow', 'boto3', 'tqdm'],
node_count=node_count,
process_count_per_node=process_count_per_node,
distributed_backend='mpi',
use_gpu=True)

Azure Machine Learning is greatly simplifying the work involved in setting up and running a distributed training job. As you can see, scaling the job to multiple workers is done by just changing the number of nodes in the configuration and providing a distributed backend. For distributed backends, Azure Machine Learning supports popular frameworks such as TensorFlow Parameter server as well as MPI with Horovod, and it ties in with the Azure hardware such as InfiniBand to connect the different worker nodes to achieve optimal performance. We will have a follow up blogpost on how to use the distributed training capability on Azure Machine Learning service to fine-tune NLP models.

For more information on how to create and set up compute targets for model training, please visit our documentation.

Hyper Parameter Tuning

For a given customer’s specific use case, model performance depends heavily on the hyperparameter values selected. Hyperparameters can have a big search space, and exploring each option can be very expensive. Azure Machine Learning Services provide an automated machine learning service, which provides hyperparameter tuning capabilities and can search across various hyperparameter configurations to find a configuration that results in the best performance.

In the provided example, random sampling is used, in which case hyperparameter values are randomly selected from the defined search space. In the example below, we explored the learning rate space from 1e-4 to 1e-6 in log uniform manner, so the learning rate might be 2 values around 1e-4, 2 values around 1e-5, and 2 values around 1e-6.

Customers can also select which metric to optimize. Validation loss, accuracy score, and F1 score are some popular metrics that could be selected for optimization.

from azureml.train.hyperdrive import *
import math

param_sampling = RandomParameterSampling( {
'learning_rate': loguniform(math.log(1e-4), math.log(1e-6)),
})

hyperdrive_run_config = HyperDriveRunConfig(
estimator=estimator,
hyperparameter_sampling=param_sampling,
primary_metric_name='f1',
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=16,
max_concurrent_runs=4)

For each experiment, customers can watch the progress for different hyperparameter combinations. For example, the picture below shows the mean loss over time using different hyperparameter combinations. Some of the experiments can be terminated early if the training loss doesn’t meet expectations (like the top red curve).

Figure 4. Mean loss for training data for different runs, as well as early termination

For more information on how to use the Azure ML’s automated hyperparameter tuning feature, please visit our documentation on tuning hyperparameters. And for how to track all the experiments, please visit the documentation on how to track experiments and metrics.

Visualizing the result

Using the Azure Machine Learning service, customers can achieve 85 percent evaluation accuracy when fine-tuning MRPC in GLUE dataset (it requires 3 epochs for BERT base model), which is close to the state-of-the-art result. Using multiple GPUs can shorten the training time and using more powerful GPUs (say V100) can also improve the training time. For one of the specific experiments, the details are as below:

 

GPU#
1
2
4

K80 (NC Family)
191 s/epoch
105 s/epoch
60 s/epoch

V100 (NCv3 Family)
36 s/epoch
22 s/epoch
13 s/epoch

Table 1. Training time per epoch for MRPC in GLUE dataset

For SQuAD 1.1, customers can achieve around 88.3 F1 score and 81.2 Exact Match (EM) score. It requires 2 epochs using BERT base model, and the time for each epoch is shown below:

 

GPU#
1
2
4

K80 (NC Family)
16,020 s/epoch
8,820 s/epoch
4,020 s/epoch

V100 (NCv3 Family)
2,940 s/epoch
1,393 s/epoch
735 s/epoch

Table 2. Training time per epoch for SQuAD dataset

After all the experiments are done, the Azure Machine Learning service SDK also provides a summary visualization on the selected metrics and the corresponding hyperparameter(s). Below is an example on how learning rate affects validation loss. Throughout the experiments, the learning rate has been changed from around 7e-6 (the far left) to around 1e-3 (the far right), and the best learning rate with lowest validation loss is around 3.1e-4. This chart can also be leveraged to evaluate other metrics that customers want to optimize.

Figure 5. Learning rate versus validation loss

Summary

In this blog post, we showed how customers can fine-tune BERT easily using the Azure Machine Learning service, as well as topics such as using distributed settings and tuning hyperparameters for the corresponding dataset. We also showed some preliminary results to demonstrate how to use Azure Machine Learning service to fine tune the NLP models. All the code is available on the GitHub repository. Please let us know if there are any questions or comments by raising an issue in the GitHub repo.

References

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding and its GitHub site.

Visit the Azure Machine Learning service homepage today to get started with your free-trial.
Learn more about Azure Machine Learning service.

Quelle: Azure