Extracting insights from IoT data using the cold path data flow

This blog continues our coverage of the solution guide published by Microsoft’s Industry Experiences team. The guide covers the following components:

Ingesting data
Hot path processing
Cold path processing
Analytics clients

We already covered the recommendation for processing data for an IoT application in the solution guide and suggested using Lambda architecture for data flow. To reiterate the data paths:

A batch layer (cold path) stores all incoming data in its raw form and performs batch processing on the data. The result of this processing is stored as a batch view. It is a slow-processing pipeline, executing complex analysis, combining data from multiple sources over a longer period (such as hours or days), and generating new information such as reports and machine learning models.
A speed layer and a serving layer (warm path) analyzes data in real time. This layer is designed for low latency, at the expense of accuracy. It is a faster-processing pipeline that archives and displays incoming messages, and analyzes these records, generating short-term critical information and actions such as alarms.

This blog post covers the cold path processing components of the solution guide.

We have covered timeseries analysis with Azure Time Series Insights (TSI) in detail in the solution guide. It is an analytics, storage, and visualization service for timeseries data. Please read the relevant section for the use of TSI.

As you may remember from previous blog posts, we are using the sample data published by the NIST SMS Test Bed endpoint. Our previous posts ended with the data pushed to separate Azure Event Hubs for “events” and “samples” data records.

Before we begin the rest of the discussion, we would like to emphasize that the solution of an “analytics” problem is dependent on each plant, line, machine, and so on. The data must be available and be what the business needs. We will cover two different approaches for organizing the data, but they are not exhaustive, and are meant as examples only.

Storing the raw data

Our sample implementation has a basic set of Azure Stream Analytics queries that takes the incoming data stream from the Event Hubs that the raw data is posted to and copies it into Azure Storage blobs and tables. As an example, the queries look like the following:

SELECT
*
INTO
[samplesTable]
FROM
[EventHubIn]

One table is for samples and another is for events. As we were flattening the incoming data in the custom component, we added a property for the hour window the incoming data stream was in, using the following C# code snippet to help us more easily organize the data on the processing pipelines:

HourWindow =

new DateTime(
sample.timestamp.Year,
sample.timestamp.Month,
sample.timestamp.Day,
sample.timestamp.Hour,
0,
0),

This data record field is especially useful in organizing the records on the Azure Storage Table, simply by using it as the partition key. We are using the sequence number of the incoming record as the row key. The object model for the storage tables are covered in the documentation, “Understanding the Table Service Data Model.” Please also see the documentation, “Designing a Scalable Partitioning Strategy for Azure Table Storage,” for the recommendations on the storage table design.

The Azure Blob Storage blobs generated by the ASA job are organized in containers for each hour, as a single blob for the data for the hour, in the comma separated values (CSV) format. We will be using these in the future for artificial intelligence (AI) needs.

Loading data into Azure SQL Database

We will be covering a basic way to incrementally load the records to an Azure SQL Database and later discuss potential ways for further processing them to create new aggregations and summary data.

Our goal is to provide a barebones approach to show how data can flow into data stores and demonstrate the technologies useful for this. Any analytics solution depends heavily on the context and requirements, but we will attempt to provide basic mechanisms to demonstrate the related Azure services.

Azure Data Factory (ADF) is a cloud integration service to compose data storage, movement, and processing services in automated data pipelines. We have a simple ADF pipeline that demonstrates the incremental loading of a table using a storage table as the source.

The pipeline has a lookup activity that performs the following query on the SQL Database:

select
CONVERT(
char(30),
case when max(SampleTimestamp) is null then '1/1/2010 12:00:00 AM'
else max(SampleTimestamp) end, 126) as LastLoad
from [Samples]

The style used in the CONVERT function, 126, denotes the timestamp value to be formatted as “yyyy-mm-ddThh:mi:ss.mmm,” which matches the string representation of the partition key value on the storage table. The query returns the last record that was transferred to the SQL database. We can then pass that value to the next activity to query the table storage to retrieve the new records.

Next is a “Copy Data” activity, which simply uses the returned value from the lookup activity, which is the value of the “LastLoad,” and makes the following table query for the source. Please refer to Querying Tables and Entities for details on querying storage tables.

SampleTimestamp gt datetime'@{formatDateTime(activity('LookupSamples').output.FirstRow.LastLoad, 'yyyy-MM-ddThh:mm:ss.fffZ')}'

Later, this activity maps the storage table columns (properties) to SQL Database table columns. This pipeline is scheduled to run every 15 minutes, thus incrementally loading the destination SQL Database table.

Processing examples

Further processing the raw data depends on the actual requirements. This section covers two potential approaches for processing and organizing the data to demonstrate the capabilities.

Let’s first start looking at the data we collect to discover the details. Notice that the raw data on the samples table is in the form of name/value pairs. The first query will give us the different sample types recorded by each machine.

SELECT DeviceName, ComponentName, SampleName, COUNT(SampleSequence) AS SampleCount
FROM Samples
GROUP BY DeviceName, ComponentName, SampleName
ORDER BY DeviceName ASC, ComponentName ASC, SampleName ASC, SampleCount DESC

We observe there are eight machines, and each one is sending different sets of sample types. Following is the partial result of the preceding query. We analyzed the result a bit further in Microsoft Excel to give an idea of the relative counts of the samples:

We may conclude that the best way to aggregate and summarize the results is first to organize the results by machine — for example, a raw data table per machine.

We will go step by step to demonstrate the concepts here. Some readers will surely find more optimized ways to implement some queries, but our goal here is to provide clear examples that demonstrate the concepts.

We may wish to process the data further by first transposing the raw data, which is in name/value pairs, as follows:

We can use the following query to create a new table and transpose whole rows. This query assumes that we do not differentiate any of the components and see the machine as a whole:

; WITH Machine08SamplesTransposed AS
(
SELECT * FROM
(
SELECT SampleTimestamp, sampleName, CAST(sampleValue AS NUMERIC(20,3)) AS sampleValueNumeric
FROM Samples
WHERE
DeviceName = 'Machine08' and ISNUMERIC(sampleValue) != 0
) AS S

PIVOT(
MAX(sampleValueNumeric)
FOR SampleName IN ([S2temp],
[Stemp],
[Zabs],
[Zfrt],
[S2load],
[Cfrt],
[total_time],
[Xabs],
[Xload],
[Fact],
[Cload],
[cut_time],
[Zload],
[S2rpm],
[Srpm],
[auto_time],
[Cdeg],
[Xfrt],
[S1load])
) AS PivotTable
)

SELECT * INTO Machine08Samples
FROM Machine08SamplesTransposed

We can bring this query into the ADF pipeline by moving it into a stored procedure with a parameter to query the raw table so that only the latest loaded rows are brought in, and modifying “SELECT * INTO …” to “INSERT * INTO …”. We recommend relying on stored procedures as much as possible to use SQL database resources efficiently.

The resulting table looks like the following (some columns removed for brevity).

One way to process this interim data set is to fill in the null values of samples from the last received value, as shown below.

We should emphasize that we are not recommending this solution for every business case and for every sample value. This approach makes sense for the values that are meaningful together. For example, in a certain case, grouping Fact (actual path feed-rate) and Zfrt (Z axis feed-rate) may make sense. However, for another case Xabs (absolute position on X axis) and Zfrt on one record, grouped this way, may not make sense. Grouping of the sample values must be done on a case-by-case basis, depending on the business need.

Or another way is to put the individual records into time buckets, and apply an aggregate function in that group:

Let’s give a small example for achieving the first option. In the preceding example, we received V1.1 at t1, and received V2.2 at t2. We want to fill in the Sample1 value for t2 with t1s, V1.1.

;WITH NonNullRank AS
(
SELECT SampleTimestamp, S2temp, cnt = COUNT(s2temp) OVER (ORDER BY SampleTimestamp)
FROM Machine08Samples
),

WindowsWithNoValues AS
(
SELECT SampleTimestamp, S2temp,
r = ROW_NUMBER() OVER (PARTITION BY cnt ORDER BY SampleTimestamp ASC) – 1
FROM NonNullRank
)

SELECT SampleTimestamp, S2temp,
S2tempWithValues= ISNULL(S2temp, LAG(S2temp, r) OVER (ORDER BY SampleTimestamp ASC))
FROM WindowsWithNoValues

When we dissect the preceding queries, the first common table expression (CTE), NonNullRank, gives us the rank of the non-null values of S2temp sample values among the received data records.

The second CTE, WindowsWithNoValues, gives us windows of samples with the received value at the top, and the order of null values within the windows (column r).

The concluding query fills in the null values using the LAG analytic function by bringing in the received value from the top of the window to the current row.

The second option we mentioned previously is to group the received values and apply an aggregate function within the group.

;WITH With30SecondBuckets AS
(
SELECT *,
(dateadd(second,(datediff
(second,'2010-1-1',[SampleTimestamp])/(30))*(30),'2010-1-1'))
AS [SampleTimestamp30Seconds]
FROM Machine08Samples
)

SELECT SampleTimestamp30Seconds, AVG(S2Temp)
FROM With30SecondBuckets GROUP BY SampleTimestamp30Seconds
ORDER BY SampleTimestamp30Seconds

We can put these queries in a stored procedure to generate new aggregate and summary tables as necessary to be used by the analytics solution.

We would like to repeat our opening argument here once more. The solution to an analytics problem depends on the available data, and what business needs. There may not be one single solution, but Azure provides many technology options for implementing a given solution.

Next steps

Complete the ADF tutorial for transforming the data in the cloud by using a Spark activity and an on-demand Azure HDInsight linked service for a different example.
Get the larger picture for extracting insights from IoT data from the solution guide.

Quelle: Azure

An Azure Function orchestrates a real-time, serverless, big data pipeline

Although it’s not a typical use case for Azure Functions, a single Azure function is all it took to fully implement an end-to-end, real-time, mission-critical data pipeline for a fraud detection scenario. And it was done with a serverless architecture. Two blogs recently described this use case, “Considering Azure Functions for a serverless data streaming scenario,” and “A fast, serverless, big data pipeline powered by a single Azure Function.”

Pipeline requirements

A large bank wanted to build a solution to detect fraudulent transactions. The solution was built on an architectural pattern common for big data analytic pipelines, with massive volumes of real-time data ingested into a cloud service where a series of data transformation activities provided input for a machine learning model to deliver predictions. Latency and response times are critical in a fraud detection solution, so the pipeline had to be very fast and scalable. End-to-end evaluation of each transaction had to complete and provide a fraud assessment in less than two seconds.

Requirements for the pipeline included the following:

Ability to scale and efficiently process bursts of event activity totaling 8+ million transactions daily.
Daily parsing and processing of 4 million complex JSON files.
Events and transactions had to be processed in sequential order with assurances that duplicates would not be processed.
Reference data and business rules could change dynamically and the pipeline needed to accommodate these updates.
A deployed architecture that could easily integrate with a CI/CD and DevOps process.

Pipeline solution

The pipeline starts and ends with an Azure Function. A single function orchestrates and manages the entire pipeline of activities, including the following:

Consuming, validating, and parsing massive numbers of JSON files.
Invoking a SQL stored procedure to extract data elements from JSON files, with data used to build real-time behavioral profiles for bank accounts and customers, and to generate an analytics feature set.
Invoking a machine learning model to evaluate and score each individual transaction.
Posting the fraud score back to an on-premises API for integration to a case management solution (a separate solution that lets users examine and unblock transactions).

Recommended next steps

If you are designing a real-time, serverless data pipeline and seek the flexibility of coding your own methods for integration with other services, or to deploy through continuous integration, consider using Azure Functions to orchestrate and manage the pipeline.

Read the “Mobile Bank Fraud Solution Guide” to learn details about the architecture and implementation. Read more about the pipeline technology decision and implementation in these two blogs, “Considering Azure Functions for a serverless data streaming scenario,” and “A fast, serverless, big data pipeline powered by a single Azure Function.” We hope you find this helpful and we welcome your feedback.
Quelle: Azure

Deploying Apache Airflow in Azure to build and run data pipelines

Apache Airflow is an open source platform used to author, schedule, and monitor workflows. Airflow overcomes some of the limitations of the cron utility by providing an extensible framework that includes operators, programmable interface to author jobs, scalable distributed architecture, and rich tracking and monitoring capabilities. Since its addition to Apache foundation in 2015, Airflow has seen great adoption by the community for designing and orchestrating ETL pipelines and ML workflows. In Airflow, a workflow is defined as a Directed Acyclic Graph (DAG), ensuring that the defined tasks are executed one after another managing the dependencies between tasks.

A simplified version of the Airflow architecture is shown below. It consists of a web server that provides UI, a relational metadata store that can be a MySQL/PostgreSQL database, persistent volume that stores the DAG files, a scheduler, and worker process.

The above architecture can be implemented to run in four execution modes, including:

Sequential Executor – This mode is useful for dev/test or demo purpose. It serializes the operations and allows only a single task to be executed at a time.
Local Executor – This mode supports parallelization and is suitable for small to medium size workload. It doesn’t support scaling out.
Celery Executor – This is the preferred mode for production deployments and is one of the ways to scale out the number of workers. For this to work, an additional celery backend which is a RabbitMQ or Redis broker is required for coordination.
Dask Executor – This mode also allows scaling out by leveraging the Dask.distributed library, allowing users to run the task in a distributed cluster.

The above architecture can be implemented in Azure VMs or by using the managed services in Azure as shown below. For production deployments, we recommend leveraging managed services with built-in high availability and elastic scaling capabilities.

Puckel's Airflow docker image contains the latest build of Apache Airflow with automated build and release to the public DockerHub registry. Azure App Service for Linux is integrated with public DockerHub registry and allows you to run the Airflow web app on Linux containers with continuous deployment. Azure App Service also allow multi-container deployments with docker compose and Kubernetes useful for celery execution mode.

We have developed the Azure QuickStart template, which allows you to quickly deploy and create an Airflow instance in Azure by using Azure App Service and an instance of Azure Database for PostgreSQL as a metadata store.

The QuickStart template automatically downloads and deploys the latest Docker container image from puckel/docker-airflow and initializes the database in Azure Database for PostgreSQL server as shown in the following graphic:

The environment variables for the Airflow docker image can be set using application settings in Azure App Service as shown in the following graphic:

The environment variables used in the deployment are:

AIRFLOW__CORE__SQL_ALCHEMY_CONN – Sets the connection string for web app to connect to Azure Database for PostgreSQL.
AIRFLOW__CORE__LOAD_EXAMPLES – Set to true to load DAG examples during deployment.

The application setting WEBSITES_ENABLE_APP_SERVICE_STORAGE is set to true which can be used as a persistent storage for DAG files accessible to scheduler and worker container images.

After it is deployed, you can browse the web server UI on port 8080 to see and monitor the DAG examples as shown in the following graphic:

Next steps

You are now ready to orchestrate and design data pipelines for ETL and machine learning workflows by leveraging the Airflow operators. You can also leverage Airflow for scheduling and monitoring jobs across fleet of managed databases in Azure by defining the connections as shown below.

If you are looking for exciting challenge, you can deploy the kube-airflow image with celery executor with Azure Kubernetes Services using helm charts, Azure Database for PostgreSQL, and RabbitMQ. Let us know if you have developed it and we would be happy to provide link it to this blog.

Acknowledgements

Special thanks to Mark Bolz and Jim Toland for their contributions to the postings.
Quelle: Azure

Azure.Source – Volume 61

Microsoft Connect(); 2018

On Tuesday, December 4th, Microsoft Connect(); 2018 provided a full day of developer-focused content—including updates on Azure and Visual Studio, keynotes, demos, and real-time coding with experts. Scott Guthrie’s keynote provided all the Azure and Visual Studio news on how you can stay productive and focus on what matters to you.

Guthrie’s keynote was followed by a demo-packed session hosted by Scott Hanselman and friends that provided a full tour on how you can build your apps on Azure using best-in-class developer tools.

The day continued with coding sessions hosted on Twitch and Mixer, where developers joined interactive Q&As to share what they’ve been building with Azure. An additional collection of announcement and programming language-focused videos provided a deeper dive into the topics that were covered. You can access all of this content for on-demand viewing on the Microsoft Connect(); 2018 site.

AI/ML

Azure AI – accelerating the pace of AI adoption for organizations

Eric Boyd, Corporate Vice President, Azure AI, shares several new announcements we made at Microsoft Connect(); to enable organizations to easily apply AI to transform their businesses, which are covered in further detail in the next four blog posts below.

Announcing general availability of Azure Machine Learning service: A look under the hood

Azure Machine Learning service, which contains many advanced capabilities designed to simplify and accelerate the process of building, training, and deploying machine learning models, is now generally available. Automated machine learning enables data scientists of all skill levels to identify suitable algorithms and hyperparameters faster. Support for popular open-source frameworks such as PyTorch, TensorFlow, and scikit-learn allow data scientists to use the tools of their choice. DevOps capabilities for machine learning further improve productivity by enabling experiment tracking and management of models deployed in the cloud and on the edge. All these capabilities can be accessed from any Python environment running anywhere, including data scientists’ workstations.

ONNX Runtime is now open source

The Open Neural Network Exchange (ONNX) Runtime is now open source. ONNX is an open format to represent machine learning models that enables data scientists and developers to use the frameworks and tools that work best for them, including PyTorch, TensorFlow, scikit-learn, and more. ONNX Runtime is the first inference engine that fully supports the ONNX specification and delivers an average of 2x in performance gains. You can integrate ONNX Runtime into your code directly from source or from precompiled binaries, but an easy way to operationalize it is to use Azure Machine Learning to deploy a service for your application to call.  

Getting started with Cognitive Services Language Understanding container

Language Understanding is now available as part of the preview of Azure Cognitive Services containers. With container support for Language Understanding, you can easily and quickly add cognitive capabilities — such as object detection, vision recognition, and language understanding — into your apps without having deep data science skills. Cognitive Services containers enable customers to build one application architecture that is optimized to take advantage of both robust cloud capabilities and edge locality.

New automated machine learning capabilities in Azure Machine Learning service

As part of Azure Machine Learning service general availability, we are excited to announce the new automated machine learning (automated ML) capabilities. Automated ML enables you to automate model selection and hyperparameter tuning, reducing the time it takes to build machine learning models from weeks or months to days, freeing up more time for them to focus on business problems. This blog post highlights the new automated ML capabilities of that are available today as part of Azure Machine Learning service and provides a walkthrough of the motivation, underlying technology, and principles behind automated ML.

IoT

New Azure IoT offerings simplify IoT application development

In this roll-up post from Sam George Director, Azure Internet of Things, he announces additional capabilities to bring rich data streams into IoT applications from cloud to edge, as well as solutions to accelerate the IoT development process, which are covered in further detail in the next five blog posts below.

Announcing key industrial IoT capabilities in Azure Time Series Insights

Azure Time Series Insights is a fully managed end-to-end IoT analytics solution that empowers customers to quickly glean insights from billions of time series measurements generated by IoT devices. To help maximize the value of this data and thereby drive operational intelligence, Microsoft is updating Time Series Insights offering to support a broad range of industrial IoT data analytics scenarios.

Location intelligence for the enterprise: New pricing tier and SDK updates

Azure Maps is a portfolio of mapping, navigation, and traffic services which uses simple and secure location APIs to add geospatial context to data, and geospatial intelligence to IoT devices. To give customers an enhanced service option, Microsoft is introducing a new S1 pricing tier for production-scale deployments of applications using Azure Maps, to complement the Standard S0 offering.

Azure Stream Analytics on IoT Edge now generally available

Azure Stream Analytics (ASA) bring powerful stream processing capabilities to Azure IoT Edge, enabling developers to process data and respond to conditions locally. ASA on Azure IoT Edge makes it easy to move analytics between edge and cloud, which can be critical in scenarios where customers need low-latency command and control, have limited connectivity to the cloud, have limited bandwidth, or have certain compliance needs.

Azure IoT Remote Monitoring extends operator capabilities to the edge

Continued updates to this solution accelerator make it easier than ever to empower operators managing all facets of their IoT solutions, including managing and deploying Azure IoT Edge components from Azure IoT Remote Monitoring and automatic device management.

Accelerating IoT solution development and testing with Azure IoT Device Simulation

Microsoft is enabling developers to create advanced device simulations to test their IoT solutions in development, simplifying the development of robust and scalable solutions. Watch this episode of The Internet of Things Show, Device Simulation with Azure IoT, to learn about the capabilities of Azure IoT Device Simulation, an open source solution that you can deploy directly to your Azure subscription. And discover how Device simulation helps you build simulated devices that look and behave like the real thing.

Data

Migrate mission critical SQL workloads to the most economical cloud destination

Announces the general availability (GA) of the Business Critical service tier in Azure SQL Database Managed Instance, which is a new deployment option in SQL Database that streamlines the migration of SQL Server workloads from on-premises to the cloud and combines the best of SQL Server with all the benefits of a fully managed database service. In addition, instance-level collations and Azure Active Directory (AAD) logins are now available in preview to simplify SQL Server migrations. We have also extended more of our SQL Database platform features to Managed Instance to ensure a smooth production experience once workloads have been migrated. Transparent Data Encryption (TDE) for Managed Instance now supports Bring Your Own Key (BYOK) scenarios through the integration with Azure Key Vault, Azure’s cloud-based external key management system. Networking security is now significantly strengthened with the introduction of a built-in firewall to provide full inbound protection.

Announcing the general availability of Azure Database for MariaDB

The Azure Database for MariaDB service, which is now generally available, offers an enterprise-ready, fully managed community version of MariaDB with easy lift-and-shift capability. It features open-source compatibility, built-in high availability with a 99.99 percent SLA, elastic scaling for storage and compute resources, and flexible pricing. As a fully integrated service, Azure Database for MariaDB also plugs you into the Microsoft global network of datacenters with unparalleled security and around-the-clock monitoring, including Azure IP Advantage.

Azure Cosmos DB Developer experience updates December 2018

The Azure Cosmos DB team announced new features and improvements for developers, including: .NET SDK Version 3.0 Preview, 25x Lower Entry Point for Database-level throughput, CORS support, Change Feed support for JavaScript SDK, and Cassandra API support in Emulator.

Azure Cosmos DB .NET SDK Version 3.0 now in public preview

Azure Cosmos DB .NET SDK Version 3.0 is now in public preview and open sourced on GitHub. Version 3.0 of the SDK targets .NET Standard 2.0 and features numerous usability and performance improvements, including a new, more intuitive object model and support for streams. The SDK also joins our Java, JavaScript, and Python SDKs in being open source on GitHub. To get started with the new SDK, add our new NuGet package to your project, but you should continue to use Azure Cosmos DB .NET SDK Version 2.x for production workloads.

Azure Cosmos DB now supports Cross-Origin Resource Sharing (CORS)

Azure Cosmos DB now supports Cross-Origin Resource Sharing (CORS) “allowedOrigins” header for our core SQL API. You can configure this setting via the portal or ARM templates. With CORS support, you can make your web app talk directly to Cosmos DB from the browser using the @azure/cosmos JavaScript library, and get a more responsive, snappy end-user experience by avoiding an extra hop through a middle-tier layer. AKS virtual nodes is now in public preview and enables you to extend the consistent, powerful Kubernetes API provided by AKS with the scalable, container-based compute capacity of ACI. This combination brings many of the benefits of serverless platforms to the rich, consistent API provided by Kubernetes.

Serverless

Azure Functions gets better for Python and JavaScript developers

Python support in Azure Functions is available in public preview and you can publish Python function apps to our new Linux Consumption plan for a serverless, pay-per-execution hosting option. This Linux Consumption hosting option is available for JavaScript function apps as well. To further enable JavaScript developers with the ability to orchestrate serverless workflows programmatically, we are also announcing the general availability of the JavaScript Durable Functions extension to the Azure Functions runtime, now ready to be used on production workloads.

Bringing serverless to Azure Kubernetes Service

Sean McKenna Principal Program Manager, Azure Cloud Native Services, outlines a series of new capabilities and investments that further demonstrate our commitment to making Kubernetes easier. In addition, Sean covers the donation of the Virtual Kubelet to the Cloud Native Compute Foundation (CNCF) as a sandbox project.  We are also adding GPU support to ACI, which will enable a whole new class of compute-intensive applications through AKS virtual nodes.

Announcing Azure API Management for serverless architectures

Azure API Management is a fully managed service that enables customers to publish, secure, transform, maintain, and monitor APIs. API Management handles all the tasks involved in mediating API calls, including request authentication and authorization, rate limit and quota enforcement, request and response transformation, logging and tracing, and API version management. Azure customers can now choose the new Consumption tier when they are creating a new API Management instance, which is designed and implemented around serverless principles. The API Management consumption tier fills helps round out the Azure serverless offering with Azure API Management for serverless applications.

In this episode of Azure Friday, Azure API Management for serverless applications, Mike Budzynski join Scott Hanselman to explain the serverless properties of the new tier and demonstrate how to build a highly scalable serverless application with Function Apps and expose it with API Management.

DevOps

Linking your GitHub commits with Azure Boards

Aaron Bjork Principal Group Program Manager, Azure DevOps, explains a new integration between Azure Boards and GitHub. Development teams using GitHub can now take advantage of the rich project management capabilities offered by Azure Boards, including Kanban boards, backlogs, sprint planning tools, queries, and multiple work item types.

New Azure Pipelines announcements – VS Code extension, GitHub Releases, and more

Jeremy Epling Principal Group Program Manager, Azure DevOps covers several new features and integrations with Azure Pipelines – two of which were directly inspired by the open source projects using Azure Pipelines to build and release their applications more easily. The Azure Pipelines extension for Visual Studio Code brings syntax highlighting and autocompletion for Azure Pipelines YAML to Visual Studio Code. You can now seamlessly manage GitHub Releases using Azure Pipelines by creating new releases, modifying drafts, or discarding older ones. We also added support in the Azure portal for Azure IoT Edge in the Azure DevOps project workflow to make it easy to get started in those scenarios. Lastly, the ServiceNow Change Management Extension enables integration of ServiceNow Change Management with Azure Pipelines.

Cincinnati Children’s has a great app development experience with Azure services

Cincinnati Children’s Hospital Medical Center is well known for the excellence of its care and its commitment to the well-being of the families it serves. To reduce strain for parents and make their lives easier, the hospital created the Caren mobile app using Microsoft Azure and Azure services. Parents turn to Caren for help navigating the hospital, to get answers to their questions, and to help them entertain their children. Using the app lets parents focus more on their families and less on stressful details. Learn more about this family-friendly app and how the team built it.

Now in preview

Azure Data Lake Storage Gen2 preview – More features, more performance, better availability

Announced in limited public preview in June, Azure Data Lake Storage Gen2 is now fully open to all customers of Azure in all public and sovereign Azure regions. In addition, in response to customer feedback and requests, several new features are now available (for example, Azure Storage Explorer support and the ability to mount an ADLS Gen2 filesystem into the Databricks File System).

Also available in preview

API Management bring your own cache is in preview
Azure Container Instances (ACI) GPU support public preview
Public preview: Virtual network service endpoints for Azure Database for MariaDB 
Serverless Community Library
Public preview: JavaScript and page contracts in Azure Active Directory B2C policies
Paginated reports in Power BI is in preview
API Management flexible subscriptions is in preview

Now generally available

General availability of VNet Service Endpoints for Azure SQL Data Warehouse

Virtual Network (VNet) Service Endpoints for Azure SQL Data Warehouse is now generally available in all Azure regions. Azure SQL Data Warehouse is a fast, flexible, and secure cloud data warehouse tuned for running complex queries fast and across petabytes of data. VNet Service Endpoints enable you to isolate connectivity to your logical server from a given subnet or set of subnets within your virtual network.

Disaster recovery of zone pinned Azure Virtual Machines to another region

Azure Virtual Machines now support disaster recovery of virtual machines deployed in Availability Zones to another region using Azure Site Recovery (ASR). You can now replicate and failover zone pinned virtual machines to other regions within a geographic cluster using Azure Site Recovery. This new capability is generally available in all regions supporting Availability Zones. Along with Availability Sets and Availability Zones, Azure Site Recovery completes the resiliency continuum for applications running on Azure Virtual Machines. Azure Availability Zones are unique fault-isolated physical locations, within an Azure region, with independent power, network, and cooling. Each Availability Zone is comprised of one or more datacenters and houses infrastructure to support highly available, mission critical applications with fault tolerance to datacenter failures.

Also generally available

Azure Cosmos DB new shared database offer now available
Custom Translation capability of Text Translator
JavaScript support for Durable Functions is now available
Now available: Azure Service Fabric Mesh fall refresh
Now available: Azure Service Fabric runtime version 6.4 & SDK updates
S1 Tier Now Available 
General availability: Increased compute in Azure Database for MySQL servers
General availability: Increased compute in Azure Database for PostgreSQL servers
Azure SQL Database Managed Instance—Business-critical service tier now available

News and updates

Microsoft Azure portal December 2018 update

This month brings updates to help improve the usability of creating and managing virtual machines in Azure, updates to Security Center, and an improved experience to control access to resources.

SONiC: Global support and updates

Microsoft and the community is developing, refining, and making SONiC (Software for Open Networking in the Cloud), our open switch OS, freely available to anyone running global scale or cloud-type networks or just have a healthy interest in advanced networking. This post takes a look at several enhancements and updates that demonstrate SONiC’s powerful ability to run diagnostics,  prevent network failures, and provide fast and flexible telemetry.

Azure SQL Data Warehouse Gen2 now supports lower compute tiers

Now you can experience Azure SQL Data Warehouse’s leading performance, flexibility, and security features starting with 100 cDWU (Data Warehouse Units) and scale to 30,000 cDWU in minutes. Starting mid-December 2018, customers can benefit from Gen2 performance and flexibility with lower compute tiers in 15 regions, with remaining regions available during 2019.

An easy way to bring back your Azure VM with In-Place restore

In-Place restore of disks in IaaS VMs along with simplified restore improvements in Azure Backup is now available. This feature helps roll back or fix corrupted virtual machines through in-place restore without the needs of spinning up a new VM. With the introduction of this feature, customers have multiple choices for IaaS VM restore like create new VM, Restore Disks and Replace disks.

Azure IoT Java SDK provides improved Android support

Transforming mobile devices into Internet of Things (IoT) devices have been gaining traction in the IoT space. Mobile devices have a wide range of sensors and a big screen, but most importantly, they are ubiquitous and they are getting cheaper. After releasing native iOS support in April 2018, we have been improving our support for Android. The Azure IoT Java SDK has improved testing on the Android platform, as well as new samples and quickstarts.

Additional updates

Deploy Service Fabric Ubuntu clusters on-premise using Bosh
Azure Container Service Will Retire on January 31, 2020
Azure Backup simplifies data protection and protects against ransomware.
Application Insights is available in UK South

 

Technical content

Modernize your Java Spring Boot application with Azure Database for MySQL

Learn about a Spring Boot application that demonstrates how to build and deploy a purchase order microservice as a containerized application on Azure Kubernetes Service (AKS). Spring Boot framework with MySQL Database backend is one of the established patterns to meet the online transactional processing needs of business applications. More often, such applications are built and deployed on cloud native microservice platforms, such as Kubernetes. Since databases have more stateful requirements with atomicity, consistency, durability, resiliency, and zero data loss across failures, you should run them outside of Kubernetes environment on managed database services that meet these requirements, such as the Azure Database for MySQL service.

Apply the latest Azure Site Recovery capabilities with automatic agent update

Azure Site Recovery has a monthly release cadence where it makes enhancements to existing features or add new capabilities. When you enable disaster recovery of Azure virtual machines, a component known as mobility service agent gets installed on the VM. Once you enable the automatic agent update, the Mobility Service extension gets updated automatically with every new release. You no longer need to plan for deploying the new versions with every release, and automatic update doesn't require a reboot of your Azure Virtual Machines, nor does it affect on-going replication.

Get up to speed with Azure HDInsight: The comprehensive guide

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics. With HDInsight, you get managed clusters for various Apache big data technologies, such as Spark, MapReduce, Kafka, Hive, HBase, Storm and ML Services backed by a 99.9% SLA. In addition, you can take advantage of HDInsight’s rich ISV application ecosystem to tailor the solution for your specific scenario. This guide covers the resources you need to learn about big data technologies and get started with HDInsight.

Best practices for naming your Microsoft Azure resources

Naming standards form one small part of Azure Governance, enabling you to structure and manage your Cloud resources in alignment with your organizational requirements. If set up correctly, resource creation, management, monitoring and cost analysis is made much easier. In this blog post, Sonia Cuff takes you through some of the challenges around naming resources and shares tips for improving their naming strategy.

List of Recommended Resources for Getting Started with Machine Learning on Azure

If you’re curious about using machine learning on Azure but don’t know where to start, this post is for you. Ari Bornstein rounds up a list of invaluable resources—from Azure Free Trial, to a list of docs, open source projects, videos, and tutorials—to help you explore machine learning on Azure.

Azure shows

Episode 257 – Azure Firewall | The Azure Podcast

Yair Tor, a Principal PM in the Azure Networking team, gives us the scoop on the newly released Firewall service. He talks to us about the use-cases where it makes sense to use the service, pricing and future plans.

HTML5 audio not supported

Azure Container Registry updates for security and reliability | Azure Friday

Curious about what's coming for container lifecycle management in Azure? To help you plan your roadmap, Steve Lasker and Scott Hanselman talk about the Azure Container Registry (ACR) roadmap, including updates for production security, reliability features, and developer security.

Pros and cons of stable and unique tags in Docker image tagging | Azure Friday

Steve Lasker joins Scott Hanselman to talk about best practices for Docker image tagging, and the tradeoffs between stable and unique tagging techniques.

Azure Time Series Insights – end-to-end solution for industrial IoT analytics | Internet of Things Show

To help maximize the value of time series data and drive operational intelligence, Azure Time Series Insights introduces support for a broad range of industrial IoT analytics capabilities. Check out, Chandrika Shankarnarayan, Principal PM Manager for Azure Time Series Insights, showing the new capabilities in this space.

 

Customizing the Web UI has never been easier with our recently released in code walkthroughs! Follow along as we customize an out of the box solution and make it our own.

An Introduction to the Intelligent Kiosk | AI Show

How to use this sample application to learn cognitive services and test applications that you are infusing with AI.

How to Build a Home Automation Auto-Away Assist with Azure IoT Hub | Azure Makers Series

Get more out of your home automation setup with Azure IoT Hub and Azure Functions. See how you can let your smart thermostat know when you’re in another room (not truly away) using motion sensors, Particle.io, and Azure.

How to work with Azure Logic App using Visual Studio 2017 | Azure Tips and Tricks

Learn how to work with Azure Logic App using Visual Studio 2017. Inside Visual Studio, you can search for the Logic App Visual Studio extension and create a new Logic App.

Jamie Cool on What’s Going On in Azure DevOps – Episode 013 | Azure DevOps Podcast

Jeffrey Palermo is joined by Jamie Cool to discuss what’s going on in Azure DevOps with a dive deep into what the internal roadmap is looking like for Microsoft’s DevOps transformation and a discussion about some of the big shifts that Jamie is currently working on. Jamie also talks about what is happening around the GitHub acquisition, where he sees DevOps headed in the future, and gives his advice on what you should be keeping an eye out for as a Visual Studio Developer.

HTML5 audio not supported

Events

IoT in Action: 4 innovations that are revolutionizing IoT

Artificial intelligence and cognitive abilities, real-time analytics at the intelligent edge, more secure IoT devices, and provisioning IoT quickly at scale are the four innovations that are revolutionizing IoT. To learn more about how you can take advantage of these innovations, be sure to register for an IoT in Action Virtual Bootcamp. Whether you are an engineer, software architect, or practice owner, this virtual bootcamp (taking place in early 2019 around the world) will give you a clear understanding of IoT from device to cloud and accelerate the development of an IoT solution for your business.

Join the Twitter AMA with Azure Integration Services

Azure Integration Services hosted a joint Twitter Ask Me Anything (AMA), or actually “Ask Us Anything”, session for API Management, Logic Apps, Service Bus, and Event Grid on Thursday, December 6, 2018. You can find the questions and their replies on Twitter by searching for the hashtag, #IntegrationAMA.

Customers, partners and industries

Johnson Controls tackles a $15b building industry problem with Azure Cosmos DB

Modern buildings include multiple systems that handle everything from building management to HVAC to security. To help building operators gather and understand data about their buildings, operations, and occupants, Johnson Controls created Digital Vault to integrate internal and external data sources and present a harmonized view of energy usage, security breaches, fire alarm status, temperature controls, and other building management systems.

How pharma sales operations benefit from centralizing data and process integration

Pharmaceutical companies need to meet demanding sales goals, manage intricate regulatory compliance, and maintain a competitive hold on the market. However, current sales force automation (SFA) solutions for the life sciences industry are focused primarily on sales reps, which leaves a large capability gap for sales operations departments and inhibits their ability to support the sales process. Prescriber360 is a Microsoft Gold Partner with a comprehensive Pharma SalesOps solution designed specifically for the life sciences industry that can reduce, and even close, the capability gap.

Azure obtains automotive industry’s TISAX compliance

Microsoft data centers and operations centers handling Microsoft Azure, Office 365, and Dynamics 365 have been evaluated by independent auditors as meeting the strong security requirements of the Trusted Information Security Assessment Exchange (TISAX). TISAX is used by European automotive companies to provide a common information security assessment for internal assessments, the evaluation of suppliers, and as an information exchange mechanism. See this post to learn which Azure datacenters were assessed at level 2 and 3 of the standard.

A Cloud Guru's Azure This Week – 7 December 2018

This time on Azure This Week, Lars talks about new security features for Azure functions and Azure Apps, a new Premium Block Blob Storage, and a look into how the Azure teams stay on top of security vulnerabilities on the entire platform.

Quelle: Azure

Power BI and Azure Data Services dismantle data silos and unlock insights

Learn how to connect Power BI and Azure Data Services to share data and unlock new insights with a new tutorial. Business analysts who use Power BI dataflows can now share data with data engineers and data scientists, who can leverage the power of Azure Data Services, including Azure Databricks, Azure Machine Learning, Azure SQL Data Warehouse, and Azure Data Factory for advanced analytics and AI.

With the recently announced preview of Power BI dataflows, Power BI has enabled self-service data prep for business analysts. Power BI dataflows can ingest data from a large array of transactional and observational data sources, and cleanse, transform, enrich, schematize, and store the result. Dataflows are reusable and can be refreshed automatically and daisy-chained to create powerful data preparation pipelines. Power BI is now making available support for storing dataflows in Azure Data Lake Storage (ADLS) Gen2, including both the data and dataflow definition. By storing dataflows in Azure Data Lake Storage Gen2, business analysts using Power BI can now collaborate with data engineers and data scientists using Azure Data Services.

Data silos inhibit data sharing

The ability for organizations to extract intelligence from business data provides a key competitive advantage, however attempting this today can be time consuming and costly. To extract intelligence and create value from data, an application must be able to access the data and understand its structure and meaning. Data often resides in silos that are application or platform specific, creating a major data integration and data preparation challenge. 

Consistent data and metadata formats enable collaboration

By adopting a consistent way to store and describe data based on the Common Data Model (CDM), Power BI, Azure Data Services and other applications can share and interoperate over data more effectively. Power BI dataflows are stored in ADLS Gen2 as CDM folders. A CDM folder contains a metadata file that describes the entities in the folder, with their attributes and datatypes, and lists the data files for each entity. CDM also defines a set of standard business entities that define additional rich semantics. Mapping the data in a CDM folder to standard CDM entities further facilitates interoperability and data sharing. Microsoft has joined with SAP and Adobe to form an Open Data Initiative to encourage the definition and adoption of standard entities across a range of domains to make it easier for applications and tools to share data through an enterprise Data Lake. 

By adopting these data storage conventions, data ingested by Power BI, with its already powerful and easy to use data prep features, can now be further enriched and leveraged in Azure. Similarly, data in Azure can be exported into CDM folders and shared with Power BI. 

Azure Data Services enable advanced analytics on shared data

Azure Data Services enable advanced analytics that let you maximize the business value of data stored in CDM folders in the data lake. Data engineers and data scientists can use Azure Databricks and Azure Data Factory dataflows to cleanse and reshape data, ensuring it is accurate and complete. Data from different sources and in different formats can be normalized, reformatted, and merged to optimize the data for analytics processing. Data scientists can use Azure Machine Learning to define and train machine learning models on the data, enabling predictions and recommendations that can be incorporated into BI dashboards and reports, and used in production applications. Data engineers can use Azure Data Factory to combine data from CDM folders with data from across the enterprise to create an historically accurate, curated enterprise-wide view of data in Azure SQL Data Warehouse. At any point, data processed by any Azure Data Service can be written back to new CDM folders, to make the insights created in Azure accessible to Power BI and other CDM-enabled apps or tools.

New tutorial explores data sharing between Power BI and Azure

A tutorial is now available to help you understand how sharing data between Power BI and Azure using CDM folders can break down data silos and unlock new insights. The tutorial with sample code shows how to integrate data from Power BI into a modern data warehousing scenario in Azure. The tutorial allows you to explore the flows highlighted in green in the diagram above.  

In the tutorial, Power BI dataflows are used to ingest key analytics data from the Wide World Importers operational database and store the extracted data with its schema in a CDM folder in ADLS Gen2. You then connect to the CDM folder and process the data using Azure Databricks, formatting and preparing it for later steps, then writing it back to the lake in a new CDM folder. This prepared CDM folder is used by Azure Machine Learning to train and publish an ML model that can be accessed from Power BI or other applications to make real-time predictions. The prepared data is also loaded into staging tables in an Azure SQL Data Warehouse, where it is transformed into a dimensional model. 

Azure Data Factory is used to orchestrate the flow of data between the services, as well as to manage and monitor the processing at runtime. By working through the tutorial, you’ll see first-hand how the metadata stored in a CDM folder makes it easier to for each service to understand and share data.   

Sample code accelerates your data integration projects

The tutorial includes sample code and instructions for the whole scenario. The samples include reusable libraries and code in C#, Python, and Scala, as well as reusable Azure Data Factory pipeline templates, that you can use to integrate CDM folders into your own Azure Data Services projects. 

Summary

Power BI data flows, the Common Data Model, and Azure Data Services can be used together to break open silos of data in your organization and enable business analysts, data engineers, and data scientists to share data to fuel advanced analytics and unlock new insights to give you a competitive edge. 

Next steps

Download the tutorial today and explore how you can share data between Power BI and Azure Data Services.

To learn more about Power BI dataflows, CDM, and Azure Data Services, follow the links below:

Power BI Dataflows
Common Data Model
CDM folders
Azure Data Factory
Azure Databricks
Azure SQL Data Warehouse
Azure Machine Learning
Azure Data Lake Storage Gen 2

Quelle: Azure

Azure HDInsight integration with Data Lake Storage Gen2 preview – ACL and security update

Today we are sharing an update to the Azure HDInsight integration with Azure Data Lake Storage Gen 2. This integration will enable HDInsight customers to drive analytics from the data stored in Azure Data Lake Storage Gen 2 using popular open source frameworks such as Apache Spark, Hive, MapReduce, Kafka, Storm, and HBase in a secure manner.

Azure Data Lake Storage Gen2

Azure Data Lake Storage Gen2 is the only data lake designed specifically for enterprises to run large scale analytics workloads in the cloud. It unifies the core capabilities from the first generation of Azure Data Lake with a Hadoop compatible file system endpoint now directly integrated into Azure Blob Storage. This enhancement combines the scale and cost benefits of object storage with the reliability and performance typically associated only with on-premises file systems. This new file system includes a full hierarchical namespace that makes files and folders first class citizens, translating to faster, more reliable analytics job execution.

Azure Data Lake Storage Gen2 also includes limitless storage ensuring capacity to meet the needs of even the largest, most complex workloads. In addition, Azure Data Lake Storage Gen2 delivers on native integration with Azure Active Directory and support POSIX compliant ACLs to enable granular permission assignments on files and folders.

Key benefits

Hadoop compatible access

Azure Data Lake Storage Gen2 allows you to manage and access data just as you would with a Hadoop Distributed File System (HDFS). The ABFS driver is available within all Apache Hadoop environments. File systems are well understood by developers and users alike. There is no need to learn a new storage paradigm when you move to the cloud as the file system interface exposed by Azure Data Lake Storage Gen2 is the same paradigm used by computers, large and small.

Role based access control

The security model for Azure Data Lake Storage Gen2 supports ACL and POSIX permissions.

These storage ACL capabilities along with fine grain access control via apache Ranger in HDInsight for applications such as Spark, Kafka, Hive, and HBase make it very convenient to open up your data lake for entire organization with appropriate security control and auditing in place.

SSL only access

With this update, ADLS Gen 2 accounts can only be accessed via https protocol ensuring that only encrypted communication is possible between HDInsight and storage.

Global availability

Azure Data Lake Storage Gen 2 and HDInsight are available across the globe, offering the scale needed to bring big data applications closer to users around the world, preserving data residency, and offering comprehensive compliance and resiliency options for customers.

Atomic directory manipulation

Object stores approximate a directory hierarchy by adopting a convention of embedding slashes (/) in the object name to denote path segments. While this convention works for organizing objects, the convention provides no assistance for actions like moving, renaming, or deleting directories. Without real directories, applications must process potentially millions of individual blobs to achieve directory-level tasks. By contrast, the hierarchical namespace processes these tasks by updating a single entry (the parent directory).

This dramatic optimization is especially significant for many big data analytics frameworks. Tools like Hive and Spark often write output to temporary locations and then rename the location at the conclusion of the job. Without the hierarchical namespace, this rename can often take longer than the analytics process itself. Lower job latency equals lower total cost of ownership (TCO) for analytics workloads.

Scale

HDInsight and Azure Data Lake Storage Gen2 bring new levels of scale for big data workloads. Customers can run workloads that scale at 100’s Gb/Sec to Petabytes of storage without needing to shard the data across multiple storage accounts.

Encryption at REST

Encryption in Azure Data Lake Storage Gen2 helps you protect your data, implement enterprise security policies, and meet regulatory compliance requirements. Azure Data Lake Storage Gen 2 supports encryption of data both at rest and in transit.

Network firewall

Integrated network firewall capabilities allow you to define rules restricting access only to requests originating from specified networks or HDInsight clusters in a specific VNET.

How does the integration work?

HDInsight and Azure Data Lake Storage Gen2 integration is based upon user-assigned managed identity. You assign appropriate access to HDInsight with your Azure Data Lake Storage Gen2 accounts. Once configured, your HDInsight cluster is able to use Azure Data Lake Storage Gen2 as its storage.

Getting started

Start using Azure Data Lake Storage Gen2 with Azure HDInsight today.

Feedback

We look forward to your comments and feedback. If there are any feature requests, customer asks, or suggestions, please contact us at askhdinsight@microsoft.com.

Additional resources

Azure Data Lake Storage Gen2 introduction
Hierarchical Namespace concept
Create HDInsight cluster with ADLS Gen2
Learn more about Azure HDInsight.
Read the Open Source component guide on HDInsight.
Review the HDInsight release notes.
Ask HDInsight questions on MSDN forums.
Ask HDInsight questions on StackOverflow.

Quelle: Azure

Azure Data Lake Storage Gen2 preview – More features, more performance, better availability

Since we announced the limited public preview of Azure Data Lake Storage (ADLS) Gen2 in June, the response has been resounding. Customers participating in the ADLS Gen2 preview have directly benefitted from the scale, performance, security, manageability, and cost-effectiveness inherent in the ADLS Gen2 offering. Today, we are very pleased to announce significant updates to the preview that will allow an even greater experience for customers.

Today’s announcements include additional features that preview customers have been asking for:

Enterprise-class security features integrated into Azure Databricks and Azure HDInsight (available shortly)
Azure Storage Explorer support to view and manage data in ADLS Gen2 accounts, including data exploration and access control management
Support for connecting external tables in SQL Data Warehouse, including when Storage Firewalls are active on the account
Power BI and SQL Data Warehouse supporting the Common Data Model for entities stored in ADLS Gen2
Storage Firewall and Virtual Network rules integration for all analytics services
Encryption of data at rest using either Microsoft or customer supplied keys as well as encryption in transit via TLS 1.2
Ability to mount an ADLS Gen2 filesystem into the Databricks File System (DBFS)

Additionally, as of today, the ADLS Gen2 public preview is fully open to all customers of Azure in all public and sovereign Azure regions. Customers may take advantage of the ABFS driver in HDInsight, Databricks, or SQL Data Warehouse with the Hierarchical Namespace enabled on all new accounts without any requirement to signup or be whitelisted.

Providing enterprise-class security for your Data Lake 

As customers create vast enterprise-wide repositories of data for analytics, not only do they need a storage solution that is capable of scaling and performing to meet their increasing demands, they MUST be able to secure this data. There are multiple aspects to securing the rich assets in an enterprise data lake:

Apply permissions so that only authorized users or groups may have access to read or write the data
Encrypt the data at rest (using own or system keys) and in transit to eliminate any possibility of loss of data
Provide transport-layer protections so even in the event of user credentials being compromised, the physical network layer provides protection

In the same manner as ADLS Gen1, ADLS Gen2 now provides both Role Based Access Control (RBAC) and POSIX-compliant Access Control Lists (ACLs) that restrict access to only authorized users, groups, or service principals in a flexible, fine-grained, and manageable manner. Authentication is via Azure Active Directory OAuth 2.0 bearer tokens which allows for flexible authentication schemes including federation with AAD Connect and multi-factor authentication for stronger protection than just passwords. More significantly, these authentication schemes are now integrated into the main analytics services including Azure Databricks, HDInsight, and SQL Data Warehouse as well as management tools such as Azure Storage Explorer. Once authenticated, permissions are applied at the finest granularity to ensure the right level of authorization for protecting an enterprise’s big data assets.

End-to-end encryption of data and transport layer protections complete the security shield for an enterprise data lake. Given that ADLS Gen2 is built on top of the Azure Blobs service, these existing capabilities that are already trusted by Blobs customers automatically apply to ADLS Gen2 data. The same set of analytics engines and tools are capable of taking advantage of these additional layers of protection, resulting in complete end to end protection of your analytics pipelines.

Your Data Lake is powered by performance

As we’ve discussed many times, the performance of the storage layer has an outsized impact on the total cost of ownership (TCO) for your complete analytics pipeline. This is due to the fact that every percentage point improvement in storage performance results in that same percentage reduction in the requirement for the very expensive compute layer. Given that the disaggregated storage model allows us to scale compute and storage independently, that percentage reduction in compute requirement results in almost the same (compute typically equates to 90 percent of the TCO) reduction in TCO.

So, when I say that ADLS Gen2 provides performance improvements ranging from 10-50 percent, depending on the nature of the workload over existing storage solutions, this equates to VERY significant reductions in the monthly analytics spend. It also has the added benefit of providing your insights sooner!

ADLS Gen2 is priced equivalent to general-purpose object (Blob) storage. However, all of the above performance and security features are now included at that price. This makes ADLS Gen2 the ideal environment to create or migrate your enterprise data lake as you get all of this dedicated functionality at commodity object storage prices.

Data Lakes everywhere

As ADLS Gen2 is a feature of the Azure Blobs service, it MUST be in ALL Azure regions. This is significant for enterprises that want to run their data lakes close to where their employees can gain benefit without the latency of travelling half way around the world. Many countries, and therefore enterprises, stipulate sovereignty requirements for where data may reside. Azure already has the biggest footprint of public cloud providers with regions around the world and now with ADLS Gen2 being available in all of those regions, customers can build their data lakes where ever they desire.

Start using Azure Data Lake Storage Gen2

To find out more you can:

Find out more about Azure Data Lake Storage Gen2.
Watch this video to learn how you can get started with Azure Data Lake Storage.
Start using Azure Data Lake Storage for free.

Quelle: Azure

Disaster recovery of zone pinned Azure Virtual Machines to another region

Azure Availability Zones are unique fault-isolated physical locations, within an Azure region, with independent power, network, and cooling. Each Availability Zone is comprised of one or more datacenters and houses infrastructure to support highly available, mission critical applications with fault tolerance to datacenter failures. For IaaS applications running on Azure Virtual Machines, you can build high availability into your business continuity strategy by deploying multiple virtual machines (VMs) across multiple zones within a region. In doing so, VMs are physically separated across zones, and a virtual network is created using load balancers at each site. These locations are close enough for high availability replication, so your applications stay running, despite any issues at the physical locations.

On rare occasions, an entire region could become unavailable due to major incidents such as natural disasters. Non-transient, large scale failures may exceed the ability of high availability (HA) features and require full-fledged disaster recovery (DR). Today, we are announcing the support for disaster recovery of virtual machines deployed in Availability Zones to another region using Azure Site Recovery (ASR). You can now replicate and failover zone pinned virtual machines to other regions within a geographic cluster using Azure Site Recovery. This new capability is generally available in all regions supporting Availability Zones. Along with Availability Sets and Availability Zones, Azure Site Recovery completes the resiliency continuum for applications running on Azure Virtual Machines.

Configuring DR for your zone pinned VMs is very simple. To get started, browse to your VM, select Disaster Recovery, select the target region of your choice, review settings, and click Enable replication. If the target region supports availability zones, you can configure your target VM to be deployed as a zone pinned VM. If not, you can configure the target VM to be deployed as a single instance or in an availability set. Once you failover to the target region and failback to the source region, the original zone configuration in source region stays intact. This ensures you can seamlessly failover to any supported target region, whether it supports availability zones or not, of your choice for DR and failback without any impact on the source region configuration.

This feature is available in all Azure regions where ASR is available. Get started with Azure Site Recovery today.

Related links and additional content

Get started by configuring disaster recovery for Azure VMs.
Learn more about the supported configurations for replicating Azure VMs.
Need help? Reach out to Azure Site Recovery forum for support.
Tell us how we can improve Azure Site Recovery by contributing new ideas and voting up existing ones.

Quelle: Azure

Get up to speed with Azure HDInsight: The comprehensive guide

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics. With HDInsight, you get managed clusters for various Apache big data technologies, such as Spark, MapReduce, Kafka, Hive, HBase, Storm and ML Services backed by a 99.9% SLA. In addition, you can take advantage of HDInsight’s rich ISV application ecosystem to tailor the solution for your specific scenario.

HDInsight covers a wide variety of big data technologies, and we have received many requests for a detailed guide. Whether you want to just get started with HDInsight, or become a Big Data expert, this post has you covered with all the latest resources.

Latest content

The HDInsight team has been working hard releasing new features, including the launch of HDInsight 4.0. We make major product announcements on the Azure HDInsight and Big Data blogs. Here is a selection of the most recent updates:

Launch of HDInsight 4.0 at Microsoft Ignite 2018 (Session Video)
Azure HDInsight brings next generation Apache Hadoop 3.0 and enterprise security to the cloud
Deep dive into Azure HDInsight 4.0
HDInsight Enterprise Security Package now generally available
Exciting new capabilities on Azure HDInsight
6-part best practice guide for on premises Hadoop to cloud migration
Azure Toolkit for IntelliJ – Spark Interactive Console
Secure incoming traffic to HDInsight clusters in a virtual network with private endpoint
Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache
Bring Your Own Keys for Apache Kafka on HDInsight
New Azure HDInsight management SDK now in public preview

HDInsight Developer Guide

The HDInsight Developer Guide covers both basic as well as advanced scenarios for developers, data scientists, or data engineers getting started or learning more with Azure HDInsight. This step-by-step guide starts with a basic overview and use-cases, followed by best practices on how to configure clusters, plan capacity, and develop applications for different workloads such as Hive, Spark, HBase and others. Finally, the guide concludes with advanced use-cases and scenarios along with samples.

HDInsight training resources

In addition to the guide, we would also like to highlight the other resources available to learn or know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, videos, and more.

Self-paced online trainings

Self-paced online training on edX, an online learning destination, offers high-quality courses from the world’s best universities and institutions to learners everywhere. These self-paced training courses are available for free as part of Microsoft Professional Program for Big Data, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight.

Processing Big Data in Azure HDInsight: This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis.
Implementing Real Time Analytics in Azure HDInsight: In this course, you’ll learn how to implement low-latency and streaming big data solutions using Hadoop technologies like HBase, Storm, and Spark on Microsoft Azure HDInsight.
Implementing Predictive Analytics in Azure HDInsight: In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.

Also see self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.

Hadoop on HDInsight
Spark on HDInsight

Self-serve documentation

HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientist, or big data administrator. This documentation includes everything from getting started to specific scenarios and use-cases with HDInsight. You can download the complete documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.

HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.

Instructor led training

Whether you’re looking to enhance your proficiency in specific technologies like Azure Machine Learning Studio or in overall architecture of Big Data and Analytics, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are regularly added by Microsoft and our training partners.

Videos

HDInsight videos: Apart from the above resources, you can also search for specific topics from getting started to advanced topics on Channel 9 or YouTube.

The following videos are great to learn about the scope and features in HDInsight.

Deep Dive on Apache Spark Performance Tuning on HDInsight: Part 1, Part 2, Part 3, and Part 4
New Spark UI extensions for better job performance analysis
Optimizing HBase Performance in HDInsight
Introduction to Apache Kafka on Azure HDInsight
Fine-grained security with Apache Ranger on HDInsight Kafka
Bring your own keys on Apache Kafka with Azure HDInsight
HDInsight: Fast Interactive Queries with Hive on LLAP
Introducing ML Services 9.3 in Azure HDInsight
Compliance Standards on HDInsight
Big Data Partner Program
How to use Machine Learning on Azure Government with HDInsight
StreamSets on Azure HDInsight

2017-18 conference recordings

Ignite 2018

Gaining deeper insights from big data using open source analytics on Azure HDInsight
Five essential new enhancements in Azure HDInsight

DataWorks Summit 2018

Building a Modern Data Warehouse on Microsoft Azure with Azure HDInsight and Azure Databricks
Zero ETL analytics with LLAP in Azure HDInsight

//build

Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Real-time data streams with Apache Kafka and Spark

Connect()

Breakpoint debugging of Spark jobs in Azure HDInsight

Hands on labs

Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.

Get Microsoft certified on HDInsight

Perform Data Engineering on Microsoft Azure HDInsight
Designing and Implementing Big Data Analytics Solutions

Other Resources

Training to build expertise in Azure

We hope that you will find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at AskHDInsight@microsoft.com. We’d love to hear from you. You can also stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight.
Quelle: Azure

IoT in Action: 4 innovations that are revolutionizing IoT

The Internet of Things (IoT) is reshaping every industry from manufacturing to medicine, and opportunities to transform business are nearly limitless. And while IoT is a complicated endeavor requiring multiple partners, skillsets, and technologies, new innovations are making projects easier to deploy, more secure, and more intelligent than ever.

Below I’ve called out four innovations that are revolutionizing the IoT industry. To learn more about how to take advantage of these innovations, be sure to register for our upcoming IoT in Action Virtual Bootcamp.

1. Artificial intelligence (AI) and cognitive capabilities

Cognitive services and AI used to come with a high price tag. But times have changed, and these capabilities are becoming increasingly accessible.

IoT Hub and Cognitive Services enable you to tailor IoT solutions with advanced intelligence without a team of data scientists. Not only do AI and Cognitive Services make it easier to infuse IoT solutions with capabilities such as image recognition, speech analytics, and intelligent recommendations, but they also help companies act on the data being gathered and realize the true value of IoT. Scenarios are virtually limitless. Companies like UBER are using visual identity verification to increase platform security, and Spektacom is making cricket better with its AI-infused sticker for cricket bats that can deliver insights around batting style.

2. Real-time analytics at the intelligent edge

You need data analytics to make your IoT solution complete, but all the data you need is not where you want it to be—it’s at the edge. One solution is to reproduce a cloud environment locally, but this can be costly and you may end up having to support two solutions, not one.

Now you can extend cloud intelligence and analytics to the edge. Azure IoT Edge optimizes performance between the edge and cloud, reducing latency, so you get real-time data. This secure solution enables edge devices to operate reliably even when they have intermittent cloud connectivity, while also ensuring that only the data you need gets sent to the cloud. And by combining data from the cloud and data from the edge, you get the best of both worlds.

3. More secure IoT devices

IoT security continues to evolve. Which means it’s never been easier to lock down your IoT solutions. At Microsoft, we continue to build uncompromising security into every product we make. We recently released Azure Sphere, which is an end-to-end solution for creating highly-secure, connected devices using a new class of microcontrollers (MCUs). Azure Sphere powers edge devices, combining three key components including Azure Sphere certified MCUs, Azure Sphere OS, and the Azure Sphere Security Service.

4. Provisioning IoT quickly at scale

Provisioning IoT manually is time-intensive and can quickly become a showstopper, especially when you’ve got hundreds, thousands, or even millions of devices to configure. Even if manual provisioning is possible now, building in the capability to quickly and securely provision future devices is critical.

Azure IoT Hub features a Device Provisioning Service (DPS) that enables remote provisioning without human intervention. Azure DPS provides the infrastructure needed to provision millions of devices in a secure and scalable way. DPS extends trust from the silicon to the cloud where it creates registries to enable managed identity services including location, mapping, aging, and retirement. It works in a variety of scenarios from automatic configuration based on solution-specific needs to load balancing across multiple hubs to connecting devices based on geo-location.

Register for the IoT in Action Virtual Bootcamp

To learn more about how you can take advantage of these innovations, be sure to register for an IoT in Action Virtual Bootcamp. Whether you are an engineer, software architect, or practice owner, this virtual bootcamp will give you a clear understanding of IoT from device to cloud and accelerate the development of an IoT solution for your business.

This event will help you get hands on with the latest in IoT devices and cloud services including secure MCUs, IoT OSes, and advanced application services. You will also receive trusted guidance and a singular ecosystem view, supporting you in the design of secure IoT solutions that add real-world business value and create exciting new customer experiences. Join us to establish a leadership position in the IoT ecosystem by creating new experiences and revenue streams while optimizing bottom-line performance.

Register for an IoT in Action Virtual Bootcamp in your time zone:

Asia: January 28-29, 2019
Europe, Middle East, and Africa: January 30-31, 2019
North and South America: February 4-5, 2019

Interested in attending one of our in-person IoT in Action event? Register for a free event coming to a city near you.
Quelle: Azure