Get up to speed with Azure HDInsight: The comprehensive guide

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics. With HDInsight, you get managed clusters for various Apache big data technologies, such as Spark, MapReduce, Kafka, Hive, HBase, Storm and ML Services backed by a 99.9% SLA. In addition, you can take advantage of HDInsight’s rich ISV application ecosystem to tailor the solution for your specific scenario.

HDInsight covers a wide variety of big data technologies, and we have received many requests for a detailed guide. Whether you want to just get started with HDInsight, or become a Big Data expert, this post has you covered with all the latest resources.

Latest content

The HDInsight team has been working hard releasing new features, including the launch of HDInsight 4.0. We make major product announcements on the Azure HDInsight and Big Data blogs. Here is a selection of the most recent updates:

Launch of HDInsight 4.0 at Microsoft Ignite 2018 (Session Video)
Azure HDInsight brings next generation Apache Hadoop 3.0 and enterprise security to the cloud
Deep dive into Azure HDInsight 4.0
HDInsight Enterprise Security Package now generally available
Exciting new capabilities on Azure HDInsight
6-part best practice guide for on premises Hadoop to cloud migration
Azure Toolkit for IntelliJ – Spark Interactive Console
Secure incoming traffic to HDInsight clusters in a virtual network with private endpoint
Apache Spark jobs gain up to 9x speed up with HDInsight IO Cache
Bring Your Own Keys for Apache Kafka on HDInsight
New Azure HDInsight management SDK now in public preview

HDInsight Developer Guide

The HDInsight Developer Guide covers both basic as well as advanced scenarios for developers, data scientists, or data engineers getting started or learning more with Azure HDInsight. This step-by-step guide starts with a basic overview and use-cases, followed by best practices on how to configure clusters, plan capacity, and develop applications for different workloads such as Hive, Spark, HBase and others. Finally, the guide concludes with advanced use-cases and scenarios along with samples.

HDInsight training resources

In addition to the guide, we would also like to highlight the other resources available to learn or know more about HDInsight. Please see below for the different learning resources available for HDInsight including self-paced training, documentation, videos, and more.

Self-paced online trainings

Self-paced online training on edX, an online learning destination, offers high-quality courses from the world’s best universities and institutions to learners everywhere. These self-paced training courses are available for free as part of Microsoft Professional Program for Big Data, or you can add a verified certificate for a fee. These courses have been updated and below are the three specific courses on HDInsight.

Processing Big Data in Azure HDInsight: This course teaches you how to use the Hadoop technologies in Microsoft Azure HDInsight to build batch processing solutions that cleanse and reshape data for analysis.
Implementing Real Time Analytics in Azure HDInsight: In this course, you’ll learn how to implement low-latency and streaming big data solutions using Hadoop technologies like HBase, Storm, and Spark on Microsoft Azure HDInsight.
Implementing Predictive Analytics in Azure HDInsight: In this course, learn how to implement predictive analytics solutions for big data using Apache Spark in Microsoft Azure HDInsight.

Also see self-paced online training on Microsoft Virtual Academy, which provides free online training by world-class experts to help you build your technical skills and advance your career. Ready to continue your big data deep dive? Below are the in-depth course to explore Hadoop and Spark on HDInsight, which are a key part of the analytics portion of MVA Data Series.

Hadoop on HDInsight
Spark on HDInsight

Self-serve documentation

HDInsight Documentation: This is the landing page for HDInsight documentation that is useful to any developer, data scientist, or big data administrator. This documentation includes everything from getting started to specific scenarios and use-cases with HDInsight. You can download the complete documentation using the “Download as PDF” option available on bottom left side of the page, or search for specific topics on the top left search box.

HDInsight Troubleshooting Guide: We are constantly updating the troubleshooting guide so that you can easily debug or troubleshoot issues.

Instructor led training

Whether you’re looking to enhance your proficiency in specific technologies like Azure Machine Learning Studio or in overall architecture of Big Data and Analytics, we’ve likely got a course that can get you on your way. The instructor-led and self-paced video courses span from short webinars, to multi-day workshops, to longer-term deep dives on demand. Check back frequently because new offerings are regularly added by Microsoft and our training partners.

Videos

HDInsight videos: Apart from the above resources, you can also search for specific topics from getting started to advanced topics on Channel 9 or YouTube.

The following videos are great to learn about the scope and features in HDInsight.

Deep Dive on Apache Spark Performance Tuning on HDInsight: Part 1, Part 2, Part 3, and Part 4
New Spark UI extensions for better job performance analysis
Optimizing HBase Performance in HDInsight
Introduction to Apache Kafka on Azure HDInsight
Fine-grained security with Apache Ranger on HDInsight Kafka
Bring your own keys on Apache Kafka with Azure HDInsight
HDInsight: Fast Interactive Queries with Hive on LLAP
Introducing ML Services 9.3 in Azure HDInsight
Compliance Standards on HDInsight
Big Data Partner Program
How to use Machine Learning on Azure Government with HDInsight
StreamSets on Azure HDInsight

2017-18 conference recordings

Ignite 2018

Gaining deeper insights from big data using open source analytics on Azure HDInsight
Five essential new enhancements in Azure HDInsight

DataWorks Summit 2018

Building a Modern Data Warehouse on Microsoft Azure with Azure HDInsight and Azure Databricks
Zero ETL analytics with LLAP in Azure HDInsight

//build

Ingestion in data pipelines with Managed Kafka Clusters in Azure HDInsight
ISV Showcase: End-to-end Machine Learning using H2O on Azure
Interactive ad-hoc analysis at petabyte scale with HDInsight Interactive Query
Real-time data streams with Apache Kafka and Spark

Connect()

Breakpoint debugging of Spark jobs in Azure HDInsight

Hands on labs

Data science lab: This lab specifically focuses on the Spark ML component of Spark and highlights its value proposition in the Apache Spark Big Data processing framework.
Hive lab: This lab focuses on how customers can leverage HDInsight Hive to analyze big data stored in Azure Blob Storage.

Get Microsoft certified on HDInsight

Perform Data Engineering on Microsoft Azure HDInsight
Designing and Implementing Big Data Analytics Solutions

Other Resources

Training to build expertise in Azure

We hope that you will find the developer guide and all the other resources helpful. If you have any feedback or questions, feel free to send us an email at AskHDInsight@microsoft.com. We’d love to hear from you. You can also stay up-to-date on the latest Azure HDInsight news and features by following us on Twitter #HDInsight and @AzureHDInsight.
Quelle: Azure

Published by