Silo busting 2.0—Multi-protocol access for Azure Data Lake Storage

Cloud data lakes solve a foundational problem for big data analytics—providing secure, scalable storage for data that traditionally lives in separate data silos. Data lakes were designed from the start to break down data barriers and jump start big data analytics efforts. However, a final “silo busting” frontier remained, enabling multiple data access methods for all data—structured, semi-structured, and unstructured—that lives in the data lake.

Providing multiple data access points to shared data sets allow tools and data applications to interact with the data in their most natural way. Additionally, this allows your data lake to benefit from the tools and frameworks built for a wide variety of ecosystems. For example, you may ingest your data via an object storage API, process the data using the Hadoop Distributed File System (HDFS) API, and then ingest the transformed data using an object storage API into a data warehouse.

Single storage solution for every scenario

We are very excited to announce the preview of multi-protocol access for Azure Data Lake Storage! Azure Data Lake Storage is a unique cloud storage solution for analytics that offers multi-protocol access to the same data. Multi-protocol access to the same data, via Azure Blob storage API and Azure Data Lake Storage API, allows you to leverage existing object storage capabilities on Data Lake Storage accounts, which are hierarchical namespace-enabled storage accounts built on top of Blob storage. This gives you the flexibility to put all your different types of data in your cloud data lake knowing that you can make the best use of your data as your use case evolves.

Single storage solution

Expanded feature set, ecosystem, and applications

Existing blob features such as access tiers and lifecycle management policies are now unlocked for your Data Lake Storage accounts. This is paradigm-shifting because your blob data can now be used for analytics. Additionally, services such as Azure Stream Analytics, IoT Hub, Azure Event Hubs capture, Azure Data Box, Azure Search, and many others integrate seamlessly with Data Lake Storage. Important scenarios like on-premises migration to the cloud can now easily move PB-sized datasets to Data Lake Storage using Data Box.

Multi-protocol access for Data Lake Storage also enables the partner ecosystem to use their existing Blob storage connector with Data Lake Storage.  Here is what our ecosystem partners are saying:

“Multi-protocol access for Azure Data Lake Storage is a game changer for our customers. Informatica is committed to Azure Data Lake Storage native support, and Multi-protocol access will help customers accelerate their analytics and data lake modernization initiatives with a minimum of disruption.”

– Ronen Schwartz, Senior Vice President and General Manager of Data Integration, Big Data, and Cloud, Informatica

You will not need to update existing applications to gain access to your data stored in Data Lake Storage. Furthermore, you can leverage the power of both your analytics and object storage applications to use your data most effectively.

Multi-protocol access enables features and ecosystem

Multiple API endpoints—Same data, shared features

This capability is unprecedented for cloud analytics services because not only does this support multiple protocols, this supports multiple storage paradigms. We now bring you this powerful capability to your storage in the cloud. Existing tools and applications that use the Blob storage API gain these benefits without any modification. Directory and file-level access control lists (ACL) are consistently enforced regardless of whether an Azure Data Lake Storage API or Blob storage API is used to access the data.  

Multi-protocol access on Azure Data Lake Storage

Features and expanded ecosystem now available on Data Lake Storage

Multi-protocol access for Data Lake Storage brings together the best features of Data Lake Storage and Blob storage into one holistic package. It enables many Blob storage features and ecosystem support for your data lake storage.

Features
More information

Access tiers
Cool and Archive tiers are now available for Data Lake Storage. To learn more, see the documentation “Azure Blob storage: hot, cool, and archive access tiers.”

Lifecycle management policies
You can now set policies to a tier or delete data in Data Lake Storage. To learn more, see the documentation “Manage the Azure Blob storage lifecycle.”

Diagnostics logs
Logs for the Blob storage API and Azure Data Lake Storage API are now available in v1.0 and v2.0 formats. To learn more, see the documentation "Azure Storage analytics logging."

SDKs
Existing blob SDKs can now be used with Data Lake Storage. To learn more, see the below documentation:

Azure Blob storage client library for .NET
Azure Blob storage client library for Java
Azure Blob storage client library for Python

PowerShell
PowerShell for data plane operations is now available for Data Lake Storage. To learn more, see the Azure PowerShell quickstart.

CLI
Azure CLI for data plane operations is now available for Data Lake Storage. To learn more, see the Azure CLI quickstart.

Notifications via Azure Event Grid
You can now get Blob notifications through Event Grid. To learn more, see the documentation “Reacting to Blob storage events.” Azure Data Lake Storage Gen2 notifications are currently available.

 

Ecosystem partner
More information

Azure Stream Analytics
Azure Stream Analytics now writes to, as well as reads from, Data Lake Storage.

Azure Event Hubs capture
The capture feature within Azure Event Hubs now lets you pick Data Lake Storage as one of its destinations.

IoT Hub
IoT Hub message routing now allows routing to Azure Data Lake Storage Gen 2.

Azure Search
You can now index and apply machine learning models to your Data Lake Storage content using Azure Search.

Azure Data Box
You can now ingest huge amounts of data from on-premises to Data Lake Storage using Data Box.

Please stay tuned as we enable more Blob storage features using this amazing capability.

Next steps

All these new capabilities are available today in West US 2 and West Central US. Sign up for the preview today. For more information, please see our documentation on multi-protocol access for Azure Data Lake Storage.
Quelle: Azure

Published by