Announcing Azure Files share snapshots public preview

Azure Files offers fully managed cloud file shares, and extends the ability of organizations to share files across on-premises and the cloud. With support for industry standard SMB protocol, this service is truly cross-platform and can support mounting as file share from any client that implements SMB 3.0 with encryption. Some examples are Windows, Mac, and Linux. In addition to native mount, it exposes REST APIs for programmability. With Azure Files, organizations get the added benefit of a storage infrastructure that is highly secure, massively scalable, and globally available. Even with all of these capabilities, what would you do if a user or application accidentally deletes or corrupts files or folders that are stored in Azure Files share?

Today, we are very excited to introduce the public preview of Azure Files share snapshots. Azure Files share snapshots allows you to periodically store read-only versions of your file shares. It also allows you to copy an older version of your content from anywhere for further modification and use.

When a share snapshot is created, the contents of the file share and the share snapshot are exactly the same. However, only the incremental changes are written to the snapshot. This makes snapshot creation faster, space-efficient, and cost-effective.

On Windows, you can leverage the familiar Previous Versions functionality, as shown below in Figure 1, where sharesnapshotdefs is a mounted Azure file share and each entry in the Previous Versions tab is a share snapshot. You can browse the content of the snapshot, right there in your explorer, by selecting “Open” or by copying the contents of that share snapshot back to its original location by selecting “Restore”. The same Previous Versions experience is available for individual directories or files. This means that while snapshots are taken at the share level, data retrieval can be done at both the file share and individual directory/file level. 

Figure 1: Azure Files share snapshot experience on Windows – Integrated with “Previous Versions”

On Linux, you can use Azure CLI 2.0 for Azure Files, as shown below in Figure 2. All the same capabilities, including creation of snapshots, are available in Azure CLI 2.0.

Figure 2: Azure Files share snapshot experience on Azure CLI – List Snapshots

In addition to Azure CLI 2.0, snapshots are fully supported by REST and client libraries such as .Net and Python programmatic access. Also, PowerShell support is coming soon. To quickly get started, you can go directly to the Azure Portal today and start creating snapshot.

Figure 3: Azure Files share snapshot experience on Azure Portal

And what more – During our public preview, capacity consumed by snapshots will not be charged!

Azure Files share snapshots will be a key addition to your cloud storage management toolkit. To learn more about snapshots, please visit our documentation.

If you have any questions about Azure Files, please leave a comment below. In addition, if you have any feature request, we are always listening to your feedback on our User Voice.
Quelle: Azure

How Azure Security Center uses machine learning to enable adaptive application control

While the threat landscape has changed dramatically over the last several years, malware detection continues to be one of the biggest issues. There is an endless race – attackers develop new malware, security vendors create new signatures to detect it, a new variant of the malware is created to avoid detection, and the cycle continues. Malware is not the only type of application that can expose a server to risk. Unauthorized software can introduce vulnerabilities that are exploited by attackers. Most organizations lack the necessary application tracking and controls, leaving them blind to these risks.

Application controls, such as whitelisting, can help limit exposure to malicious and vulnerable applications. Instead of trying to keep pace with rapidly evolving malware and new exploits, application whitelisting simply blocks all but known good applications. For purpose-built servers that typically run a fixed set of applications, whitelisting can offer significant added protection. You can use application controls to:

Block new and unknown malware.
Comply with your organization's security policy that dictates the use of only licensed software.
Avoid old and unsupported applications.
Prevent specific software tools that are not allowed in your organization.

While the concept of application whitelisting has existed for some time now, it was not widely used. This is due to the complexity of creating and applying accurate whitelisting policies per server or group of servers, and managing these policies at scale in large environments.

Azure Security Center recently released adaptive application controls, which uses an innovate approach to application whitelisting, enabling you to realize the security benefits without the management overhead. Machine learning is used to analyze the behavior of your Azure VMs, create a baseline of applications, group the VMs and decide if they are good candidates for application whitelisting, as well as recommend and automatically apply the appropriate whitelisting rules. In addition, Security Center surfaces applications that can be exploited to bypass an application whitelisting solution, and provides full management and monitoring capabilities, through which you can change an existing whitelist (e.g. remove / add applications to the coverage) and be alerted on violations of the whitelists.

Adaptive application controls are currently available for Windows machines running in Azure (all versions, classic or Azure Resource Manager). To get started, open Security Center and select the application whitelisting tile as shown below.

Select a resource group to view and apply the recommended application whitelisting rules.

New rules are always set to audit mode, which alerts you to applications that violate the rules, but you can edit the policy to change in order to block these applications using enforce mode. You can also edit the details of the rules at any time.

Adaptive application controls are available to Azure Security Center Standard customers as a limited public preview. Please send an email with your subscription IDs to ASC_appcontrol@microsoft.com to join the preview.

To learn more about the capabilities and the different options that you have as a security administrator, please visit our documentation page.
Quelle: Azure

Point-to-Site (P2S) connection using OpenVPN infrastructure

One question that we are frequently asked is to establish a Point-to-Site (P2S) connection from a client computer based on a Unix-like operating system (like Linux and FreeBSD) and the Virtual Network (VNet) on Azure with the objective to manage their Virtual Machine deployed on cloud. In this article we want illustrate a custom solution for Azure Resource Manager (ARM) deployment model that addresses this request and, for which, we have provided a sample code that automates the setup of the architecture on Azure.

The proposed infrastructure is not based on an Azure Virtual network gateway, but is a mix of Azure IaaS resources and open source community software. It's a solution for whoever wants:

Complete control of VPN server
Very low costs with no concern about performance
P2S automatic setup

It is important to remember that, as for all IaaS approaches, the client will have the responsibility of the infrastructure management and its security.

Target Architecture

Follow a schema of the target architecture that we want to build:

We have following components:

Target VNet with:

One OpenVPN subnet
One Network Security Group (NSG) for OpenVPN subnet
An undefined number of subnets that contain the Virtual Machines to manage

OpenVPN Server – CentOS based Virtual Machine
Client Desktop with Internet connection – It contains a OpenVPN Client

The CentOS based Virtual Machine will contain an OpenVPN server that will be used to create a VPN based on SSLv3/TLSv1 protocol.

We have selected this product for its ability to work through proxy servers. Many client network implementations require that Internet access is routed through proxy servers. For this reason, the OpenVPN server will listen on port 443, which is the HTTPS default port. Normally this port is not filtered by proxy servers. This will allow to establish a management VPN between client and server, even in networks with limited Internet access.

Furthermore, the OpenVPN server is very well known in the open source community and, there are many clients available for different operating systems.

Note: This implementation is based on a single OpenVPN server. It is not recommended to use it for a production environment where high availability is a requirement, or your VPN connection will be used as part of services delivery.

How the architecture works

The OpenVPN server is in an Azure subnet and it will receive all incoming requests from the VPNs connections. The most important task that server will execute is IP Masquerade.

IP Masquerade is a form of network address translation (NAT) through which it is possible to alter the VPN server incoming communications to look like they were originated from it.

Acting in this way, the packages can be routed freely on the VNet and they can reach their destination: the VMs to manage.

As you can see we did not cite the need to add some specific route rule. In effect, by simulating that the packets are originated by OpenVPN, all of them will be routed on various subnets using the system route rule “Local Vnet Roule”.

Note: Even IP Forwarding is not needed, since there are no route rules which send traffic to a NIC that isn't a real destination of network traffic. All incoming traffic is “enveloped” in the VPN tunnel and, in OpenVPN server, will be unwrapped and then sent to its destination without Azure infrastructure actions.

Sample Code

The proposed code for automation has two main components:

PowerShell script used to prepare Azure infrastructure
Bash script used to execute CentOS server auto-configuration

The first one will execute the steps needed to create the proposed infrastructure and, as last step, it will setup a Virtual Machine Extension that permits the execution of a provided Bash script on the machine at the end of its provisioning.

The Bash script will handle the necessary customizations of CentOS image. Moreover, it will take care of upload, on a new private Azure Blob, the credentials which are necessary to OpenVPN client to establish a connection with the server.

A detailed guide of the code can be found in the GitHub project.

Conclusion

In this article, we have described an architecture, based on a mix of Azure IaaS resources and open source community software to address a particular requirement of some users. We also released a sample code.

This is further proof of how various situations can be addressed easily with the combination of resources and services available on Azure.
Quelle: Azure

Java: Manage availability zones and more

We released 1.3 of the Azure Management Libraries for Java. This release adds support for availability zones (in preview).

https://github.com/Azure/azure-sdk-for-java

Getting started

Add the following dependency fragment to your Maven POM file to use 1.3 version of the libraries:

<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure</artifactId>
<version>1.3.0</version>
</dependency>

Create virtual machine in availability zone

You can create a virtual machine instance in an availability zone by using a define() … create() method chain.

VirtualMachine = azure.virtualMachines()
.define("Linux-VM-in-Availability-Zone")
.withRegion(Region.US_EAST)
.withNewResourceGroup(rgName)
.withNewPrimaryNetwork("10.0.0.0/28")
.withPrimaryPrivateIPAddressDynamic()
.withNewPrimaryPublicIPAddress("linux-vm-with-high-availability")
.withPopularLinuxImage(KnownLinuxVirtualMachineImage.UBUNTU_SERVER_16_04_LTS)
.withRootUsername("tirekicker")
.withSsh(sshKey)
// Optional
.withAvailabilityZone(AvailabilityZoneId.ZONE_1)
.withSize(VirtualMachineSizeTypes.STANDARD_D3_V2)
// Create VM
.create();

Create virtual machine scale set in availability zone

You can create a virtual machine scale set instance in multiple availability zones by using another define() … create() method chain.

VirtualMachineScaleSet virtualMachineScaleSet1 = azure.virtualMachineScaleSets()
.define("VM-Scale-Set-in-Availability-Zone")
.withRegion(Region.US_EAST)
.withExistingResourceGroup(rgName)
.withSku(VirtualMachineScaleSetSkuTypes.STANDARD_D3_V2)
.withExistingPrimaryNetworkSubnet(network, "front-end")
.withExistingPrimaryInternetFacingLoadBalancer(loadBalancer)
.withPrimaryInternetFacingLoadBalancerBackends(backends.get(0))
.withPrimaryInternetFacingLoadBalancerInboundNatPools(natpools.get(0))
.withoutPrimaryInternalLoadBalancer()
.withPopularLinuxImage(KnownLinuxVirtualMachineImage.UBUNTU_SERVER_16_04_LTS)
.withRootUsername("tirekicker")
.withSsh(sshKey)
// Optional
.withAvailabilityZone(AvailabilityZoneId.ZONE_1)
// Create VM
.create();

Support for more Azure services

1.3 also adds support for Network Peering, Virtual Network Gateway and Azure Container Instances.

Peer two virtual networks

You can peer two virtual networks using another define() … create() method chain.

NetworkPeering peeringAB = networkA.peerings()
.define(peeringABName)
.withRemoteNetwork(networkB)
.create();

You can use network watcher to check connectivity between virtual machines in peered virtual networks.

Create Virtual Private Network (VPN) using site-to-site connection

You can create a virtual private network site-to-site connection using another define() … create() method chain.

virtualNetworkGateway.connections()
.define("My-Connection")
.withSiteToSite()
.withLocalNetworkGateway(localNetworkGateway)
.withSharedKey("MySecretKey")
.create();

You can create a virtual network to virtual network VPN connection.

Create container group in Azure Container Instances

You can create a container group with multiple container instances using another define() … create() method chain.

ContainerGroup containerGroup = azure.containerGroups().define("azureContainerInstance")
.withRegion(Region.US_WEST)
.withNewResourceGroup(rgName)
.withLinux()
.withPublicImageRegistryOnly()
.withoutVolume()
.defineContainerInstance(aciName + "-1")
.withImage(containerImageName1)
.withExternalTcpPort(80)
.withCpuCoreCount(.5)
.withMemorySizeInGB(.75)
.attach()
.defineContainerInstance(aciName + "-2")
.withImage(containerImageName2)
.withoutPorts()
.withCpuCoreCount(.5)
.withMemorySizeInGB(.75)
.attach()
.create();

Similarly, you can:

Create a container group with a new Azure File Share
Create a container group with an existing Azure File Share
Create a container group and scale up containers using Kubernetes in Azure Container Service

Try it

You can get more samples from our GitHub repo. Give it a try and let us know what you think (via e-mail or comments below).
 
You can find plenty of additional info about Java on Azure at http://azure.com/java.
Quelle: Azure

Qubole customers now gain insights from Azure Data Lake

Azure Data Lake provides one of the easiest ways to build an enterprise data lake. With Qubole’s latest release of Qubole Data Service (QDS), their customers can now land data of any size, structured or unstructured, in the data lake for analytics. Access to a data lake is essential in gaining insights on enterprise data and driving new business opportunities. Life just got easier for enterprises transitioning from legacy on-premises data warehouses to Azure Data Lake using Qubole.

Try it out today!

You can easily deploy the latest release of QDS from the Azure Marketplace. The Azure Quick Start Guide provides step-by-step guidance for configuring QDS to interoperate with Azure Data Lake Store.  

In QDS, in Control Panel go to Account Settings, under Account Details
choose Storage Service and choose Data Lake in the drop down.

Qubole Data Service (QDS) delivers fully-automated and cloud-optimized data engines (including Spark, Hive, and Hadoop MapReduce) on Azure, performing at scale with extensive automation management tools. Along with its existing support for Zeppelin notebooks,​ Qubole also recently announced support for Jupyter notebooks enabling data scientists to connect to a Qubole Spark cluster from within their IDE and execute distributed computing on large datasets that reside in Azure Data Lake.

Azure Data Lake provides a high throughput, cloud-scale filesystem optimized for analytics workloads. Multiple clusters can share the same Azure Data Lake account, enabling customers to add and remove compute resources to meet workloads’ needs. Through Role-based Access Control (RBAC), and POSIX-style ACLs at file and folder level, customers can either partition the data lake among separate users or share data lake among multiple users concurrently. Encryption-at-rest and integration with Azure Key Vault enables customers to secure their data lake assets with either Azure-generated keys, or keys they create themselves. With today’s announcement, analytics workloads on Qubole clusters can now easily access data in Azure Data Lake and take advantage of these capabilities, bringing Qubole's intelligence, automation and data platforms capabilities to Azure Data Lake.

Quelle: Azure

Get alerts faster with near real-time alerting for Azure platform metrics

At Ignite 2017, we announced the public preview of near real-time metric alerts in Azure. This new kind of alerts provide the following improvements over the current metric alerts.

Improved Latency – You can create near real-time metric alerts that monitor metric values as frequently as 1 minute.
More control over metric conditions – You can create near real-time metric alert rules that can monitor minimum, maximum, average, and total of the metric over the evaluation period.
Combined monitoring of multiple metrics – You can create a single near real-time metric alert rule that can monitor multiple metrics (currently two) at the same time.
Modular notification system –  You can use action groups with near real-time metric alerts. Action groups provide a reusable set of actions that can be used with multiple alerts. By using action groups with near real-time metric alerts, you can send SMS, email, or call web hook when an alert gets triggered.

Using near real-time metric alerts

Let’s look at how to create a near real-time metric alert.

1. In the Azure portal, locate the resource you are interested in monitoring and select it. You can also do the same for all supported resource types centrally from Monitor>Alerts.

2. Select Alerts or Alert rules under the Monitoring section. The text and icon may vary slightly for different resources.

3. Click the Add near real time metrics alert (preview) command. If the command is grayed out, ensure the resource is selected in the filter.

4. Name your alert rule, and choose a Description, which also shows in notification emails.

5. Select the Metric you want to monitor, then choose a Condition, Time Aggregation, and Threshold value for the metric. Optionally, select another Metric you want to monitor, then choose a Condition, Time Aggregation, and Threshold value for the second metric.

6. Choose the Evaluation Period and Evaluation Frequency.

7. Specify if you want to use a New or Existing Action Group.

8. If you choose to create New Action Group, give the action group a name and a short name, specify actions (SMS, Email, or Web hook) and fill respective details.

9.  Select OK when done to create the alert.

How does a near real-time alert work?

Let’s look at this with the help of an example. Say you have configured your rule as follows:

Starting at around when the alert was created, the alert will look at the past 5 minutes of data for %CPU Usage and NetworkIn metrics, calculate the average %CPU Usage and Total NetworkIn values, and check if they are above their respective thresholds. The alert will fire only if both the thresholds are breached. The conditions are evaluated again after 1 min. The alert is resolved if one of the metrics go below their thresholds.

Supported resources

These are some of the resource types that are supported today by near real-time metric alerts:

Microsoft.Compute/virtualMachines:
Microsoft.Compute/virtualMachineScaleSets:
Microsoft.DBforMySQL/servers
Microsoft.DBforPostgreSQL/servers
Microsoft.Cache/Redis
Microsoft.ServiceBus/namespaces

For a full list of supported resource types, see the documentation. Support for more resources is coming soon.

For more information on near real-time metric alerts, see the documentation. We would love to hear your feedback. Send us any questions or feedback to azurealertsfeedback@microsoft.com.
Quelle: Azure

Learn the latest on Azure Analysis Services

Last week, members from the Azure Analysis Services team met with customers at the Microsoft Ignite conference in Orlando. The videos below give details about bringing Azure Analysis Services into your BI solutions, and also announce various new features that will be coming soon.

Delivering enterprise BI with Azure Analysis Services

Learn how to deliver analytics at the speed of thought with Azure Analysis Services on top of a petabyte-scale SQL Data Warehouse and Azure HDInsight Spark implementation. This session covers best practices for managing, processing, and query accelerating at scale, implementing change management for data governance, and designing for performance and security. These advanced techniques are demonstrated through an actual implementation including architecture, code, data flows, and tips and tricks.

 

Creating enterprise-grade BI models with Azure Analysis Services

Microsoft Analysis Services enables you to build comprehensive, enterprise-scale analytic solutions that deliver actionable insights through familiar data visualization tools such as Microsoft Power BI and Microsoft Excel. Analysis Services enables consistent data across reports and users of Power BI. This session covers new features such as improved Power BI Desktop feature integration, Power Query connectivity, and techniques for modeling and data loading which enable the best reporting experiences. Various modeling enhancements are included, such as Detail Rows allowing users to easily see transactional records, and deployment and application-lifecycle management (ALM) features to bridge the gap between self-service and corporate BI.

 

Architect your big data solutions with SQL Data Warehouse and Azure Analysis Services

Have you ever wondered what’s the secret sauce that allows a company to use their data effectively? How do they ingest all their data, analyze it, and then make it available to thousands of end users? What happens if you need to scale the solution? Come find out how some of the top companies in the world are building big data solutions with Azure Data Lake, Azure HDInsight, Azure SQL Data Warehouse, and Azure Analysis Services. We cover some of the reference architectures of these companies, best practices, and sample some of the new features that enable insight at the speed of thought.

 

Learn more about Azure Analysis Services and the Azure Analysis Services web designer.
Quelle: Azure

Announcing Azure Building Blocks

I am pleased to announce the release of Azure Building Blocks, a set of tools and Azure Resource Manager templates that are designed to simplify deployment of Azure resources. Azure Resource Manager templates are very powerful, yet at the same time can be very complicated to author and maintain. With Azure Building Blocks, you don't need to author any templates. You only need to specify the parameter settings for the resources you want to deploy. These parameters settings are then merged with best practice defaults and deployed using the azbb command line tool.

Azure Building Blocks currently support the following resources:

​Virtual Networks (including gateways and peering)
Network Security Groups
User Defined Routes
Windows and Linux VMs
VM Extensions
Internal and External Load Balancers
Application Gateway

Infrastructure as Code

Azure Building Blocks enables you to manage your deployment in a simple, declarative way. Your parameter files can sit side-by-side with your application's code. The Azure Building Block command line tool can be integrated into your deployment pipeline.

Simplifying Resource Deployment

For example, the parameters to deploy three identical VMs: 

"type": "VirtualMachine",

"settings": {

    "vmCount": 3,

    "osType": "windows",

    "namePrefix": "test",

    "adminPassword": "<your password>",

    "nics": [{"subnetName": "web"}],

    "virtualNetwork": {"name": "ra-vnet"}

}

The Azure Building Blocks command line tool merges the following best practice defaults when it deploys the VMs to Azure:

Enables diagnostics on all VMs.
Deploys the VMs in an availability set.
All VM disks are managed.
The operating system is the latest Windows Server 2016 image (or latest Ubuntu 16.04 image for Linux).
Public IP created for each VM.

Placing the VMs in a scale set is as easy as specifying one more parameter:

"type": "VirtualMachine",

"settings": {

    "vmCount": 3,

    "osType": "windows",

    "namePrefix": "test",

    "adminPassword": "<your password>",

    "nics": [{"subnetName": "web"}],

    "virtualNetwork": {"name": "ra-vnet"},

    "scaleSetSettings": {}

}

Getting Started

To get started, clone the Azure Building Blocks repository on Github and install the Azure Building Blocks command line tool. You can even install in Azure Cloud Shell. Learn how to create your own parameter file, then learn how to create larger and more complex architectures by going through our tutorials. The tutorials are currently being written, but keep checking back as more are added.

Now go deploy some stuff to Azure!
Quelle: Azure

Announcing the preview of Java support for Azure Functions

Serverless provides a great model for accelerating app development, but developers want to do it using the programming languages and development tools of their choice. Ever since we first released Azure Functions, support for Java has been a top request. Today, at JavaOne in San Francisco, we’re announcing the public preview of Java support in Azure Functions.

With the recently announced capability to run the open source Azure Functions runtime on cross-platform .NET Core, we’ve architected our runtime to allow a broadened support for different programming languages. Java is the first new language we are introducing in this public preview. The new Java runtime will share all the differentiated features provided by Azure Functions, such as the wide range of triggering options and data bindings, serverless execution model with auto-scale, as well as pay-per-execution pricing.

As a Java developer, you don’t need to use any new tools to develop using Azure Functions. In fact, with our newly released Maven plugin, you can create, build, and deploy Azure Functions from your existing Maven-enabled projects. The new Azure Functions Core Tools will support you to run and debug your Java Functions code locally on any platform.

Figure 1: Azure Functions project in Java created using Maven Archetype

What is even more exciting is that popular IDEs and editors like Eclipse, IntelliJ, and VS Code can be used to develop and debug Azure Functions locally.

Figure 2: A serverless function in Java debugged using Visual Studio Code

To get started, look at the Azure Functions Java tutorial to create your first Java function and deploy it to Azure using Maven and Jenkins today. Also, if you’re attending JavaOne, join our sessions and swing by the Azure booth to learn more about building serverless apps in Azure with Java!

Next steps

Build your first serverless Java function using the instructions in this tutorial.
Watch Java and Maven in action for Azure Functions.
Find the complete Azure Functions Java developer reference.
Learn how to debug serverless Java functions locally using VS Code.
Follow our progress on our GitHub repository.
Learn about all the great things you can do with Java on Azure.

Quelle: Azure

Provision on-demand Spark clusters on Docker using Azure Batch's infrastructure

Since its release 3 years ago, Apache Spark has soared in popularity amongst Big Data users, but is also increasingly common in the HPC space. However, spinning up a Spark cluster, on-demand, can often be complicated and slow. Instead, Spark developers often share pre-existing clusters managed by their company’s IT team. In these scenarios, Spark developers run their Spark applications on static clusters that are in constant flux between under-utilization and insufficient capacity. You’re either out of capacity, or you’re burning dollars on idle nodes.

I’m excited to announce our beta release of the Azure Distributed Data Engineering Toolkit – an open source python CLI tool that allows you to provision on-demand Spark clusters and submit Spark jobs directly from your CLI.

After closely studying how Spark users interact with their clusters, we designed the Azure Distributed Data Engineering Toolkit to extend the native Spark experience, allowing you to provision clusters, and giving you the full end-to-end experience of running Spark at scale. Furthermore, this toolkit inherits Azure Batch’s fast provision time, taking only 3-5 minutes to provision your Spark cluster. With a Spark native experience and fast spin-up time, this toolkit allows you to easily run your Spark experiments, enabling you to do more, easily and in less time.

For those of you who need specific software pre-installed for your Spark application, this toolkit also gives you the ability to bring your own Docker image, making setup simple and reproducible.

The Azure Distributed Data Engineering Toolkit is free to use – you only pay for the cores you consume. Because it is built on Azure Batch, it has the ability to provision low-priority VMs, letting you run your Spark jobs at an 80% discount, making it a great tool for experimentation, testing, and other low priority work.

Today, this toolkit only supports Spark, however, we plan to support other distributed data engineering frameworks in a similar vein.

Create your Spark cluster

Once you have the Azure Distributed Data Engineering Toolkit installed you can start by creating a Spark cluster with this simple CLI command:

$ aztk spark cluster create
–id <my_spark_cluster_id>          –size <number_of_nodes>          –vm-size <vm_size>

Instead of using –size, you can optionally use –size-low-priority to provision low priority VMs.

You can also add the –username and –ssh-key parameters to create an user for the cluster. This can optionally be done in a separate aztk spark cluster add-user command.

Submit a job

Once your Spark cluster is ready, submit jobs against your cluster(s) with the standard spark-submit command using the aztk spark cluster submit command.

$ aztk spark cluster submit        –id <my_spark_cluster>        –name <my_spark_job_name>
[options]        <app jar | python file>        [app parameters]

And just like with the standard spark-submit command, the output will be streamed to the console.

Quick start demo

Here’s an example of how you can get a 2TB Spark cluster with low-priority VMs (80% discounted) in about 5 minutes:

$ aztk spark cluster create –id my_cluster –size-low-pri 16 –vm-size Standard_E16_v3

# Grab some coffee…

# BOOM! Your 2TB Spark cluster is up. Start submitting jobs!
$ aztk spark cluster submit –id my_cluster –name my_massive_job my_pyspark_app.py

Interactive mode

To get started, most users will want to work interactively with their Spark clusters. The Azure Distributed Data Engineering Toolkit supports working interactively with the aztk spark cluster ssh command that helps you ssh into the cluster’s master node, but also helps you port-forward your Spark Web UI and Spark Jobs UI to your local machine:

$ aztk spark cluster ssh –id <my_spark_cluster_id>

By default, we port forward the Spark Web UI to localhost:8080, Spark Jobs UI to localhost:4040, and Jupyter to localhost:8888. These defaults can be configured in the .thunderbolt/ssh.yaml file, if needed.

Once you run the command, you should be able to interact with your Spark Web UI by going to localhost:8080 on your local machine:

Managing your Spark cluster(s)Checkout some of the other commands you can use to manage your Spark cluster(s):

# Get a summary of all the Spark clusters you have created with Azure Thunderbolt
$ aztk spark cluster list

# Get a summary on a specific Spark cluster
$ aztk spark cluster get –id <my_spark_cluster_id>

# Delete a specific Spark cluster
$ aztk spark cluster delete –id <my_spark_cluster_id>

We look forward to you using these capabilities and hearing your feedback. Please contact us at askaztk@microsoft.com for feedback or feel free to contribute to our Github repository.

Additional information

Download and get started with the Azure Distributed Data Engineering Toolkit
Please feel free to submit issues via Github

Additional resources

See Azure Batch, the underlying Azure service used by the Azure Distributed Data Engineering Toolkit
More general purpose HPC on Azure

Quelle: Azure