Run massive parallel R Jobs in Azure, now at a fraction of the price

We continue to add new capabilities to our lightweight R package, doAzureParallel, built on top of Azure Batch that allows you to easily use Azure's flexible compute resource right from your R session. Combined with the recently announced low-priority VMs on Azure Batch, you can now run your parallel R jobs at a fraction of the price. We also included other commonly requested capabilities to enable you to do more, and to do it more easily, with doAzureParallel.

Using R with low priority VMs to reduce cost

Our second major release comes with full support for low-priority VMs, letting R users run their jobs on Azure’s surplus compute capacity at up to an 80% discount.

For data scientists, low-priority is great way to save costs when experimenting and testing their algorithms, such as parameter tuning (or parameter sweeps) or comparing different models entirely. And Batch takes care of any pre-empted low-priority nodes by automatically rescheduling the job to another node.

You can also mix both on-demand nodes and low-priority nodes. Supplementing your regular nodes with low-priority nodes gives you a guaranteed baseline capacity and more compute power to finish your jobs faster. You can also spin up regular nodes using autoscale to replace any pre-empted low-priority nodes to maintain your capacity and to ensure that your job completes when you need it.

​Other new features

Aside from the scenarios that low-priority VMs enable, this new release includes additional tools and common feature asks to help you do the following:

Parameter tuning & cross validation with Caret
Job management and monitoring to make it easier to run long-running R jobs
Leverage resource files to preload data to your cluster
Additional utility to help you read from and write to Azure Blob storage
ETL and data prep with Hadley Wickham’s plyr

​Getting started with doAzureParallel

doAzureParallel is extremely easy to use. With just a few lines of code, you can register Azure as your parallel backend which can be used by foreach, caret, plyr and many other popular open source packages.

Once you install the package, getting started is as simple as few lines of code:

# 1. Generate your credentials config and fill it out with your Azure information
generateCredentialsConfig(“credentials.json”)

# 2. Set your credentials
setCredentials(“credentials.json”)

# 3. Generate your cluster config to customize your cluster
generateClusterConfig(“cluster.json”)

# 4. Create your cluster in Azure passing, it your cluster config file.
cluster <- makeCluster(“cluster.json”)

# 5. Register the cluster as your parallel backend
registerDoAzureParallel(cluster)

# Now you are ready to use Azure as your parallel backend for foreach, caret, plyr, and many more

For more information, visit the doAzureParallel Github page for a full getting started guide, samples and documentation.

We look forward to you using these capabilities and hearing your feedback. Please contact us at razurebatch@microsoft.com for feedback or feel free to contribute to our Github repository.

Additional information:

Download and get started with doAzureParallel
For questions related to using the doAzureParallel package, please see our docs, or feel free to reach out to razurebatch@microsoft.com
Please submit issues via Github

Additional resources:

See Azure Batch, the underlying Azure service used by the doAzureParallel package
More general purpose HPC on Azure
Learn more about low-priority VMs
Visit our previous blog post on doAzureParallel

Quelle: Azure

Published by