Februar 2020 - Seite 22 von 77 - Cloud Computing Köln

Running ML workloads more cost effectivelyGoogle Cloud wants to help you run your ML workloads as efficiently as possible. To do this, we offer many options for accelerating ML training and prediction, including many types of NVIDIA GPUs. This flexibility is designed to let you get the right tradeoff between cost and throughput during training or cost and latency for prediction.We recently reduced the price of NVIDIA T4 GPUs, making AI acceleration even more affordable. In this post, we’ll revisit some of the features of recent generation GPUs, like the NVIDIA T4, V100, and P100. We’ll also touch on native 16-bit (half-precision) arithmetics and Tensor Cores, both of which provide significant performance boosts and cost savings. We’ll show you how to use these features, and how the performance benefit of using 16-bit and automatic mixed-precision for training often outweighs the higher list price of NVIDIA’s newer GPUs.Half-precision (16-bit float)Half-precision floating point format (FP16) uses 16 bits, compared to 32 bits for single precision (FP32). Storing FP16 data reduces the neural network’s memory usage, which allows for training and deployment of larger networks, and faster data transfers than FP32 and FP64.32-bit Float structure (Source: Wikipedia)16-bit Float structure (Source: Wikipedia)Execution time of ML workloads can be sensitive to memory and/or arithmetic bandwidth. Half-precision halves the number of bytes accessed, reducing the time spent in memory-limited layers. Lowering the required memory lets you train larger models or train with larger mini-batches.The FP16 format is not new to GPUs. In fact, it has been supported as a storage format for many years on NVIDIA GPUs: High performance FP16 is supported at full speed on NVIDIA T4, NVIDIA V100, and P100 GPUs. 16-bit precision is a great option for running inference applications, however if you’re training a neural network entirely at this precision, the network may not converge to required accuracy levels without higher precision result accumulation.Automatic mixed precision mode in TensorFlowMixed precision uses both FP16 and FP32 data types when training a model. Mixed-precision training offers significant computational speedup by performing operations in half-precision format whenever it’s safe to do so, while storing minimal information in single precision to retain as much information as possible in critical parts of the network. Mixed-precision training usually achieves the same accuracy as single-precision training using the same hyper-parameters.NVIDIA T4 and NVIDIA V100 GPUs incorporate Tensor Cores, which accelerate certain types of FP16 matrix math, enabling faster and easier mixed-precision computation. NVIDIA has also added automatic mixed-precision capabilities to TensorFlow.To use Tensor Cores, FP32 models need to be converted to use a mix of FP32 and FP16. Performing arithmetic operations in FP16 takes advantage of the performance gains of using lower-precision hardware (such as Tensor Cores). Due to the smaller representable range of float16, though, performing the entire training with FP16 tensors can result in gradient underflow and overflow errors. However, performing only certain arithmetic operations in FP16 results in performance gains when using compatible hardware accelerators, decreasing training time and reducing memory usage, typically without sacrificing model performance.TensorFlow supports FP16 storage and Tensor Core math. Models that contain convolutions or matrix multiplication using the tf.float16 data type will automatically take advantage of Tensor Core hardware whenever possible.This process can be configured automatically using automatic mixed precision (AMP). This feature is available in V100 and T4 GPUs, and TensorFlow version 1.14 and newer supports AMP natively. Let’s see how to enable it.Manually: Enable automatic mixed precision via TensorFlow APIWrap your tf.train or tf.keras.optimizers Optimizer as follows:This change applies automatic loss scaling to your model and enables automatic casting to half precision.(Note: To enable mixed precision in a for TensorFlow 2 Keras you can use: tf.keras.mixed_precision.Policy.)Automatically: Enable automatic mixed precision via an environment variableWhen using the NVIDIA NGC TFDocker image, simply set one environment variable:As an alternative, the environment variable can be set inside the TensorFlow Python script:(Note: For a complete AMP example showing the speed-up on training an image classification task on CIFAR10, check out this notebook.)Please take a look at the Models that have been tested successfully using mixed-precision.Configure AI Platform to use acceleratorsIf you want to start taking advantage of the newer NVIDIA GPUs like the T4, V100, or P100 you need to use the customization options: Define a config.yaml file that describes the GPU options you want. The structure of the YAML file represents the Job resource.The first example shows a configuration file for a training job that uses Compute Engine machine types with a T4 GPU.(Note: For a P100 or V100 GPU, configuration is similar, just replace type with the correct GPU type—NVIDIA_TESLA_P100 or NVIDIA_TESLA_V100.)Use the gcloud* command to submit the job, including a –config argument pointing to your config.yaml file. This example assumes you’ve set up environment variables—indicated by a $ sign followed by capital letters—for the values of some arguments:The following example shows how to submit a job with a similar configuration (using Compute Engine machine types with GPUs attached), but without using a config.yaml file:(Note: Please verify you are running the latest Google Cloud SDK to get access to the different machine types.)Hidden cost of low-priced instancesThe conventional practice most organizations follow is to select lower-priced cloud instances to save on per-hour compute cost. However, the performance improvements of newer GPUs can significantly reduce costs for running compute-intensive workloads like AI.To validate the concept that modern GPUs reduce the total cost of some common training workloads, we trained Google’s Neural Machine Translation (GNMT) model—which is used for applications like real-time language translations—on several GPUs. In this particular example we tested the GNMTv2 model using AI Platform Training using Custom Containers. By simply using modern hardware like a T4, we are able to train the model at 7% of the cost while obtaining the result eight times faster, as shown in the table below. (For details about the setup please take a look at the NVIDIA site.)Each GPU Model was tested using three different runs and calculating the average numbers per section.Additional costs for storing data (GNMT input data was stored on GCS) are not included, since they are the same for all tests.A quick note: When calculating the cost of a training job using Consumed ML units use the following formula:The cost of a training job in all available Americas regions is $0.49 per hour, per Consumed ML units.The cost of a training job in all available Europe regions and Asia Pacific regions is $0.54 per hour, perConsumed ML units.In this case to calculate the cost for running the job in the K80 use the Consumed ML units * $0.49formula: 465 * $0.49 = $227.85.The Consumed ML units can be found on your Job details page (see below), and are equivalent to training units with the duration of the job factored in:Looking at the specific NVIDIA GPUs, we can get more granular on the performance-price proposition.NVIDIA T4 is well known for its low power consumption and great Inference performance for Image/Video Recognition, Natural Language Processing, and Recommendation Engines, just to name a few use cases. It supports half-precision (16-bit float) and automatic mixed precision for model training and gives a 8.1x speed boost over K80 at only 7% of the original cost.NVIDIA P100 introduced half-precision (16-bit float) arithmetic. Using it gives a 7.6x performance boost over K80, at 27% of the original cost.NVIDIA V100 introduced tensor cores that accelerate half-precision and automatic mixed precision. It provides an 18.7x speed boost over K80 at only 15% of the original cost. In terms of time savings, the time to solution (TTS) was reduced from 244 hours (about 10 days) to just 13 hours (an overnight run). What about model prediction?GPUs can also drastically lower latency for online prediction (inference). However, the high availability demands of online prediction often requires keeping machines alive 24/7 and provisioning sufficient capacity in case of failures or traffic spikes. This can potentially make low latency online prediction expensive.The latest price cuts to T4s, however, make low latency, high availability serving more affordable on the Google Cloud AI Platform. You can deploy your model on a T4 for about the same price as eight vCPUs, but with the low latency and high-throughput of a GPU.The following example shows how to deploy a TensorFlow model for Prediction using 1 NVIDIA T4 GPU:ConclusionModel training and serving on GPUs has never been more affordable. Price reductions, mixed precision, and Tensor Cores accelerate AI performance for training and prediction when compared to older GPUs such as K80s. As a result, you can complete your workloads much faster, saving both time and money. To leverage these capabilities and reduce your costs, we recommend the following rules of thumb:If your training job is short lived (under 20 minutes), use T4, since they are the cheapest per hour.If your model is relatively simple (fewer layers, smaller number of parameters, etc.), use T4, since they are the cheapest per hour.If you want the fastest possible runtime and have enough work to keep the GPU busy, use V100.To take full advantage of the newer NVIDIA GPUs use 16-bit precision in P100 and enable mixed precision mode when using T4 and V100.If you haven’t explored GPUs for model prediction or inference, take a look at our GPUs on Compute Engine page for more details. For more information on getting started, check out our blog post on the topic.ReferencesCheaper Cloud AI deployments with NVIDIA T4 GPU price cutEfficiently scale ML and other compute workloads on NVIDIA’s T4 GPU, now generally availableAcknowledgements: Special thanks to the following people who contributed to this post: NVIDIA: Alexander Tsado, Cloud Product Marketing ManagerGoogle: Henry Tappen, Product Manager; Robbie Haertel, Software Engineer; Viesturs Zarins, Software Engineer1. Price is calculated as described here. Consumed ML Units * Unit Cost (different per region).
Quelle: Google Cloud Platform

21. Februar 2020

da Agency

Preview of Active Directory authentication support on Azure Files

We are excited to announce the preview of Azure Files Active Directory (AD) authentication. You can now mount your Azure Files using AD credentials with the exact same access control experience as on-premises. You may leverage an Active Directory domain service either hosted on-premises or on Azure for authenticating user access to Azure Files for both premium and standard tiers. Managing file permissions is also simple. As long as your Active Directory identities are synced to Azure AD, you can continue to manage the share level permission through standard role-based access control (RBAC). For directory and file level permission, you simply configure Windows ACLs (NTFS DACLs) using Windows File Explorer just like any regular file share. Most of you may have already synced on-premises Active Directory to Azure AD as part of Office 365 or Azure adoption and are ready to take advantage of this new capability today.

When you consider migrating file servers to the cloud, many may decide to keep the existing Active Directory infrastructure and move the data first. With this preview release, we made it seamless for Azure Files to work with existing Active Directory with no change in the client environment. You can log into an Active Directory domain-joined machine and access Azure file share with a single sign-on experience. In addition, you can carry over all existing NTHS DACLs that have been configured on the directories and files over the years and have them continue to be enforced as before. Simply migrate your files with ACLs using common tools like robust file copy (robocopy) or orchestrate tiering from on-premises Windows file servers to Azure Files with Azure File Sync.

With AD authentication, Azure Files can better serve as the storage solution for Virtual Desktop Infrastructure (VDI) user profiles. Most commonly, you have set up the VDI environment with Windows Virtual Desktop as an extension of your on-premises workspace while continue to use Active Directory to manage the hosting environment. By using Azure Files as the user profile storage, when a user logs into the virtual session, only the profile of the authenticated user is loaded from Azure Files. You don’t need to set up a separate domain service for managing storage access control experience for your VDI environment. Azure Files provides you the most scalable, cost-efficient, and serverless file storage solution for hosting user profile data. To learn more about using Azure Files for Windows Virtual Desktop scenarios, refer to this article.

What’s new?

Below is a summary of the key capabilities introduced in the preview:

Enable Active Directory (Active Directory/Domain Services) authentication for server message block (SMB) access. You can mount Azure Files from Active Directory domain-joined machines either on-premises or on Azure using Active Directory credentials. Azure Files supports using Active Directory as the directory service for identity-based access control experience for both premium and standard tiers. You can enable Active Directory authentication on self-managed or Azure Files Sync managed file shares.
Enforce share level and directory or file level permission. The existing access control experience continues to be enforced for file shares enabled for Active Directory authentication. You can leverage RBAC for share-level permission management, then persist or configure directory or file level NTFS DACLs using Windows File Explorer and icacls tools.
Support file migration from on-premises with ACL persistence over Azure File Sync. Azure File Sync now supports persisting ACLs on Azure Files in native NTFS DACL format. You can choose to use Azure File Sync for seamless migration from on-premises Windows file servers to Azure Files. Existing files and directories tiered to Azure Files through Azure Files Syncs have ACLs persisted in the native format.

Get started and share your experiences

You can create a file share in the preview supported regions and enable authentication with your Active Directory environment running on-premises or on Azure. Here are the documentation links to the detailed guidance on the feature capabilities and step to step enablement.

As always, you can share your feedback and experience over email at azurefiles@microsoft.com. Post your ideas and suggestions about Azure Storage on our feedback forum.
Quelle: Azure