Mai 2021 - Seite 43 von 53 - Cloud Computing Köln

PyTorch is an open source machine learning and deep learning library, primarily developed by Facebook, used in a widening range of use cases for automating machine learning tasks at scale such as image recognition, natural language processing, translation, recommender systems and more. PyTorch has been predominantly used in research and in recent years it has gained tremendous traction in the industry as well due to its ease of use and deployment. Google Cloud AI Platform is a fully managed end-to-end platform for data science and machine learning on Google Cloud. Leveraging Google’s expertise in AI, AI Platform offers a flexible, scalable and reliable platform to run your machine learning workloads. AI Platform has built-in support for PyTorch through Deep Learning Containers that are performance optimized, compatibility tested and ready to deploy. In this new series of blog posts, PyTorch on Google Cloud, we aim to share how to build, train and deploy PyTorch models at scale and how to create reproducible machine learning pipelines on Google Cloud.Why PyTorch on Google Cloud AI Platform?Cloud AI Platform provides flexible and scalable hardware and secured infrastructure to train and deploy PyTorch based deep learning models.Flexibility: AI Platform Notebooks and AI Platform Training gives flexibility to design your compute resources to match any workload while the platform manages the bulk of the dependencies, networking and monitoring under the hood. Spend your time building models, not worrying about infrastructure.Scalability: Run your experiments with AI Platform Notebooks using pre-built PyTorch containers or custom containers and scale your code with high availability using AI Platform Training by training models on GPUs or TPUs. Security: AI Platform leverages the same global scale technical infrastructure designed to provide security through the entire information processing lifecycle at Google.Support: AI Platform collaborates closely with PyTorch and NVIDIA to ensure top-notch compatibility between AI Platform and NVIDIA GPUs including PyTorch framework support.Here is a quick reference of support for PyTorch on Google Cloud(Click to enlarge)In this post, we will cover:Setting up a PyTorch development environment on JupyterLab notebooks with AI Platform NotebooksBuilding a sentiment classification model using PyTorch and training on AI Platform TrainingYou can find the accompanying code for this blog post on the GitHub repository and the Jupyter Notebook.Let’s get started!Use case and datasetIn this article we will fine tune a transformer model (BERT-base) from Huggingface Transformers Library for a sentiment analysis task using PyTorch. BERT (Bidirectional Encoder Representations from Transformers) is a Transformer model pre-trained on a large corpus of unlabeled text in a self-supervised fashion. We will begin experimentation with the IMDB sentiment classification dataset on AI Platform Notebooks. We recommend using an AI Platform Notebook instance with limited compute for development and experimentation purposes. Once we are satisfied with the local experiment on the notebook, we show how you can submit the same Jupyter notebook to the AI Platform Training service to scale the training with bigger GPU shapes. AI Platform Training service optimizes the training pipeline by spinning up infrastructure for the training job and spinning it down after the training is complete, without you having to manage the infrastructure.In upcoming posts, we will show how you can deploy and serve these PyTorch models on AI Platform Prediction service. Creating a development environment on AI Platform NotebooksWe will be working with JupyterLab notebooks as a development environment on AI Platform Notebooks. Before you begin, you must set up a project on Google Cloud Platform with the AI Platform Notebooks API enabled. Please note that you will be charged when you create an AI Platform Notebook instance. You pay only for the time your notebook instance is up and running. You can choose to stop the instance which will save your work and only charge for the boot disk storage until you restart the instance. Please delete the instance after you are done.You can create an AI Platform Notebooks instance:Using thepre-built PyTorch image from AI Platform Deep Learning VM (DLVM) Image or Using a custom container with your own packagesCreating a Notebook instance with the pre-built PyTorch DLVM imageAI Platform Notebooks instances are AI Platform Deep Learning VM Image instances with JupyterLab notebook environments enabled and ready for use. AI Platform Notebooks offers PyTorch image family supporting multiple PyTorch versions. You can create a new notebook instance from Google Cloud Console or command line interface (CLI). We will use the gcloud CLI to create the Notebook instance on NVIDIA Tesla T4 GPU. From Cloud Shell or any terminal where Cloud SDK is installed, run the following command to create a new notebook instance:To interact with the new notebook instance, go to the AI Platform Notebooks page in the Google Cloud Console and click the “OPEN JUPYTERLAB” link next to the new instance, which becomes active when it’s ready to use.Most of the libraries needed for experimenting with PyTorch have already been installed on the new instance with the pre-built PyTorch DLVM image. To install additional dependencies, run %pip install <package-name> from the notebook cells. For the sentiment classification use case, we will be installing additional packages such as Hugging Face transformers and datasets libraries.Notebook instance with custom containerAn alternative to installing dependencies with pip in the Notebook instance is to package the dependencies inside a Docker container image derived from AI Platform Deep Learning Container images and create a custom container. You can use this custom container for creating AI Platform Notebooks instances or AI Platform Training jobs. Here is an example to create a Notebook instance using a custom container.1. Create a Dockerfile with one of the AI Platform Deep Learning Container images as base image (here we are using PyTorch 1.7 GPU image) and run/install packages or frameworks you need. For the sentiment classification use case include transformers and datasets.2. Build image from Dockerfile using Cloud Build from terminal or Cloud Shell and get the image location gcr.io/{project_id}/{image_name}3. Create a notebook instance with the custom image created in step #2 using the command line.Training a PyTorch model on AI Platform trainingAfter creating the AI Platform Notebooks instance, you can start with your experiments. Let’s look into the model specifics for the use case.The model specificsFor analyzing sentiments of the movie reviews in IMDB dataset, we will be fine-tuning a pre-trained BERT model from Hugging Face. Fine-tuning involves taking a model that has already been trained for a given task and then tweaking the model for another similar task. Specifically, the tweaking involves replicating all the layers in the pre-trained model including weights and parameters, except the output layer. Then adding a new output classifier layer that predicts labels for the current task. The final step is to train the output layer from scratch, while the parameters of all layers from the pre-trained model are frozen. This allows learning from the pre-trained representations and “fine-tuning” the higher-order feature representations more relevant for the concrete task, such as analyzing sentiments in this case. For the scenario here analyzing sentiments, the pre-trained BERT model already encodes a lot of information about the language as the model was trained on a large corpus of English data in a self-supervised fashion. Now we only need to slightly tune them using their outputs as features for the sentiment classification task. This means quicker development iteration on a much smaller dataset, instead of training a specific Natural Language Processing (NLP) model with a larger training dataset.Pretrained Model with classification layer: The Blue-box indicates the pre-trained BERT Encoder module. Output of the encoder is pooled into linear layer with number of outputs same as the number of target labels (classes).For training the sentiment classification model, we will:Preprocess and transform (tokenize) the reviews dataLoad the pre-trained BERT model and add the sequence classification head for sentiment analysisFine-tune the BERT model for sentence classificationFollowing is the snippet of code to preprocess the data and fine-tune a pre-trained BERT model. Please refer to the Jupyter Notebook for complete code and detailed explanation of these tasks.In the snippet above, notice that the encoder (also referred to as the base model) weights are not frozen. This is why a very small learning rate (2e-5) is chosen to avoid loss of pre-trained representations. Learning rate and other hyperparameters are captured under the TrainingArguments object. During the training, we are only capturing accuracy metrics. You can modify the compute_metrics function to capture and report other metrics.We will explore integration with Cloud AI Platform Hyperparameter Tuning Service in the next post of this series.Training the model on Cloud AI PlatformWhile you can do local experimentation on your AI Platform Notebooks instance, for larger datasets or models often a vertically scaled compute resource or horizontally distributed training is required. The most effective way to perform this task is AI Platform Training service. AI Platform Training takes care of creating designated compute resources required for the task, performs the training task, and also ensures deletion of compute resources once the training job is finished.Before running the training application with AI Platform Training, the training application code with required dependencies must be packaged and uploaded into a Google Cloud Storage bucket that your Google Cloud project can access. There are two ways to package the application and run on AI Platform Training:Package application and Python dependencies manually using Python setup toolsUse custom containers to package dependencies using Docker containersYou can structure your training code in any way you prefer. Please refer to the GitHub repository or Jupyter Notebook for our recommended approach on structuring training code. Using Python packaging to build manuallyFor this sentiment classification task, we have to package the training code with standard Python dependencies – transformers, datasets and tqdm – in the setup.py file. The find_packages() function inside setup.py includes the training code in the package as dependencies.Now, you can submit the training job to Cloud AI Platform Training using the gcloud command from Cloud Shell or terminal with gcloud SDK installed. gcloud ai-platform jobs submit training command stages the training application on GCS bucket and submits the training job. We are attaching 2 NVIDIA Tesla T4 GPUs to the training job for accelerating the training. Training with custom containersTo create a training job with a custom container, you have to define a Dockerfile to install the dependencies required for the training job. Then, you build and test your Docker image locally to verify it before using it with AI Platform Training.Before submitting the training job, you need to push the image to Google Cloud Container Registry and then submit the training job to Cloud AI Platform Training using the gcloud ai-platform jobs submit training command.Once the job is submitted, you can monitor the status and progress of training job either in Google Cloud Console or using gcloud commands as shown below:You can also monitor the job status and view the job logs from the Google AI Platform Jobs console.Let’s run prediction calls on the trained model locally with a few examples (refer to the notebook for the complete code). The next post in this series will show you how to deploy this model on AI Platform Prediction service.Cleaning up the Notebook environmentAfter you are done experimenting, you can either stop or delete the AI Notebook instance. Delete the AI Notebook instance to prevent any further charges. If you want to save your work, you can choose to stop the instance instead.What’s next?In this article, we explored Cloud AI Platform Notebooks as a fully customizable IDE for PyTorch model development. We then trained the model on Cloud AI Platform Training service, a fully managed service for training machine learning models at scale.ReferencesIntroduction to AI Platform NotebooksGetting started with PyTorch | AI Platform TrainingConfiguring distributed training for PyTorch | AI Platform TrainingGitHub repository with code and accompanying notebookIn the next installments of this series, we will examine hyperparameter tuning on Cloud AI Platform and deploying PyTorch models on AI Platform Prediction service. We encourage you to explore the Cloud AI Platform features we have examined. Stay tuned. Thank you for reading! Have a question or want to chat? Find authors here – Rajesh [Twitter | LinkedIn] and Vaibhav [LinkedIn].Thanks to Amy Unruh and Karl Weinmeister for helping and reviewing the post.
Quelle: Google Cloud Platform

6. Mai 2021

da Agency

Scheduling Cloud Bigtable Backups

Cloud Bigtable backups let you save a copy of a table’s schema and data, then restore from the backup to a new table at a later time. In this tutorial, you’ll learn how to create backups at regularly scheduled intervals (such as daily or weekly) using the Cloud Bigtable Scheduled Backups example.This example uses Cloud Scheduler to periodically send backup creation requests as Pub/Sub messages. The Pub/Sub messages trigger a Cloud Function which initiates a backup using the Cloud Bigtable Java client library. The function could be adapted to any of the clients that are supported in Cloud Functions.This solution uses the following Google Cloud services:Cloud Scheduler to trigger tasks with a cron-based scheduleCloud Pub/Sub to pass the message request from Cloud Scheduler to Cloud FunctionsCloud Functions to initiate an operation for creating a Cloud Bigtable backupCloud Logging to create logs-based metricsCloud Monitoring to create alerts based on conditions of the logs-based metricsSystem ArchitectureCostsThis tutorial uses billable components of Google Cloud, including the following:Cloud SchedulerCloud FunctionsCloud Bigtable backupsUse the pricing calculator to generate a cost estimate based on your projected usage.Before you beginBefore proceeding with the tutorial, ensure the following:A Cloud Bigtable table exists in the same Google Cloud project. Please check Cloud Bigtable documentation if needed.Google Cloud SDK is installedAPIs and IAM Roles SetupThe diagram below focuses on the actions flow between human roles and APIs.IAM Roles for AdministratorsThe administrator should be granted specific roles to deploy the services needed for the solution.RolePurposeroles/bigtable.adminCloud Bigtable administratorroles/cloudfunctions.adminDeploy and manage Cloud Functionsroles/deploymentmanager.editorDeploy monitoring metricsroles/pubsub.editorCreate and manage Pub/Sub topicsroles/cloudscheduler.adminSet up a schedule in Cloud Schedulerroles/appengine.appAdminUse Cloud Scheduler to deploy a cron serviceroles/monitoring.adminSet up alerting policies for failure notificationsroles/logging.adminAdd log based user metrics to track failuresThe administrator also needs to be assigned a custom role that has the following permissions:appengine.applications.create – for Cloud Scheduler to create an App Engine appserviceusage.services.use – for Cloud Scheduler to use the App Engine appService Account for Cloud FunctionsCloud Functions calls the Cloud Bigtable API to create a backup, and the Cloud function gets triggered when a message arrives on the Pub/Sub topic. For successful execution, the Cloud Function should be able to consume from the Pub/Sub topic and have permissions to create Cloud Bigtable backups. To accomplish this, perform the following steps:Create a service account (e.g. cbt-scheduled-backups@iam.gserviceaccount.com).Create a custom role with the following permissions:bigtable.backups.createbigtable.backups.deletebigtable.backups.getbigtable.backups.listbigtable.backups.restorebigtable.backups.updatebigtable.instances.getbigtable.tables.createbigtable.tables.readRowsAssign the custom role and roles/pubsub.subscriber to the service account. This allows Cloud Functions to read messages from the Pub/Sub topic and initiate a `create backup` request.Add the administrator as a member of the service account with role roles/iam.serviceAccountUser. This allows the administrator to deploy Cloud Functions.Creating Scheduled BackupsCreate a Pub/Sub topicCreate a Cloud Pub/Sub topic cloud-bigtable-scheduled-backups that serves as the target of the Cloud Scheduler job and triggers the Cloud function. For example:gcloud pubsub topics create cloud-bigtable-scheduled-backups –project <project-id>Then go to the Pub/Sub UI and verify that you can see the newly created topic:Deploy a function to Cloud FunctionsCreate and deploy a Cloud Function cbt-create-backup-function, which is called whenever a Pub/Sub message arrives in cloud-bigtable-scheduled-backups topic. The deploy-backup-function function in the script scheduled_backups.sh wraps the gcloud function to do that../scripts/scheduled_backups.sh deploy-backup-functionGo to the Cloud Functions UI to view the function. The function subscribes to the Pub/Sub topic cloud-bigtable-scheduled-backups.Deploying scheduled jobs using Cloud SchedulerNote: To use Cloud Scheduler, you must create an App Engine app. This can be done explicitly before the next step or indirectly when running the next step.Now we need to deploy the scheduled backup configuration to Cloud Scheduler. The configuration includes the time schedule of the cron job and the Pub/Sub topic name and message. This is also wrapped under a function in the script, and the configurations can be specified in the properties file../scripts/scheduled_backups.sh create-scheduleThe job is now visible in the Cloud Scheduler UI:Email notification of backup failuresTo get email notifications on backup creation failures, follow these steps:Follow this guide to add your email address as a notification channel.2. Create and deploy a custom metrics configuration file to filter logs generated by Cloud Functions, Cloud Scheduler, and Cloud Bigtable. We use Deployment Manager to create custom metrics. The example file can be found in ./config/metrics.yaml. Deploy the custom metrics in Cloud Logging:./scripts/scheduled_backups.sh add-metricsAfter this, you should see two user-defined metrics under Logs-based Metrics in Cloud Logging.3. From there, you can choose an Aggregrator, such as sum or mean, for the target metric, then define a condition that triggers an alert. For example, you can choose the following:Condition triggers if: Any time series violatesCondition: is aboveThreshold: 0For: 1 minute4. Add the notification channels you just created to the alerting policies. Whenever the condition breaks, you will receive an email notification.ConsiderationsTo use Cloud Scheduler, you must create an App Engine app. Once you set a zone for the App Engine app, you cannot change it. Your Cloud Scheduler job must run in the same zone as your App Engine app.Learn MoreTo get started, create a Cloud Bigtable instance or try it out with Cloud Bigtable Qwiklab.Check out this Github sample for details about this Scheduled Backup tool.Learn more about the managed backups feature of Cloud Bigtable.
Quelle: Google Cloud Platform