Optimize Dataproc costs using VM machine type

Dataproc is a fast, easy-to-use, fully managed cloud service for running managed open source, such as Apache Spark, Apache Presto, and Apache Hadoop clusters, in a simpler, more cost-efficient way. We hear that enterprises are migrating their big data workloads to the cloud to gain cost advantages with per-second pricing, idle cluster deletion, autoscaling, and more. However, compute, storage, and network costs can add to your overall workload processing costs. Picking the right cloud virtual machine (VM) for compute is a critical piece of the puzzle to make sure you’re using your budget wisely. Moving these big data clusters to the cloud can bring many cost-efficient benefits. But the choices you make when deploying these clusters on Google Cloud can also add to the savings. As part of the configuration process of Dataproc clusters, you can choose from available Google Cloud VMs. A workload and its execution times are some factors to consider when selecting VM types. Here’s a quick overview of the VMs you can choose from in Google Cloud:General-purpose machine types (N1) offer the best price-performance ratio for a variety of workloads. If you are not sure which machine type is best for your workload, using a general-purpose machine type is a good place to start.General-purpose efficient machine types (E2) are ideal for small-to-medium workloads that require at most 16 vCPUs, but do not require local SSDs or GPUs. E2 machine types do not offer sustained-use discounts; however, they do provide consistently low on-demand and committed-use pricing.Memory-optimized machine types (M2) are ideal for tasks that require intensive use of memory with higher memory-to-vCPU ratios, such as in-memory databases and in-memory analytics, like SAP HANA and business warehousing (BW) workloads, genomics analysis, SQL analysis services, and so on.Compute-optimized machine types (C2) are ideal for compute-intensive workloads, such as machine learning algorithms, and offer the highest performance per core on Compute Engine.For the right Dataproc workloads, E2 VMs have emerged as a way to continue to bend the cost curve down. E2 VMs have similar processing characteristics and are up to 31% cheaper compared to equivalent N1 VMs, offering the lowest total cost of ownership of any VM in Google Cloud, with reliable and sustained performance along with flexible configurations. Sustained use discounts do not apply to E2 VMs. Also, E2 VMs do not support local SSDs, sole-tenant nodes, nested virtualization, and processor type selection. Here, this table summarizes the various Dataproc-supported VM types.The following table summarizes the costs of various standard-4 configurations available at the time of publishing this article. Check out up-to-date VM pricing information. The e2-standard-4 is about 30% less expensive than the n1-standard-4 configuration.E2 VMs reduce the compute costs of a Dataproc cluster, thereby reducing the overall TCO. Committed use discounts add to the realized savings. Dataproc TCO savings from using E2 VMs will depend on your workload and cluster usage patterns. Along with VM choice, there are some other ways to make sure you’re keeping your big data costs down.Tips to optimize cluster costsThere are some commonly used techniques and features you can use to manage cluster operations costs, including:VM pricingCommitted use discounts: Predictable workloads can benefit from pricing discounts from committed use. Compute spendAutoscaling: You can configure Dataproc clusters to scale up and down based on utilization demand. Job-scoped clusters: A Dataproc cluster can be started in 90 seconds and workflow/job specific clusters can be started for non-SLA critical workloads, thereby minimizing costs to only when the cluster is running. Preemptible VMs: Non-SLA critical workloads can be run on Preemptible VMs, further increasing the cost savings.Compute timeData reorganization: Understanding query access patterns and optimizing the data store for the access pattern will result in faster query times and reduced compute costs. Aggregates vs. raw data: Intermediate aggregates for frequently accessed data also reduce the amount of data scanned and improve query performance and compute costs. How to select E2 VMs for Dataproc clustersUpdate the master-machine-type and worker-machine-type to the desired E2 VM machine name at cluster creation time to change your Dataproc cluster configuration. Use any of the Dataproc management interfaces: gcloud command, REST API, Google Console and language-specific APIs (go, Java, Node.js, and Python) to select the E2 VM machine type.   Your mileage will vary, but customers using E2 VMs can expect savings as long as the VM limitations are acceptable. We highly recommend verifying the workload on the Dataproc cluster with E2 VMs to better understand the performance and cost characteristics prior to making a change in production. We are always looking for ways to deliver the highest cost performance for our customers and continue to add features for you to customize and optimize your Dataproc clusters. Check out our Dataproc page to learn more about what’s new with Dataproc.
Quelle: Google Cloud Platform

Viva Las Vegas, with our latest Google Cloud Platform region

If you’re running in the cloud, more regions means being able to build reliable, fast applications that can serve your local customers. Today, we’re launching our newest Google Cloud Platform region in Las Vegas, bringing a fourth region to the western United States, the seventh nationally, and our global total to 23.Now open to Google Cloud customers, the Las Vegas region provides you with the speed and availability you need to innovate faster and build high-performing applications that cater to the needs of nearby end users. Additionally, the region gives you added capacity and the flexibility to distribute your workloads across the western U.S., including our existing cloud regions in Los Angeles, Salt Lake City and Oregon.New capacity for local usersAristocrat offers users a range of digital products from casual social gaming to casino games. Speed to market, and the ability to predict and scale infrastructure over the life of a game, are key requirements for their business. Aristocrat uses Google Cloud to deliver gaming experiences to millions of users everyday. “Cloud technologies enable two important outcomes for us,” said James Alvarez, CIO of Aristocrat. “First the ability to securely, consistently and immediately enable and disable game development platforms; and second, our ability to expand and contract our infrastructure based on demand. Both of these capabilities allow us to flex our technology to fully support the demands of our customers and our business. The Las Vegas region gives us the opportunity to more directly engage Google Cloud services and take advantage of an entry point into the network.” We’ve heard from many of you that the availability of your workloads and business continuity  are increasingly top priorities. Let’s use the new region in Las Vegas to explain a bit more about how regions, and the choice of where to run your applications, may impact your reliability.  Zones, regions and multi-regionsLike all regions across Google Cloud’s fleet, Las Vegas offers immediate access to three zones. Each zone is composed of separate software, power, cooling, network, and security infrastructure, and includes a large pool of compute and storage resources—think of it as the fundamental building block of your Google Cloud infrastructure. Perhaps the simplest analogy for a zone is a single data center. But like a single data center, there are limits to how much reliability a single zone can offer. If you need to architect fault-tolerant and highly available applications, you’ll want to use multiple zones and for even greater resiliency, use multiple regions. Google Cloud offers regional services that automatically distribute data, processing, and user traffic across multiple zones in the same region and multi-regional services that automatically replicate data and processing across multiple regions.  Like all regions, the zones in Las Vegas are connected by low latency high-bandwidth networking and our private backbone provides a secure connection through our global network between Las Vegas and the other 22 GCP regions globally. Getting startedIf you’re new to Google Cloud, check out some of our resources to get started. You can also integrate your on-premises workloads with our new region using Cloud Interconnect or explore multi-cloud options with Anthos. You’ll have access to our standard set of products in Las Vegas including Compute Engine, Cloud Storage, Google Kubernetes Engine, Bigtable, Spanner and BigQuery.Visit our cloud locations page for a complete list of services available in the Las Vegas region.We are excited to welcome you to our new cloud region in Las Vegas, and eagerly await to see what you build with our platform. Stay tuned for more region announcements and launches. For more information, contact sales to get started with Google Cloud today.
Quelle: Google Cloud Platform

Explore Anthos with a sample deployment

Last week, we announced important updates to Anthos, our multi-cloud application management platform. Many of you reached out to find out how you could try Anthos, out today so we’re sharing the Anthos Sample Deployment on Google Cloud, a simple way to explore the platform and its components.Available from the Google Cloud Marketplace, you can now launch a test environment with multiple clusters and various Anthos components. It also includes a sample microservices application that runs in a multi-cluster Kubernetes environment, so you can explore all of Anthos’ advanced capabilities.Launch Anthos Sample Deployment from the Google Cloud MarketplaceThe Anthos Sample Deployment launches in a new Google Cloud project under your account, and includes a tutorial that features the new Anthos UI. With this sample deployment, you can learn about the core Anthos components, including:Certified Kubernetes that provides production-grade orchestration of your containerized applicationsConfiguration management tools that use a modern GitOps approach to keep your production environment synchronized with your desired state, as stored under version controlA service mesh to manage, and secure, internal service-to-service network communications A set of integrated dashboard tools, including service level objective (SLO) monitoring and alerting, that provide rich insight into application performanceThe sample deployment is currently available on Google Cloud but the full Anthos offering will let you build and manage your modern application workloads running wherever your enterprise needs them—in Google Cloud, in other clouds, on-premises, and in various hybrid combinations!Now is the ideal time to explore Anthos on Google Cloud, as it’s free of charge for up to 100 vCPUs per month through June 30, 2020 for non-production use. You are only responsible for your own infrastructure costs, such as Compute Engine, Cloud Logging, and Cloud Monitoring. We invite you to watch the following short video and then head over to the tutorial. Launch a sample deployment today and explore Anthos for yourself.
Quelle: Google Cloud Platform

Azure Files share snapshot management by Azure Backup is now generally available

Microsoft Azure Files offers fully managed file shares in the cloud that are accessible via the industry standard Server Message Block (SMB) protocol. For users of Azure Files, share snapshots have offered a read-only version of file shares from a previous point in time. Share snapshots are also incremental in nature, making their storage usage efficient. Although customers can simply use these share snapshots to go back in time, managing snapshots using scripts or automation is a labor-intensive process. Microsoft Azure Backup offers a simple and reliable way to backup and protect Azure Files using share snapshots.

Today, we are announcing the general availability of snapshot management for Azure Files by Azure Backup. Apart from being available natively in the cloud, Azure Backup offers significant benefits while protecting file shares using Recovery Services vault.

Key benefits

Simple configuration: You can use the +Backup option from Recovery Services vault to discover all unprotected file shares in storage accounts, select multiple file shares if necessary, choose a policy, and configure backup for all file shares at once. Once configured, you can manage your backups directly from the Azure Files portal.
Zero infrastructure solution: Being an Azure native solution, using Azure Backup means that you don’t need to run any additional compute. This saves you from setting up infrastructure to schedule snapshots, or maintain or modify them periodically.

Azure File Sync users do not need to back up their data from on-premises servers as the entire data is available in cloud. You can enable cloud tiering on your on-premises servers or machines and continue to use Azure Backup to protect the cloud data.

Flexible backup policy: Azure Backup provides you with the ability to create and modify policies of choice to define the schedule for snapshots.

You’re already used to creating daily snapshots as part of the Azure Backup policy. As part of the general availability release, we have also introduced the ability to create weekly, monthly, and yearly snapshots. You can also choose retention for these snapshots for up to 10 years. The backup policy automatically takes care of pruning expired snapshots, allowing you to stay within the 200 snapshots limit per file share.

Comprehensive restore capabilities: Azure Backup offers a variety of options to restore your file share data. You can choose to restore the entire file share or individual files and folders. Restores can also be done to the original location or to alternate file shares in the same or different storage accounts. Azure Backup also preserves and restores all access control lists (ACLs) of files and folders.

Apart from the options above, Azure Backup ensures that the restores are sync-aware. We coordinate with the Azure File Sync service to trigger a resync back to on-premises servers once we complete restores to the associated file shares in the cloud.

Protection against accidental deletion: Accidental deletion can happen at multiple levels.

Individual files and folders: The lowest level is a file or folder. This is also the most common scenario. Using scheduled snapshots and being able to restore individual files and folders addresses this issue.
Snapshot: Azure Backup becomes the initiator of the snapshot that it takes using the backup policy. However, administrators can still delete specific snapshots in their file shares. These deletions are not recommended as the restore points become invalid. We’re actively working on a mechanism that will allow Azure Backup to prevent you from any accidental snapshot deletions.
File share: You could delete your file share and end up wiping out all snapshots taken for the file share. Azure Backup is currently working on protecting against accidental deletion of your file shares and the solution should be available in the first few regions soon.
Storage account: Deleting storage accounts can wipe out all file shares inside the storage account along with its snapshots. Customer conversations indicate that, although this is a less common scenario, there needs to be protection against it. Hence, Azure Backup takes a delete lock on a storage account as soon as the first file share in the storage account is configured for backup.

On-demand snapshots: Apart from the backup policy option to schedule snapshots, you can also choose to create up to four on-demand backups every day. Taking multiple on-demand backups in a day reduces the recovery point objective (RPO) for customers. Although Azure Backup purges these snapshots based on the retention set during backup, you need to ensure that you do not exceed the 200 snapshots per file share limit while using this capability.
Alerts and reports: Integration with Azure Backup alerts enables you to configure email notifications for critical failures. Once the general availability release is available across all regions, you will start seeing backup related data in Azure Backup reports.

What’s next?

Based on our conversations with customers, we‘re working to deliver functionality above and beyond snapshot management using Azure Backup, including the ability to copy file share data to Recovery Services vault. We welcome this and all feedback from customers that help us align our work on features you will value. You can help us by filling out this survey.

Getting started

Start protecting your file shares by using the Recovery Services vaults in your region. For the list of supported regions, please refer to the support matrix. The backup goal option in the vault overview will let you choose Azure File shares to back up from storage accounts in your region. You can refer to our documentation for more details.

Pricing

For pricing details, please follow the Azure Backup pricing page for updates as we are currently rolling out the regional prices. Snapshot management using Azure Backup will not be chargeable for customers until July 1, 2020. All users can access and trial the feature without added cost through June 2020. You can write to AskAzureBackupTeam@microsoft.com for any feedback and queries.

Related links and additional content

Support matrix for Azure Files snapshot management using Azure Backup.
If you are new to Azure Backup, start configuring the backup on the Azure portal.
Want more details? Check out Azure Backup documentation.
Need help? Reach out to the Azure Backup forum for support.
Tell us how we can improve Azure Backup by contributing new ideas and voting up existing ones.
Follow us on Twitter @AzureBackup for the latest news and updates.

Quelle: Azure