Where is your Cloud Bigtable cluster spending its CPU?

CPU utilization is a key performance indicator for Cloud Bigtable. Understanding CPU spend is essential for optimizing Bigtable performance and cost. We have significantly improved Bigtable’s observability by allowing you to visualize your Bigtable cluster’s CPU utilization in more detail. We now provide you with the ability to break the utilization down by various dimensions like app profile, method and table. This finer grained reporting can help you make more informed application design choices and help with diagnosing performance related incidents.In this post, we present how this visibility may be used in the real world, through example persona-based user journeys.User Journey: Investigate an incident with high latencyTarget Persona: Site Reliability Engineer (SRE)ABC Corp runs Cloud Bigtable in a multi-tenant environment. Multiple teams at ABC Corp use the same Bigtable instance.Alice is an SRE at ABC Corp. Alice gets paged because the tail latency of a cluster exceeded the acceptable performance threshold. Alice looks at the cluster level CPU utilization chart and sees that the CPU usage spiked during the incident window.P99 latency for app profile personalization-reader spikesCPU utilization for the cluster spikesAlice wants to drill down further to get more details about this spike. The primary question she wants to answer is “Which team should I be reaching out to?” Fortunately, teams at ABC Corp follow the best practice of tagging the usage of each team with an app profile in the following format: <teamname>-<workload-type>The bigtable instance has the following app profiles:revenue-updaterinfo-updaterpersonalization-readerpersonalization-batch-updaterThe instance’s data is stored in the following tables:revenueclient-infopersonalizationShe uses the CPU per app profile chart to determine that the personalization-batch-updater app profile utilized the most CPU during the time of the incident and also saw a spike that corresponded with the spike in latency of the serving path traffic under the personalization-reader app profile.At this point, Alice knows that the personalization-batch-updater traffic is adversely impacting the personalization-reader traffic. She further digs into the dashboards in Metrics Explorer to figure out the problematic method and table.CPU usage breakdown by app profile, table and methodAlice has now identified the personalization-batch-updater app profile, the personalization table and the MutateRows method as the reason for the increase in CPU utilization that is causing high tail latency of the serving path traffic.With this information, she reaches out to the personalization team to provision the cluster correctly before the batch job starts so that the performance of other tenants is not affected. The following options can be considered in this scenario:Run the batch job on a replicated instance with multiple clusters. Provision a dedicated cluster for the batch job and use single cluster routing to completely isolate the serving path traffic from the batch updatesProvision more nodes for the cluster before the batch job starts and for the duration of the batch job. This option is less preferred than option 1, since serving path traffic may still be impacted. However, this option is more cost effective.User Journey: Schema and cost optimizationTarget Persona: DeveloperBob is a developer who is onboarding a new workload on Bigtable. He completes the development of his feature and moves on to the performance benchmarking phase before releasing to production. He notices that both the throughput and latency of his queries are lower than what he expected and begins debugging the issue. His first step is to look at the CPU utilization of the cluster, which is higher than expected and is hovering around the recommended max.CPU utilization by clusterTo debug further, he looks at the CPU utilization by app profile and the CPU utilization by table charts. He determines that the majority of the CPU is consumed by the product-reader app profile and the product_info table.CPU utilization by app profileCPU utilization by tableHe inspects the application code and notices that the query includes a value range filter. He realizes that value filters are expensive, so he moves the filtering to the application. This leads to substantial decrease in Bigtable cluster CPU utilization. Consequently, not only does he improve performance, but he can also lower costs for the Bigtable cluster.CPU utilization by cluster after removing value range filterCPU utilization by app profile after removing value range filterCPU utilization by table after removing value range filterWe hope that this blog helps you to understand why and when you might want to use our new observability metric – CPU per app profile, method and table. Accessing the metricsThese metrics can be accessed on the Bigtable Monitoring UI under the Tables and Application Profiles tabs. To see the method breakdown, view the metric in Metrics Explorer, which you can also navigate to from Cloud Monitoring UI.
Quelle: Google Cloud Platform

How Bayer Crop Science uses BigQuery and geobeam to improve soil health

Bayer Crop Science uses Google Cloud to analyze billions of acres of land to better understand the characteristics of the soil that produces our food crops. Bayer’s teams of data scientists are leveraging services from across  Google Cloud to load, store, analyze, and visualize geospatial data to develop unique business insights. And because much of this important work is done using publicly-available data, you can too!Agencies such as the United States Geological Survey (USGS), National Oceanic and Atmospheric Administration (NOAA), and the National Weather Service (NWS) perform measurements of the earth’s surface and atmosphere on a vast scale, and make this data available to the public. But it is up to the public to turn this data into insights and information. In this post, we’ll walk you through some ways that Google Cloud services such as BigQuery and Dataflow make it easy for anyone to analyze earth observation data at scale. Bringing data togetherFirst, let’s look at some of the datasets we have available. For this project, the Bayer team was very interested in one dataset in particular from ISRIC, a custodian of global soil information. ISRIC maps the spatial distribution of soil properties across the globe, and collects soil measurements such as pH, organic matter content, nitrogen levels, and much more. These measurements are encoded into “raster” files, which are large images where each pixel represents a location on the earth, and the “color” of the pixel represents the measured value at that location. You can think of each raster as a layer, which typically corresponds to a table in a database. Many earth observation datasets are made available as rasters, and they are excellent for storage of gridded data such as point measurements, but it can be difficult to understand spatial relationships between different areas of a raster, and between multiple raster tiles and layers.Processing data into insightsTo help with this, Bayer used Dataflow with geobeam to do the heavy-lifting of converting the rasters into vector data by turning them into polygons, reprojecting them to the WGS 84 coordinate system used by BigQuery, and generating h3 indexes to help us connect the dots — literally. Polygonization in particular is a very complex operation and its difficulty scales exponentially with file size, but Dataflow is able to divide and conquer by splitting large raster files into smaller blocks and processing them in parallel at massive scale. You can process any amount of data this way, at a scale and speed that is not possible on any single machine using traditional GIS tools. What’s best is that this is all done on the fly with minimal custom programming. Once the raster data is polygonized, reprojected, and fully discombobulated, the vector data is written directly to BigQuery tables from Dataflow.Once the data is loaded into BigQuery, Bayer uses BigQuery GIS and the h3 indexes computed by geobeam to join the data across multiple tables and create a single view of all of their soil layers. From this single view, Bayer can analyze the combined data, visualize all the layers at once using BigQuery GeoViz, and apply machine learning models to look for patterns that humans might not seeScreenshot of Bayer’s soil analysis in GeoVizUsing geospatial insights to improve the businessThe soil grid data is essential to help characterize the soil characteristics of the crop growth environments experienced by Bayer’s customers. Bayer can compute soil environmental scenarios for global crop lands to better understand what their customers experience in order to aid in testing network optimization, product characterization, and precision product design. It also impacts Bayer’s real-world objectives by enabling them to characterize the soil properties of their internal testing network fields to help establish a global testing network and enable environmental similarity calculations and historical modeling.It’s easy to see why developing spatial insights for planting crops is game-changing for Bayer Crop Sciences, and these same strategies and tools can be used across a variety of industries and businesses.Google’s mission is to organize the world’s information and make it universally accessible and useful, and we’re excited to work with customers like Bayer Crop Sciences who want to harness their data to build products that are beneficial to their customers and the environment. To get started building amazing geospatial applications for your business, check out our reference guide to learn more about geospatial capabilities in Google Cloud, and open BigQuery in the Google Cloud console to get started using BigQuery and geobeam for your geospatial workloads.
Quelle: Google Cloud Platform