GKE best practices: Day 2 operations for business continuity

So, you followed our advice and built a highly available Google Kubernetes Engine (GKE) cluster based on our day 0 guidance. But day 2 is where the rubber hits the road: your GKE cluster is up and running, and serving traffic to your app, and can’t really afford to go down. The day 0 steps you took should help prevent that, but in production, ensuring business continuity isn’t just about the high availability of the workloads. It’s also about gracefully handling disruptions, and applying the latest security patches and bug fixes non-disruptively. In this blog post, we’ll discuss recommendations and best practices to help the applications running on your GKE cluster to stay happy and healthy. Manage disruptionAs with any platform’s lifecycle, there will come a time when your GKE cluster experiences an interruption, needs to be updated, or needs to shut down. You can limit the interference by proactively setting up the right number of replicas, setting a Pod Disruption Budget, and specifying your shutdown grace period.Make sure you have replicasYou may be familiar with the concept of Kubernetes replicas. Replicas ensure the redundancy of your workloads for better performance and responsiveness, and to avoid a single point of failure. When configured, replicas govern the number of pod replicas running at any given time.Set your tolerance for disruptionHowever, during maintenance, Kubernetes sometimes removes an underlying node VM, which can impact the number of replicas you have. How much disruption is too much? What’s the minimum number of replicas you need to continuously operate your workloads while your GKE cluster is undergoing maintenance? You can specify this using the Kubernetes Pod Disruption Budget, or PDB.Setting PodDisruptionBudget ensures that your workloads have a sufficient number of replicas, even during maintenance. Using the PDB, you can define a number (or percentage) of pods that can be terminated, even if terminating them brings the current replica count below the desired value. With PDB configured, Kubernetes will drain a node following the configured disruption schedule. New pods will be deployed on other available nodes. This approach ensures Kubernetes schedules workloads in an optimal way while controlling the disruption based on the PDB configuration.Once the PDB is set, GKE won’t shut down pods in your application if the number of pods is equal to or less than a configured limit. GKE respects a PDB for up to 60 minutes. Note that the PDB only protects against voluntary disruptions—upgrades for example. It offers no protection against involuntary disruptions (e.g., a hardware failure).Terminate gracefullySometimes, applications need to terminate unexpectedly. By default, Kubernetes sets the termination grace period to 30 seconds. This should be sufficient for most lightweight, cloud-native applications. However the default setting might be too low for heavyweight applications or applications that have long shutdown processes.The recommended best practice is to evaluate your existing grace periods and tune them based on the specific needs of your architecture and application. You can change the termination grace period by altering terminationGracePeriodSeconds.Schedule updates and patchesKeeping your cluster up to date with security patches and bug fixes is one of the most important things you can do to ensure the vitality of the cluster and business continuity. Regular updates protect your workloads from vulnerabilities and failures. However, timing plays a major role in performing these updates. Especially now when many teams are working from home or at reduced capacity, you want to increase the predictability of these upgrades, and perhaps avoid changes during regular business hours. You can do that by setting up maintenance windows, sequencing roll-outs, and setting up maintenance exclusions. Set your maintenance windowsSetting up a maintenance window lets you control automatic upgrades to both the cluster control plane and its nodes. GKE respects maintenance windows. Namely if the upgrade process runs beyond the defined maintenance window, GKE will attempt to pause the operation and resume it during the next maintenance window.You can also use maintenance windows in a multi-cluster environment to control and sequence disruption in different clusters. For example, you may want to control when to perform maintenance on clusters in different regions by setting different maintenance windows for each cluster.Practice regular updates New GKE releases are rolled out on a regular basis as patches become available in the fleet.The rollout process of these updates is done gradually, and some version upgrades may take several weeks to completely rollout in the entire GKE fleet.Nonetheless, in times of uncertainty, you can specify the day and time maintenance can occur in a week by setting your maintenance windows, to better plan and anticipate maintenance to your clusters.Please do not disturbThere are times when you may want to completely avoid maintenance (e.g. holidays, high season, company events, etc.), to ensure your clusters are available to receive traffic. With maintenance exclusions, you can prevent automatic maintenance from occurring during a specific time period. Maintenance exclusions can be set on new or existing clusters. The exclusion windows can also be used in conjunction with an upgrade strategy. For example, you may want to postpone an upgrade to a production cluster if a testing/staging environment fails because of an upgrade.Upgrade node pool versions without disruptionUpgrading a GKE node pool can be a particularly disruptive process, as it involves recreating every VM in the node pool. The process is to create a new VM with the new version (upgraded image) in a rolling update fashion, which requires shutting down all the pods running on the old node and shifting to the new node.By following the recommendations above, your workloads can run with sufficient redundancy (replicas) to minimize disruption, and Kubernetes will move and restart pods as needed. However, a temporarily reduced number of replicas can be still disruptive to your business, and may slow down workload performance until Kubernetes is able to meet the desired state again (i.e., meet the minimum number of needed replicas). To eliminate this disruption entirely, you can use the GKE node surge upgrade feature. Once configured, surge upgrade secures the resources (machines) needed for the upgrade by first creating a new node, then draining the old node, and finally shutting it down. This way, the expected capacity remains intact throughout the upgrade process.Speed up upgrades for large clustersLarge clusters mean larger nodepools, which can take a long time to upgrade if you’re updating one node at a time—especially if you’ve set a maintenance window. In this case, an upgrade starts at the beginning of the maintenance window, and lasts for the duration of the maintenance window (four hours). If GKE can’t complete upgrading all the nodes within the allotted maintenance window, it pauses the upgrade and resumes it in the next maintenance window.You can accelerate your upgrade completion time by concurrently upgrading multiple nodes with the surge upgrade feature. For example, if you set maxSurge=20 and maxUnavailable=0, GKE will upgrade 20 nodes at a time, without using any existing capacity. Bringing it all togetherContainerized applications are portable and easy to deploy and scale. GKE makes it even easier to run your workloads hassle-free with a wide range of cluster management capabilities. Knowing your application the best, you can drastically improve the availability and vitality of your clusters by following the recommendations above.To learn more, register for the Google Cloud Next ‘20: OnAir session, Ensuring Business Continuity at Times of Uncertainty and Digital-only Business with GKE, which goes live on August 25, 2020.
Quelle: Google Cloud Platform

COVID-19 public datasets: supporting organizations in their pandemic response

Editor’s note: This is part two of a series on the COVID-19 public datasets. Check out part one to learn more about recently onboarded datasets and new program expansion.Back in March, we launched new COVID-19 public datasets into our Google Cloud Public Datasets program to make critical COVID-19 datasets available to the public and free to analyze using BigQuery.At launch, we aimed to get high-quality data into the hands of users as quickly as possible to support their efforts to monitor and understand the emergent pandemic. A few months in, we have expanded our original goals to include supporting public and private sector users with the data that they need to make informed decisions. Today, we’ll highlight how research organizations, governments, and partners have used these datasets to power their decisions, contribute to the growing body of research on the virus and its societal impacts, and create tools to support response efforts.Helping communities respond to COVID-19Reliable data is now more important than ever as leaders in healthcare, government, and private industry are challenged to make decisions in response to COVID-19. To equip organizations in charting the safest path forward, Google Cloud collaborated with Google Cloud partner SADA to build the National Response Portal. The portal is an open data platform that combines many relevant datasets for an on-the-ground view of the pandemic. “The National Response Portal takes full advantage of the Google Cloud Public Datasets program, giving us direct and easy access to the COVID-19 datasets that power our visualizations,” says Michael Ames, senior director of healthcare and life sciences at SADA. Via the portal, users can explore trends on COVID-19 cases and deaths, view forecasts anticipating future hotspots, and examine the impact of policy decisions and social mobility. Healthcare providers have begun contributing data as part of a growing effort to share data insights among the health community to empower better awareness and decision-making.To find out more and view the portal, check it out here.Equipping the public sector to monitor COVID-19When looking for a technical solution for monitoring COVID-19 cases and updating residents, the Oklahoma State Department of Health and the governor’s office turned to Google Cloud. The state needed a public-facing platform that would display real-time data on the pandemic. Using the COVID-19 public datasets along with Looker, Google Cloud’s business intelligence and analytics platform, the State of Oklahoma built a dashboard on Oklahoma COVID-19 statistics, located on the state’s public health website. Since the dashboard launched, it has been viewed tens of thousands of times each day. Department of Health staff and Oklahoma citizens are able to access and interact with consolidated information served by Looker dashboards for actionable insights. “The partnership with Google Cloud has enabled the OK Department of Health to be extremely agile in keeping the citizens of Oklahoma informed as to the impact of COVID-19 across the state,” says State of Oklahoma Digital Transformation Secretary David Ostrowe. The dashboard has decreased manual processing needs and has been easy to update and deploy changes over Google Cloud. The State of Oklahoma also received an A+ COVID-19 data quality rating from the COVID Tracking Project.Supporting research on COVID-19In the early days of the pandemic, Northeastern University used Google Cloud to model COVID-19 and forecast the impact that interventions like stay-at-home-orders would have on the spread of the virus. Northeastern University researchers used several Google Cloud products, including BigQuery, to analyze various datasets and inform their global metapopulation disease transmission model. The team relied on the U.S. Census Data and OpenStreetMap public datasets and BigQuery GIS capabilities to project the impact of different interventions on the global spread of the COVID-19 pandemic.”Our team models and forecasts the spatial spread of infectious diseases by quickly analyzing hundreds of terabytes of simulation data,” says Dr. Matteo Chinazzi, associate research scientist at Northeastern University. “With the help of BigQuery, we are able to accelerate insights from our epidemic models and better study evolution of an ongoing outbreak.”  Dr. Chinazzi’s team has provided valuable insights on the effects of different containment and mitigation strategies. The team’s findings were published in Science in April. You can check them out through The Global Epidemic and Mobility (GLEAM) Project interactive dashboards.Visualizing the pandemicCARTO, a location intelligence platform integrated with BigQuery, used its mapping expertise to build an important COVID-19 dashboard using Google Cloud public datasets. CARTO combined census data with COVID-19 case data and social determinants of health datasets in this real-time dashboard to support organizations in monitoring and responding to the pandemic.“We built our COVID-19 dashboard to anticipate viewers looking for fast answers,” says Stephanie Schober, CARTO solution engineer. “As COVID-19 continues to spread, Google Cloud’s BigQuery content has enabled our dashboard to use real-time and reliable data.” ”Location data has been extremely relevant through this pandemic to ensure both private and public sector organizations can respond fast enough,” says Florence Broderick, VP of marketing at CARTO. “Geospatial analysis through CARTO and BigQuery has enabled a wide range of use cases, including PPE distribution, mobility analysis, and workplace-return planning.”If you’re interested in developing similar visualizations, check out more details from CARTO and tune into Data vs. COVID-19: How Public Data is Helping Flatten the Curve. Analyzing the global COVID-19 news narrative from web to television To support researchers in analyzing global media coverage of COVID-19 and comparing with outbreaks of the past decade, we have partnered with the GDELT Project to host several multimodal datasets. These datasets include media coverage across 152 languages and span more than a decade, totaling more than 3 trillion data points, all of them available as public datasets in BigQuery. “Google Cloud’s AI offerings make it possible to transform text, speech, imagery and video into rich annotations sharing a common taxonomy,” says GDELT Founder Dr. Kalev Leetaru. “BigQuery is the lens through which trillions of data points become actionable insights that can help guide our understanding of the global COVID-19 media narrative.” Data insights on COVID-19 media portrayal, such as trend analysis on mask use worldwide, and sample queries can be found on the GDELT Project Blog or you can explore the data directly in BigQuery. A Google Cloud COVID-19 research grant is also supporting additional data annotation on the COVID-19 pandemic and other major disease outbreaks. The project is using Cloud Speech-to-Text to compare COVID-19 radio coverage on 10 major U.S. stations. When completed, this dataset will make it possible for researchers to understand how television and radio coverage of the pandemic compares with online coverage.Helping companies manage operations throughout the pandemicIn the private sector, organizations have leveraged the COVID-19 datasets to support decision making in responding to the pandemic.Rolls-Royce joined with Google Cloud and other industry partners to form the Emergent Alliance. This data analytics coalition plans to leverage Google Cloud’s datasets in finding ways to support the global response to the pandemic, model economic recovery, and support return-to-work initiatives. When we launched COVID-19 public datasets, we set out on a mission to partner with data owners and make critical datasets easily accessible and free of analysis costs. We are inspired by the many organizations across healthcare, government, academia, and private industry that have led the way applying this data in innovative ways, supporting global response efforts. As communities continue to navigate the challenging path forward, we hope to play a small part in empowering them with data insights to prepare for what comes next.
Quelle: Google Cloud Platform

Data analytics for all — What happened at Week 5, Google Cloud Next ‘20: OnAir

Data analytics technologies are becoming a must-have for businesses looking to stay competitive in a changing environment. And if there’s one lesson from this unpredictable year, it’s that we always need to be prepared for anything. We spent this week at Google Cloud Next ’20: OnAir exploring Google Cloud’s data analytics technologies and hearing how customers across all industries are using BigQuery, Dataflow, Dataproc, Looker and more to drive real-time data insights and power new data-driven applications. Key data analytics announcementsWe kicked off Next OnAir this year with the launch of BigQuery Omni, a multi-cloud analytics solution that lets you query data stored across Google Cloud, AWS and Azure (coming soon). Data QnA, a natural language interface for analytics, also launched at the beginning of Next OnAir, allowing a business user to just ask a question on their company’s dataset and get results back the same way.BigQuery Omni was designed to meet the needs of a multi-cloud computing future. So is Looker, acquired by Google Cloud earlier this year, which powers data experiences that deliver actionable business insights at the point of decision to help meet different types of data users where they are. Check out the latest announcements from Looker, such as new multi-cloud hosting options and new UI components, all designed to optimize costs and use data at greater scale. You can also find a technical deep-dive session on Looker’s technology.Also new this week: BigQuery now offers a 100-slot purchase option, so that SMBs and digital-native businesses can get started more easily, with predictable pricing options. In addition, BigQuery now offers a 99.99% availability with guaranteed SLAs, providing peace of mind that the platform will be available to handle all your mission-critical needs. Finally, we launched detailed and prescriptive design patterns that allow you to build real-time AI solutions like anomaly detection, pattern recognition, and predictive forecasting that can be used across multiple industries. These help you to quickly get started with your organization’s real-time needs. How customers are blazing new trails with analyticsJust this week, we heard how organizations using Google Cloud and data analytics are transforming digitally and improving customer and user experiences. Procter and Gamble shared how their cloud data analytics journey lets them personalize products for consumers. Major League Baseball (MLB) migrated to BigQuery to centralize their enterprise data warehouse (EDW) and bring better decision-making and tailored fan communications.And lifecycle pricing platform provider Revionics chose BigQuery to stay ahead of their application development needs, forecast growth, and give customers up-to-the-minute information at scale. Explore this fleet management demoA conference isn’t complete without demos, and Next OnAir brings them to you for easy exploration. Check out this interactive demo to see how you might increase vehicle safety and health using streaming and predictive analytics, and business intelligence within Google Cloud’s smart analytics platform. And this blog post gives you the backstory on how the demo was developed to create a live simulated world of 7,500 trucks generating approximately 25 million trip events per day. Here’s a look:Go deep with dataThere are plenty more sessions and topics to explore, from building a data lake to implementing real-time AI. If you’re curious about streaming analytics, check out this session on creating and managing real-time experiences. Wherever you are in your modernization journey, you can find tips and how-tos, like this post for DBAs on how to easily adapt to cloud data warehouses. Looking ahead: Data managementLooking forward to more Next OnAir? Next week is all about data management. On Tuesday, August 18, Penny Avril, director of product management for databases at Google Cloud, will talk with chat app ShareChat about how they’ve modernized their database infrastructure to stay ahead of user demand, plus dive into product features.Next OnAir runs through September 8, and you can find live technical talks and learning opportunities aligned with each week’s content. Click “Learn” on the Explore page to find each week’s schedule. Haven’t yet registered for Google Cloud ’20 Next: OnAir? Get started at g.co/cloudnext.
Quelle: Google Cloud Platform

How fleet management gets easier with smart analytics on Google Cloud

Editor’s note: This blog provides a deeper, under-the-hood view of this smart analytics platform demo that you can explore now, featured during Google Cloud Next ‘20: OnAir.The ongoing COVID-19 pandemic has changed consumer purchasing behavior and, consequently, how companies need to manage their transportation logistics to meet new expectations. With large fleets of trucks to manage, how can fleet managers use modern technology to optimize their businesses? We explored the answers to this question using simulated sensor data and Google Cloud’s smart analytics platform to create a demo. Geotab, a Canada-based company, provides data-driven insights on commercial fleet vehicles on every continent. From engine speeds to ambient air temperatures to driving and weather conditions, Geotab processes a wealth of data from more than 2 million vehicles around the world. These vehicles are equipped with Geotab’s telematic solutions and a range of integrated sensors and apps.With all of this data streaming in real time, fleet truck managers use data to predict vehicle health, monitor driver safety, and track all of this information in real time. Real-time data lets users proactively respond to things as they happen or before they happen, rather than being reactive.To demonstrate how to solve these business challenges using smart analytics on Google Cloud, we created a live simulated world of 7,500 trucks generating approximately 25 million trip events per day. Built as a demo, the generated data simulates some of the data that might be processed as if it were integrated with actual sensors and apps from Geotab (Geotab does run a scaled environment using Google Cloud, but the data and environment here is simulated).Here’s a look at the dashboard we created for the demo, using Pub/Sub, Dataflow, BigQuery, and Looker on Google Cloud. It shows the simulated trucks making deliveries between various points.To create this, we first generated a simulated fleet of 7,500 trucks across 150 regions in the United States, each equipped with a simulated sensor that would emit data every few seconds. The data includes vehicle telematics data such as the GPS location, current speed, and current battery health. The data also contained information about each region, trip, and driver.BigQuery table showing more than 2 billion rows of telematics data collected over the past 78 daysTo ingest this constant flow of data into Google Cloud, we used Pub/Sub and Dataflow, automatically parsing all incoming sensor data into BigQuery (assuming one sensor per truck). With over 1 million trip data points stored in BigQuery per day, analysts could use SQL to quickly analyze massive amounts of data in seconds.In 5.7 seconds, BigQuery queried 103.2 GB of data to calculate the highest recorded driving speed for each of the 1.3 million trips on any particular day.Predictive analytics was the next topic of interest. With data streaming into BigQuery, we could write algorithms to help anticipate any issues with driver safety or vehicle health. Using scheduled queries, we established a safety score of each driver based on how they drove in recent trips, with data points such as the level of acceleration or steering around corners when driving. If the result was “poor,” the fleet manager could address the driver ahead of the next trip. In addition, vehicle health was updated on every trip to ensure the vehicle was in working condition, based on engine sensor data (i.e., battery health). This is computed at the beginning of every trip, too, to contribute toward improved predictive maintenance, one of the challenges in fleet management. BigQuery comes with machine learning capabilities via BigQuery ML, so it’s possible to create predictive models in SQL to forecast demand or predict unsafe driving conditions.Visualizing fleet management dataWith this data now available, how can it best communicate what actions a fleet operator could take? Since BigQuery natively integrates with business intelligence tools like Looker for out-of-the-box visual analysis, we used Looker to create dashboards to display some of the descriptive as well as actionable insights, as shown here:With Looker, analysts can create a single-source-of-truth data model where they define metrics using SQL. With a shared data model in place, business users can then use Looker to ask new questions of the data and build reports—all without needing a technical resource and still ensuring consistent metrics. For example, in the context of fleet management, an analyst might define the Harsh Acceleration Score in LookML, ensuring consistency and accuracy across the organization. In addition to dashboards and data exploration, Looker also provides a platform that enables end users to drive operational workflows through data actions (i.e., sending a text message directly to a driver concerning acceleration patterns).Google Cloud’s smart analytics platform provides an end-to-end solution for your fleet data, so you can spend less time worrying about scale, speed and infrastructure, and more time delivering value to your customers.See the full interactive smart analytics platform demo.Special thanks to Zack Akil, Matt Olivo, and Leigha Jarett for their technical contributions to the demo.
Quelle: Google Cloud Platform

Accelerating Mayo Clinic’s data platform with BigQuery and Variant Transforms

Genomic data is some of the most complex and vital data that our customers and strategic partners like Mayo Clinic work with. Many of them want to work with genomic variant data, which is the set of differences between a given sample and a reference genome, in order to diagnose patients and discover new treatments. Each sample’s variants are usually stored as a Variant Call Format file, or VCF, but files aren’t a great way to do analytics and machine learning on these data. In 2018, we introduced Variant Transforms, an easy way to load VCF data into BigQuery to enable these use cases. Since then, we’ve been hard at work improving Variant Transforms, adding features such as the ability to get VCFs back out of BigQuery for customers who want to use file-based tools. We’re now announcing a new schema for Variant Transforms that uses new BigQuery features to significantly reduce the cost of running queries:Sharding: Variant Transforms shards its output into multiple tables, each containing variants of a specific region of a genome—in this case, a whole chromosome. By storing each chromosome’s variants in its own table, you don’t pay to query every chromosome if you’re only analyzing a small genomic region on one of the chromosomes. In addition to making each table smaller and more manageable, sharding is a prerequisite for integer-range partitioning.Integer-range partitioning: A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage and query your data. By dividing a large table into smaller partitions, you can improve query performance and you can control costs by reducing the number of bytes read by a query. Variant Transforms uses integer-partitioned tables to only query the necessary variants.Clustering is a technique for automatically organizing a table based on the contents of one or more columns in the table’s schema. Clustered columns are used to colocate related data. BigQuery sorts the data based on the values of the clustered columns and organizes the data into multiple blocks in BigQuery storage. Using clustering in addition to partitioning is very effective for large variant tables, because it efficiently organizes and sorts the variants inside each partition.The result? Queries to find all variants of a typical gene are now five to 40 times cheaper than before. For queries that span a short genomic region, the associated cost can be up to 200 times lower.1How Mayo Clinic is using Variant TransformsThis kind of performance is empowering partners like Mayo Clinic. In 2019, Mayo Clinic built a new Omics Data Platform (ODP) on Google Cloud. They use Variant Transforms along with the Google Cloud Healthcare API and the Google Cloud Life Sciences API to deliver next-generation patient insights for Mayo Clinic patients. Mayo Clinic has integrated ODP with their pipelines that are producing variant data and are also loading all of their historical data. “Mayo Clinic is interested in finding the meaning in all of the variants. Is the variant pathogenic or benign? Is the variant potentially high impact? Does the variant affect how a patient metabolizes a drug?” says Mike Mundy, IT technical specialist from Mayo Clinic, who’s leading the implementation of ODP. “There are many sources of variant annotations and they are constantly being updated with new knowledge.” Mayo Clinic’s ODP also supports managing annotation sources, and annotation data is loaded into BigQuery. Variant data and annotation data is kept separate, which allows for dynamic variant annotation with the latest knowledge.ODP is implemented using a microservices architecture that is built on Google Cloud, using Cloud IAM for managing accounts and permissions, Cloud Storage for the data lake, and BigQuery for the data warehouse. The ODP security model keeps research subject data and clinical patient data separate. Each research study or clinical sequencing project gets its own Google Cloud project for fine-grained access control. ODP also includes an Omics Data Console that uses ODP’s microservices to provide a user interface for querying variant data and running dynamic annotations.How much better is Variant Transforms?To test the performance of the new schema, we wrote a query to find all variants in a genomic region of fixed length and examined several region lengths, from 1,000 base pairs to 1,000,000. We ran each query 10 times from random starting points and recorded the median cost (bytes billed). We repeated this process on two chromosomes, a large one and a small one, to examine the effect of chromosome size on the efficiency of our new schema. Here’s how the new schema performs on the 1000 Genomes dataset, in MB processed (lower is better):And here’s the new schema performance on the gnomADv3 data set, in MB processed (lower is better):As the approximate length of a typical gene is less than 50,000 base pairs, running a query to find all variants of a gene in gnomADv3 costs about 25MB. This means users can run around 42,000 similar queries each month at no cost, assuming use of the 1TB free offering of BigQuery.As Mayo Clinic scales out sequencing to hundreds of thousands of patients, they estimate saving $1.5 million over three years by using Google Cloud and Variant Transforms instead of their existing solution. The new version of Variant Transforms is accelerating the team’s ability to deliver on the promise of precision medicine. Learn more about Mayo Clinic and about Variant Transforms.1. Based on tests described in the “How much better is Variant Transforms” section of this post. Your own experience with the performance improvement of Variant Transforms may vary. Performance depends on many factors, including the size of the sharded output table, the density of variants, and the presence of repeated calls (in the case of joint genotyped inputs).
Quelle: Google Cloud Platform

Helping retailers prepare for the 2020 holiday season

Although summer is a time when many of us are thinking about sunshine and weekend trips to the beach, for retailers, summer means important preparation for the holiday season, the most important sales period of the year. The typical holiday season presents retailers with familiar challenges: increased product demand, planning in-store seasonal staffing, supply chain pressures, and spikes in web traffic—to name a few.But 2020 is going to be different. The long-tail impacts of the COVID-19 pandemic will continue to be felt in the retail industry through this year and into the next. Many retailers have been forced to reduce their physical footprints or change the way they operate, in addition to exploring new ways of delivering their products into the hands of customers, with more changes likely to come. Several major U.S. retailers have already announced being closed for Thanksgiving, further changing the digital and physical dynamics of the season. So how will this year’s holiday season play out? Connecting people with information is what Google does bestWe know the months ahead are paved with uncertainty, from fluctuating physical distancing restrictions, to questions on the strength of consumer spending given the pandemic, to the emergence of new consumer channels and buying paradigms. Retailers can prepare by understanding how shopper behaviors have changed, and by following the latest Google Cloud guidance in our ebook, “A retailer’s guide to 2020 holiday season readiness: Five keys to success.”Shape your strategy by understanding consumer behaviorLooking ahead to the 2020 holiday season, third-party market trends and our Google-led research indicates three overarching consumer behaviors that have the power to affect how retailers perform. As your team prepares in the months ahead, keep in mind these changes to purchasing habits, loyalty, and sentiment.1. More consumers are shopping online for the first time, and for products they would normally buy in-store.Perhaps unsurprisingly, the shift to ecommerce accelerated at its fastest ever rate in 2020. By May, it was up 70% year-over-year and had reached $82.5 billion in the United States. This shift affected retail segments that had previously lagged behind the shift online, like grocery. In the first week of March, only 11% of US adults said they shopped online for groceries. By the end of the month, that figure had jumped to 37%.We saw the abandonment of long-established buying habits as home-bound customers began to experiment with new approaches to purchasing everyday items. In addition to the steep rise in online grocery shopping, 1 in 4 of surveyed shoppers went online during the lockdown to purchase something they would normally buy in-store.1While stores are now reopening in many regions, new limits on physical interactions, reluctance on the part of consumers to shop in-store due to fears around COVID-19 exposure, and the possibility of further store closures (due to future waves of the virus) mean that a question mark continues to hang over the viability of a predominantly bricks-and-mortar retail strategy.Having a flexible and scalable ecommerce channel is becoming table stakes as more consumers are willing to go online to find and purchase a wider variety of products. Retailers looking to prioritize their holiday season readiness efforts should begin by doubling down on their omnichannel strategy.2. Shoppers expect new contactless ways to make every type of purchase.Before the pandemic, many shoppers found a visit to a store to be the fastest and simplest way to get what they needed. That changed when lockdown and shelter-in-place orders were issued and 53% of shoppers reported trying a new shopping service for the first time.2This included grocery delivery, as previously mentioned, but also checking inventory online before heading to the store, as well as trying curbside pickup. And for those who haven’t yet tried new services, the intent is there. Searches for “curbside pickup” rose 100% in March, while looking for “home delivery” grew by 70%—and more than 50% of consumers in a Google survey believe that curbside pick up will still be relevant as stores reopen. What people are buying is also changing. Google Search data shows a drastic increase in searches for items to complement spending more time in the home. For example, ergonomic chairs to improve the home office or home improvement items to enhance the quality of living spaces.Retailers need to understand and be able to act on emerging consumer trends to deliver new offerings and support changing buying preferences. We recommend leveraging tools like Google Trends and Rising Retail Categoriesto spot fast-rising retail categories, while investing in supply chain improvements to increase flexibility and resilience.3. With consumer sentiment low, shoppers are more value-conscious than ever before. Widespread disruption to lives and livelihoods has contributed to a drop in consumer sentiment in many markets around the world. A McKinsey study found that the vast majority of consumers in the 45 countries surveyed expect COVID-19 to impact their finances and personal routines for more than two months. In the United States, almost a third of consumers are already switching to less expensive products to save money.Looking at the retail industry as a whole, overall forecasts have decreased by more than 10% since the beginning of the year. Retailers that are successful this holiday season will have a strong, targeted strategy for reaching consumers with the products that are most relevant to them, at the right price. Reaching shoppers with the right message at the right time relies on having access to advanced analytics and machine learning technologies. With industry solutions from Google Cloud like the recently launched Recommendations AI, retailers can drive digital acceleration, increase consumer satisfaction and improve operational efficiency across the value chain. Here to help you prepare for the unpredictableThis year has been one of frantic and unexpected change for many of us. We’re committed to bringing forward technologies the retail industry needs and to partner with our customers to help them face any surprises this holiday season. This is why commerce companies such as Etsy and Shopify, and retailers like The Home Depot  and Urban Outfitters choose Google Cloud to support them during the busy shopping season. In particular, they leverage our Black Friday and Cyber Monday (BFCM) white-glove services, where we work side-by-side with their IT and engineering teams from early capacity planning, to reliability tests, to operational war rooms.Our teams are here with the solutions and the expertise to help you succeed through this holiday season and into the next. Start with our practical, actionable guide to help you prepare for a successful holiday season and beyond. From tips on cloud migration for cost management, to empowering your service centers with the support of AI, we outline the areas that will make the biggest difference to the customer experience and to your bottom line. Download the ebook today and join us for one of our Retail OnAir events, where we’ll share learnings from our work with leading retailers.1. Google/Ipsos, U.S. Shopping Tracker, March 20202. Google/Ipsos, Shopping Tracker, Jan, Feb, Mar, April 2020
Quelle: Google Cloud Platform

Understanding IP address management in GKE

When it comes to giving out IP addresses, Kubernetes has a supply and demand problem. On the supply side, organizations are running low on IP addresses, because of large on-premises networks and multi-cloud deployments that use RFC1918 addresses (address allocation for private internets). On the demand side, Kubernetes resources such as pods, nodes and services each require an IP address. This supply and demand challenge has led to concerns of IP address exhaustion while deploying Kubernetes. Additionally, managing these IP addresses involves a lot of overhead, especially in cases where the team managing cloud architecture is different from the team managing the on-prem network. In this case, the cloud team often has to negotiate with the on-prem team to secure unused IP blocks.There’s no question that managing IP addresses in a Kubernetes environment can be challenging. While there’s no silver bullet for solving IP exhaustion, Google Kubernetes Engine (GKE) offers ways to solve or work around this problem. For example, Google Cloud partner NetApp relies heavily on GKE and its IP address management capabilities for users of its Cloud Volumes Service file service. “NetApp’s Cloud Volumes Service is a flexible, scalable, cloud-native file service for our customers,” said Rajesh Rajaraman, Senior Technical Director at NetApp. “GKE gives us the flexibility to take advantage of non-RFC IP addresses and we can provide scalable services seamlessly without asking our customers for additional IPs,” Google Cloud and GKE enable us to create a secure SaaS offering and scale alongside our customers.”Since IP addressing in itself is a rather complex topic and the subject of many books and web articles, this blog assumes you are familiar with the basics of IP addressing. So without further ado, let’s take a look at how IP addressing works in GKE, some common IP addressing problems and GKE features to help you solve them. The approach you take will depend on your organization, your use cases, applications, skill sets, and whether or not there’s an IP Address Management (IPAM) solution in place. IP address management in GKEGKE leverages the underlying GCP architecture for IP address management, creating clusters within a VPC subnet and creating secondary ranges for Pods (i.e., pod range) and services (service range) within that subnet. The user can provide the ranges to GKE while creating the cluster or let GKE create them automatically. IP addresses for the nodes come from the IP CIDR assigned to the subnet associated with the cluster. The pod range allocated to a cluster is split up into multiple sub-ranges—one for each node. When a new node is added to the cluster, GCP automatically picks a sub-range from the pod-range and assigns it to the node. When new pods are launched on this node, Kubernetes selects a pod IP from the sub-range allocated to the node. This can be visualized as follows:Provisioning flexibilityIn GKE, you can obtain this IP CIDR either in one of two ways: by defining a subnet and then mapping it to the GKE cluster, or by auto-mode where you let GKE pick a block automatically from the specific region. If you’re just starting out, run exclusively on Google Cloud and would just like Google Cloud to do IP address management on your behalf, we recommend auto-mode. On the other hand, if you have a multi-environment deployment, have multiple VPCs and would like control over IP management in GKE, we recommend using custom-mode, where you can manually define the CIDRs that GKE clusters use.Flexible Pod CIDR functionalityNext, let’s look at IP address allocation for Pods. By default, Kubernetes assigns a /24 subnet mask on a per node basis for Pod IP assignment. However, more than 95% of GKE clusters are created with no more than 30 Pods per node. Given this low Pod density per node, allocating a /24 CIDR block on every Pod is a waste of IP addresses. For a large cluster with many nodes, this waste gets compounded across all the nodes in the cluster. This can greatly exacerbate IP utilization. With Flexible Pod CIDR functionality, you can define Pod density per Node and thereby use fewer IP blocks per node. This setting is available on a per Node-pool basis, so that if tomorrow the Pod density changes, then you can simply create a new Node pool and define a higher Pod density. This can either help you fit more Nodes for a given Pod CIDR range, or allocate a smaller CIDR range for the same number of Nodes, thus optimizing the IP address space utilization in the overall network for GKE clusters.The Flexible Pod CIDR feature helps to make GKE cluster size more fungible and is frequently used in three situations: For hybrid Kubernetes deployments, you can avoid assigning a large CIDR block to a cluster, since that increases the likelihood of overlap with your on-prem IP address management. The default sizing can also cause IP exhaustion.To mitigate IP exhaustion – If you have a small cluster, you can use this feature to map your cluster size to the scale of your Pods and therefore preserve IPs.For flexibility in controlling cluster sizes: You can tune your cluster size of your deployments by using a combination of container address range and flexible CIDR blocks. Flexible CIDR blocks give you two parameters to control cluster size: you can continue to use your container address range space, thus preserving your IPs, while at the same time increasing your cluster size. Alternatively you can reduce the container address range (use a smaller range) and still keep the cluster size the same.Replenishing IP inventoryAnother way to solve IP exhaustion issues is to replenish the IP inventory. For customers who running out of RFC 1918 addresses, you can now use two new types of IP blocks:Reserved addresses that are not RFC 1918Privately used Public IPs (PUPIs), currently in betaLet’s take a closer look.Non-RFC 1918 reserved addresses For customers who have an IP shortage, GCP added support for additional reserved CIDR ranges that are outside the RFC 1918 range. From a functionality perspective, these are treated similar to RFC1918 addresses and are exchanged by default over peering. You can deploy these in both private and public clusters. Since these are reserved, they are not advertised over the internet, and when you use such an address, the traffic stays within your cluster and VPC network. The largest block available is a /4 which is a very large block.Privately used Public IPs (PUPI)Similar to non-RFC 1918 reserved addresses, with PUPIs, you can use any Public IP, except Google owned Public IPs on GKE. These IPs are not advertised to the internet.To take an example, imagine you need more IP addresses and you use the following IP range privately A.B.C.0/24. If this range is owned by a Service MiscellaneousPublicAPIservice.com, devices in your routing domain will no longer be able to reach MiscellaneousPublicAPIservice.com and will instead be routed to your Private services that are using those IP addresses. This is why there are some general guidelines when using PUPIs. PUPIs are given higher priority over real IPs on the internet because they belong within the customer’s VPC and therefore, their traffic doesn’t go outside of the VPC. Thus, when using PUPIs, it’s best to ensure you are selecting IP ranges that you are sure will not be accessed by any internal services.Also, PUPIs have a special property in that they can be selectively exported and imported over VPC Peering. With this function, a user can have deployment with multiple clusters in different VPCs and reuse the same PUPIs for Pod IPs. If the clusters need to communicate with each other, then you can create a servicetype loadbalancer with Internal LB annotation. Then only these Services VIPs can be advertised to the peer, allowing you to reuse PUPIs across clusters and at the same time ensuring connectivity between the clusters.The above works for your environment whether you are running purely on GCP or if you run in a hybrid environment. If you are running a hybrid environment, there are other solutionswhere you can create islands of clusters in different environments by using overlapping IPs and then use a NAT or proxy solution to connect the different environments. The IP addresses you needIP address exhaustion is a hard problem with no easy fixes. But by allowing you to flexibly assign CIDR blocks and replenish your IP inventory, GKE ensures that you have the resources you need to run. For more, check out the documentation on how to create a VPC-native cluster with alias IP ranges, and this solution on GKE address management.
Quelle: Google Cloud Platform

COVID-19 public datasets: our continued commitment to open, accessible data

Back in March, we announced that new COVID-19 public datasets would be joining our Google Cloud Public Datasets program to increase access to critical datasets to support the global response to the novel coronavirus. While the program initially focused on COVID-19 case data, we’ve since expanded our datasets offering to provide additional value to members of the research community and public decision makers. In addition, we’re extending our initial offering of free querying of COVID-19 public datasets for an additional year, through September 15, 2021.These expanded datasets would not have been possible without numerous partnerships with data providers working closely with Google Cloud to onboard their data to BigQuery. By onboarding public data to BigQuery, these data providers remove barriers and increase the velocity with which users can access and query these large data files. With the COVID-19 public datasets and BigQuery, everything is easily found in one place. As we strive to continue supporting our users, we want to help ensure that a lack of resources is not a contributing factor in one’s ability to make sense of this data. That’s why we’re expanding datasets access, and we hope that this will expand the pool of contributors who are finding solutions to this pandemic, whether that’s students and faculty querying these datasets through distance learning in the fall or public decision makers gauging when their communities can safely reopen. We hope that these datasets continue to provide universally accessible and useful information in the fight against COVID-19.How Google has worked with organizations to make COVID-19 datasets availableSince the beginning of the pandemic, The New York Times has tracked and visualized cases across the United States. They have publicly shared aggregated case data at the county and state level, allowing researchers to track, model, and visualize the spread of the virus. These rich datasets provide U.S. national-level, state-level, and county-level cases and deaths, beginning with the first reported coronavirus case in Washington State on January 21, 2020. As deaths began to increase across the United States and abroad, The New York Times published the data behind their excess deaths tracker to provide researchers and the public with a better record of the true toll of the pandemic globally. We worked with them to make this data accessible on BigQuery. The New York Times also estimated the prevalence of mask-wearing in counties in the United States and made that data available to provide researchers a way to better understand the role of mask-wearing in the course of the pandemic. To complement this and many other efforts to better understand the impact of policy actions, Google also released the COVID-19 Community Mobility Reports, which provide data on community movement trends, and made the data available on BigQuery. We also recently announced our COVID-19 Public Forecasts to help first responders and other healthcare and public sector-impacted organizations project metrics, such as case counts and deaths, into the future. This data is also available on BigQuery.Next, we prioritized data that could help in understanding the varying effects of COVID-19 in our communities and healthcare systems by publishing datasets relating to social determinants of health. To expand the scope of COVID-19 related queries that qualify for free querying, we included existing datasets like the American Community Survey from the U.S. Census Bureau and OpenStreetMap. We also worked with organizations like BroadStreet to make datasets like the U.S. Area Deprivation Index available on BigQuery. This dataset provides a measure of community vulnerability to public health issues at a highly granular level. Finally, we are publishing aggregated hospital capacity data from the American Hospital Association to help decision makers better understand their community’s ability to handle a surge in hospitalizations.We also recognize that the scientific community’s response to COVID-19 often depends on the availability and accessibility of high-quality scientific data. We’ve worked to include the Immune Epitope Database (Vita et al, Nucleic Acid Research, 2018) on BigQuery as a resource to researchers investigating the immune response to the SARS-CoV-2 virus. We have also published a series of articles to show how researchers can explore and build predictive models from this dataset using Google Cloud AI Platform. As an additional resource to the scientific community, we’ve created the COVID-19 Open Data dataset, which combines numerous publicly available COVID-19 and related datasets at a fine geographic level and makes them available both in BigQuery and in CSV and JSON formats. The code used to create this dataset is open-source and available on GitHub.As we continue to expand the list of COVID-19 public datasets, we will continue to release new datasets aligned with these four established focus areas:Epidemiology and health response, such as case and testing statistics and hospital dataGovernment policy response and effects, such as mobility and mask complianceSocial determinants of health and community responseBiomedical and other research dataFor those attending Google Cloud Next ‘20: OnAir, be sure to check out Data vs. COVID-19: How Public Data is Helping Flatten the Curve. This session will highlight how public data and the Google Cloud and COVID-19 public datasets are helping combat the pandemic and informing individual decision-making to help everyone make informed decisions about the spread and risks of the virus, and explain how cooperative efforts could help flatten the curve.
Quelle: Google Cloud Platform

The best of Google Cloud Next ’20: OnAir's Data Analytics Week for technical practitioners

Calling all data practitioners: It’s week 5 of Google Cloud Next ‘20: OnAir, and this week we’re covering all things data analytics. This covers the full spectrum of data workflows in Google Cloud, from data ingestion using Dataflow and Pub/Sub to BigQuery’s machine learning and geospatial capabilities and data visualization with Looker. And, of course, this week includes a session about BigQuery Omni, which lets you use BigQuery’s capabilities across Google Cloud, Amazon Web Services, and Azure (soon). After checking out the sessions below, if you have questions, join me this Friday, August 14 at 9 AM PST for a developer- and operator-focused live recap and Q&A session. Our APAC team is also hosting a recap Friday, August 14 at 11 AM SGT. Several colleagues will join me to discuss their sessions, data analytics news, and future events to look forward to. Join us live to ask our experts questions or watch it on-demand at any time after it airs. Hope to see you then.We have a lot of great content to share this week, so let’s dig in to a few highlights:Analytics in a multi-cloud world with BigQuery Omni—BigQuery Product Manager Emily Rapp brings BigQuery Omni to life with a demonstration of how it can help multi-cloud users better understand their analytics insights with their data stored across multiple clouds.MLB’s data warehouse modernization—Google Cloud data engineer Ryan McDowell and Robert Goretsky, Major League Baseball’s VP of data engineering, detail the challenges Major League Baseball experienced in their traditional data warehouse, their migration to BigQuery, and the benefits they saw.Building a streaming anomaly detection solution at TELUS using Pub/Sub, Dataflow, and BigQuery ML—Google Cloud solutions architect Masud Hasan and TELUS Lead Architect for Cybersecurity Analytics and Enterprise Data Lake Abdul Rahman Sattar break down the Google Cloud-native architecture TELUS uses to stream and analyze real-time events to detect anomalies that might be a security threat to their customers. Also, this week’s Cloud Study Jam features opportunities for hands-on cloud experience with workshops on BigQuery. These workshops are led by Google Cloud’s experts, and feature opportunities to learn more about BigQuery, as well as an opportunity to chat with some of our training teams. Be sure to check out the entire session catalog for this week for a wide variety of content that drills down into Looker, developing data lakes in Google Cloud, building data pipelines, real-time AI, and more. Google Cloud Next ‘20: OnAir is running through the week of September 8, so be sure to check  out the full session catalog and register now.
Quelle: Google Cloud Platform

21 new ways we're improving observability with Cloud Ops

We’ve heard from customers about how important it is to be able to reliably operate your applications and infrastructure running on Google Cloud. In particular, observability is critical to reliable operations. To help you quickly gain insight into your Google Cloud environment, we’ve added 21 new features to Cloud Operations, the observability suite we launched earlier this year, which gives you access to all our operations capabilities directly from the Google Cloud Console. The new features we’re discussing today make it easier for you to get the observability you need from your environment, whether it’s in Google Cloud, other clouds, on-premises or a mix.Perhaps more importantly, Cloud Operations is built on top of infrastructure with breathtaking scale—and we pass the resulting performance on to you. Two of Cloud Operations’ central services, Cloud Monitoring and Cloud Logging, are built on core observability platforms used by all of Google that handle over 16 million metrics queries per second, 2.5 exabytes of logs per month, and over 14 quadrillion metric points on disk. That’s a lot of data! To get to this scale, we developed the culture and practices that go into building, launching and running production applications with high velocity and reliability. The practice of Site Reliability Engineering (SRE) is core to this, and an integral part of product planning. In addition to offering the power of this massive platform to Google Cloud customers, we’ve been picking relevant capabilities from the SRE approach that we believe will simplify customer experiences, and building them into Cloud Ops products, broken down according to the five steps of the process for adding observability in your environment—plan, collect, store, configure and troubleshoot. Let’s take a closer look.PlanBefore you start building your operational workflow, it’s a best practice to outline your services and how you want each to perform, defining Service Level Objectives, (SLOs). That leads us to our first new feature:1. The general availability of SLO Monitoring Now, focusing on your SLOs is easier than ever. SLO Monitoring gives you the ability to focus on signals and improve the signal-to-noise ratio. This, along with out-of-the-box alerts, reduces the levels of expertise required to monitor production environments, and makes it easier to identify and remediate issues before they impact critical business metrics.In just two weeks since making it generally available, we’ve had hundreds of new users for SLO Monitoring, and gotten some great feedback about how it can simplify your monitoring practices.“SLOs measure how the user feels about your product—that is what truly matters,”said Vipul Makar, SRE Lead & Enterprise Architect, Equifax.”With SLO Monitoring we make data-driven decisions and build more reliable products. Once we learned how to use SLOs we never looked back!”To learn more about SLOs and using custom metrics to create SLOs, join us at these Google Cloud Next ’20: OnAir sessions: OPS200 – Kubernetes Engine Infrastructure and Service Monitoring with Equifax and OPS102 – Best Practices for Custom Metric Telemetry on Google Cloud.CollectCollecting logs and metrics is easy when you leverage out-of-the-box observability for Google Cloud system logs and metrics. You can add application and third-party logs and metrics from wherever they are generated: OpenTelemetry / OpenCensus, captured by Google Cloud monitoring and logging agents, or submitted directly through Cloud Operations APIs. Today, we’re expanding the types of logs that you can use in Logging in two important ways:2. G Suite audit logs  – The integration betweenG Suite audit logs and Cloud Logging is now generally available, adding to the dozens of Google Cloud services already providing audit logs out of the box.3. Multi-cloud and on premise – We’ve partnered with Blue Medora to provide agents for collecting logs and metrics from anywhere — now generally available at no additional cost.We’ve also made it even easier to capture metrics and logs from your Compute Engine VMs:4. You can now install, run, and manage the Cloud Logging and Monitoring agents across groups of Compute Engine instances or your entire fleet with a single command.To learn more about collecting logs and metrics, join us at OPS102 – Best Practices for Custom Metric Telemetry on Google Cloud and OPS203 – OpenTelemetry and Observability at Shopify, Splunk, and Google.StoreBeing able to store and protect your data in Cloud Logging and Monitoring is critical to your observability strategy. That’s why we’ve been working hard to launch new features to help you meet your security and privacy requirements.Cloud Logging makes it easy to search and analyze logs as well as provides a central, secure, compliant, and scalable log storage solution. Today we’re announcing a number of improvements to log storage and management building on several recent improvements for exploring and analyzing logs. Here’s a selection of what’s new:5.Logs buckets (Beta) – Centralize or subdivide your logs based on your needs for ownership, retention, and region. 6. Logs views (Preview) – Gain better control over who has access to your logs data.7. Regionalized log storage (Preview) – Configure your log buckets in five separate cloud regions, with more to come.8. Improved log routing (Preview) – Route logs from one project to another or even using an aggregated log sink at the folder or organization level to centralize logs into a logs bucket.9. Customizable retention (Generally Available) – In addition to announcing that custom retention is generally available, allowing you to retain your logs data for anywhere from one day to 10 years, you can now use custom retention through the end of March 2021 for no additional cost. This means you can try out our log management capabilities for your long-term compliance and analytics needs for logs without a cost commitment. Regionalized logs buckets and logs views are now in private preview but coming to beta in September 2020. Some examples of our new functionality that preview users have enjoyed include centralizing all audit logs from across your organization, splitting out logs from a multitenant GKE cluster into multiple projects, or setting up regionalized log storage. Sign up for access or to be notified of future releases of the preview of logs views and regionalized storage.Then, there’s your ability to write and retain custom and Prometheus metrics, which can be critical to the observability of your applications and services. To help ensure that you have these metrics when you need them:10. Extended Retention for Custom and Prometheus metrics – they are now retained for 24 months rather than 6 weeks, at no additional cost.11. 10 second resolution for Agent, Custom, and Prometheus metrics –  you can write metrics at 10 second granularity for agent, custom and Prometheus metrics. You can use these higher resolution metrics to track rapidly changing environments, applications, services, and infrastructure.To learn more about our new log management capabilities, join us at OPS100 – Designing for Observability on Google Cloud.ConfigureOur Cloud Operations suite offers several ways for you to customize your environment to meet your business and reliability goals: dashboards, alerting policies, logs-based metrics, uptime checks and SLOs.  We’ve got a range of new improvements to help you both automate your configuration and get started quickly with new out of the box dashboards.12. Monitoring Dashboards API – Building out your dashboards at scale is easier than ever with our new Dashboards API, which allows you to manage your monitoring as code. 13. Out of the box dashboards – The only thing better than easy-to-build dashboards are dashboards that are already built for you. We’ve added a range of new out-of-the-box dashboards including a Cloud Logging dashboard and a newly refreshed dashboard for Compute Engine that shows cross-fleet metrics.14. Pub/Sub alerting notifications  – In addition to visualizing your system, you’ll want to use alerting for reliability and automation to reduce toil with the new Pub/Sub integration for alerting in Cloud Monitoring.15. Monitoring Query Language (generally available) – The new Monitoring Query Language allows you to manipulate time series to create useful charts, for example plot ratios between different metrics or current vs. past metrics, define arithmetic expressions over time-series values, or create new aggregations.  To learn more about alerting and dashboarding with Cloud Operations, join us at OPS208 – Alerting Best Practices for Google Cloud Monitoring and OPS302 – Monitoring as Code.TroubleshootNow that you’ve got everything set up, you’re ready to troubleshoot issues in production. We’ve added six new features to the Cloud Logging to help you find issues fast.16. Our new logs viewer is now generally available (GA) and boasts a variety of new features for analyzing logs data, and now supports viewing your logs at the folder or org level in your GCP organization.17. We’ve added histograms to the new logs viewer to help you spot patterns in your logs over time.18. We’ve added the logs field explorer to the new logs viewer which helps you rapidly refine queries and spot interesting distributions across your data.19. Saved and recent searches in the new logs viewer to help you get to your most valuable logs more quickly. 20. Integration with traces now provides in-context insight about latency and makes it easy to find all logs that include a specific trace. 21. Our logging query language also got a major upgrade with support for regular expressions. To learn more about troubleshooting with Cloud Operations, join us at OPS201 – Creating a Better Developer Experience with Google Cloud’s Operations Suite of Products and OPS301 – Analyzing Distributed Traces to Find Performance Bottlenecks.For an overview of the new functionality and how it can be used with GKE, check out this short video:Get started with Cloud OperationsWith a Google-scale foundation, and an aggressive roadmap of new features and functionality, you can rely on the observability tools in Cloud Operations to help you manage, monitor and troubleshoot your most mission-critical applications. To learn more about Cloud Operations, register for and join us at these NEXT’20 OnAir sessions:OPS100 – Designing for Observability on Google CloudOPS200 – Kubernetes Engine Infrastructure and Service Monitoring with EquifaxOPS213 – Cloud Operations Spotlight
Quelle: Google Cloud Platform