Google Kubernetes Engine: 7 years and 7 amazing benefits

Today, as we celebrate seven years of general availability of the most automated and scalable managed Kubernetes, Google Kubernetes Engine (GKE), we present seven of the common ways that GKE helps customers do amazing things.Accelerates productivity of developersDeveloper time is at a premium. GKE provides a rich set of integrated tools to help you ship faster and more often. The practice of continuous integration (CI) allows developers to frequently integrate all their code changes back into a main branch, exposing failures faster by revealing issues as early as possible in the process. A CI pipeline typically produces an artifact that you can deploy in later stages of the deployment process with continuous delivery (CD). CD lets you release code at any time.The ecosystem of developer tools for GKE spans across CI and CD.Developers write, deploy, and debug code faster with Cloud Code and Cloud ShellContinuously integrate and deliver updates with Cloud BuildContinuous delivery to GKE is made easier, faster, and more reliable with Cloud DeployDebug and troubleshoot with Google Cloud’s operations suiteYou can use your favorite partner solutions out of the boxMoreover, GKE Autopilot cluster accelerates app deployment reducing configuration time and simplifies ongoing management of Dev/Test clusters. You can read more on how to get started with GKE Autopilot. “Google Kubernetes Engine is easy to configure, and scales really well. That means the developers don’t need to think about managing it in production, they can simply set the parameters and be confident it will work.”—Vincent Oliveira, CTO, Lucky CartBolsters security into software supply chainSecurity remains a top of mind for all organizations. Kubernetes clusters created in the Autopilot mode implement many GKE hardening features by default. Furthermore, GKE Autopilot improves cluster security, restricting access to the Kubernetes API, prevents node mutation, enforcing robust security posture, and lets you implement additional guidance to harden security of your clusters. Binary Authorization is a deploy-time security control that ensures only trusted container images are deployed on GKE. With Binary Authorization, you can gain tighter control over your container environment by ensuring only verified images are integrated into the build-and-release process. You can read more on how to build security into your software supply chain”We needed to be HIPAA compliant, which was going to be painful on AWS, and we wanted to get away from managing and operating our own Kubernetes clusters,” recalled Astorino. “We had heard good things about GKE (Google Kubernetes Engine). And particularly valuable for us, — many technical requirements you need for HIPAA compliance are configured by default on Google Cloud.” —Troy Astorino, CoFounder & CTO of PicnicHealthCreates new opportunities with a platform approachModern application platforms spawn creativity and drive quick response to customer demands.  GKE customers use Kubernetes to build a modern enterprise-grade application platform for their organization. With the ability to achieve improved speed and performance for a variety of workloads through Tau VMs/GPU/TPU/Local SSD support, GKE helps them support a wide variety of containerized applications, including stateful and stateless, AI and ML, Linux and Windows. Only GKE can run 15,000 node clusters, outscaling other cloud providers by up to 10X, letting you run applications effectively and reliably at scale.”Google Cloud-managed services are playing a major role in enabling Noon.com customers to get their shopping done whenever they need it, without experiencing any delays or glitches, and without us having to lose sleep at night to ensure our platform is functioning as it should.”—Alex Nadalin, SVP of Engineering, Noon.comDelivers always-on experiences for customersConsumers today demand 24×7 digital experiences. GKE provides granular controls to deliver always-on, highly available, and reliable apps and services. With node auto-upgrade we automatically upgrade and patch your cluster nodes, while the control plane is always patched and upgraded by Google. You can also subscribe to a release channel – rapid, regular or stable – based on your needs and constraints. For enterprises, release channels provide the level of predictability needed for advanced planning, and the flexibility to orchestrate custom workflows automatically when a change is scheduled. You can learn more about release channels here, and about maintenance windows here.”To bring E.ON Optimum to market, we needed to transform in-house software into a highly scalable, reliable cloud-based solution. We were specifically looking for a cloud partner capable of running Kubernetes pods at scale and 100% of the time, and that led us to Google Cloud.”—Dennis Nobel, Digital Delivery Manager,E.ONEnables cost optimization and savings for organizationsIn the current macroeconomic environment, you often need to do more with fewer resources. GKE Autopilot dynamically adjusts compute resources, so there’s no need to figure out what size and shape nodes you should configure for your workloads. With GKE Autopilot, you pay only for the pods you use and you’re billed per second for vCPU, memory and disk resource requests. Moreover, GKE cost optimization insights help you discover optimization opportunities at scale, across your GKE clusters and workloads, automatically with minimal friction.”Since migrating to GKE, we’ve halved the costs of running our nodes, reduced our maintenance work, and gained the ability to scale up and down effortlessly and automatically according to demand. All our customer production loads and development environment run on GKE, and we’ve never faced a critical incident since.”—Helge Rennicke, Director of Software Development, Market Logic SoftwareFuels growth with focus on business innovationIT divisions are moving from cost centers to value centers by using managed cloud services. One can benefit from no-stress management and focus on business innovation using GKE Autopilot, which provides hands-off cluster management, SLA and eliminates most day-2 cluster operations. GKE delivers most dimensions of automation to efficiently and easily operate your applications. With fully managed GKE Autopilot,  combined with multi-dimensional auto-scaling capabilities, you can get started with a production ready secured cluster in minutes and have complete control over the configurations and maintenance.”The automated features of Google Kubernetes Engine enables us to manage app traffic and develop games at an amazingly high level of efficiency. Currently, we only need two engineers to monitor traffic volume and all the environments of our three games, which frees up more workforce for development and innovation work.”—Aries Wang, Research and Development Deputy Manager, Yile TechnologyGives freedom from proprietary tools for ITMulti-cloud is a reality. Proprietary tools often require specialized skills and lock you into huge licensing fees. You can minimize vendor lock-in and be well placed to maximize the benefits of a mutli-cloud strategy with conformant Kubernetes supported across multiple environments including all major cloud providers. Kubernetes’ workload portability provides you the flexibility to move your apps around without constraints.”MeilleursAgents is a product-oriented company and our goal is to deliver new services as fast as we can, in order to get market feedback, and improve them once they’re in production. Google Kubernetes Engine helps us do that by delivering flexibility and easy scaling, which is why we decided to make the switch.”—Thibault Lanternier, Head of Web Engineering, MeilleursAgentsJoin us at Building for the future with Kubernetes to kickstart or accelerate your Kubernetes journey. You’ll get access to technical demos that go deep into our Kubernetes services, developer tools, operations suite and security solutions. We look forward to partnering with you on your Kubernetes journey!Related ArticleWhy automation and scalability are the most important traits of your Kubernetes platformThe recipe for long-term success with Kubernetes: automation that matters and scalability that saves money.Read Article
Quelle: Google Cloud Platform

Why all retailers should consider Google Cloud Retail Search?

Cloud Retail Search, part of Discovery Solutions For Retail portfolio, helps retailers significantly improve the shopping experience on their digital platform with ‘Google-quality’ search. Cloud Retail Search offers advanced search capabilities such as better understanding user intent and self-learning ranking models that help retailers unlock the full potential of their online experience.Google Cloud’s Discovery Solutions For Retail are a set of services that can help retailers improve their digital engagement and are offered as part of our industry solutions.Executive SummaryRetailers are always working on trying to keep up with the ever changing consumer expectations and trying to forecast the next trend that can impact sales and revenue. The pandemic brought its own (and largely new) set of challenges which further complicated the issue over the last two years. The retailers were forced to adapt to the new consumer (low physical touch) behavior in which the browsing and product research was largely digital (endless aisle) and accelerated other trends such as buy online and pick up in stores (BOPIS), curbside pick up and pick up lockers. According to a McKinsey Global Survey from early last year, the pandemic has accelerated the pace of digital transformation by several years. The National Retail Federation (NRF) estimates that retail sales are expected to grow between 6% and 8% in 2022 (slower growth rate than in 2021), as consumers spend more on services instead of goods, deal with inflation and higher food & gas prices due to geopolitical disruptions in the world. And the competition continues to be fierce as ever. Amazon continues its dominance in the U.S. retail world and new PYMNTS data shows that Amazon’s share of US Ecommerce sales hit an all-time high of 56.7% in 2021.Customers now have more choices than ever on how they want to engage with the retailers, where they want to spend the money and make their purchase. They also have increased expectations from the retailers around providing a high quality product discovery experience, which is forcing the retailers to invest heavily on improving customer engagement on their digital platforms to boost conversion rate and overall customer loyalty.This is where Retail Search can help by providing an enhanced search experience that uses Google-quality search models to understand the customer intent and takes into account the retailer’s first party data (such as promotions, available inventory and price) for ranking results.How is Google Cloud Retail Search DifferentThe Ecommerce platform on-site search use case is not new and retailers have been trying to solve it effectively for the last two decades.  Most retailers recognize that search is a critical service on the platform and have spent countless resources to improve and fine tune it over the years. Yet the challenge remains. According to a Baymard Institute study in late as 2019, 61% of sites still required their users to search by the exact product type jargon the site uses. However, users now expect the same robust and intuitive search features as is offered by Google.com and other popular web platforms, who seem to have the uncanny ability to intelligently interpret and yield relevant results to complex search queries.Google’s decades of experience and research in search technology benefits Cloud Retail Search solution and that is what differentiates it from the competition. Advanced Query Understanding: Retail Search can provide more relevant results for the same query due to better query understanding features and knowing when to broaden or narrow the query results. While most search engines still rely largely on keyword based or matching tokens results, Retail Search has the advantage of being able to leverage Google search algorithms to return highly relevant results for product listings and category pages. Semantic Search: Intent recognition is a key requirement for semantic search and identifying what the customers mean when they enter the query is a key strength of Retail Search. This is critical for retailers since this has a direct impact on Clickthrough rate, Conversion rate and the Bounce Rate.Personalized Search Results: Another key differentiator for Retail Search is its ability to leverage user interaction data and ranking models to provide hyper personalized search results. Retailers are able to optimize search performance to deliver desired outcomes: better engagement, revenue, or conversions. Self-Learning and Self-Managed Solution: Retail Search models get better over time because of the self-learning capabilities built into the solution. In addition, the service is fully managed, which saves precious resources needed to keep it running and managing its set up.Strong Security Controls: The service runs on Google Cloud and follows security best practices to keep our customers’ data secure. Google never shares model weights or customer data across customers using the Retail API or other Discovery Solution products. For more details about this data use, see a description of Retail API data use.High Level Conceptual ViewHere is a simplified high-level view of Retail Search API. Retailers can call the API for the given search query and get back the results which can then be displayed on their digital properties.The returned results contains two types of information:Search results: Query search results including product listings and category pages based on advanced query understanding and semantic search.Dynamic faceted search attributes: Faceted Search is a feature that allows further refinement of the search by providing ways to apply additional filters while returning results. Retail Search needs the following datasets as input to train its machine learning models for search:Product Catalog:  Information about the available products including product categories, product description, in-stock availability, and pricing.User Events: This is the clickstream data that contains user interaction information such as clicks and purchases.Inventory / Pricing Updates: Incremental updates to in-stock availability and pricing as that information is updated.(Keeping the product catalog up to date and recording user events successfully is crucial for getting high-quality results. Set up Cloud Monitoring alerts to take prompt action in case any issues arise).Retailers also have the ability to set up business/config rules to customize their search results and optimize for business revenue goals such as Clickthrough rate, Conversion rate, Average size order etc.How to get startedRetail Search is generally available now and anyone with a Google cloud account can access it. If you don’t already have an account, you can start with a trial account for free here.Establish a Success Criteria: It’s important to establish a success criteria for measuring the effectiveness of Retail search. Get a consensus on which factor(s) you want to include in scope for measuring the effectiveness of Retail Search. This could include one or two from the following: Search Conversion Rate, Search Average Order Value, Search Revenue Per Visit and Null Search Rate (No Results Found).Initial Set Up: Create a Google Cloud Project and set up the Retail API. When you set up the Retail API for a new project, the Google Cloud Console displays the following three panels to help you configure your Retail API project:Catalog: displays product catalog and a link to import catalog.Event: displays user events and a link to import historical user events.Serving configs: contains details on serving configuration and a link to create a new serving configuration.Measuring Performance: Retail dashboards provide metrics to help you determine how incorporating the Retail API is affecting the results. You can view summary metrics for your project on the Analytics tab of the Monitoring & Analytics page in Cloud Console. Set up A/B Experiments: To measure the performance of Retail Search with another search solution, you can set up A/B tests using a third-party experiment platform such as Google Optimize.Summary:As retailers try to navigate through the post-pandemic world where supply chain failures and digital transformation acceleration are major focus areas, they now also have to keep a close eye on the recent geopolitical challenges resulting in rising inflation and costs.While we can all agree that in-store shopping will continue to be a major source of revenue, it is also important for retailers to tweak the in-store experience for the digital world. Trends such as buy online and pick up in stores (BOPIS), curbside pick up and pick up lockers are here to stay.Given all the above, consumer engagement and digital experience is more important now than ever before. The cost of search abandonment is way too high and has both short and longer term impact. Retail Search is a great solution to help reduce churn, improve conversion and retention. It provides Google-quality search models to help understand customer intent and the retailers have the ability to set up business/config rules to optimize search results for business revenue goals such as Clickthrough rate, Conversion rate and Average size order.Related ArticleShoppers know what they want. Does your site’s search understand?With Retail Search, retailers can now build Google-quality search into their sites and apps to drive sales and conversions.Read Article
Quelle: Google Cloud Platform

Announcing Virtual Machine Threat Detection now generally available to Cloud customers

Today we are delighted to announce that our unique, first-to-market detection capability with Virtual Machine Threat Detection (VMTD) in Security Command Center is now generally available for all Google Cloud customers. We launched this service six months ago in public preview and have seen a lot of enthusiasm from our customers. We’ve seen adoption from users around the world and in every industry. For years, we have said security must be engineered in, not bolted on. By baking this capability into our virtualization stack we are living up to our promise of delivering invisible security.  Our team has been busy scaling the service, refining our detection capabilities, and preparing our next major feature set. VMTD in general availability has been scaled to support significantly more frequent scanning across a tremendously large number of instances. Scaling the scanning of memory from the Google Cloud Compute Engine (GCE) fleet has posed unique challenges, and we’ve invested in caching scan results to enable more frequent scans of smaller – but more important – sections of memory. For customers, enabling VMTD is as easy as checking a box in their Security Command Center Premium settings. They consistently report that this is a game-changer compared to the challenges associated with third-party agent deployment. Because VMTD is deployed from the hypervisor, rather than inside the instance, our instrumentation is not as exposed to adversaries as are traditional endpoint detection and response technology (EDR) agents. This is an invisible-to-adversaries approach: they can not detect when we scan. By enabling VMTD, our customers have activated protection against millions of compute instances with a few simple checkboxes, and without using compute overhead.Deploying VMTD to Cloud customers has uncovered multiple attacks, and a few surprising false positives. In one example, our YARA rules detected a customer’s antivirus agent as mining cryptocurrency because that nameless agent happened to resolve a lot of mining pool domains as part of its protection scheme. Our YARA rules are developed in collaboration with the best of Google’s threat intelligence community, drawing on expertise from Google’s Threat Analysis Group as well as Google Cloud Threat Intelligence. As a cloud-native managed service, we’re always on the lookout for these cases and work hard to improve our service for all customers. With this release we are thrilled to announce that in addition to detecting mining on a particular instance, we now can–in the vast majority of instances–identify and report on the specific process that is engaged in mining activity. VMTD is now able to deeply understand the Linux kernel to report specific details about the threat execution we’ve observed without an agent inside the instance. This can enable easier triage, investigation, and response to our detection. In the academic literature of virtual machine introspection, understanding kernel data structure meaning from hardware contents is known as “bridging the semantic gap.” With this release, VMTD can bridge this gap and prove that insights useful to security teams can be pulled from the contents of memory for any given virtual machine.In the future, we plan on further improving VMTDs understanding of Linux kernels to detect additional advanced attacks and report live telemetry to our customers. With its unique position as an outside-the-instance observer, VMTD can detect rootkits and bootkits, attacks that tamper with kernel integrity and otherwise blind the kernel and EDR to their presence. We are delighted with the progress we’ve made on detecting the most common threats in Cloud computing environments. If you’re interested in trying out our service, navigate to the settings page of Security Command Center Premium and turn on Virtual Machine Threat Detection. You can read more about Security Command Center Premium and Virtual Machine Threat Detection at our docs pages.Related ArticleProtecting customers against cryptomining threats with VM Threat Detection in Security Command CenterExtending threat detection in Security Command Center with Virtual Machine Threat Detection.Read Article
Quelle: Google Cloud Platform

How Danone created new eCommerce strategies to connect directly with customers with commercetools on Google Cloud

Editor’s note: Today’s blog comes from Luke O’Connell, Head of Data and Technology at Danone. He shares the ways in which using commercetools deployed on Google Cloud is helping Danone to connect more directly with consumers and better understand their needs.With the rise of large eCommerce players and rapid home delivery, here at Danone we’re seeing big changes in the way that consumers need our products to reach them. They want to be able to connect with brands and purchase products more directly, especially when traditional commerce channels become unavailable. Using commercetools deployed on Google Cloud, we’re making the most of this opportunity to develop and deploy new digital and hybrid commercial models, whether that be digital sampling services for healthcare products, subscription services for water or the delivery of goods directly for our infant nutrition brands. Solving consumer goods shortages by connecting directly In 2013, we experienced how important being able to connect directly with consumers can be, when increased demand for our products in China led to shortages of our baby formula brands across Europe. Accessing food for their babies was an emotive issue for parents, and we needed to find a solution quickly to enable them to buy direct. We chose commercetools as the right solution for Danone because of its flexibility and its MACH-based cloud-native approach. This enabled us to get eCommerce solutions ready to deploy across multiple markets within weeks. When the COVID-19 pandemic arrived in 2020, consumers were once again faced with empty shelves in-store as their shopping habits were disrupted on a much larger scale. As we were already working with commercetools, we could easily deploy direct-to-consumer solutions by simply adding a cart function to our existing brand websites. And, because commercetools on Google Cloud is able to powerfully scale to meet demand, we didn’t have to worry about spikes in traffic causing sites to crash.  Leveraging data to better understand consumer needsOne of the key reasons we chose to work with commercetools on Google Cloud was because the commercetools team understood we needed more than just a shopping cart. While there are many eCommerce vendors focused only on increasing sales, we wanted a flexible solution that helps us to add value and helps us to learn more about our customers.In the UK, Danone’s healthcare division is deploying commercetools to enable its B2B consumers to order medical samples directly. Not only does that make the process easier for consumers, it’s also helping Danone to gather important data on who its consumers are, and how they use its products.Connecting commercetools with Google Cloud enables us to collect and analyze this sampling data, as well as stream it into other software to help our reps in the field tailor their conversations with healthcare professionals. When it comes to data and the value this can unlock, we have been able to fully understand our consumers’ journeys, which has previously been a challenge for us as a Consumer Packaged Goods company; we would normally rely on third-party data such as that acquired through POS or loyalty cards.Creating a tailored solutionOur choice of commercetools deployed on Google Cloud is part of a significant shift towards modular technologies that are enabling us to pick and choose the most effective components to suit our requirements. We’ve also recently created local digital transformation teams to integrate technology and data fully alongside business strategy. This is helping us to really understand the specific features we need to connect directly with our consumers, building and extending platforms that reach them quickly and easily.  commercetools is a next-generation headless enterprise commerce platform that enables customers to easily integrate game-changing features and functionality to their commerce sites. It’s now available via Google Cloud Marketplace, offering brands and retailers deeper product integration, unified billing and an expanded go-to-market relationship, making it easier for customers to select Google Cloud as the cloud platform to drive their business growth with commercetools.
Quelle: Google Cloud Platform

Building a sustainable agricultural supply chain on Google Cloud

Working to put food on all of our tables, today’s farmers are facing a higher amount of instability from input supply chain issues to weather patterns. Adding to this challenge are problems farmers face when trying to correctly time grain purchases, sales and transport. Farmers have always been stewards of the land, but now the demand for sustainable products has them needing to better prove their regenerative practices. AtBushel, we understand these problems can’t be solved overnight or by a single company. We focus on empowering agribusinesses and farmers to work even more closely together to build a more sustainable agricultural supply chain by rapidly responding to market changes. With Bushel, farmers can track market prices in real time, instantly buy and sell grain, analyze inventory and transactions, and securely share verified information with grain operators and other producers. We provide the digital tools to streamline how farmers buy and sell commodities throughout the agricultural industry’s supply chain to help address market inefficiencies that can lead to waste, and have the information and resources to help them flex and adapt as complexity increases in farming operations. Approximately 40% to 50% of all U.S. grain transactions now pass through the Bushel platform. As we continue to grow, Bushel continues to focus on what digital tools can support each point in the supply chain. Many focus on the first mile at the farm or last mile at the store. But Bushel is focused on modernizing the middle where grain purchasing and processing sit. We aim to help local grain industries and stabilize regional agricultural supply chains.Starting with a simple mobile app; now scaling into an agricultural ecosystem    Bushel began its journey in 2017 as a small-scale platform for farmers that delivered grain contracts, cash bids, and receipts. As Bushel evolved into a comprehensive agricultural ecosystem, we realized we needed knowledgeable technology partners to help us rapidly scale while saving time and administrative costs. That’s why we started partnering with theGoogle for Startups Cloud Program to get support from Google and work with Google Cloud Managed Services partner,DoiT International to help support our use of GKE and create a multi-regional deployment as well as migrate our CUDs to new Compute Engine families and continue to optimize our footprint. We’ll also use DoiT’s Flexsave technology to reduce the management overhead of CUDs in the future. In just one year, we expanded to over 1,200 live grain receiving locations and quickly grew our services portfolio with electronic signature capabilities, commodity balances, and web development. Because that relationship between farmer and agribusiness is so important, we provide more than 200 grain companies with white-labled digital experiences so each farmer sees their local grain facility they do business with on both desktop and mobile. To further our extension into the digital infrastructure of agriculture, we subsequently acquiredGrainBridge andFarmLogs to help farmers handle specific jobs and tasks, and provide the needed insights to improve their business operations. Over 2,000 grain receiving locations across the United States and Canada now use Bushel products. We accomplished all this onGoogle Cloud. We leverage thesecure-by-design infrastructure of Google Cloud to protect millions of financial transactions and keep sensitive customer data safe. Our data is processed and stored in Google’s secure data centers, which maintain adherence to a number of compliance frameworks. We utilize Google Kubernetes Engine extensively as it reduces operational overhead and offers auto scaling up to 15,000 nodes. Database provisioning, storage capacity management, and other time-consuming tasks are automated withour Cloud SQL usage.Query Insights for Cloud SQL streamlines database observability and seamlessly integrates with existing apps and Google Cloud services such as GKE andBigQuery. Empowering farmers and agribusinesses in North America The Google Cloud Account Team had been instrumental in helping Bushel build an expansive agricultural platform that powers APIs, apps, websites, and digital solutions. Google’s startup experts are incredibly responsive, with deep technical knowledge that can’t be found elsewhere. Google Cloud also has provided us credits to explore new ways of analyzing the vast amounts of data we generate, verify, and transfer with solutions such as BigQuery andPub/Sub.With BigQuery, we can run analytics at scale with 26%–34% lower three-year TCO than cloud data warehouse alternatives. BigQuery delivers actionable insights on a highly secure and scalable platform, includes built-inmachine learning capabilities, and integrates with Pub/Sub to ingest and stream analytic events viaDataflow.With Bushel, farmers across North America are rapidly responding to sudden market changes by tracking grain prices in real time and instantly buying and selling crops. We see a future where this business information becomes insights – where a farmer can not just know where to sell their grain, but when to sell. The burden right now to engage with carbon markets is high, full of paper-based binders and verification forms. We see a world where farming practices recorded digitally can be permissioned along the supply chain for a better picture of how our food is grown. With the Bushel platform, millions of farmers around the world will have the digital tools to modernize local grain industries, build more sustainable agricultural supply chains, and help to address global food inequity.If you want to learn more about how Google Cloud can help your startup, visit our pagehere to get more information about our program, and sign up for our communications to get a look at our community activities, digital events, special offers, and more. Related ArticleFounders and tech leaders share their experiences in “Startup Stories” podcastFounders and tech leaders share their experiences in Google Cloud’s “Startup Stories” podcast.Read Article
Quelle: Google Cloud Platform

Vertex AI Example-based Explanations improve ML via explainability

Artificial intelligence (AI) can automatically learn patterns that humans can’t detect, making it a powerful tool for getting more value out of data. A high-performing model starts with high-quality data, but in many cases, datasets have issues such as incorrect labels or unclear examples that contribute to poor model performance. Data quality is a constant challenge for enterprises—even some datasets used as machine learning (ML) benchmarks suffer from label errors. ML models are thus often notoriously difficult to debug and troubleshoot. Without special tools, it’s difficult to connect model failures to root causes and even harder to know the next step to resolve the problem. Today, we’re thrilled to announce the public preview of Vertex AI Example-based Explanations, a novel feature that provides actionable explanations to mitigate data challenges such as mislabeled examples. With Vertex AI Example-based Explanations, data scientists can quickly identify misclassified data, improve datasets, and more efficiently involve stakeholders in the decisions and progress. This new feature takes the guessing games out of model refinement, enabling you to identify problems faster and speed up time to value. How Examples-based Explanations create better modelsVertex AI Example-based Explanations can be used in numerous ways, from supporting users in building better models to closing the loop with stakeholders. Below, we describe some notable capabilities of the feature:Figure 1. Use case overview Example-based ExplanationsTo illustrate the use case of misclassification analysis, we trained an image classification model on a subset of the STL-10 dataset, using only images of birds and planes. We noted some images of birds being misclassified as planes. For one such image, we used Example-based Explanations to retrieve other images in the training data that appeared most similar to this misclassified bird image in the latent space. Examining those, we identified that both the misclassified bird image and the similar images were dark silhouettes. To take a closer look, we expanded the similar example search to show us the 20 nearest neighbors. From this, we identified that 15 examples were images of planes, and only five were images of birds. This signaled a lack of images of birds with dark silhouettes in the training data, as only one of the training data bird images was a dark-silhouetted one. The immediate actionable insight was to improve the model by gathering more data with images of silhouetted birds.Figure 2. Use Example-based Explanations for misclassification analysisBeyond misclassification analysis, Example-based Explanations can enable active learning, so that data can be selectively labeled when its Example-based Explanations come from confounding classes. For instance, if out of 10 total explanations for an image, five are from class “bird” and five are from class “plane,” the image can be a candidate for human annotation, further enriching the data. Example-based Explanations are not limited to images. They can generate embeddings for multiple types of data: image, text, tabular. Let’s look at an illustration of how to use Example-based Explanations with tabular data. Suppose we have a trained model that predicts the duration of a bike ride. When examining the model’s projected duration for a bike ride, Example-based Explanations can help us identify issues with the underlying data points. Looking at row #5 in the below image, the duration seems too long when compared with the distance covered. This bike ride is also very similar to the query ride, which is expected since Example-based Explanations are supposed to find similar examples. Given the distance, time of day, temperature, etc. are all very similar between the query ride and the ride in row #5, the duration label seems suspicious.The immediate next step is to examine this data point more closely and either remove it from the dataset or try to understand if there might be some missing features (say, whether the biker took a snack break) contributing to the difference in durations.Figure 3. Use Example-based Explanations for tabular dataGetting started with Examples-based Explanations in Vertex AIIt takes only three steps to set up Example-based Explanations. First, upload your model and dataset. The service will represent the entire dataset in a latent space (called embeddings). As a concrete example, let’s examine words in a latent space. The below visualizations show such word embeddings, where the position in the vector space encodes meaningful semantics of each word, such as the relation between verbs or between a country and its capital.Next, deploy your index and model, after which the Example-based API will be ready to query. Then, you can query for similar data points and only need to repeat steps 1 and 2 when you retrain the model or change your dataset.Figure 4. Embeddings can capture meaningful semantic informationUnder the hood, the Example-based Explanations API builds on cutting-edge technology developed by Google research organizations, described in this blog post and used at scale across a wide range of Google applications, such as Search, YouTube and Play Store. This technology, ScaNN, enables querying for similar examples significantly faster and with better recall, compared to other vector similarity search techniques. Learn how to use Example-based Explanations by following the instructions available in thisconfiguration documentation. To learn more about Vertex AI, visit our product page or explore this summary of tutorials and resources.Related ArticleVertex Matching Engine: Blazing fast and massively scalable nearest neighbor searchSome of the handiest tools in an ML engineer’s toolbelt are vector embeddings, a way of representing data in a dense vector space. An ear…Read Article
Quelle: Google Cloud Platform

Announcing public availability of Google Cloud Certificate Manager

Today we are pleased to announce that Cloud Certificate Manager is now in general availability. Cloud Certificate Manager enables our users to acquire, manage, and deploy public Transport Layer Security (TLS) certificates at scale for use with your Google Cloud workloads. TLS certificates are required to secure browser connections and transactions. Cloud Certificate Manager supports both self-managed and Google-managed certificates, as well as wildcard certificates, and has monitoring capabilities to alert for expiring certificates. Scale to support as many domains as you needSince our public preview announcement supporting the SaaS use cases, we have scaled the solution to serve millions of managed domains. Alon Kochba, head of web performance at Wix, shared how Certificate Manager’s scale and performance helped them lighten their workload.“As a SaaS product, we need to terminate SSL for millions of custom domains and certificates. Google Cloud’s Certificate Manager and External HTTPS Load Balancing lets us do this at the edge, close to the clients, without having to deploy our own custom solution for terminating SSL,” Kochba said. Streamline your migrationsYou can now deploy a new certificate globally in minutes and greatly simplify and accelerate the deployment of TLS for SaaS offerings. Coupled with support for DNS Authorizations, you can now streamline your workload migrations without major disruptions. James Hartig, co-founder of GetAdmiral.com, shared this with Google after the migration experience.“I just wanted to say thank you so much for the release of Certificate Manager and its support for SaaS use cases. We just completed our migration to using Google to terminate TLS and everything went really smoothly and we couldn’t be happier.” Automate with Kubernetes & self-service ACME certificate enrollmentWe have further introduced a number of automation and observability features including:Kubernetes integration in public preview with Cloud Certificate ManagerSelf-service ACME certificate enrollment, now in public previewThe ability to track Certificate Manager usage in the billing dashboardWe also have started work on incorporating Terraform automation with Cloud Certificate Manager, which will simplify your workload automation.During the certificate manager private preview of the ACME certificate enrollment capability, our users have acquired millions of certificates for their self-managed TLS deployments. Each of these certificates comes from Google Trust Services, which means our users get the same TLS device compatibility and scalability we demand for our own services. Our Cloud users get this benefit even when they manage the certificate and private key themselves–all for free. We look forward to you using Certificate Manager and these new capabilities to improve the reliability of your services and help encourage further adoption of TLS.Related ArticleHow Google Cloud blocked the largest Layer 7 DDoS attack at 46 million rpsBy anticipating a DDOS attack, a Google Cloud customer was able to stop it before it took down their site. They just weren’t expecting it…Read Article
Quelle: Google Cloud Platform

Jupiter evolving: Reflecting on Google’s data center network transformation

Data center networks form the basis for modern warehouse-scale and cloud computing. The underlying guarantee of uniform, arbitrary communication among tens of thousands of servers at 100s of Gb/s of bandwidth with sub-100us latency has transformed computing and storage. The primary benefit of this model is simple yet profound: adding an incremental server or storage device to a higher-level service delivers a proportional increase in service capacity and capability. At Google, our Jupiter data center network technology supports this kind of scale-out capability for foundational services for our users, such as Search, YouTube, Gmail, and Cloud services, such as AI and machine learning, Compute Engine, BigQuery analytics, Spanner databases, and dozens more.We have spent the last eight years deeply integrating optical circuit switching (OCS) and wave division multiplexing (WDM) into Jupiter. While decades of conventional wisdom suggested that doing so was impractical, the combination of OCS with our Software Defined Networking (SDN) architecture has enabled new capabilities: support for incremental network builds with heterogeneous technologies; higher performance and lower latency, cost, and power consumption; real-time application priority and communication patterns; and zero-downtime upgrades. Jupiter does all this while reducing flow completion by 10%, improving throughput by 30%, using 40% less power, incurring 30% less cost, and delivering 50x less downtime than the best known alternatives. You can read more about how we did this in the paper we presented at SIGCOMM 2022 today, Jupiter Evolving: Transforming Google’s Datacenter Network via Optical Circuit Switches and Software-Defined Networking. Here is an overview of this project. Evolving Jupiter data center networks In 2015, we showed how our Jupiter data center networks scaled to more than 30,000 servers with uniform 40Gb/s per-server connectivity, supporting more than 1Pb/sec of aggregate bandwidth. Today, Jupiter supports more than 6Pb/sec of datacenter bandwidth. We delivered this never-before-seen level of performance and scale by leveraging three ideas:Software Defined Networking (SDN) – a logically centralized and hierarchical control plane to program and manage the thousands of switching chips in the data center network.  Clos topology – a non-blocking multistage switching topology, built out of smaller radix switch chips, that can scale to arbitrarily large networks.Merchant switch silicon – cost-effective, commodity general-purpose Ethernet switching components for a converged storage and data network. By building on these three pillars, Jupiter’s architectural approach supported a sea change in distributed systems architecture and set the path for how industry as a whole builds and manages data center networks.However, two primary challenges for hyperscale data centers remained. First, data center networks need to be deployed at the scale of an entire building — perhaps 40MW or more of infrastructure. Further, the servers and storage devices deployed into the building are always evolving, for example moving from 40Gb/s to 100Gb/s to 200Gb/s and today 400Gb/s native network interconnects. Therefore, the data center network needs to evolve dynamically to keep pace with the new elements connecting to it. Unfortunately, as shown below, Clos topologies require a spine layer with uniform support for the fastest devices that might connect to it. Deploying a building-scale, Clos-based data center network meant pre-deploying a very large spine layer that ran at a fixed speed of the latest generation of the day. This is because Clos topologies inherently requireall-to-allfanout from aggregation blocks1 to the spine; adding to the spine incrementally would require rewiring the entire data center. One way to support new devices running at faster line rates would be to replace the entire spine layer to support the newer speed, but this would be impractical given hundreds of individual racks housing the switches and tens of thousands of fiber pairs running across the building.Fig. A new aggregation block (green) with 200Gb/s port speed is connected to 3 old spine blocks (blue) with 100Gb/s port speed and one new spine block (green) with 200Gb/s port speed.  In this model, only 25% of the links from the new aggregation block and the new spine block run at 200Gb/s.Ideally, the data center network would support heterogeneous network elements in a “pay as you grow” model, adding network elements only when needed and supporting the latest generation of technology incrementally. The network would support the same idealized scale-out model it enables for servers and storage, allowing the incremental addition of network capacity — even if of a different technology than previously deployed — to deliver a proportional capacity increase and native interoperability for the entire building of devices. Second, while uniform building-scale bandwidth is a strength, it becomes limiting when you consider that data center networks are inherently multi-tenant and continuously subject to maintenance and localized failures. A single data center network hosts hundreds of individual services with varying levels of priority and sensitivity to bandwidth and latency variation. For example, serving web search results in real-time might require real-time latency guarantees and bandwidth allocation, while a multi-hour batch analytics job may have more flexible bandwidth requirements for short periods of time. Given this, the data center network should allocate bandwidth and pathing for services based on real-time communication patterns and application-aware optimization of the network. Ideally, if 10% of network capacity needs to be temporarily taken down for an upgrade, then that 10% should not be uniformly distributed across all tenants, but apportioned based on individual application requirements and priority.Addressing these remaining challenges seemed impossible at first. Data center networks were built around hierarchical topologies at massive physical scale such that supporting incremental heterogeneity and dynamic application adaptation could not be incorporated into the design. We broke this impasse by developing and introducing Optical Circuit Switching (OCS) into the Jupiter architecture. An optical circuit switch (depicted below) maps an optical fiber input port to an output port dynamically through two sets of micro-electromechanical systems (MEMS) mirrors that can be rotated in two dimensions to create arbitrary port-to-port mappings.Fig. The operation of a single OCS device mapping N input to N output fibers via MEMS mirrors.We had the insight that we could create arbitrary logical topologies for data center networks by introducing an OCS intermediation layer between data center packet switches as shown below.Fig. Aggregation blocks physically connected via fibers to OCS switches. A logical topology can be realized by configuring each OCS switch to connect a permutation of input to output fibers.Doing so required us to build OCS and native WDM transceivers with levels of scale, manufacturability, programmability and reliability never achieved before. While academic research investigated the benefits of optical switches, conventional wisdom suggested that OCS technology was not commercially viable. Over multiple years, we designed and built Apollo OCS that now forms the basis for the vast majority of our data center networks. One salient benefit of OCS is that no packet routing or header parsing are involved in its operation. OCS simply reflects light from an input port to an output port with incredible precision and little loss. The light is generated through electro-optical conversion at WDM transceivers already required to transmit data reliably and efficiently across data center buildings. Hence, OCS becomes part of the building infrastructure, is data rate and wavelength agnostic, and does not require upgrades even as the electrical infrastructure moves from transmission and encoding rates of 40Gb/s to 100Gb/s to 200Gb/s — and beyond.With an OCS layer, we eliminated the spine layer from our data center networks, instead connecting heterogeneous aggregation blocks in a direct mesh, for the first time moving beyond Clos topologies in the data center. We created dynamic logical topologies that reflected both physical capacity and application communication patterns. Reconfiguring the logical connectivity seen by switches in our network is now standard operating procedure, dynamically evolving the topology from one pattern to another with no application-visible impact. We did this by coordinating link drains with routing software and OCS reconfiguration, relying on our Orion Software Defined Networking control plane to seamlessly orchestrate thousands of dependent and independent operations.Fig. Multiple OCS achieving topology engineeringA particularly interesting challenge was that, for the first time, shortest path routing over mesh topologies could no longer provide the performance and robustness required by our data center. A side effect of typically-deployed Clos topologies is that while many paths are available through the network, all of them have the same length and link capacity, such that oblivious packet distribution, or Valiant Load Balancing, provides sufficient performance. In Jupiter, we leverage our SDN control plane to introduce dynamic traffic engineering, adopting techniques pioneered for Google’s B4 WAN: We split traffic among multiple shortest and non-shortest paths while observing link capacity, real-time communication patterns, and individual application priority (red arrows in the figure below).Fig. Flow tables on switches perform traffic engineeringTaken together, we have iteratively completely re-architected the Jupiter data center networks that power Google’s warehouse-scale computers, introducing a number of industry firsts along the way:Optical Circuit Switches as the interoperability point for building-scale networks, seamlessly supporting heterogeneous technologies, upgrades, and service requirements.Direct mesh-based network topologies for higher performance, lower latency, lower cost, and lower power consumption. Real-time topology and traffic engineering to simultaneously adapt network connectivity and pathing to match application priority and communication patterns, all while observing real-time maintenance and failures.Hitless network upgrades with localized add/remove of capacity, eliminating the need for expensive and toilsome “all services out” style upgrades that previously required hundreds of individual customers and services to move their services for extended building downtime.While the underlying technology is impressive, the end goal of our work is to continue to deliver performance, efficiency, and reliability that together provide transformative capabilities for the most demanding distributed services powering Google and Google Cloud. As mentioned above, our Jupiter network consumes 40% less power, incurs 30% less cost, and delivers 50x less downtime than the best alternatives we are aware of, all while reducing flow completion by 10% and improving throughput by 30%. We are proud to share details of this technological feat at SIGCOMM today, and look forward to discussing our findings with the community. Congratulations and thank you to the countless Googlers that work on Jupiter everyday and to the authors of this latest research: Leon Poutievski, Omid Mashayekhi, Joon Ong, Arjun Singh, Mukarram Tariq, Rui Wang, Jianan Zhang, Virginia Beauregard, Patrick Conner, Steve Gribble, Rishi Kapoor, Stephen Kratzer, Nanfang Li, Hong Liu, Karthik Nagaraj, Jason Ornstein, Samir Sawhney, Ryohei Urata, Lorenzo Vicisano, Kevin Yasumura, Shidong Zhang, Junlan Zhou, Amin Vahdat.1. An aggregation block comprises a set of machine (compute/storage/accelerator) racks including Top-of-Rack (ToR) switches connected by a layer of typically co-located switches.
Quelle: Google Cloud Platform

How CISOs need to adapt their mental models for cloud security

Many security leaders head into the cloud armed mostly with tools, practices, skills and ultimately the mental models for how security works that were developed on premise. This leads to cost and efficiency problems that can be solved by mapping their existing mental models to those of the cloud. When it comes to understanding the differences between on-premises cybersecurity mental models and their cloud cybersecurity counterparts, a helpful place to start is by looking at the kinds of threats each one is attempting to block, detect, or investigate. Traditional on-premise threats focused on stealing data from databases, file storage, and other corporate resources. The most common defenses of these resources rely on layers of network, endpoint, and sometimes application security controls. The proverbial “crown jewels” of corporate data were not made accessible with an API to the outside world or stored in publicly accessible storage buckets. Other threats aimed to disrupt operations or deploy malware for various purposes, ranging from outright data theft to holding data for ransom.There are some threats that are specifically aimed at the cloud. Bad actors are always trying to take advantage of the ubiquitous nature of the cloud.One common cloud-centered attack vectorthat they pursue is constantly scanning IP address space for open storage buckets or internet-exposed compute resources.As Gartner points out, securing the cloud requires significant changes in strategy from the approach we take to protect on-prem data centers. Processes, tools, and architectures need to be designed using cloud-native approaches to protect critical cloud deployments. And when you are in the early stages of cloud adoption, it’s critical for you to be aware of the division of security responsibilities between your cloud service provider and your organization to make sure you are less vulnerable to attacks targeting cloud resources.Successful cloud security transformations can help better prepare CISOs for threats today, tomorrow, and beyond, but they require more than just a blueprint and a set of projects. CISOs and cybersecurity team leaders need to envision a new set of mental models for thinking about security, one that will require you to map your current security knowledge to cloud realities.  As a way to set the groundwork for this discussion, the cloud security transformation can start with a meaningful definition of what “cloud native” means. Cloud native is really an architecture that takes full advantage of the distributed, scalable, and flexible nature of the public cloud. (To be fair, the term implies that you need to be born in the cloud to be a native, but we’re not trying to be elitist about it. Perhaps a better term would be “cloud-focused” or doing the security “the cloudy way.”)However we define it, adopting cloud is a way to maximize your focus on writing code, creating business value, and keeping your customers happy while taking advantage of cloud-native inherent properties—including security. One sure way to import legacy mistakes, some predating cloud by decades, into the future would be to merely lift-and-shift your current security tools and practices into the public cloud environment.Going cloud-native means abstracting away many layers of infrastructure, whether it’s network servers, security appliances, or operating systems. It’s about using modern tools built for the cloud and built in the cloud. Another way to think about it: You’ll worry less about all these things because you’re going to build code on top of that to help you move more quickly. Abandoning legacy security hardware maintenance requirements is part of the win here. To put another way, security will follow in the steps of IT that has been transformed by the SRE and DevOps revolution. You can extend this thinking to cloud native security, where some of your familiar tools combine with solutions provided by cloud service providers to take advantage of cloud native architecture to secure what’s built and launched in the cloud. While we talked about the differences between on-prem targeted threats compared to threats targeting cloud infrastructure, here are other vital areas to re-evaluate in terms of a cloud security mental model.Network securitySome organizations practice network security in the cloud as if it were a rented data center. While many traditional practices that worked reasonably well on-premise for decades, along with many traditional network architectures, are either not applicable in the cloud or not optimal for cloud computing. However, concepts like a demilitarized zone (DMZ) can be adapted to today’s cloud environments. For example, a more modern approach to DMZ would use microsegmentation and govern access by identity in context. Making sure that the right identity, in the right context, has access to the correct resource gives you strong control. Even if you get it wrong, microsegmentation can limit a breach blast radius. Becoming cloud native also drives the adoption of new approaches to enterprise network security, such as BeyondProd. It also benefits organizations because it gets them away from traditional network perimeter security to focus on who and what can access your services—rather than where requests for access originated.Although network security changes driven by cloud adoption can be enormous and transformational, not all areas shift in the same way.Endpoint securityIn the cloud, the concept of a security endpoint changes. Think of it this way: A virtual server is a server. But what about a container? What about microservices and SaaS? With software as a service cloud model, there’s no real endpoint there. All along your cloud security path, users only need to know what happens where. Here is a helpful mental model translation: An API can be seen as sort of an endpoint. Some of the security thinking developed for endpoints applies to cloud APIs as well. Securing access, permissions, privileged access thinking can be carried over, but the concept of endpoint operating system maintenance does not. Even with automation of service agents on virtual machines in the cloud, insecure agents may increase risks because they are operating at scale in the cloud. Case in point: This major Microsoft Azure cross-tenant vulnerability highlighted a new type of risk that wasn’t even on the radar of many of its customers.In light of this, across the spectrum of endpoint security approaches, some disappear (such as patching operating systems for SaaS and PaaS), some survive (such as the need to secure privileged access,) and yet others are transformed. Detection and response With a move to the cloud comes changes to the threats you’ll face, and changes to how you detect and respond to them. This means that using on-prem detection technology and approaches as a foundation for future development may not work well. Copying all your on-premises detection tools and their threat detection content won’t reduce risks in the way that most cloud-first organizations will need..Moving to the cloud provides the opportunity to rethink how you can achieve your security goals of confidentiality, integrity, availability, and reliability with the new opportunities created by cloud process and technology.Cloud is distributed, often immutable, API-driven, automatically scalable, and centered on the identity layer and often contains ephemeral workloads created for a particular task. All these things combine to affect how you handle threat detection for the cloud environment and necessitate new detection methods and mechanisms. There are six key domains where threats in the cloud can be best detected: identify, API, managed services, network, compute, and Kubernetes. These provide the coverage needed related to network, identity, compute, and container infrastructure. They also provide specific detection mechanisms for API access logs and network traffic captures.As with endpoint security, some approaches become less important (such as network IDS on encrypted links), others can grow in importance (such as detecting access anomalies,) while others transform (such as detecting threats from the provider backplane).Data securityThe cloud is changing data security in significant ways, and that includes new ways of looking at data loss prevention, data encryption, data governance, and data access. Cloud adoption sets you on a path to what we at Google call“autonomic data security.”Autonomic data security means security has been integrated throughout the data lifecycle and is improving over time. At the same time, it makes things easier on users, freeing them from having to define and redefine myriad rules about who can do what, when, and with which data. It lets you keep pace with constantly evolving cyberthreats and business changes, so you can keep your IT assets more secure and make your business decisions faster.Similar to other categories, some data security approaches wane in importance or disappear (such as manual data classification at cloud scale), some retain their importance from on-prem to cloud unchanged, while others transform (such as pervasive encryption with effective and secure key management).Identity and access managementThe context for identity and access management (IAM) in the cloud is obviously different from your on-premise data center. In the cloud, every person and service has its own identity and you want to be able to control access. Within the cloud, IAM gives you fine-grained access control and visibility for centrally managing cloud resources. Your administrators authorize who can act on specific resources, giving you full control and visibility to manage cloud resources centrally. What’s more, if you have complex organizational structures, hundreds of workgroups, and a multitude of projects, IAM gives you a unified view into security policy across your entire organization.With identity and access management tools, you’re able to grant access to cloud resources at fine-grained levels, well beyond project-level access. You can create more granular access control policies to resources based on attributes like device security status, IP address, resource type, and date and time. These policies help ensure that the appropriate security controls are in place when granting access to cloud resources.The concept of Zero Trust is strongly in play here. It’s the idea that implicit trust in any single component of a complex, interconnected system can create significant security risks. Instead, trust needs to be established via multiple mechanisms and continuously verified. To protect a cloud-native environment, a zero trust security framework requires all users to be authenticated, authorized, and validated for security configuration and posture before being granted or keeping access to cloud-based applications and data.This means that IAM mental models from on premise security mostly survive, but a lot of underlying technology changes dramatically, and the importance of IAM in security grows significantly as well. Shared fate for greater trust in cloud securityClearly, cloud is much more than “someone else’s computer.” That’s why trust is such a critical component of your relationship with your chosen cloud service providers. Many cloud service providers acknowledge shared responsibility, meaning that they supply the underlying infrastructure but leave you responsible for many seemingly inscrutable  security tasks.With Google Cloud, we operate in a shared fate model for risk management in conjunction with our customers. We believe that it’s our responsibility to be active partners as our customers deploy securely on our platform, not delineators of where our responsibility ends. We stand with you from day one, helping you implement best practices for safely migrating to and operating in a trusted cloud. Get ready to go cloud nativeWe offer you several great resources to help you prepare for cloud migration, and guide you as you review your current security approaches for signs of on-prem thinking.Listen to our podcast series where Phil Venables, Vice President, CISO at Google Cloud, and Nick Godfrey, Director, Financial Services Security & Compliance and member of Office of the CISO at Google Cloud, join me in a discussion on preparing for cloud migration (Podcast 1, Podcast 2). You can deepen your cloud native skills by earning a Professional Cloud Security Engineer certification from Google.Related ArticleHow autonomic data security can help define cloud’s futureHere’s how Autonomic Data Security can help transform old-world security models to the new world of data in the cloud.Read Article
Quelle: Google Cloud Platform

How Google Cloud blocked the largest Layer 7 DDoS attack at 46 million rps

Over the past few years, Google has observed that distributed denial-of-service (DDoS) attacks are increasing in frequency and growing in size exponentially. Today’s internet-facing workloads are at constant risk of attack with impacts ranging from degraded performance and user experience for legitimate users, to increased operating and hosting costs, to full unavailability of mission critical workloads. Google Cloud customers are able to use Cloud Armor to leverage the global scale and capacity of Google’s network edge to protect their environment from some of the largest DDoS attacks ever seen.On June 1, a Google Cloud Armor customer was targeted with a series of HTTPS DDoS attacks which peaked at 46 million requests per second. This is the largest Layer 7 DDoS reported to date—at least 76% larger than the previously reported record. To give a sense of the scale of the attack, that is like receiving all the daily requests to Wikipedia (one of the top 10 trafficked websites in the world) in just 10 seconds.Cloud Armor Adaptive Protection was able to detect and analyze the traffic early in the attack lifecycle. Cloud Armor alerted the customer with a recommended protective rule which was then deployed before the attack ramped up to its full magnitude. Cloud Armor blocked the attack ensuring the customer’s service stayed online and continued serving their end-users.Figure 1: DDoS attack graph peaking at 46M requests per second.What happened: Attack analysis and timelineStarting around 9:45 a.m. PT on June 1, 2022, an attack of more than 10,000 requests per second (rps) began targeting our customer’s HTTP/S Load Balancer. Eight minutes later, the attack grew to 100,000 requests per second. Cloud Armor Adaptive Protection detected the attack and generated an alert containing the attack signature by assessing the traffic across several dozen features and attributes. The alert included a recommended rule to block on the malicious signature. The following is the alert showing details of the attack before it ramped to its peaks.Figure 2: Cloud Armor Adaptive Protection alert listing the top region codes detected as a part of the attack.Our customer’s network security team deployed the Cloud Armor-recommended rule into their security policy, and it immediately started blocking the attack traffic. In the two minutes that followed, the attack began to ramp up, growing from 100,000 rps to a peak of 46 million rps. Since Cloud Armor was already blocking the attack traffic, the target workload continued to operate normally. Over the next few minutes, the attack started to decrease in size, ultimately ending 69 minutes later at 10:54 a.m. Presumably the attacker likely determined they were not having the desired impact while incurring significant expenses to execute the attack. Analyzing the attackIn addition to its unexpectedly high volume of traffic, the attack had other noteworthy characteristics. There were 5,256 source IPs from 132 countries contributing to the attack. As you can see in Figure 2 above, the top 4 countries contributed approximately 31% of the total attack traffic. The attack leveraged encrypted requests (HTTPS) which would have taken added computing resources to generate. Although terminating the encryption was necessary to inspect the traffic and effectively mitigate the attack, the use of HTTP Pipelining required Google to complete relatively few TLS handshakes.  Approximately 22% (1,169) of the source IPs corresponded to Tor exit nodes, although the request volume coming from those nodes represented just 3% of the attack traffic. While we believe Tor participation in the attack was incidental due to the nature of the vulnerable services, even at 3% of the peak (greater than 1.3 million rps) our analysis shows that Tor exit-nodes can send a significant amount of unwelcome traffic to web applications and services.The geographic distribution and types of unsecured services leveraged to generate the attack matches the Mēris family of attacks. Known for its massive attacks that have broken DDoS records, the Mēris method abuses unsecured proxies to obfuscate the true origin of the attacks.  How we stopped the attackThe attack was stopped at the edge of Google’s network, with the malicious requests blocked upstream from the customer’s application. Before the attack started, the customer had already configured Adaptive Protection in their relevant Cloud Armor security policy to learn and establish a baseline model of the normal traffic patterns for their service. As a result, Adaptive Protection was able to detect the DDoS attack early in its life cycle, analyze its incoming traffic, and generate an alert with a recommended protective rule–all before the attack ramped up. The customer acted on the alert by deploying the recommended rule leveraging Cloud Armor’s recently launched rate limiting capability to throttle the attack traffic. They chose the ‘throttle’ action over a ‘deny’ action in order to reduce chance of impact on legitimate traffic while severely limiting the attack capability by dropping most of the attack volume at Google’s network edge. Before deploying the rule in enforcement mode, it was first deployed in preview mode, which enabled the customer to validate that only the unwelcome traffic would be denied while legitimate users could continue accessing the service. As the attack ramped up to its 46 million rps peak, the Cloud Armor-suggested rule was already in place to block the bulk of the attack and ensure the targeted applications and services remained available. Protecting your applications in the cloudAttack sizes will continue to grow and tactics will continue to evolve. To be prepared, Google recommends using a defense-in-depth strategy by deploying defenses and controls at multiple layers of your environment and your infrastructure providers’ network to protect your web applications and services from targeted web attacks. This strategy includes performing threat modeling to understand your applications’ attack surfaces, developing proactive and reactive strategies to protect them, and architecting your applications with sufficient capacity to manage unanticipated increases in traffic volume. With Google Cloud Armor, you are able to protect your internet facing applications at the edge of Google’s network and absorb unwelcome traffic far upstream from your applications.Related ArticleIntroducing new Cloud Armor features including rate limiting, adaptive protection, and bot defenseCloud Armor strengthens its already formidable defenses with new features to counter advanced L7 attacks and block malicious bots.Read Article
Quelle: Google Cloud Platform