Cloud against the storm: Clemson’s 2.1 million VCPU experiment

While the phenomena of climate change are complex, it’s an empirical truth that the number of North Atlantic storms big enough to merit a name each year has increased dramatically in the past 20 years, and is projected to increase further, arguably the most obvious impact on the United States of a shifting global climate. When these storms come ashore, destruction follows, sometimes even death. Authorities face a series of dilemmas as they try to manage the risks to the populations of low-lying coastal areas. When is shelter-in-place the best option, and when must people evacuate? How can evacuations be managed safely and efficiently? When public safety officials get the answers to these questions wrong, tragedy can ensue. A team at Clemson University has been exploring the use of Google Cloud to improve disaster planning and management in ways that are both scientifically interesting and instructive for users of Google Cloud computing services. And in doing so, they set a new record for the scale of computation on a single problem in a public cloud: 2.1 million cores. It was an adventure that I think is worth sharing.The problem: Managing evacuation routesEvacuations from threatened coastal areas are almost entirely done by private automobiles on public roads. The evacuation routes tend to be limited in number and capacity by the terrain and the weather: Bridges and causeways are bottlenecks, and alternate routes across low-lying areas may already be flooded. Optimizing vehicular “throughput” over the available channels is incredibly important—and quite difficult.Brandon Posey of Clemson, under the supervision of Professor Amy Apon, has been working on building traffic flow models based on streaming data from existing street-level traffic cameras. Machine vision software capable of anonymously distinguishing vehicles and vehicular motion exists, but processing and correlating the feeds from multiple cameras at the necessary scale is a herculean computational task. A single evacuation zone can have thousands of these cameras, and the full cycle of an evacuation, cleanup, and return, can take days, even weeks. The Clemson team assumed a 10-day cycle for the experiment, and chose an evacuation area with 8500 available camera feeds, which over 10 days generate 2 million hours of video—about 210TB. TrafficVision, a commercial company that participated in the experiment, provided the software for analysis of the video segments, so that vehicle, incident and anomaly detection on an individual video stream could be treated as a “solved problem” for the purposes of the experiment. The problem itself is “embarrassingly parallel,” in that the interpretations of individual video segments are completely independent of one another, and can be done in any order or all at once.The challenge was in provisioning and operating the infrastructure to run the software at the necessary scale, with the scant advance notice one might have in a real climate emergency. It is simply not practical to have millions of cores “on deck” waiting for a hurricane. Public cloud providers, however, have fungible resources that can be procured at a range of service levels. Non-critical resources can be pre-empted and redeployed for emergency management services. The question becomes whether the computation can be provisioned and executed fast enough, at the necessary scale.Architecting the solutionOne common design pattern in high performance computing on Google Cloud is a “node-out” approach, where the first step is to understand and optimize application performance on a single virtual machine instance “node.” Every application has a “sweet spot” in the space of ratios of CPU cores to memory and other storage, GPUs, and other accelerators. When these can’t be computed from first principles, they are generally straightforward to determine experimentally. And once it has been determined, Google Cloud Platform provides the capability to generate an optimal custom node type that can be carved out of the Google infrastructure, as shown in the diagram below. The customer pays only for the resources actually deployed. The sweet spot for the TrafficVision application was 16 virtual CPUs and 16GB of memory per instance.Once the virtual node has been defined, it’s easy to make virtual clusters of them within Google Cloud. These can be persistent or ephemeral, spun up by a job orchestration or batch management system long enough to run the specific work for which they are adapted, at the level of parallel scaling that provides the most cost-effective results, as shown in the following diagram.The Clemson team started with this node-out approach. The processing of individual video segments by the TrafficVision software runs most cost-effectively on small virtual nodes, each with 16 VCPU “cores” and 16GB of memory. This memory/CPU ratio is significantly lower than the default for compute VM instances. “Custom-16-16384” became the custom instance type for the worker nodes. The fact that the processing of the video segments can be done in any order suggested that the team could also take advantage of Google Cloud’s preemptible VMs. These are offered with a discount of up to 80% in return for reduced service levels of availability—the instance can be shut down at any moment, and cannot run for more than 24 hours. Neither of those constraints were problematic for the experiment, since processing time on an individual video is on the order of an hour, not a day, and the small number of jobs that fail because their resources were preempted can simply be restarted. A tale of scaleBut while the nodes of the virtual supercomputer are small, and potentially ephemeral, they are very numerous—up to 2 million of them (now, that’s a tale of scale). To meet this challenge, the Clemson team used their Provisioning And Workflow management tool, PAW (Go Tigers!). PAW is provider- and workflow-agnostic, allowing the team to use GCP infrastructure with CloudyCluster workflow tooling, which had already been demonstrated to work at a million-core scale.Public cloud services like Google Cloud manage very large pools of resources for large pools of clients. Resource management must necessarily be parallel and distributed to operate at scale. This distribution and parallelism is normally transparent to users, but at the scale of this experiment, it was necessary to understand and adapt to the underlying design.Fairness is an important property of a multi-user operating system, and Google Cloud enforces it in various ways. One of those is limiting the rate at which one user can make resource requests of the shared infrastructure. With no limit, a malicious (or more likely simply erroneous) script could saturate the resource management system, acting as a de facto denial-of-service (DOS) attack. Google Cloud solves this problem by putting quotas on the number of GCP REST API requests that can be made per 100 seconds. The default is 2000. At a minimum, an API request is made to create each VM instance, and another is required to delete it when it is no longer needed.2000 API calls per 100 seconds, 20 per second, is more than enough for most applications, but at the scale of Clemson’s experiment, 93,750 virtual machine instances, it would take more than an hour just to provision or free all the worker nodes. The system supports a tripling of the quota to 6000 per 100 seconds, but even at this level, spinning down after a run would take 26 minutes, during most of which tens of thousands of instances are sitting idle—but nevertheless billable.The key to solving this problem lies in the fact that these quotas are defined per GCP project, not per customer. By using CloudyCluster to federate multiple GCP projects, each associated with a specific Google Cloud region (which generally corresponds to a data center), the quotas can be combined. Not only can the provisioning of projects happen in parallel, but the per-region provisioning, having higher locality, is generally faster. Within each regional project, CloudyCluster instantiated a SLURM batch scheduler per zone in a further level of parallel distribution and federation.But while some management processes, like provisioning, would take too long at this scale if one followed the usual cloud playbook, another was so unexpectedly fast on Google Cloud Platform as to create a problem! Getting past 20 instance creations per second was critical to ramping rapidly enough to the required scale, but each instance needs to create an SSH network connection to a CloudyCluster master node. At the full scale of the experiment, with provisioning running open-loop in parallel in multiple zones, the incoming connection rate exceeded the capacity of that node to accept incoming SSH connections, causing some to fail. What was needed was a means to pace instance creation so as not to overwhelm SSH connection services, without increasing the deployment time. Fortunately, Google Cloud provides such a mechanism.To regulate the flow of creation requests, the Clemson team used batching of GCP API requests. Instead of instance creation running open-loop at the aggregate maximum rate, instances were created in bursts of no more than 5000, with randomized pauses between batches. This allowed administrative connection setup to make enough forward progress to avoid timeouts.Clemson’s final operational model was something like the diagram below. They spread the work across six Google Cloud geographical regions (and the administrative zones within the regions), to parallelize the provisioning process and get beyond the API rate limitations. This also allowed them to cast a wide net when looking for available preemptible cores.This tuned, hierarchical scheme allowed the Clemson team to successfully scale up to 2.14 million virtual CPUs executing in parallel to analyze traffic video. The chart below shows the ramp-up and ramp-down of active VCPUs running in the experiment. 967,000 were running at the end of the first hour, 1.5 million in 1.5 hours, and 2.13 after 3 hours, with a peak of 133,573 concurrent VM instances. The peak Google Cloud Storage IO throughput was 128 GB/s.While some ingenuity was required, the experiment was a success. Using Google Cloud, Posey and the Clemson team showed it is possible to draw on worldwide spare capacity to deploy very large scale emergency computations in the interest of public safety. Most hurricane evacuations won’t require the full 2-million-plus virtual CPUs used here to process evacuation data in real time, but it’s reassuring to know that it’s possible. And a source of some pride to me that it’s possible on Google Cloud.  Kudos to Brandon Posey and Amy Apon of Clemson University, Boyd Wilson and Neeraj Kanhere of TrafficVision, Dan Speck of the Burwood Group, and my Google colleagues Vanessa July, Wyatt Gorman, and Adam Deer.
Quelle: Google Cloud Platform

Leave manual cluster resizing behind with Cloud Dataproc’s autoscaling

Building real-time, interactive data products with open source data and analytics processing technology is not a trivial task. It involves constantly balancing cluster costs with service-level agreements (SLAs). Whether you are using Apache Hadoop and Spark to build a customer-facing web application or a real-time interactive dashboard for your product team, it’s extremely difficult to handle heavy spikes in traffic from a data and analytics perspective.We’re pleased to announce Cloud Dataproc’s new autoscaling capabilities, now generally available, that can remove the need for complex capacity planning that always results in either missed SLAs or resources sitting idle.How can autoscaling help your team?These new capabilities can help a range of teams, whether data engineers building complex ETL pipelines, data analysts running ad hoc SQL queries, or data scientists training a new model. Cloud Dataproc’s autoscaling capabilities allow cluster admins to build ephemeral or long-standing clusters in 90 seconds and apply an autoscaling policy to the cluster to minimize costs and maximize the user experience without manual intervention. Whether you’re part of the team at a technology company building a SaaS application, a telecommunications company analyzing network traffic, or a retailer monitoring clickstream data during the holidays, you no longer have to worry about right-sizing clusters. Here’s a look at some common use cases:Core Cloud Dataproc autoscaling capabilities include:Right-sizing your cluster: Estimating the “right” number of cluster workers (nodes) for a workload is difficult, and a single cluster size for an entire pipeline is often not ideal. Don’t worry about manually right-sizing your cluster with autoscaling. One autoscaling policy, multiple clusters: An autoscaling policy is a reusable configuration that describes how clusters using the autoscaling policy should scale. It defines scaling boundaries, frequency, and aggressiveness to provide fine-grained control over cluster resources throughout the cluster lifetime.Budget optimization: Scale in and scale out clusters while setting limits in the autoscaling policy to make sure you don’t exceed budget. YARN integration:Autoscaling policies integrate with YARN automatically to trigger VM scaling when needed, so you have one central resource management system for all of your Cloud Dataproc jobs.Monitor autoscaling jobs: Integrate with Stackdriver Monitoring to view the metrics from the autoscaling clusters, view the number of Node Managers in your cluster, and understand why autoscaling did or did not scale your cluster. Use Stackdriver Logging to view autoscaler decisions.Multi-region support: Deploy autoscaling clusters in any region where Cloud Dataproc clusters are running. Check out our documentation to access everything you need to get started with Cloud Dataproc autoscaling. Autoscaling is supported through the v1 API on cluster image versions 1.0.99+, 1.1.90+, 1.2.22+, 1.3.0+, and 1.4.0+.
Quelle: Google Cloud Platform

Introducing Batch on GKE—modernizing HPC with Kubernetes in the cloud

One of our most important goals at Google Cloud is to make cloud computing easier, so that you can focus on answering questions that matter the most to you, your business and your users—not on managing infrastructure. Today, we are excited to announce the preview of Batch on Google Kubernetes Engine (GKE), a cloud-native solution for running batch workloads at scale in an optimized manner. Batch on GKE brings the functionality and familiarity of a traditional batch job scheduler into a cloud-first world. It frees your applications from the limitations of fixed-sized compute clusters by dynamically allocating resources to meet the needs of your application.Google Cloud is the home of Kubernetes—originally developed here and released as open source in 2014. GKE is a managed, production-ready environment for deploying containerized applications that relieves your teams from the operational toil of Kubernetes cluster management, allowing you to focus on your business needs. We heard you wanted to bring the benefits of GKE to batch workloads such as media rendering, genomics sequencing, silicon design verification and financial portfolio risk analysis, so we built Batch on GKE to bring what you love about GKE to these real-world batch workloads.The preview release of Batch on GKE comes with the following capabilities:Autoscaling and just-in-time provisioning to ensure you pay for just what you need Rightsizing of virtual machines to tailor fit CPU and memory for the job at hand Smart reuse of virtual machines and smart packing of jobs to reduce waste and the time jobs spend waiting in a queueResource budgets to allocate the maximum spend per teamGraphics Processing Unit (GPU) supportJob submission tool that high performance computing practitioners will find familiarBatch on GKE’s easy-to-use, familiar interface enables you to deliver business results for your batch computing use cases. The following diagram shows how users can leverage Batch on GKE and other Google Cloud services to build a genomics processing and analysis system.Click to enlargeMeeting you where you are with partnersBatch on GKE is a great solution for modernizing your batch workloads. We’re also committed to helping you migrate your existing systems as-is to Google Cloud or augment your on-premise setup by connecting to Google Cloud. We partner with SchedMD, Altair and Univa to integrate their market-leading schedulers with our platform and meet you where you are.  Engineering simulation made easyWe’re also working closely with Rescale to enable their full-service HPC platform to run engineering simulations on Google Cloud and leverage on-demand GCP clusters and virtually unlimited cores. With this integration, you can run and manage simulations from a vast application library, including ANSYS Fluent, LS-DYNA, Star-CCM+ and more. Click here for a list of all the software that Rescale supports on Google Cloud. If you’re attending SuperComputing 2019, be sure to visit the Google Cloud booth (#1363). You can find more information about our SC19 presence here. You can also learn more about how Google Cloud’s flexible infrastructure can accelerate your HPC workloads.Special thanks to Senanu Aggor, Product Marketing, and Annie Ma-Weaver, Partner Manager, for making this blog post possible.
Quelle: Google Cloud Platform

Shifting gears: How the cloud drives digital transformation in the automotive industry

It’s undeniable that technology has improved many facets of modern life. Transportation and mobility, however, continues to be an area where more can be done. For example, look no further than your daily commute. Research by INRIX shows that time spent in traffic has more than doubled in many major cities around the world since 2015.But five megatrends hold the promise of bringing modern technology’s full benefits to the mobility sector, to improve the commuter experience and more: autonomous driving, shared mobility, deeper customer insights, digital manufacturing, and connected cars. These trends have the potential to fundamentally change the way the automotive industry works, and cloud technology will play a major role in getting there. Let’s look at each of these trends and see how we’re helping drive them.Autonomous drivingGiven its impact on both technology and fundamental business models, autonomous driving is perhaps one of the most dramatic revolutions in mobility. Building the infrastructure and providing the tools for companies to build autonomous driving solutions is an area where the cloud can provide great value. At the same time, building an artificial intelligence (AI) system that can be a safe driver is one of the most demanding machine learning (ML) problems to solve at industrial scale.To validate autonomous driving models, companies need to test cars both on the road and in complex digital simulations. These simulations involve massive computational demands and volumes of data, and are often best served with a combination of GPUs and Cloud TPUs (Tensor Processing Units). We custom-built Cloud TPU for AI workloads, and its high-speed network offers over 100 petaflops of performance in a single pod—essentially making it an on-demand supercomputer. So, no matter what your workload is, we have the world-class AI infrastructure to run it as efficiently and inexpensively as possible.Shared mobilityAlmost any vision of improved transportation involves optimizing ridesharing services. Google Maps Platform helps ridesharing and delivery companies improve driver navigation and the overall efficiency of their fleets in a number of ways. At the driver level, developers can embed a Google Maps-powered turn-by-turn navigation experience into their applications. This means drivers don’t have to switch between apps to get directions or information about their next job. It also lets companies retrieve data about a driver’s journey. With this programmatic control and insight into drivers’ behavior, organizations can better allocate drivers, ultimately decreasing drivers’ idle time and customer wait times. One early user of our ridesharing solution reported a 4% reduction in drive times and up to a 48% increase in the accuracy of estimated times of arrival.Gaining deeper customer insightsWhile automotive manufacturers create some of the most iconic consumer products, gaining customer insights can be very difficult. That’s because customer-related data is often fragmented; some resides at dealerships, while some is with the manufacturer itself—potentially distributed over many non-connected systems. Siloed data can lead to a number of inefficiencies, especially around automotive incentive spends. According to McKinsey, these incentives are among the largest expenses that car companies have, but are the least understood and managed. Additionally, discounts and rebates introduce enormous variation and complexity for dealers and original equipment manufacturers (OEMs) around customer money, dealer money, lease discounts, bonuses for members of the National Funeral Directors Association, and rebates for realtors, as a few examples.This complexity can result in real pricing confusion. In a recent study, Cox Automotive’s rates and incentives unit compared interest rates, cash, and incentives through dealer service provider tools and found pricing fluctuations of up to $6,750. And one thing consumers don’t like is uncertainty. J.D. Power’s 2018 U.S. Sales Satisfaction Index shows that 14% of customers who shopped but didn’t buy a vehicle at a dealership said they rejected the store because they had difficulty getting a straight answer on price.With Google Cloud, automotive incentives can be optimized in a way that helps OEMs and dealers control critical costs and eliminate confusion. Using BigQuery, they can ingest, store, and analyze data that can connect the dots between the OEM, dealer, and customer. This process enables incentives to be tracked accurately and consistently. In addition, Google Cloud’s advanced AI/ML tools can investigate patterns in past rebates, so OEMs and dealers can learn whether they’re applying rebates effectively, and even provide automatic triggers to point out potential overspending due to factors like overlapping/duplicative incentives and conflicting rebates. To pull it all together, Google Cloud APIs let dealers connect to their dealer management systems (DMS) for “one source of truth” for data across their systems-of-record. Finally, Google Cloud has a network of systems integrators and other consulting partners in the automotive industry who can help implement these incentive optimization solutions.Digital manufacturingDespite improvements in robotics, AI, and other digital technologies, the automotive manufacturing shop floor remains mired in decades-old systems and siloed data. IDC notes that 77% of manufacturers view digital transformation as an opportunity, making it one of their top priorities. And McKinsey says the new era of automated production and data exchange opens a broad range of use cases that can cut costs, increase yields, and support new manufacturing methods. Automotive companies, however, often want to run manufacturing workloads on premises. This could be for a number of reasons, such as latency, data residency requirements in some countries, and customer preferences to keep data local. Anthos, our managed, cloud-native platform, addresses these challenges, while still allowing customers to develop and operate their systems as if they were in the cloud. As a software-only stack, Anthos runs on customers’ choice of hardware. Moreover, Anthos workloads can seamlessly move to Google Cloud, or other cloud vendors, at any time.Quality inspection is another area where our customers are seeing tangible operational improvements with the help of cloud and machine learning technologies. Edge TPU—our ASIC (application-specific integrated circuit) designed to run at the edge with a small power and physical footprint—can let manufacturers run inspections on the shopfloor. And outside the auto industry, LG CNS is already using our technology to detect defects in LCD panel production.Connected CarsToday’s modern automobile is a supercomputer that generates enormous amounts of data—data that’s captured from as many as 60-100 sensors, and often in real-time. Unfortunately, much of this data is unstructured, in siloed systems, and vulnerable to hacks. At the same time, the electronics and software inside the vehicle are getting exponentially more complex. Between 2010 and 2016, the lines of code required for an average vehicle increased 15-fold, while the complexity of suppliers and processes has also multiplied.We want to make automotive software simpler and help companies draw insights from the data their fleets generate. We’ve already taken a big step with Android, which has seen significant adoption among automotive OEMs. With the cloud, we can offer a fully integrated approach to telemetry, making it much easier to extract and run analytics on vehicle data. Of course, whenever data is moving between organizations, trust and security are of utmost importance. That’s why, with our method, all businesses, including OEMs, retain full control and ownership of their data. We also build our own chips to ensure the integrity of our data centers and your data.Transportation and mobility is a complex field, and its digital transformation has the potential to touch almost every single one of us in some way. Our solutions aim to help automotive companies continue to make that transition, and we look forward to seeing the innovative ways they use cloud technology to reach their goals.
Quelle: Google Cloud Platform

Cloud Run, a managed Knative service, is GA

We want to empower developers no matter where their businesses are in their cloud journey, whether that’s on-prem, operating in a managed Kubernetes environment, or running on a fully managed serverless computing platform. Today, we’re announcing that Cloud Run is generally available, helping developers focus on writing high-value code, regardless of where their organizations are on the path to the cloud. Specifically, we’re announcing:Cloud Run, a fully managed serverless execution environment that lets you run stateless HTTP-driven containers, without worrying about the infrastructure.Cloud Run for Anthos, which lets you deploy Cloud Run applications into an Anthos GKE cluster running on-prem or in Google Cloud.Our commitment to Knative, the open API and runtime environment on which Cloud Run is based, bringing workload portability and the serverless developer experience to your Kubernetes clusters, wherever they may be.Cloud Run fully managedCloud Run brings the best of both serverless and containers together. It allows you to write code in any language you choose, using any binary, without having to worry about managing the underlying infrastructure. Cloud Run offers a natively serverless experience that lets you go from container to URL within seconds, for terrific developer velocity. You’re also charged only for the resources used, billed to the nearest 100 milliseconds. And Cloud Run workloads are totally portable: you can run them fully managed on Google Cloud, on Anthos running on-premises, or Anthos running on Google Cloud, or on a third-party cloud platform that supports Knative. Customers such as Narvar, a post-purchase platform for retailers, are already leveraging the benefits of fully managed Cloud Run.”Google Cloud Run accelerated our developer velocity by 3X by enabling our developers to create new services at scale with ZERO dev-ops involvement. Just as important, Cloud Run helped us create a net-new product and business model, previously not possible because of the constraints of other solutions.” – Ram Ravichandran, CTO NarvarAs a managed compute platform, Cloud Run is production-ready out of the box with monitoring and logging powered by Stackdriver, all integrated into Cloud Console. Cloud Run also has first-class integration with gcloud, GCP’s CLI experience, giving you a consistent interface for deploying and managing Cloud Run services across these platforms.Cloud Run for AnthosIf your organization wants to leverage Kubernetes to modernize and optimize its existing resources, the combination of Anthos and Cloud Run for Anthos can help smooth that transition, so you can modernize in place. With Cloud Run for Anthos, developers can easily write serverless applications and deploy them to the Anthos cluster without having to learn Kubernetes concepts first. Cloud Run for Anthos takes care of scaling your application instances up or down, even down to zero depending on traffic. Cloud Run for Anthos also lets you future-proof your applications. If you want to modernize an on-prem legacy application in place or extend it to the cloud, Cloud Run for Anthos can simply spin up a microservice that talks privately to existing services—all running on the same Anthos cluster. And if you decide you want to move your Cloud Run for Anthos apps to the cloud, it’s just a simple redeploy to the fully managed Cloud Run. You can even choose to move your workloads back to your own datacenter or another third-party cloud running a Knative-compatible environment. Here is what one of our customers, QiTech had to say:”Cloud Run for Anthos brings additional features like application level auto scaling to our Kubernetes clusters, making it very simple to manage the cluster to scale when needed. It takes various actions that follow Kubernetes best practices and abstracts them, allowing us to focus more on our service and less on its infrastructure.” – Danilo Porto, Lead DevOps Engineer, QI TechWe’ve also been collaborating with popular ISV partners to extend the capabilities of Cloud Run for Anthos across key areas like CI/CD, application performance monitoring (APM), and security. We’re excited to announce support from CircleCI, CloudBees, Datadog, Dynatrace, GitLab, JFrog, New Relic, Octarine, Palo Alto Networks, PureSec, Sumo Logic, and Sysdig. These partnerships mean organizations can easily integrate Cloud Run for Anthos using popular ISV solutions and still take advantage of existing tooling investments.Cloud Run for Anthos running on Google Cloud is available as a free trial until May 14, 2020, with no additional charge beyond the cost of GKE.Knative: the basis of Cloud Run Cloud Run is possible, in part, through our longstanding commitment to open source. We started Knative more than a year ago to help developers easily write serverless applications on top of Kubernetes. Working alongside industry leaders such as IBM, Pivotal, Red Hat, SAP and TriggerMesh, we built Knative to provide the essential components you need to build, deploy and manage modern serverless workloads anywhere you choose.The Knative project is supported by more than 100 organizations and approximately 450 individual contributors. Already, organizations such as T-Mobile are building innovative applications for production use cases with Knative. Releases over the last year have focused on codifying best practices from successful real-world Kubernetes frameworks, and collecting feedback to ensure Knative delivers on its promise of portability and ease of use. By bringing these learnings to Cloud Run, we aim to bring the serverless developer experience to your Kubernetes cluster anywhere. Cloud Run, don’t walk, to serverless containersHere at Google, we believe in the power of serverless and the benefits of containers, and give you the best of both with Cloud Run. We also believe that the tools to enable serverless containers should be built in the open, by the community. Check out our serverless quest to give Cloud Run a try. If you’re already using Cloud Run for Anthos, please upgrade your GKE cluster. It’s your journey—we’re excited to be there alongside you.
Quelle: Google Cloud Platform

Deliver an exceptional customer experience with Contact Center AI, now GA

Contact centers are one of the most important ways businesses respond to and meet customer needs. However, the demand for quality customer service comes with its own set of operational challenges. As a result, many businesses are embracing cloud technologies and AI to help provide better and more helpful customer experiences, more efficiently. Contact Center AI, which we announced last July, lets you provide personalized, intuitive customer care from the first “Hello.” Now, we’re pleased to announce that businesses can easily start implementing Contact Center AI today to drive results that matter to your customers and your business.Contact Center AI is now GA, and ready to transform contact centersVirtual Agent and Agent Assist, two features of Contact Center AI that are now both GA, work together to achieve a common goal: improve the customer experience while increasing operational efficiency. Here’s how they do it: Virtual Agent: Customers now have 24/7 access to immediate and personalized conversational self-service, as Virtual Agent automates basic interactions and provides seamless handoffs to human agents for more complex issues through real-time call transcription.Agent Assist: Live agents are empowered with continuous support during their calls, as Agent Assist transcribes calls in real time, identifies customer intent, provides real-time, step by step assistance (recommended articles, workflows, etc.), and automates call dispositions.You can now integrate Contact Center AI with your existing workflows and start seeing results within 3-6 months, thanks to integrations with partners such as Avaya and Mitel, who are GA today, as well as 8×8, Cisco, Five9, Genesys, Salesforce, Twilio, and Vonage. Create an even better conversational experience with key Dialogflow updatesAt the heart of Contact Center AI is Dialogflow, our core technology that builds interfaces such as chatbot and interactive voice responses (IVR), to enable natural and rich conversational experiences. Today, Dialogflow has reached over 1 million developers, supports HIPAA and PCI compliance, and has enhanced support for Virtual Agents: Self-service Dialogflow Telephony Integrations: Helps set up your virtual agent faster, as we’ve expanded Dialogflow Telephony Gateway to include IVR partners. Through partner instructions, you can use existing phone numbers to access your Dialogflow virtual agent and power your IVR systems in no time. Agent validation: Helps agent designers create high-quality virtual agents to improve the overall experience, by providing feedback with a list of errors or warnings that should be fixed to improve the quality and performance of the virtual agent. Entity enhancements: Through these features, we have improved our natural language processing to provide an even more accurate and natural conversational experience. System entity extension: Offers greater accuracy and recognition, by adding additional values to your Dialogflow application to extend system entities. Regular expression entities: Improves return matches through regular expression entities, matching patterns instead of specific terms (examples include: national identification numbers, IDs, license plates, etc.). Fuzzy matching: Provides better entity matching, regardless of word ordering in a value or synonym, to make it easier for developers to create entities.Start seeing results in your Contact Center todayCustomers have started seeing immediate results from Contact Center AI integrations. Here’s what our customers have to say: “To deliver on a best-in-class viewer experience, we need to respond effectively to our viewers whenever they’re reaching out to Hulu for help,” says Matt Kravitz, Director of Viewer Experience at Hulu. “Because of this, we’re always searching for innovative technology that will help our customer support advocates respond and react quickly. Contact Center AI from Google Cloud and Salesforce was the best solution for our needs as it enables recommended responses and next best actions. With these tools, our team can focus more on engaging with our viewers.” “GoDaddy uses Dialogflow, a core component of Contact Center AI, to power our conversational self-help experience. It furthers our ability to provide exceptional care and guidance to our customers, both digitally and while interacting with our human Guides,” says Mira Lynn, Head of Conversational AI at GoDaddy. “We chose Dialogflow because of the promise of flexibility the platform provides. It has enabled us to develop a unique approach that maintains conversation designers’ independence from engineering development cycles, allowing us to move quickly and flexibly to iterate and respond to customer needs. Our agility has allowed us to provide a customer and Guide-first experience, giving our customers the right answers in the mode that they choose, while also enabling our Guides to better serve them.” To find out how Contact Center AI can increase CSAT, deflection rates, and operational efficiency, visit our site, contact your Google sales representative, or request to be contacted.
Quelle: Google Cloud Platform

Announcing Network Intelligence Center—towards proactive network operations

We are very excited to introduce Network Intelligence Center, Google Cloud’s comprehensive network monitoring, verification, and optimization platform across the cloud and on-prem data centers, along with an initial set of modules. Organizations are increasingly adopting multi-cloud implementations and hybrid deployments as a part of their cloud strategy, to deliver the best experiences for their customers around the globe. Networks are at the foundation of this digital transformation, and becoming increasingly complex due to the growing heterogeneity in the customer environment. This puts pressure on network operations teams, who live under the presumption that “the network is guilty until proven innocent.” Unfortunately, today’s fragmented networking tools don’t make their lives any easier, leading to a lack of visibility, long troubleshooting times to resolve connectivity and performance issues, configuration errors being discovered too late in production, and no centralized way to holistically understand and remedy network health.*1, *2 & **3The vision for intelligent and predictive network operationsAdoption of hybrid and multi-cloud is absolutely critical for organizations to remain agile. However, this underscores the need for intelligent and continuous network operations—the promise that the network is doing what it needs to do, in line with business intent. For example, if you have global operations, you may want to ensure that users in a specific geographical region are being served out of the nearest datacenter at all times. To realize this vision, you need proactive network operations that can predict and heal network failures, driven by AI/ML recommendations and remediation. These systems also need to maintain an equilibrium between automation and control, to support customers who are at different points in their journey from reactive towards proactive network operations. In the first phase of fulfilling this vision for intelligent network operations, Network Intelligence Center offers four modules: Connectivity Tests and Network Topology, both in beta; Performance Dashboard and Firewall Metrics & Insights in alpha; with several other modules to follow. Network Topology: Visualize complex networks with insights Google Cloud Platform (GCP) is the first cloud provider to provide comprehensive visibility into your global GCP deployment (not just your single VPC) and its interaction with the public internet, including an organization-wide view of the topology, and associated network performance metrics with Network Topology.Visibility is especially important in cloud to gain trust and confidence since customers do not own the infrastructure. Visualizing cloud topologies is also complicated by the fact that a cloud network is really a mesh, making it hard to understand and navigate the differences between on-premises and cloud environments. Network Topology lets you visualize and navigate your network with structure and insights, simplifying network monitoring and troubleshooting, and helping  you fulfill your compliance requirements.Further, Network Topology provides useful performance metrics for various nodes and edges, enabling metrics-driven troubleshooting, policy checks, and architecture and capacity optimization. For instance, you can easily visualize how your users are being served worldwide and if they are being served optimally out of their nearest geographical regions. You can quickly search through complex networks and drill down into exactly what matters to you, shrinking troubleshooting time. In the event of a network issue, you can go back in time up to six weeks to track the evolution of the network’s topology, to diagnose issues quickly. “Network Intelligence Center has been transformational for us to optimize our network operations. Using Network Intelligence Center we discovered that data transferred to a particular GCE region was significantly higher than expected. By looking at Network Topology we were able to diagnose and fix this issue, and save significant costs,” said Rob Lyon, Enterprise Architect at Kochava, a mobile app analytics company. “Network Intelligence Center helps us quickly identify and troubleshoot network problems. We’ve gained full situational awareness of our network topology, saving us time on monitoring network health,” said Jason Cradit, Senior Director of Technology at Pivvot, a real-time location data and analytics company. Visualize your network with insights: See metrics for nodes and edges. Egress traffic breakdown from one region (europe-west1 here) to other regions can be used for optimizationConnectivity Tests: Diagnose connectivity issues and predict the impact of configuration changes GCP is the first cloud provider to provide a solution based on formal verification techniques that empowers you to diagnose connectivity issues, and proactively verify the impact of configuration changes to prevent outages.75% of network outages and performance issues result from misconfiguration. More often than not, these misconfigurations are discovered in production. Not knowing the impact of making a configuration change in firewall rules or routing rules makes network monitoring reactive rather than proactive, introducing risk and extending mean time to resolution. Network Intelligence Center helps to quickly diagnose connectivity issues and prevent outages by allowing teams to perform on-demand tests with the Connectivity Test module. Connectivity Test enables you to self-diagnose connectivity issues within GCP or GCP to an external IP address which could be on-prem or another cloud, helping to isolate whether the issue is in GCP or not. You can create, save and run tests to help verify the impact of configuration changes and ensure that network intent captured by these tests is not violated, proactively preventing network outages. These tests also help assure network security and compliance. Connectivity Test has been used internally by Google Cloud’s support team to resolve customer issues. “We’re excited that with the help of Network Intelligence Center, we’ll be able to verify that our network connectivity matches intent and quickly troubleshoot network configuration issues,” said David Breise, Cloud & Network Engineer at HIPAAVault, a secure HIPAA-compliant file-sharing provider.Create & save connectivity tests that reflect your intent. Any time you make configuration change, run them to verify  impact.Speed up cloud migrationsIn addition to monitoring, verification, and optimization of the network between on-premises and cloud, Network Intelligence Center can also help accelerate your migration to cloud. With a holistic view into network architecture, health and performance before and after migration, it allows you to plan your cloud migrations efficiently and optimize your architecture. It also helps to identify and debug network issues between on-prem and cloud, and can help you easily track topology changes during migration, as well as view actual traffic flows and performance metrics before and  during the migration. Additional modules: Performance Dashboard and Firewall Metrics & InsightsNetwork Intelligence Center includes two additional modules in alpha: Performance Dashboard and Firewall Metrics & Insights. Performance Dashboard provides real-time network performance visibility into packet loss and latency at a per-project level, a compelling capability for network health monitoring at a project level that reflects a user’s network experience, not offered by any other cloud provider today. These metrics provide insights into your network performance, showing VM-to-VM packet loss and latency data aggregated across zones. In the case of an application performance issue, Performance Dashboard helps the network team quickly determine whether the issue is with the network or with the application. Fill out this form to sign up as an alpha customer for Performance Dashboard. Firewall Metrics & Insights provides visibility into firewall rules usage, helping you optimize firewall rules by revealing insights into shadowed firewall rules. You can access this capability today from the firewall rules page in alpha, and it will be integrated into Network Intelligence Center when it reaches beta. Fill out this form to sign up as an alpha customer for Firewall Metrics & Insights. Moving from reactive to proactive network operationsAs you build more and more applications on a Google Cloud foundation, you need a network operations solution that lets you easily visualize complex, cloud-based topologies, helps you proactively identify and troubleshoot problems, and assists you in making good decisions when running at global scale. We are excited to see you use Network Intelligence Center to adopt a more proactive approach to monitoring network health and to predict, diagnose and verify your network quickly. Stay tuned for new Networking Intelligence Center modules as well as blogs that dive deep into Connectivity Test and Network Topology. Meanwhile, give Connectivity Test and Network Topology a try in the Google Cloud Console. Also, visit our website to learn more about networking and take a tour of Cloud City. And as always, reach out to the Google Cloud Networking team with your questions and feedback.*1 & 2. https://www.channelpartnersonline.com/2019/02/27/multicloud-hybrid-cloud-adoption-growing/**3. https://blog.ipswitch.com/best-practices-in-network-configuration-and-change-management
Quelle: Google Cloud Platform

Monitor cloud costs and create budgets at scale

Monitoring and controlling costs effectively is important, especially as you scale your business in the cloud. As the number of teams, applications, and environments running in your cloud deployment expands, it becomes crucial that you provide granular transparency, accountability, and controls to various parts of your organization that incur cloud costs. When you’re using Google Cloud, our budgets and alerts features are a powerful part of your cost monitoring toolkit, letting you set your target spending and notify key stakeholders if you’re getting off track. We’re dedicated to making it even easier for you to monitor and control your costs at scale so that you can achieve greater predictability, and we’ve recently released several enhancements that help you do just that.With the beta release of the Budget API, you can view, create, and manage budgets programmatically at scale. This is especially important if you’re creating a large number of budgets across your organization. For example, you can integrate the Budget API into your deployment manager, like Terraform, to programmatically create a budget for each new project as it’s created in your organization.We have also added more granular budget filters to make it easier for you to monitor specific slices of your costs across groups of projects and services. For example, an analytics manager can now create a budget to monitor BigQuery spending for the three projects her team uses for business intelligence and building predictive models, as shown here:Once you’ve created a budget, you can also set up cost forecast alerts to proactively receive notifications when you’re projected to exceed your budget for the month and may need to take action. For example, if you set a cost forecast alert threshold to 120% of your monthly budget, then you will receive notifications when you are expected to exceed your budget by more than 20%, like this:How customers are monitoring costs at scale Customers are already seeing the benefits of these new features, like global water, waste, and energy company Veolia. “At Veolia, we automatically create budgets for new projects using the Google Cloud Budget API as a step in our cloud provisioning workflow in ServiceNow.” says Antoine Castex, product manager at Veolia. “By creating a budget for each project, we are able to closely monitor our costs and take action when needed to prevent budget overruns. This empowers application developers at Veolia focus on what they do best—build amazing products that help make the world more sustainable.” Get started with new cost management featuresWith the new Budget API, granular budget filters, and cost forecast alerts, you can proactively monitor costs across your cloud environment, however you organize your business. Get started by visiting the Budgets and Alerts page in the Cloud Console or by trying out the new Budget API. See the Budget API user guide for more details.To learn more about budget alerts and Google Cloud cost management tools, check out the following resources:Documentation: Budget APIDocumentation: Set budget alertsHands-on training: Understanding your GCP costsHands-on training: Optimizing your GCP costs What’s new: Billing release notes
Quelle: Google Cloud Platform

4 steps to a successful cloud migration

Digital transformation and migration to the cloud are top priorities for a lot of enterprises. At Google Cloud, we’re working hard to make this journey easier. For example, we recently launched Migrate for Compute Engine and Migrate for Anthos to simplify cloud migration and modernization. These services have helped customers like Cardinal Health perform successful, large-scale migrations to GCP. But we understand that the migration journey can be daunting. To make things easier, we developed a whitepaper on application migration featuring investigative processes and advice to help you design an effective migration and modernization strategy. This guide outlines the four basic steps you need to follow to migrate successfully and efficiently:  Build an inventory of your applications and infrastructure: Understanding how many items, such as applications and hardware appliances, exist in your current environment is an important first step.Categorize your applications: Analyze the characteristics of all of your applications and evaluate them across two dimensions: migration to cloud, and modernization.Decide whether or not to migrate an application to the cloud: Not all applications should move to the cloud quite yet. The whitepaper lists the questions to ask to determine whether or not to migrate a given application.Pick your migration strategy: For the applications you decided to migrate, decide on your ideal strategy—pure lift and shift, containers, cloud managed services, or a combination thereof.There’s a lot to consider when you start thinking about digital transformation, and every cloud modernization project has its nuances and unique considerations. The secret to success is understanding the advantages and disadvantages of the options at your disposal, and weighing them against what you want to transform and why. To learn how to migrate and modernize your applications with Google Cloud, download this whitepaper.
Quelle: Google Cloud Platform

An update on Chronicle: Continuing to give good the advantage

In June, we announced Chronicle was coming to Google Cloud. Since then, we’ve been hard at work and wanted to give you an update on our progress as well as a sneak peak of what’s to come. Chronicle’s mission has always been to give good the advantage. We’ve done this with our products VirusTotal and Backstory. The same mission underpins how we build and deliver Google Cloud security solutions that work on our platform and across multi-cloud deployments. As a result, we’ve consolidated our security solutions into a single group, enabling teams working on cloud-native controls, security integrations with partners and security analytics (i.e. Backstory) to collaboratively deliver next generation enterprise security services. In one week we’ll take the stage at Next UK, where we are sharing how we are connecting Chronicle and Google Cloud to create a comprehensive security business that will benefit all of our customers. During the opening keynote at Next UK, we’ll be announcing multiple new native capabilities, and also demonstrating upcoming features in Chronicle’s Backstory product, Google Cloud’s flagship offering for hybrid security analytics. Tune into to watch live, or feel free to catch a replay after the event. NCR is an example of a customer already seeing benefits with Chronicle. “We recently became a Backstory customer, and so far, we have been impressed with the innovation we are seeing from Chronicle,” says Bob Varnadoe, CISO of NCR. “With Backstory, we can do in minutes what used to take days with our previous SIEM.”In addition to customer momentum, partners who have been previewing our new capabilities are excited about the investments in the product they are seeing since Chronicle joined Google Cloud:”As an early strategic partner of Chronicle, we work very closely with the Backstory team and I can tell you first hand it has been full steam ahead. We’re excited by the technology and the new capabilities we are building together with Chronicle and Google,”  says Gary Fish, CEO of Fishtech Group.Security is one of the key solutions we are focused on delivering at Google Cloud, and with Chronicle, we’ll reimagine enterprise security services to keep customers safe on GCP, on premises, or on other clouds.See you in London!
Quelle: Google Cloud Platform