Now you can train TensorFlow machine learning models faster and at lower cost on Cloud TPU Pods

Data scientists, machine learning (ML) engineers, and ML researchers can’t be maximally productive if they have to wait days or weeks for their model training runs to complete.The MLPerf benchmarking effort measures the time it takes for common ML problems to train to a given accuracy, and we contributed both GPU and TPU submissions to MLPerf 0.5, the initial version of this industry-wide benchmark contest. However, real-world ML challenges are much larger than the current MLPerf training tasks, so MLPerf does not yet reflect the systems’ performance on very large problems.As a case study that we believe is more representative of ML training performance in larger-scale settings, we train ResNet-50 v1.5, one of the MLPerf models, on the well-known ImageNet image classification dataset, but we focus on the steady-state training performance observed when processing a much larger collection of images. We have seen broadly similar scaling performance in other ML application domains, including machine translation, speech recognition, language modeling, GAN training, reinforcement learning, etc.Performance comparisonGCP provides a full spectrum of machine learning accelerators, and here we focus on Cloud TPU Pods and on Google Cloud VMs with NVIDIA Tesla V100 GPUs attached. Cloud TPU Pods, now available in alpha, are tightly-coupled supercomputers built with hundreds of Google’s custom Tensor Processing Unit (TPU) chips and dozens of host machines, all linked via an ultrafast custom interconnect.To ensure that anyone can fully reproduce the results described below today, we use well-optimized, open-source TensorFlow 1.12 implementations of ResNet-50 v1.5 (GPU version, TPU version). To simulate a larger-scale ML training scenario (imagine eBay training on tens or hundreds of millions of product images), we run a warm-up epoch on both the GPU and TPU systems before starting our measurements; this excludes one-time setup costs and evaluation costs and ensures that caches are fully filled. All of the systems shown train ResNet-50 to the same quality score of 76% top-1 accuracy. Further details are available on our methodology page.Our results demonstrate that Cloud TPU Pods deliver near-linear speedups for this large-scale training task; the largest Cloud TPU Pod configuration tested (256 chips) delivers a 200X speedup over an individual V100 GPU (see chart below). So instead of waiting for more than 26 hours if you used a single state-of-the-art GPU, a full Cloud TPU v2 pod delivers the same result in 7.9 minutes of training time.Cost comparisonReducing the cost of ML training enables a broader group of ML practitioners to explore a wider range of model architectures and hyperparameters and to train cutting-edge ML models on larger, more representative datasets. As shown below, training ResNet-50 on a full Cloud TPU v2 Pod costs almost 40% less than training the same model to the same accuracy on an n1-standard-64 Google Cloud VM with eight V100 GPUs attached. On top of those savings, the full Cloud TPU Pod completes the training task 27 times faster.Qualitative takeawaysCloud TPUs were designed from the ground up for machine learning. Cloud TPUs excel at synchronous training, which makes it much easier to reproduce results during model development, and the Cloud TPU software stack transparently distributes ML models across multiple TPU devices in a Cloud TPU Pod to help you scale. Furthermore, all Cloud TPUs are integrated with Google Cloud’s high-speed storage systems to ensure that data input pipelines can keep up with the TPUs. There’s no need to manage parameter servers, deal with complicated custom networking configurations, or set up exotic storage systems to achieve unparalleled training performance in the cloud. The same code that runs on a single Cloud TPU can run on a full pod or on mid-sized “slices” in between.Cloud TPU Pods take us one step closer to our goal of making high-performance machine learning more interactive and accessible for everyone. You can iterate in minutes and train ultralarge production models in hours or days instead of weeks. As a result, Cloud TPUs make it easier, faster, and more cost-effective to develop and deploy cutting-edge machine learning models.Getting startedYou can get started with Cloud TPUs by following our quickstart guide, and you can also request access to the Cloud TPU Pods that are available in alpha today. For more guidance on determining whether to try out an individual Cloud TPU or an entire Cloud TPU Pod, check out our documentation here.
Quelle: Google Cloud Platform

MLPerf benchmark establishes that Google Cloud offers the most accessible scale for machine learning training

Today marks the debut of the MLPerf 0.5 benchmark results. These tests have been designed, adopted, and promoted by many industry leaders, and the results show Google Cloud’s TPUs (Tensor Processing Units) and TPU Pods as leading systems for training machine learning models at scale, based on competitive performance across several MLPerf tests. Google Cloud customers can easily use Cloud TPUs at accessible prices today.MLPerf benchmarks measure performance for training workloads across cloud providers and on-premise hardware platforms. MLPerf is designed to establish metrics that help you make informed decisions on how to choose the right infrastructure for your machine learning workloads. Google is a core contributor to the MLPerf benchmark suite, along with many other companies and academic institutions. Each organization conducts its own testing and submits its own results for publication, contributing to a broad survey of machine learning infrastructure available today.For data scientists, ML practitioners, and researchers, building on-premise GPU clusters for training is capital-intensive and time-consuming—it’s much simpler to access both GPU and TPU infrastructure on Google Cloud. We’re pleased to see that MLPerf benchmark results provide evidence that GCP offers the ideal platform to train machine learning models at any scale.Understanding the time-to-accuracy resultsIn our MLPerf submission, we benchmarked accelerators available on our Google Cloud infrastructure, with a focus on our latest Cloud TPUs (versions 2 and 3, both on GCP), and also on our state-of-the-art TPU v3 Pods. We submitted results for ResNet-50, an industry-standard image classification network; NMT, a neural machine translation model; and SSD, a single-shot object detector[1,2,3,4,5,6,7,8,9,10,11,12].The graphic below shows absolute training times, comparing NVIDIA’s best submitted results on a DGX-2 machine (containing 16 V100 GPUs) with results using 1/64th of a TPU v3 Pod (16 TPU v3 chips used for training). The comparison ranges across image classification (ResNet-50), object detection (SSD), and neural machine translation (NMT).Training time comparison between 1/64th of a TPU v3 Pod (16 TPU v3 chips used for training, plus four separate Cloud TPU v2 chips used for evaluation) [9,10,11] and an NVIDIA DGX-2 (16 V100 GPUs) [13,14,15]In summary, Google’s Cloud TPUs and TPU Pods deliver always-available, high-performance training across multiple workloads, ranging from image understanding to language translation. For example, it’s possible to achieve a 19% speed-up with a TPU v3 Pod on a chip-to-chip basis versus the current best-in-class on-premise system when tested on ResNet-501.Making high performance compute accessible to everyoneeBay has been using Cloud TPU Pods for months and has seen a massive reduction in training time:“An important ML task that took more than 40 days to run on our in-house systems completed in just four days on a fraction of a TPUv2 Pod, a 10X reduction in training time,” explains Shuai Zheng, eBay Research Scientist. “This is a game changer—the dramatic increase in training speed not only allows us to iterate faster but also allows us to avoid large up-front capital expenditures.”A full Google CloudTPU v2 Pod is able to achieve similar training times as NVIDIA’s largest-submitted-scale on-premise system consisting of 80 DGX-1s (11.3 min [9] vs. 6.3 min [16] respectively) on the ResNet-50 v1.5 image classification task using fewer than half the ML accelerator chips (256 versus 640).ConclusionCloud TPUs and TPU Pods excel at many machine learning training workloads, from image training to language translation. As machine learning continues to become more and more central to their businesses, enterprises are turning to the cloud for the high performance and low cost of training of ML models. The MLPerf results reveal a 19% TPU performance advantage on a chip-to-chip basis, and even greater speedups and cost savings are possible when working with more realistic ML production workloads. For a detailed analysis of performance and cost in a training scenario with much larger inputs than MLPerf uses, see our companion blog post.Head over to MLPerf.org for the full set of benchmark results. To find out more about Cloud TPUs, read our documentation. You can learn how to get started with individual Cloud TPUs (you can decide between v2 and v3here), or learn how to use Cloud TPUs via Cloud ML Enginehere, or try out Cloud TPUs for free right in your browser using a Colab notebook here.Cloud TPUv2 Pods are currently available in alpha. If you’re interested in using Cloud TPU Pods, you can request access here.1. 19% speedup with a 16-chip TPU v3 system compared to the 16-chip DGX-2 on-premise system from NVIDIA.[*] All results herein are for MLPerf Training v0.5 Closed Division. All results are retrieved from “https://mlperf.org/results” on 12/12/2018. MLPerf is a trademark.[1] ResNet-50 v1.5 result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.1.[2] SSD result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.2.[3] NMT result by Google on TPUv2.8 (4 chips) using TF 1.12. Result id: 0.5.2.4.[4] ResNet-50 v1.5 result by Google on TPUv2.512 + TPUv2.8 (260 chips) using TF 1.12. Result id: 0.5.3.1.[5] ResNet-50 v1.5 result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.1.[6] SSD result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.2.[7] NMT result by Google on TPUv3.8 (4 chips) using TF 1.12. Result id: 0.5.4.4.[8] 8x Volta V100 result by Google on 8x Volta V100 (8 chips) using TF 1.12 and cuDNN 7.4. Result id: 0.5.5.1.[9] ResNet-50 v1.5 result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.1.[10] SSD result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.2.[11] NMT result by Google on TPUv3.32 + TPUv2.8 (20 chips) using TF 1.12. Result id: 0.5.26.4.[12] ResNet-50 v1.5 result by Google on TPUv3.512 + TPUv2.8 (260 chips) using TF 1.12. Result id: 0.5.27.1.[13] ResNet-50 v1.5 result by NVIDIA on DGX-2 (16 chips) using ngc18.11_MXNet, cuDNN 7.4. Result id: 0.5.18.1.[14] SSD result by NVIDIA on DGX-2 (16 chips) using ngc18.11_pyTorch and cuDNN 7.4. Result id: 0.5.19.2.[15] NMT result by NVIDIA on DGX-2 (16 chips) using ngc18.11_pyTorch and cuDNN 7.4. Result id: 0.5.19.4.[16] ResNet-50 v1.5 result by NVIDIA on 80x DGX-1 (640 chips) using ngc18.11_MXNet and cuDNN 7.4. Result id: 0.5.17.1.
Quelle: Google Cloud Platform

Mark your calendar: Google Cloud Next 2019

Google Cloud Next 2019 is coming soon! Save the date: April 9-11, 2019 at Moscone Center in San Francisco.Google Cloud Next is where customers, partners, developers, influencers and the greater global cloud community come to get inspired and learn about Google Cloud Platform, G Suite, Maps, Devices and more great technology and solutions from across Google. Learn from customers and partners, and dive deep through hundreds of breakout sessions, code labs, demos and hands-on training.Unlock new opportunities for your business by interacting with the entire Google Cloud team, including our engineers, product leadership, developer advocates and more. Plus, you’ll meet lots of other cloud developers and practitioners to share stories and ideas.We had more than 23,000 people learning at Next ‘18. Take a look at last year’s recap for a peek at what to expect in April 2019.Next ‘19 will offer more than 500 breakout sessions, led by more than 1,000 Google, customer, and partner speakers. This year, we’ll have even more technical content and learning opportunities. The conference will feature tracks encompassing all aspects of the Google Cloud portfolio: application development, architecture, collaboration and productivity, cost management, data analytics, databases, hybrid cloud, ML and AI, mobility and devices, security, serverless and much more.We’ll also share exciting product news and announcements from across the business, so you’ll be the first to know about what is launching and coming soon. Check out the 100+ announcements we made at Next ‘18.If you are interested in being a sponsor for Next ‘19, send us an email.We’re excited to create a space of inclusivity and learning, foster innovation, and inspire people of all backgrounds to come together at Next ‘19. We’d love to have you join us: registration opens today.
Quelle: Google Cloud Platform

Announcing Cloud DNS forwarding: Unifying hybrid cloud naming

A key part of a successful hybrid cloud strategy is making sure your resources can find each other via DNS, whether they are in the cloud or on-prem. Rather than create separate islands of DNS namespaces, we’ve added new forwarding capability to Cloud DNS, our managed DNS service, letting you easily link your cloud and on-prem environments, so you can use the same DNS service for all your workloads and resources.Built with Cloud DNS’ new network policy capability, DNS forwarding allows you to create bi-directional forwarding zones between your on-prem name servers and Google Cloud Platform’s internal name servers. Currently in beta, DNS forwarding provides the following features and benefits:Outbound forwarding lets your GCP resources use your existing DNS authoritative servers on-prem, including BIND, Active Directory, etc.Inbound forwarding allows on-prem (or other cloud resources) to resolve names via Cloud DNS.Intelligent Google caching improves the performance of your queries;  cached queries do not travel over your connectivity links.DNS forwarding is a fully managed service—no need to use additional software or your own compute and support resources.In a nutshell, DNS forwarding provides a first-class GCP managed service to connect your DNS cloud and on-prem environments, providing unified naming for your workloads and resources. Further, you can use DNS forwarding for inbound traffic, outbound traffic, or both, to support existing or future network architecture needs.DNS is a critical component of tying hybrid cloud architectures together. DNS forwarding in combination with GCP connectivity solutions such as Cloud Interconnect and Cloud VPN creates a seamless and secure network environment between your GCP cloud and on-prem data centers. To learn more, check out the DNS forwarding documentation and get started using DNS forwarding today.
Quelle: Google Cloud Platform

Readers’ choice: Top Google Cloud Platform stories of 2018

We’re wrapping up a busy year here at Google Cloud. As you head into a new year, take a minute to catch up on what happened in 2018—and get some ideas about what you might do in 2019. Here’s what was most popular this year on the Google Cloud Platform (GCP) blog, based on readership, and organized generally by key areas of cloud.Building the right cloud infrastructure for your businessThe many ways to build a cloud infrastructure keep expanding. Container tools like Kubernetes continued to grow in popularity, and we started to learn more about serverless computing possibilities.On the container front, this year brought news of the gVisor sandbox for secure container isolation, so you can run a bigger variety of workloads. Plus, Jib came out this year: It’s an open-source Java containerizer, so you can build containers using familiar Java tools.And at Next ‘18 we announced the Cloud Services Platform, a consistent development framework for your IT resources that gathers together cloud services to automate away tasks across on-prem and cloud infrastructure. The beta release of GPUs attached to preemptible VMs also came this year, making it more affordable to run large-scale ML workloads. And Cloud TPU hardware accelerators arrived (and continued to mature) to speed up and scale ML workloads programmed with TensorFlow.Developing cloud apps on that infrastructureAlong with solid cloud foundations, cloud app development made strides in 2018.News of support for headless Chrome for Google Cloud Functions and Cloud Functions for Firebase got attention. And the newly revamped Cloud Source Repositories made a splash—it’s powered by the same underlying code search infrastructure that Google engineers use every day.Now that you’ve found cloud, what are you gonna do with it?Cloud technology infrastructure really started to mature this year, especially for emerging use cases like machine learning (ML) that need powerful back-end tools.News of the Ethereum cryptocurrency dataset on BigQuery was a hit; it’s publicly available to use for analysis. A partnership with NASA’s Frontier Development Lab brought in Google Cloud to work on simulating and classifying the possible atmospheres of exoplanets.Also popular on the blog this year: We added a PyTorch 1.0 Preview VM image to GCP so you can easily conduct deep learning experimentation with the newest PyTorch framework. Cloud Text-to-Speech made Google’s internal technology, powered by DeepMind, available for uses like call center responses, IoT device speech, and converting text into audio format.And don’t forget the fun that’s powered by cloud, too. A post on the new open-source Agones project got a lot of attention; Agones uses Kubernetes to host and scale dedicated game servers. OpenMatch arrived this year too—this open source project lets game developers bring their own logic to a common matchmaking framework when building multiplayer games.Building the future cloud IT teamCloud technology hasn’t just changed IT infrastructure; it’s changed IT teams and processes as well. Concepts like site reliability engineering (SRE) bring some new ways of thinking about structuring these processes.This popular SRE vs. DevOps blog post laid out the details of how SRE is similar and different from DevOps and describes its availability targets, risk and error budgets, toil budgets and more. Then, there was the Accelerate: State of DevOps 2018 research report, with lots of takeaways based on survey results from DevOps professionals.Managing the modern cloudSome essential cloud management basics also stuck out among all the future-oriented, big-idea projects that got attention this year.The guide to best practices for user account authorization was a useful read for anyone creating, handling and authenticating GCP user accounts. Choosing strong database consistency also struck a chord, with details on why and how it’s important, with a particular focus on Cloud Spanner. Titan Security Keys became available in the Google Store this year. These FIDO security keys include a hardware chip with Google-engineered firmware for strong two-factor authentication.That’s a wrap for 2018! We’re looking forward to seeing what you build (and read) next.
Quelle: Google Cloud Platform

Accelerate your app delivery with Kubernetes and Istio on GKE

It’s no wonder so many organizations have moved all or part of their IT to the cloud; it offers a range of powerful benefits. However, making the jump is often easier said than done. Many organizations have a significant on-premises IT footprint, aren’t quite cloud-ready, and constrained by regulations or lack of consistent security and operating model across on-premises and the cloud.We are dedicated to helping you modernize your existing on-premises IT and move to the cloud at a pace that works for you. To do that, we are leading the charge on a number of open-source technologies for containers and microservices-based architectures. Let’s take a look at some of these and how they can help your organization prepare for a successful journey to the cloud.Toward an open cloud stackAt Google Cloud Next ‘18, we announced Cloud Services Platform, a fully managed solution based on Google open-source technologies. With Cloud Services Platform, you have the tools to transform your IT operations and build applications for today and the future, using containerized infrastructure and a microservices-based application architecture.Cloud Services Platform combines Kubernetes for container orchestration with Istio, the service management platform, helping you implement infrastructure, security, and operations best practices. The goal is to bring you increased velocity and reliability, as well as to help manage governance at the scale you need. Today, we are taking another step towards this vision with Istio on GKE.Think services first with IstioWe truly believe that Istio will play a key role in helping you make the most of your microservices. One way Istio does this is to provide improved visibility and security, making working with containerized workloads easier. With Istio on GKE, we are the first major cloud provider to offer direct integration to a Kubernetes service and simplified lifecycle management for your containers.Istio is a service mesh that lets you manage and visualize your applications as services, rather than individual infrastructure components. It collects logs, traces, and telemetry, which you can use to set and enforce policies on your services. Istio also lets you add security by encrypting network traffic, all while layering transparently onto any existing distributed application—you don’t need to embed any client libraries in your code.Istio securely authenticates and connects your services to one another. By transparently adding mTLS to your service communication, all information is encrypted in transit. Istio provides a service identity for each service, allowing you to create service-level policies that are enforced for each individual application transaction, while providing non-replayable identity protection.  Out of the gate, you can also benefit from Istio’s visibility features thanks to its integration with Stackdriver, GCP’s native monitoring and logging suite. This integration sends service metrics, logs, and traces to Stackdriver, letting you monitor your golden signals (traffic, error rates, and latencies) for every service running in GKE.Istio 1.0 was a key step toward helping you manage your services in a hybrid world, where multiple workloads run in different environments—clouds and on-premises, in containerized microservices or monolithic virtual machines. With Istio on GKE, you get granular visibility, security, and resilience for your containerized applications, with a dead-simple add-on that works out-of-the-box with all your existing applications.Using Istio on GKEThe service-level view and security that Istio delivers are especially important for distributed applications deployed as containerized microservices, and Istio on GKE lets you deploy Istio to your Kubernetes clusters with the click of a button.Istio on GKE works with both new and existing container deployments. It lets you incrementally roll out features, such as Istio security, bringing the benefits of Istio to your existing deployments. It also simplifies Istio lifecycle management by automatically upgrading your Istio deployments when newer versions become available.Today’s Beta availability of Istio on GKE is just the latest of many advancements we have made to make GKE the ideal choice for enterprises. Try Istio on GKE today by visiting the Google Cloud Platform console. To learn more please visit cloud.google.com/istio or the Istio on GKE documentation.Enhancing GKE networkingEarlier this year we announced many new networking features for GKE, including VPC-native clusters, Shared VPC, container-native load balancing and container-native network services for applications running on GKE and self-managed Kubernetes in Google Cloud.With VPC-native clusters , GKE natively supports many VPC features such as scale enhancement, IP management, security checks, and hybrid connectivity etc.Shared VPC lets  you delegate administrative responsibilities to cluster admins while ensuring your critical network resources are managed by network admins.Container-native load balancing lets you program load balancers with containers as endpoints directly for more optimal load balancing. Network services let you use Cloud Armor, Cloud CDN and Identity Aware Proxy natively with your container workloads.We also announced new features to help simplify the configuration of containerized deployments, with some backend and frontend config enhancements. These improvements make everything easier, from identity and access management for network resources to better controls for CDN, Cloud Armor, and load balancing for easier application delivery.Improving GKE securityGCP helps you secure your container environment at each stage of the build-and-deploy lifecycle with software supply chain and runtime security tools. These include integrations to tools from multiple security partners, all on top of Google’s security-focused infrastructure and security best practices. New features like node auto-upgrade and private clusters increase the security options available to GKE users. You can read more about new security features in GKE in “Exploring Containers Security: This year it’s about security.”Delivering Kubernetes apps via GCP MarketplaceEnterprises usually work with a number of partners within their IT environments, whether it’s in the cloud or on-premises. Six months ago, we introduced Kubernetes applications delivered through GCP Marketplace. Kubernetes apps offer more than just a container image; they are production-ready solutions that are integrated with GKE for simple click-to-deploy launches. Once deployed to GKE, Kubernetes apps are managed as full applications, simplifying resource management. You can also deploy Kubernetes apps to non-GKE Kubernetes clusters, whether they’re on-premises or in the cloud, for quick deployment that’s billed alongside other GCP spend.With Kubernetes, your cloud, your wayIf you use containers and Kubernetes, you already know how they can optimize infrastructure resources, reduce operational overhead, and improve application portability. But by standardizing on Kubernetes, you’ve also laid the foundation for improved service management and security, as well as simplified application procurement and deployment, across clouds and on-prem. Stay tuned in the coming months for more about Kubernetes, microservices, and Cloud Services Platform.
Quelle: Google Cloud Platform

Nurture what you create: How Google Cloud supports Kubernetes and the cloud-native ecosystem

At Google Cloud, we talk a lot about our belief in open source and open cloud. But what does that actually mean?Usually, when you’re a leader in an open-source community like Kubernetes and there’s a big event (like this week’s KubeCon North America), that means launching a brand new project. Launches are exciting, but maintaining a successful project like Kubernetes requires sustained investment and maintenance. We find that what really distinguishes a successful open-source project is the day-in day-out nurturing that happens behind the scenes. And it’s more than coding—it’s things like keeping the project safe and inclusive, writing documentation, managing test infrastructure, responding to issues, working in project governance, creating mentoring programs, reviewing pull requests, and participating in release teams. So today, we thought we’d take this opportunity not to announce a project, but rather reflect on some examples of what it means to us to be a part of the open-source cloud-native community.“Open-source software is not free like sunshine, it’s free like a puppy.” – Sarah Novotny, Head of Open Source Strategy for GCPSupporting communities and thinking differentlyFirst and foremost, with Kubernetes, we fully support the core values of the project, as well as provide technical and non-technical contributions in ways that reinforce positive results for the entire community. Since its inception, we’ve remained the top contributor to the project. This is something we’re incredibly proud of, and we hope that our work helps make the entire cloud-native landscape richer.Our commitment to open source also extends to making more impactful events. For example, this year, rather than produce new KubeCon conference swag, we donated diversity scholarships for 2019 to the CNCF instead. This aligns with our desire for inclusivity, and helps cultivate a stronger community. We also co-organized the Kubernetes Contributor Summit, so our community can have critical in-person interactions ahead of the full event.Supporting the existing cloud-native ecosystem: etcdAnother example of our commitment to open source is supporting the etcd distributed key value store, which has now joined the roster of CNCF projects. As the Kubernetes ecosystem matured, we saw the need for more support in this critical component. We dedicated full-time engineers to the project, including an etcd maintainer, and two of the top five code committers in 2018. We led improvements to the etcd release process, expanding release branch support from just the latest minor version to the latest three minor versions. We also dedicated staff to patch management duties and automating the release workflow, and actively helped stabilize etcd, hunting down and fixing issues including a critical boltdb data corruption issue. More recently, we contributed to the rewrite of the etcd client-side load balancer and led efforts to expand the metrics exposed by etcd for monitoring system health and performance.We’re committed to the quality and production readiness of etcd. Our plans include making upgrades safer by adding zero-downtime downgrade support, and expanding test coverage over more version pairings of etcd with Kubernetes. Finally, we’re continually making coordinated improvements to both etcd and the Kubernetes storage layer that interfaces with it to optimize scalability, performance, and ease of operability.Enriching the cloud-native landscapeOur commitment to open source isn’t just limited to supporting communities and existing projects. We also hope to share many of the valuable lessons we have learned while building scalable, secure, and reliable systems, Kubernetes being a prime example.A recent example is gVisor, based on technology Google uses to isolate and secure containerized workloads. As organizations run more heterogenous and less trusted workloads, there’s new interest in containers that provide a secure isolation boundary, and we wanted to share how we’ve been tackling the problem internally with the community. This in turn opened up broader discussions about the security challenges inherent in cloud-native architecture.In an effort to make gVisor more accessible, we integrated it with Minikube, so you can try out gVisor locally, in a VM on your laptop. We’re also actively working to open more of the project’s support infrastructure, plans, and processes, starting with a substantial system call compatibility test suite with more than 1500 tests.Releasing gVisor as an open-source project underscores the many different ways communities can form and contribute across the cloud-native landscape. Sometimes those contributions aren’t explicitly code, but instead feedback or ways to do things better. Being open helps build communities of practice across all technology groups and stakeholders.Improving the cloud-native developer experienceWe understand that the day-to-day life of an application developer can be challenging in the cloud-native world due to multiple points of divergence between how you run your application locally and in a production Kubernetes cluster. Our goal is to reduce these differences so all developers can have a positive experience in the Kubernetes ecosystem.In March we released an important open-source tool for cloud-native development called Skaffold, which allows you to define the build, test and deployment phases of your Kubernetes application with a single yaml file. In the skaffold dev command, this local pipeline is combined with an automated file watcher based on the build definition, creating a fast feedback loop—you can see your source file changes in your deployed app in seconds. This works both locally and in Google Kubernetes Engine (GKE), helping to provide a cohesive workflow.Learn and share: How we cross-pollinate communitiesAnother effort within Google open source is to create templates and other starter materials for emerging projects to use for things like governance and contributions. Our hope is to eventually provide everything necessary to bootstrap a successful open-source project, as well as offer guidance at key inflection points in the project lifecycle. These are distilled from our experience working on projects like Kubernetes, Istio, Knative, and Tensorflow. To further improve these materials, we regularly bring community managers together across projects to discuss shared struggles, opportunities, and lessons learned to avoid repeating antipatterns across projects. Scaling open-source contributions is important, especially if the goal is to ensure consistently positive and inclusive interactions across every project we support.So, as we all celebrate the continued success of Kubernetes, remember to take the time and thank someone you see helping make the community better. It’s up to all of us to foster a cloud-native ecosystem that prizes the efforts of everyone who helps maintain and nurture the work we do together.To stay up to date on what’s going on in the cloud-native community, both from Google and beyond, we urge you to subscribe to the Kubernetes Podcast. And, if you’re interested in getting involved, please visit the links provided below.Kubernetes for container scheduling and management [ Google Cloud | GitHub ]Istio to connect, monitor, and secure microservices [ Google Cloud | GitHub ]Knative to build, deploy, and manage modern serverless workloads [ Google Cloud | GitHub ]Container tools to help entire life-cycle of containerized applications [ Google Cloud | GitHub ]KubeFlow Pipeline to compose, deploy, and manage end-to-end machine learning workflows [ Google Cloud | GitHub ]
Quelle: Google Cloud Platform

Knative: bringing serverless to Kubernetes everywhere

Knative, the open-source framework that provides serverless building blocks for Kubernetes, is on a roll, and GKE serverless add-on, the first commercial Knative offering that we announced this summer, is enjoying strong uptake with our customers. Today, we are announcing that we’ve updated GKE serverless add-on to support Knative 0.2. In addition, today at KubeCon, RedHat, IBM, and SAP announced their own commercial offerings based on Knative. We are excited for this growing ecosystem of products based on Knative.Knative allows developers to easily leverage the power of Kubernetes, the de-facto cross-cloud container orchestrator. Although Kubernetes provides a rich toolkit for empowering the application operator, it offers less built-in convenience for application developers. Knative solves this by integrating automated container build, fast serving, autoscaling and eventing capabilities on top of Kubernetes so you get the benefits of serverless, all on the extensible Kubernetes platform. In addition, Knative applications are fully portable, enabling hybrid applications that can run both on-prem and in the public cloud.Knative plus Kubernetes together form a general purpose platform with the unique ability to run serveless, stateful, batch, and machine learning (ML) workloads alongside one another. That means developers can use existing Kubernetes capabilities for monitoring, logging, authentication, identity, security and more, across all their modern applications. This consistency saves time and effort, reduces errors and fragmentation and improves your time to market. As a user you get the ease of use of Knative where you want it, with the power of Kubernetes when you need it.Knative risingIn the four months since we announced Knative, an active and diverse community of companies has contributed to the project. Google Kubernetes Engine (GKE) users have been actively using the GKE serverless add-on since its launch in July and have provided valuable feedback leading to many of the improvements in Knative 0.2.In addition to Google, multiple partners are now delivering commercial offerings based on Knative. Red Hat announced that you can now start trying Knative as part of its OpenShift container application platform. IBM has committed to supporting Knative on its IBM Cloud Kubernetes Service. SAP is using Knative as part of its SAP Cloud Platform and open-source Kyma project.A consistent experience, with the flexibility to run where you want, resonates with many enterprises and startups. We are pleased that Red Hat, IBM, and SAP are embracing Knative as a powerful open industry-wide approach to serverless. Here’s what Knative brings to each of the new commercial offerings:”The serverless paradigm has already demonstrated that it can accelerate developer productivity and significantly optimize compute resources utilization. However, serverless offerings have also historically come with deep vendor lock-in. Red Hat believes that Knative, with its availability on Red Hat OpenShift, and collaboration within the open source community behind the project, will enable enterprises to benefit from the advantages of serverless while also minimizing lock-in, both from a perspective of application portability, as well as that of day-2 operations management.” – Reza Shafii, VP of product, platform services, at Red Hat“IBM believes open standards are key to success as enterprises are shifting to the era of hybrid multi-cloud, where portability and no vendor lock-in are crucial. We think Knative is a key technology that enables the community to unify containers, apps, and functions deployment on Kubernetes.” – Jason McGee, IBM Fellow, VP and CTO, Cloud Platform. “SAP’s focus has always been centered around simplifying and facilitating end-to-end business processes. SAP Cloud Platform Extension Factory is addressing the need to integrate and extend business solutions by providing a central point of control, allowing developers to react on business events and orchestrate complex workflows across all connected systems. Under the hood, we are leveraging cloud-native technologies such as Knative, Kubernetes, Istio and Kyma. Knative tremendously simplifies the overall architecture of SAP Cloud Platform Extension Factory and we will continue to collaborate and actively contribute to the Knative codebase together with Google and other industry leaders.” – Michael Wintergerst, SVP, SAP Cloud PlatformWe’re excited to deliver enterprise-grade Knative functionality as part of Google Kubernetes Engine, and by its momentum in the industry. To get started, take part in ther GKE serverless add-on alpha. To learn more about the Knative ecosystem, check out our post on the Google Open Source blog.
Quelle: Google Cloud Platform

Expanding our partnership with Palo Alto Networks to simplify cloud security and accelerate cloud adoption

Security remains a top concern and challenge for enterprises, and Google Cloud provides a strong and flexible toolkit to help make a smooth transition to the cloud. We partnered with Palo Alto Networks in 2017 because we both shared a belief that moving to the cloud can help enterprises simplify security, and that improved security will drive cloud adoption. Today we’re expanding that partnership to help more enterprises increase control of their own security in the cloud.As part of our partnership, Palo Alto Networks will run itsApplication Framework on Google Cloud to take advantage of Google Cloud Platform’s secure, durable cloud storage and highly-scalable AI and analytics tools. Services such as BigQuery will help Application Framework customers accelerate time-to-insight as they work to detect and respond to security threats. Palo Alto Networks will also run their GlobalProtect cloud service on Google Cloud Platform. Google Cloud’s reliable, performant, and secure global-scale network and infrastructure offer many advantages for a service to help protect branch and mobile workforces.“This partnership makes us a Google Cloud customer, allowing us to run important cloud-delivered security services at scale and with the benefits of Google’s AI and analytics expertise,” said Varun Badhwar, SVP Products & Engineering for Public Cloud Security at Palo Alto Networks. “We’ll also be working with Google Cloud to offer organizations moving to Google Cloud additional visibility, compliance and security capabilities they need to prevent cyberattacks.”New solutions to help customers automate compliance audits and reportingRedLock from Palo Alto Networks helps organizations manage security risks and achieve and maintain compliance. By monitoring the use of GCP APIs, RedLock delivers real-time visibility across GCP resources, including containerized workloads in Google Kubernetes Engine. This enables continuous compliance monitoring and auto-generated reports for common regulations and standards such as GDPR, HIPAA, PCI DSS, and NIST, eliminating the need for lengthy manual audits. A new integration with GCP’s Security Baseline API (Alpha) means that customers can combine a view of their own security and compliance posture with data from GCP’s infrastructure, a capability not available on any other public cloud.Solutions that will help increase visibility and enhance security analyticsDeep integration of Palo Alto Networks products with Google’s Cloud Security Command Center helps centralize visibility into security and compliance risks on GCP. Palo Alto Networks integrations send alerts from the VM-Series next-generation firewalls, Traps endpoint protection, and RedLock to help provide centralized visibility into security and compliance risks in a Google Cloud environment.Findings from Palo Alto Networks products in the Cloud Security Command Center DashboardThis new functionality complements the already extensive set of joint capabilities that help Google Cloud customers define, enforce, monitor and maintain consistent security policies across on-premises, public cloud, and hybrid environments. For example:The entire line of Palo Alto Networks next-generation firewalls, both physical and virtualized, support standards-based IPsec VPN connectivity to ensure a secure connection from on-premises to Google Cloud. In addition, GlobalProtect cloud service provides secure connectivity to GCP as a service, removing some of the operational burden associated with firewall deployments.Palo Alto Networks VM-Series virtualized firewalls protect and segment cloud workloads in GCP to safeguard against internal and external threats and can be deployed directly from the GCP Marketplace.Panorama network security management provides unified management of both physical and the VM-Series firewalls deployed on-premises and on GCP. Customers can create policies once and enforce them everywhere.Traps helps secure the operating system and applications within workloads on GCP. A lightweight host agent deployed within the cloud instance detects any zero-day exploits and ensures the integrity of the operating system and applications. As attackers uncover vulnerabilities, the agent-based approach can provide protection until organizations are able to patch cloud workloads.Through in-line protection provided by Palo Alto Networks firewall appliances or GlobalProtect cloud service, organizations can understand SaaS usage and build policies to help control risk exposure. They can complement the robust security capabilities in G Suite with the Aperture SaaS security service, which offers additional options for protection of data at rest as well as ongoing monitoring of user activity and administrative configurations.Through our extended partnership, enterprises using Palo Alto Networks offerings on-premises will have an easier path to move to the cloud while leveraging their existing security investments. Organizations who run on Google Cloud will have easy access to security functionality from Palo Alto Networks with enhanced capabilities available only on Google Cloud.“We are pleased to see Google Cloud and Palo Alto Networks strengthening their partnership. Security is a top priority for Broadcom, and we depend on both organizations to help protect our networks, infrastructure, data, and applications,” notes Andy Nallappan, Vice President and Chief Information Officer, Global Information Technology for Broadcom. “We look forward to increased collaboration that will provide us with new capabilities to enhance our security posture and further simplify deployment and operations across our data centers and the cloud.”Start today for freeOrganizations can take a free, 2-week VM-Series Test Drive and learn how the VM-Series can be deployed on GCP to prevent data loss and potential business disruption. Organizations can also sign up for a free, 2-week RedLock Trial to continuously monitor and secure their Google Cloud environment and identify vulnerable resources and potential points of exposure.To learn more about our partnership with Palo Alto Networks please visit this site and read their respective announcement.
Quelle: Google Cloud Platform

Exploring container security: This year, it’s all about security. Again.

Earlier this year at KubeCon in Copenhagen, the message from the community was resoundingly clear: “this year, it’s about security”. If Kubernetes was to move into the enterprise, there were real security challenges that needed to be addressed. Six months later, at this week’s KubeCon in Seattle, we’re happy to report that the community has largely answered that call. In general, Kubernetes has made huge security strides this year, and giant strides on Google Cloud. Let’s take a look at what changed this year for Kubernetes security.Kubernetes attacks in the wildWhere developers go, hackers follow. This year, Kubernetes graduated from the CNCF, and it also earned another badge of honor: weathering its first real security attacks. Earlier this year, several unsecured Kubernetes dashboards made the news for leaking cloud credentials. At the time, Lacework estimated there of over 20,000 public dashboards, 300 were open without requiring any access credentials. (Note that Google Kubernetes Engine no longer deploys this dashboard by default.) Elsewhere, attackers added binaries to images on Docker Hub to mine cryptocurrency, which were then downloaded an estimated five million times and deployed to production clusters.The majority of attacks against containers, however, remain “drive by” attacks—where an attacker is only interested in finding unpatched vulnerabilities to exploit. This means that the best thing you can do to protect your containers is to patch: your base image, your packages, your application code—everything. We expect attackers to start targeting containers more, but since containers make it easier to patch your environment, hopefully they’ll have less success.Luckily, we also saw the community responding to security threats, by donating multiple security-related projects to the CNCF including SPIFFE, OPA, and Project Harbor.Developing container isolation, togetherIsolation was a hot topic for the container community this year, even though there still haven’t been any reports of container escapes in the wild, where an attacker gains control of a container, and uses it to gain control of other containers on the same host. The Kata Containers project kicked things off in December 2017, and other sandboxing technologies quickly followed suit in 2018, including gVisor and Nabla containers. While different in implementation, the goal of each of these technologies is to create a second layer of isolation for containerized workloads and bring defense-in-depth principles to containers, without compromising performance.Container isolation is frequently misunderstood (after all, they don’t contain), and lack of isolation has been a primary argument against adopting them. Unlike virtual machines, containers don’t provide a strong isolation boundary on par with a hypervisor. That makes some users hesitant about running multi-tenant environments—deploying two containers for different workloads on the same VM—because they are worried that the workload in one container affecting the other. To address this, Kubernetes 1.12 added RuntimeClass, which lets you use new sandboxing technologies to isolate individual pods. RuntimeClass gives you the ability to select which runtime to use with each pod, letting you select hardening runtimes like gVisor or Kata depending on how much they trust the workload. With this tooling, the primary argument against containers is now one of its greatest strengths.Protecting the software supply chainAt Google Cloud, we focused our efforts on securing the software supply chain—protecting your container from the base image, to code, to an application image, to what you deploy in production. Recently we released two new products in this space: Container Registry Vulnerability Scanning scans your images for known vulnerabilities; and Binary Authorization lets you enforce your policy requirements at deployment time. Both of these products are currently in beta.Since a container is meant to be immutable, you’re constantly redeploying, and constantly pushing things down your supply chain. Binary Authorization gives you a single enforcement point where you can dictate what’s running in your environment. In addition to the GCP-hosted product, we also published an open-source reference implementation of Kritis, to ensure that your containers are scanned and patched for any known vulnerabilities before you let them into your environment.Hardening GKE and its networkWe keep GKE up to date with Kubernetes open-source releases, but we also introduce new features and new defaults to help you better protect your clusters. We made huge headway in network security recently, namely with the general availability of Private Clusters and Master Authorized Networks. Together, these help you further limit access to your cluster by malicious attackers who are scanning IP addresses for vulnerabilities. Now, you can restrict access to your cluster’s master to a set of whitelisted IP addresses, and can further ensure that your cluster’s nodes only have private IP addresses. And since GKE now works with shared Virtual Private Cloud, your network team can manage this environment directly. To learn more about GKE networking and network security, see the GKE network overview.Then, in the small-but-mighty category, we turned node auto-upgrade on by default in the GCP Console. Unpatched environments are an easy target for attackers, and it only takes one missed security notice or delayed patch to be suddenly vulnerable. Node auto-upgrade delivers security patches automatically to keep your node up to date. Note that on GKE, Google manages and patches the control plane. While you probably didn’t notice it, our team has been very active patching GCP and GKE for Linux and Kubernetes vulnerabilities this year, most notably last week!In addition to new network security features, we are always striving to improve GKE’s default security settings, so you can implement security best practices without having to be a security expert. We’ve consolidated our hardening advice into a single guide that’s easy to follow, and noted when we’ve changed defaults. Note that this is an easy link to share with auditors.There’s so much more we want to do and we’re going to keep on keeping on, so that 2019 can be all about security too. If you’re at KubeCon this year, check out some of our container security talks:How Symlinks Pwned Kubernetes (And How We Fixed It) – Tues Dec 11th, 10:50-11:25Recent Advancements in Container Isolation – Tues Dec 11th, 1:45-2:20This Year, It’s About Security – Tues Dec 11th, 4:30-5:05So You Want to Run Vault in Kubernetes? – Wed Dec 12th, 11:40-12:15Navigating Workload Identity in Kubernetes – Wed Dec 12th, 4:30-5:05Shopify’s $25k Bug Report, and the Cluster Takeover That Didn’t Happen – Thurs Dec 13th, 4:30-5:05Hope to see you there!
Quelle: Google Cloud Platform