3 steps to gain business value from AI

Many customers have asked us this profound question: How do we realize business value from artificial intelligence (AI) initiatives after a proof of concept (POC)?  Enterprises are excited at the potential of AI, and some even create a POC as a first step. However, some are stymied by lack of clarity on the business value or return on investment. As a result we have heard the same question from data science teams that have created machine learning (ML) models that are under-utilized by their organizations.  At Google Cloud, we’re committed to helping organizations of all sizes to transform themselves with AI. We have worked with many of our customers to help them derive value from their AI investments.  AI is a team sport that requires strong collaboration between business analysts, data engineers, data scientists and machine learning engineers. As a result, we recommend discussing the following three steps with your team to realize the most business value from your AI projects:Step 1: Align AI projects with business priorities and find a good sponsor.Step 2: Plan for explainable ML in models, dashboards and displays.Step 3: Broaden expertise within the organization on data analytics and data engineering.Step 1: Align AI projects with business priorities and find a good sponsorThe first step to realizing value from AI is to identify the right business problem and a sponsor committed to using AI to solve that problem. Teams often get excited by the prospect of applying AI to a problem without deeply thinking about how that problem contributes to overall business value. For example, using AI to better classify objects might be less valuable to the bottom line, than, say, a great chatbot. Yet many businesses don’t start with the critical step of aligning the AI project with the business challenges that matter most.  Identify the right business problem. To ensure alignment, start with your organization’s business strategy and key priorities. Identify the business priorities that can gain the most from AI. The person doing this assessment needs to have a good understanding of the most common use cases for AI and ML. It could be a data science director, or a team of business analysts and data scientists.Keep a shortlist of the business priorities that can truly benefit from AI or ML.  During implementation, work through this list starting with the most feasible. By taking this approach, you’re more likely to generate significant business value as you build a set of ML models that solve specific business priorities.   Conversely, if a data science or machine learning team builds great solutions for problems that are not aligned with business priorities, the models they build are unlikely to be used at scale.Find a business sponsor. We’ve also found that AI projects are more likely to be successful when they have a senior executive sponsor that will champion them with other leaders in your organization. Don’t start an AI project without completing this critical step. Once you identify the right business priority, find the senior executive to own it.  Work with their team to get their buy-in and sponsorship. The more senior and committed, the better. If your CEO cares about AI, you can bet most of your employees will.Step 2:  Plan for explainable ML in models, dashboards and displaysAn important requirement from many business users is to have explanations from ML models. In many cases, it is not enough for an ML model to provide an outcome; it’s also important to understand why. Explanations help to build trust in the model’s predictions and offer useful factors with which business users can take action. In regulated industries such as financial services and healthcare, for example, there are regulations that require explanations of decisions. For example, in the United States the Equal Credit Opportunity Act (ECOA) enforced by the the Federal Trade Commission (FTC), gives consumers the right to know why their loan applications were rejected.  Lenders have to tell the consumer the specific reasons why they were rejected. Regulators have been seeking more transparency around how ML predictions are made.Choose new techniques for building explainable ML models. Until recently, most leading ML models have offered little or no explanations for their predictions. However, recent advances are emerging to provide explanations even for the most complex ML algorithms such as deep learning.  These include Local Interpretable Model-Agnostic Explanations (LIME),  Anchor, Integrated Gradients, and Shapley. These techniques offer a unique opportunity to meet the needs of business users even in regulated industries with powerful ML models.  Use the right technique to meet your users’ needs for model explanation. When you build ML models, be prepared to provide explanations globally and locally. Global explanations provide the model’s key drivers, and are the strongest predictors in the overall model. For example, the global explanation from a credit default prediction model will likely show the top predictors of default may include variables such as number of previous defaults, number of missed payments, employment status, length of time with your bank, length of time at your address, etc. In contrast, local explanations provide the reasons why a specific customer is predicted to default, and the specific reason will vary from one customer to another.  As you develop your ML models, build time into your plan to provide global and local explanations. We also recommend gathering user needs to help you choose the right technique for model explanation. For example, many financial regulators do not allow the use of surrogate models for explanations, which rules out techniques like LIME. In this instance, the Integrated Gradients technique would be more suited to this use case.Also, be prepared to share the model’s explanations wherever you show the model’s results — this can be on analytics dashboards, embedded apps or other displays. This will help to build confidence in your ML models. Business users are more likely to trust your ML model if it provides intuitive explanations for its predictions. Your business users are more likely to take action on the predictions if they trust the model. Similarly, with these explanations, your models are more likely to be accepted by regulators.Step 3: Broaden expertise in data analytics and data engineering within your organizationTo realize the full potential of AI, you need good people with the right skills. This is a big challenge for many organizations given the acute shortage of ML engineers — many organizations really struggle to hire them. You can address this skills shortage by upskilling your existing employees and taking advantage of a new generation of products that simplify AI model development.Upskill your existing employees. You don’t always need PhD ML engineers to be successful with ML. PhD ML engineers are great if your applications need research and development, for example, if you were building driverless cars.  But most typical applications of AI or ML do not require PhD experts. What you need instead are people who can apply existing algorithms or even pre-trained ML models to solve real world problems. For example, there are powerful ML models for image recognition, such as ResNet50 or Inception V3, that are available for free in the open source community. You don’t need an expert in computer vision to use them. Instead of searching for unicorns, start by upgrading your existing data engineers and business analysts and be sure they understand the basics of data science and statistics to use powerful ML algorithms correctly.At Google we provide a wealth of ML training — from Qwiklabs to Coursera courses (e.g. Machine Learning with TensorFlow on Google Cloud Platform Specialization or Machine Learning for Business Professionals). We also offer immersive training such as instructor-led courses and a four-week intensive machine learning training program at the Advanced Solutions Lab. These courses offer great avenues to train your business analysts, data engineers and developers on machine learning.Take advantage of products that simplify AI model development. Until recently, you needed sophisticated data scientists and machine learning engineers to build even the simplest of ML models. This workforce required deep knowledge in core ML algorithms in order to choose the right one for each problem. However, that is quickly changing. Powerful but simple ML products such as Cloud AutoML from Google Cloud make it possible for developers with limited knowledge of machine learning to train high-quality models specific to their business needs. Similarly, BigQuery ML enables data analysts  to build and operationalize  machine learning models in minutes in BigQuery using simple SQL queries. With these two products, business analysts, data analysts and data engineers can be trained to build powerful machine learning models with very little ML expertise.Make AI a team sport. Machine learning teams should not exist in silos; they must be connected to analytics and data engineering teams. This will facilitate operationalization of models. Close collaboration between ML engineers and business analysts will help the ML team tie their models to important business priorities through the right KPIs. It also allows business analysts to run experiments to demonstrate the business value of each ML model. Close collaboration between ML and data engineering teams also helps speed up data preparation and model deployment in production. The results of ML models need to be displayed in applications or analytics and operational dashboards. Data engineers are critical in the development of data pipelines that are needed to operationalize models and integrate them into business workflows for the right end users.  It is very tempting to think that you have to hire a large team of ML engineers to be successful. In our experience, this is not always necessary or scalable. A more pragmatic approach to scale is to use the right combination of business analysts working closely with ML engineers and data engineers. A good recommendation is to have six business analysts and three data engineers for each ML engineer. More details on the recommended team structure is available in our Coursera course, Machine Learning for Business Professionals.Conclusion  As many organizations start to explore AI and machine learning, they are confronted with the question of how to realize the business potential of these powerful technologies. Based on our experience working with customers across industries, we recommend the three steps in this blog post to realize business value from AI.To learn more about AI and machine learning on Google Cloud, visit our Cloud AI page.
Quelle: Google Cloud Platform

Leroy Merlin: Transforming the Russian home improvement market with APIs

Editors note:Today we hear from Sergei Lega, enterprise architect at Leroy Merlin Russia, a retail chain specializing in the sale of products for construction, decoration and home furnishing. Read on to learn how Leroy Merlin is using APIs and API management to simplify how partners integrate with its services.Leroy Merlin is expanding our network of retail stores rapidly in Russia, and as part of this expansion we are undertaking a digital transformation. Not only has this process tested our technological capabilities, but it also presents us with the challenge of transforming our mindset. To offer expanded services to our customers, we rely on a rich set of APIs and microservices created and managed in Google Cloud’s Apigee API Platform.Leroy Merlin Russia sells products for construction, home decoration, and furnishing. As a DIY-focused retailer, we see a great opportunity for differentiating ourselves in the marketplace by expanding the types of services we can offer our customers beyond just the sale of our products. We currently have more than 70 partners around Russia focused on three use cases: window installation, kitchen installation, and professional building materials. These partners offer customers and building professionals access to services that enhance their Leroy Merlin customer journey.But we wanted to make it even simpler and more seamless for customers to access these services. That required a clearly defined API strategy. We now offer a set of endpoints, built from microservices and exposed as APIs, that allow us to securely share pricing, inventory, and product information, along with payment services. These services let us connect our platform and services with all the third-party merchants in our ecosystem; they can easily get onboarded, and then upload and synchronize their product databases to the Leroy Merlin Marketplace quickly, and in a scalable environment.Now, when a customer purchases windows online or from one of our stores, they can continue their journey by acquiring necessary measurement and installation services at the same time, even though these services might be provided by one of our partners. The same goes for kitchen installation, which typically requires a complex set of services like plumbing and electricity that the customer would normally need to source independently.When Apigee announced its Istio integration in 2018, we knew that we could simplify and manage our exposure of microservices from an Istio mesh by adding API management capabilities via Istio’s native configuration mechanism. At the moment, we’re using Istio in a few Kubernetes instances, which makes sharing these services inside our development team—and our ability to consume them—much simpler.Apigee’s API management policies and reporting can be applied to any service, so management policies such as API key validation, quota enforcement, and JSON web token validation can be easily controlled from the Apigee UI. In the future, we plan to extend Istio company-wide as a cornerstone of our microservices management, which will provide us with very granular control of traffic flows and access policies. It will also give us 360-degree monitoring and security capabilities, along with service discovery in a multi-cluster environment.Many of our roughly 100 APIs are exposed to third-party developers, but some are exposed internally as well; we are working to make Apigee the focal point for integrations and new service development inside the company. As we continue to develop microservices and attract new developers to our marketplace, we are keeping a mindset of APIs as products, which reflects our customer journey-focused strategy. By the end of 2019, we expect to finish our developer platform and achieve full usability, and at that point we will really begin to scale our ecosystem and start to visualize concrete benefits for Leroy Merlin Russia, our customers, and our partners.Our API journey is all about maximizing connectivity and agility with an API-first architecture in seamless partnership with our partners nationwide. So far, Apigee has been a great partner on this journey.Learn more about API management on Google Cloud by visiting our Apigee page.
Quelle: Google Cloud Platform

Building hybrid blockchain/cloud applications with Ethereum and Google Cloud

Adoption of blockchain protocols and technologies can be accelerated by integrating with modern internet resources and public cloud services. In this blog post, we describe a few applications of making internet-hosted data available inside an immutable public blockchain: placing BigQuery data available on-chain using a Chainlink oracle smart contract. Possible applications are innumerable, but we’ve focused this post on a few that we think are of high and immediate utility: prediction marketplaces, futures contracts, and transaction privacy.Hybrid cloud-blockchain applicationsBlockchains focus on mathematical effort to create a shared consensus. Ideas quickly sprang up to extend this model to allow party-to-party agreements, i.e. contracts. This concept of smart contracts was first described in a 1997 article by computer scientist Nick Szabo. An early example of inscribing agreements into blocks was popularized by efforts such as Colored Coins on the Bitcoin blockchain.Smart contracts are embedded into the source of truth of the blockchain, and are therefore effectively immutable after they’re a few blocks deep. This provides a mechanism to allow participants to commit crypto-economic resources to an agreement with a counterparty, and to trust that contract terms will be enforced automatically and without requiring third party execution or arbitration, if desired.But none of this addresses a fundamental issue: where to get the variables with which the contract is evaluated. If the data are not derived from recently added on-chain data, a trusted source of external data is required. Such a source is called an oracle.In previous work, we made public blockchain data freely available in BigQuery through the Google Cloud Public Datasets Program for eight different cryptocurrencies. In this article, we’ll refer to that work as Google’s crypto public datasets. You can find more details and samples of these datasets in the GCP Marketplace. This dataset resource has resulted in a number of GCP customers developing business processes based on automated analysis of the indexed blockchain data, such as SaaS profit sharing, mitigating service abuse by characterizing network participants, and using static analysis techniques to detectsoftware vulnerabilities and malware. However, these applications share a common attribute: they’re all using the crypto public datasets as an input to an off-chain business process.In contrast, a business process implemented as a smart contract is performed on-chain, and that is of limited utility without having access to off-chain inputs. To close the loop and allow bidirectional interoperation, we need to be not only making blockchain data programmatically available to cloud services, but also cloud services programmatically available on-chain to smart contracts.Below, we’ll demonstrate how a specific smart contract platform (Ethereum) can interoperate with our enterprise cloud data warehouse (BigQuery) via oracle middleware (Chainlink). This assembly of components allows a smart contract to take action based on data retrieved from an on-chain query to the internet-hosted data warehouse. Our examples generalize to a pattern of hybrid cloud-blockchain applications in which smart contracts can efficiently delegate to cloud resources to perform complex operations. We will explore other examples of this pattern in future blog posts.How we built itAt a high level, Ethereum Dapps (i.e. smart contract applications) request data from Chainlink, which in turn retrieves data from a web service built with Google App Engine and BigQuery.To retrieve data from BigQuery, a Dapp invokes the Chainlink oracle contract and includes payment for the parameterized request to be serviced (e.g. gas price at a specified point in time). One or more Chainlink nodes are listening for these calls, and upon observing, one executes the requested job. External adapters are service-oriented modules that extend the capability of the Chainlink node to authenticated APIs, payment gateways, and external blockchains. In this case, the Chainlink node interacts with a purpose-built App Engine web service.On GCP, we implemented a web service using the App Engine Standard Environment. We chose App Engine for its low cost, high scalability, and serverless deployment model. App Engine retrieves data from BigQuery, which hosts the public cryptocurrency datasets. The data we’ve made available are from canned queries, i.e. we aren’t allowing arbitrary data to be requested from BigQuery, but only the results of parameterized queries. Specifically, an application can request the average gas price for either (A) a particular Ethereum block number, or (B) a particular calendar date.After a successful response from the web service, the Chainlink node invokes the Chainlink oracle contract with the returned data, which in turn invokes the Dapp contract and thus triggers execution of downstream Dapp-specific business logic. This is depicted in the figure below.For details on integrating your Dapp, please see our documentation for requesting data from BigQuery via Chainlink. Illustrative queries to BigQuery can be seen for gas price by date and by block number.How to use the BigQuery Chainlink oracleIn this section we’ll describe how useful applications can be built using Google Cloud and Chainlink.Use case 1: Prediction marketplacesParticipants in prediction marketplaces allocate capital to speculate on future events in general. One area of intense interest is which smart contract platform will predominate because, being networks ecosystems, their value will follow a power law (i.e. winner-take-all) distribution. There are many differing opinions about which platform will succeed, as well as how success can be quantified.By using the crypto public datasets, it’s possible for even complex predictions like the recent $500,000 bet about Ethereum’s future state to be settled successfully on-chain. We’ve also documented how the variety, volume, recency, and frequency of Dapp utilization can be measured by retrieving 1-, 7-, and 30-day activity for a specific Dapp.These metrics are known as daily-, weekly-, and monthly-active users and are frequently used by web analytics and mobile app analytics professionals to assess website and app and success.Use case 2: Hedging against blockchain platform riskThe decentralized finance movement is rapidly gaining adoption due to its successful reinvention of the existing financial system in blockchain environments which, on a technical basis, are more trustworthy and transparent than current systems.Financial contracts like futures and options were originally developed to enable enterprises to reduce/hedge their risk related to resources critical to their operation. Similarly, data about on-chain activity such as average gas prices, can be used to create simple financial instruments that provide payouts to their holders in cases where gas prices rise too high. Other qualities of a blockchain network, e.g. block times and/or miner centralization, create risks that Dapp developers want to protect themselves against. By bringing high quality data from the crypto public datasets to financial smart contracts, Dapp developers’ risk exposure can be reduced. The net result is more innovation and accelerated blockchain adoption.We’ve documented how an Ethereum smart contract can interact with the BigQuery oracle to retrieve gas price data at a particular point in time. We’ve also implemented a stub of a smart contract option showing how the oracle can be used to implement a collateralized contract on future gas prices, a critical input for a Dapp to function.Use Case 3: Enabling commit/reveals across Ethereum using submarine sendsOne of the commonly mentioned limitations in Ethereum itself is a lack of transaction privacy, creating the ability for adversaries to take advantage of on-chain data leakage to exploit users of commonly used smart contracts. This can take the form of front-running transactions involving distributed exchange (DEx) addresses. As described in To Sink Frontrunners, Send in the Submarines, the problem of front-running plagues all current DExs and slows down the Decentralized Finance movement’s progress, as exchanges are a key component of many DeFi products/applications.By using the submarine sends approach, smart contract users can increase the privacy of their transactions, successfully avoiding adversaries that want to front-run them, making DExs more immediately useful. Though this approach is uniquely useful in stopping malicious behavior like front-running, it also has its own limitations, if done without an oracle.Implementing submarine sends without an oracle produces blockchain bloat. Specifically, the Ethereum virtual machine allows a contract to see at maximum 256 blocks upstream in the chain, or approximately one hour. This maximum scope limits the practical usefulness of submarine sends because it creates unnecessary denormalization when rebroadcasting of data is required. In contrast, by implementing submarine sends with an oracle, bloat is eliminated because operating scope is increased to include all historical chain data.ConclusionWe’ve demonstrated how to use Chainlink services to provide data from the BigQuery crypto public datasets on-chain.This technique can be used to reduce inefficiencies (submarine sends use case) and in some cases add entirely new capabilities (hedging use case) to Ethereum smart contracts, enabling new on-chain business models to emerge (prediction markets use case).The essence of our approach is to trade a small amount of latency and transaction overhead for a potentially large amount of economic utility. As a concrete example, ordinary submarine sends require on-chain storage that scales O(n) with blocks added to the blockchain, but can be reduced to O(1) if the calling contract waits an extra two blocks to call the BigQuery oracle.We anticipate that this interoperability technique will lead developers to create hybrid applications that take the best of what smart contract platforms and cloud platforms have to offer. We’re particularly interested in bringing Google Cloud Platform’s ML services (e.g. AutoML and Inference APIs).By allowing reference to on-chain data that is out of scope, we improve the operational efficiency of the smart contract platform. In the case of submarine sends, storage consumption that scales O(n) with block height is reduced to O(1), at the trade-off cost of additional transactional latency to interact with an oracle contract.
Quelle: Google Cloud Platform

Sharing enthusiasm from the cloud community for our Looker acquisition

In the week since we announced our intent to acquire Looker, a unified platform for business intelligence, data applications and embedded analytics, we’ve heard from many customers, partners, and industry analysts that are enthusiastic about our decision to provide customers with a comprehensive analytics solution. By combining Looker’s robust business intelligence and analytics platform with BigQuery, our enterprise data warehouse, customers can solve more business challenges, faster—all while remaining in complete control of their data.Here are a few of the many responses we’ve heard about the addition of the Looker analytics platform to our portfolio:“As we serve the evolving needs of our customers, it’s critical for us to empower our teams with information,” said Barbara Sanders, VP and Chief Architect at The Home Depot. “BigQuery and Looker quickly provides our engineering teams with operational data and visualizations to help identify application or infrastructure issues that could impact the customer experience.””Our data platform provides a fully managed service that makes it easy and cost-effective to create, manage and scale advanced analytics capabilities for advertisers and marketers,” said Iain Niven-Bowling, EVP, 2Sixty and Essence/WPP.  “The combination of BigQuery and Looker provide the underlying technology and Google’s acquisition only strengthens this and enables us to continue to build highly valued products on top of this robust end to end solution.”“The data analytics market is rapidly evolving and changing, and 451 Research has identified that businesses who successfully turn data into insight via analytics have a competitive advantage,” said Matt Aslett, Research Vice President, 451 Research. “The acquisition of Looker is a key move for Google Cloud that will increase the value it can provide to customers across its six key industries.”“The acquisition of Looker makes sense for Google Cloud and for their customers,” said Anil Chakravarthy, CEO at Informatica. “Looker’s rich analytics platform complements Google Cloud’s high-scale infrastructure and digital transformation capabilities. As the leader in enterprise cloud data management, we talk to a lot of customers who want to get more value from their data, and we believe every business interested in cloud analytics should be excited about this acquisition.”“Looker provides Google Cloud the ability to provide advanced analytics, visualization, and insight generation to all parts of an organization,” said Tom Galizia, Principal, Deloitte Consulting LLP. “It shows continued commitment and alignment to Google Cloud’s vision of democratizing and supercharging their customer’s data and information. Google organized the world’s information and now they want to do the same for the enterprise.”“Google Cloud is building an end-to-end platform for enterprise transformation and its acquisition of Looker will help bring more complete business intelligence, analytics and visualization capabilities to its customers,” said Sanjeev Vohra, group technology officer and data business group lead at Accenture. “Looker’s support for multiple public clouds and databases aligns well with Google’s multi-cloud approach and we look forward to working with our enterprise clients to implement these capabilities at scale.”I’m personally very excited about all the ways bringing Google Cloud and Looker together can help our customers. We share a common philosophy around solving business problems for customers across all industries while also supporting our customers where they are, be it on Google Cloud, in other public clouds, or on premises. I look forward to sharing more once the deal closes.
Quelle: Google Cloud Platform

Google Cloud networking in-depth: How Andromeda 2.2 enables high-throughput VMs

Here at Google Cloud, we’ve always aimed to provide great network bandwidth for Compute Engine VMs, thanks in large part to our custom Jupiter network fabric and Andromeda virtual network stack. During Google Cloud Next ‘19, we improved that bandwidth even further by doubling the maximum network egress data rate to 32 Gbps for common VM types. We also announced VMs with up to 100 Gbps bandwidth on V100 and T4 GPU accelerator platforms—all without raising prices or requiring you to use premium VMs.Specifically, for any Skylake or newer VM with at least 16 vCPUs, we raised the egress bandwidth cap to 32 Gbps for same-zone VM-to-VM traffic; this capability is now generally available. This includes n1-ultramem VMs, which provide more compute resources and memory than any other Compute Engine VM instance type. There is no additional configuration needed to get that 32 Gbps throughput.Meanwhile, 100 Gbps Accelerator VMs are in alpha, soon in beta. Any VM with eight V100 or four T4 GPUs attached will have bandwidth caps raised to 100 Gbps.These high-throughput VMs are ideal for running compute-intensive workloads that also need a lot of networking bandwidth. Some key applications and workloads that can leverage these high-throughput VMs are:High-performance computing applications, batch processing, scientific modelingHigh-performance web serversVirtual network appliances (firewalls, load balancers)Highly scalable multiplayer gamingVideo encoding servicesDistributed analyticsMachine learning and deep learningIn addition, services built on top of Compute Engine like CloudSQL, Cloud Filestore and some partner solutions can leverage 32 Gbps throughput already.One use case that is particularly network- and compute-intensive is distributed machine learning (ML). To train large datasets or models, ML workloads use a distributed ML framework, e.g., TensorFlow. The dataset is divided and trained by separate workers, which exchange model parameters with each other. These ML jobs consume substantial network bandwidth due to large model size and frequent data exchanges among workers. Likewise, the compute instances that run the worker nodes create high throughput requirements for VMs and the fabric serving the VMs. One customer, a large chip manufacturer, leverages 100 Gbps GPU-based VMs to run these massively parallel ML jobs, while another customer uses our 100 Gbps GPU machines to test a massively parallel seismic analysis application.Making it all possible: Jupiter and AndromedaOur highly-scalable Jupiter network fabric and high-performance, flexible Andromeda virtual network stack are the same technologies that power Google’s internal infrastructure and services.Jupiter provides Google with tremendous bandwidth and scale. For example, Jupiter fabrics can deliver more than 1 Petabit/sec of total bisection bandwidth. To put this in perspective, this is enough capacity for 100,000 servers to exchange information at a rate of 10 Gbps each, or enough to read the entire scanned contents of the Library of Congress in less than 1/10th of a second.Andromeda, meanwhile, is a Software Defined Networking (SDN) substrate for our network virtualization platform, acting as the orchestration point for provisioning, configuring, and managing virtual networks and in-network packet processing. Andromeda lets us share Jupiter networks for many different uses, including Compute Engine and bandwidth-intensive products like Cloud BigQuery and Cloud Bigtable.Since we last blogged about Andromeda, we launched Andromeda 2.2. Among other infrastructure improvements, Andromeda 2.2 features increased performance and improved performance isolation through the use of hardware offloads, enabling you to achieve the network performance you want, even in a multi-tenant environment.Increasing performance with offload enginesIn particular, Andromeda now takes full advantage of the Intel QuickData DMA Engines to offload payload copies of larger packets. Driving the DMA hardware directly from our OS-bypassed Andromeda SDN enables the SDN to spend more time processing packets rather than moving data around. We employ the processor’s IOMMU to provide security and safety isolation for DMA Engine copies.In Google Cloud Platform (GCP), we encrypt all network traffic in transit that leaves a physical boundary not controlled by Google or on behalf of Google. Andromeda 2.2 now utilizes special-purpose network hardware in the Network Interface Card (NIC) to offload that encryption, freeing the host machine’s CPUs to run guest vCPUs more efficiently.Furthermore, Andromeda’s unique architecture allows us to offload other virtual network processing to hardware opportunistically, improving performance and efficiency under the hood without requiring the use of SR-IOV or other specifications that tie a VM to a physical machine for its lifetime. This architecture also enables us to perform a “hitless upgrade” of the Andromeda SDN as needed to improve performance, add features, or fix bugs.Combined, these capabilities have allowed us to seamlessly upgrade our network infrastructure across five generations of virtual networking—increasing VM-to-VM bandwidth by nearly 18X (and more than 50X for certain accelerator VMs) as well as reducing latency by 8X—all without introducing downtime for our customers.Performance isolationAll that performance is meaningless if your VM is scheduled on a host with other VMs that are overloading or abusing the network and preventing your VM from achieving the performance you expect. Within Andromeda 2.2, we’ve made several improvements to provide isolation, ensuring that each VM receives its expected share of bandwidth. Then, for the rare cases when too many VMs are trying to push massive amounts of network traffic simultaneously, we reengineered the algorithm to optimize for fairness.For VM egress traffic, we schedule the act of looking for work on each VM’s transmit queues such that each VM gets its fair share of bandwidth. If we need to throttle a VM because it has reached its network throughput limits, we provide momentary back-pressure to the VM, which causes a well-behaved guest TCP stack to reduce its offered load slightly without causing packet loss.For VM ingress traffic, we use offloads in the NIC to steer packets into per-VM NIC receive queues. Then, similarly to egress, we look for work on each of those queues in proportion to each VM’s fair share of network bandwidth. In the rare event that a VM is receiving an excessive amount of traffic, its per-VM queue fills up and eventually starts dropping packets. Those drops will again cause a well-behaved TCP connection, originating perhaps from another VM or the internet, to back off slightly, preserving performance for that connection. A VM with a badly behaved connection might not back off, due possibly to bugs in a customer’s workload, or even malicious intent. Either way, per-VM receive queues mean we don’t need to drop packets for other VMs on the host, protecting those VMs from the performance pathologies of a bad actor.You can never have too good a networkAt Google we’re constantly working to improve the performance and reliability of our network infrastructure. Stay tuned for new advances from Google Cloud, including low-latency products focused on HPC use cases, and even higher bandwidth VMs. We’d love your feedback and what else you’d like to see in networking. You can reach us at gcp-networking@google.com.
Quelle: Google Cloud Platform

Jupyter Notebook Manifesto: Best practices that can improve the life of any developer using Jupyter notebooks

Many data science teams, both inside and outside of Google, find that it’s easiest to build accurate models when teammates can collaborate and suggest new hyperparameters, layers, and other optimizations. And notebooks are quickly becoming the common platform for the data science community, whether in the form of AI Platform Notebooks, Kaggle Kernels, Colab, or the notebook that started it all, Jupyter.A Jupyter Notebook is an open-source web application that helps you create and share documents that contain live code, equations, visualizations, and narrative text. Because Jupyter Notebooks are a relatively recently-developed tool, they don’t (yet) follow or encourage consensus-based software development best practices.Data scientists, typically collaborating on a small project that involves experimentation, often feel they don’t need to adhere to any engineering best practices. For example, your team may have the odd Python or Shell script that has neither test coverage nor any CI/CD integration.However, if you’re using Jupyter Notebooks in a larger project that involves many engineers, you may soon find it challenging to scale your environment, or deploy to production.To set up a more robust environment, we established a manifesto that incorporates best practices that can help simplify and improve the life of any developer who uses Jupyter tools.It’s often possible to share best practices across multiple industries, since the fundamentals remain the same. Logically, data scientists, ML researchers, and developers using Jupyter Notebooks should carry over the best practices already established by the older fields of computer science and scientific research.Here is a list of best practices adopted by those communities, with a focus on those that still apply today:Our Jupyter Notebooks development manifesto0. There should be an easy way to use Jupyter Notebooks in your organization, where you can “just write code” within seconds.1. Follow established software development best practices: OOP, style guides, documentation2. You should institute version control for your Notebooks3. Reproducible Notebooks4. Continuous Integration (CI)5. Parameterized Notebooks6. Continuous Deployment (CD)7. Log all experiments automaticallyBy following the above guidelines in this manifesto, we want to help you to achieve this outcome:Note: Security is a critical part of Software Development practices. In a future blog we will cover best practices for secure software development with Jupyter Notebooks, currently this topic is not covered in this blog post, but is something critical you must consider.PrinciplesEasy access to Jupyter NotebooksCreating and using a new Jupyter Notebook instance should be very easy. On Google Cloud Platform (GCP), we just launched a new service called AI Platform Notebooks. AI Platform Notebooks is a managed service that offers an integrated JupyterLab environment, in which you can create instances running JupyterLab that come pre-installed with the latest data science and machine learning frameworks in a single click.Follow established software development best practicesThis is essential. Jupyter Notebook is just a new development environment for writing code. All the best practices of software development should still apply:Version control and code review systems (e.g. git, mercurial).Separate environments: split production and development artifacts.A comprehensive test suite (e.g. unitests, doctests) for your Jupyter Notebooks.Continuous integration (CI) for faster development: automate the compilation and testing of Jupyter notebooks every time a team member commits changes to version control.Just as an Android Developer would need to follow the above best practices to build a scalable and successful mobile app, a Jupyter Notebook focused on sustainable data science should follow them, too.Using a version control system with your Jupyter NotebooksVersion control systems record changes to your code over time, so that you can revisit specific versions later. This also lets you develop separate branches in parallel, such as allowing you to perform code reviews and providing CI/CD revision history to know who is the expert in certain code areas.In order to unblock effective use of a version control system like git, there should be a tool well integrated into the Jupyter UI that allows every data scientist on your team to effectively resolve conflicts for the notebook, view the history for each cell, and commit and push particular parts of the notebook to your notebook’s repository right from the cell.Don’t worry, though: if you perform a diff operation in git and suddenly see that multiple lines have changed, instead of one, this is the intended behavior, as of today. With Jupyter notebooks, there is a lot of metadata that can change with a simple one-line edit, including kernel spec, execution info, and visualization parameters. To apply the principles and corresponding workflows of traditional version control to Jupyter notebooks, you need the help of two additional tools:nbdime: tool for “diffing” and merging of Jupyter Notebooksjupyterlab-git: a JupyterLab extension for version control using gitIn this demo, we clone a Github repository, then after this step is completed, we modified some minor parts of the code. If you execute a diff command, you would normally expect git to show only the lines that changed, but as we explained above, this is not true for Jupyter notebooks. nbdime allows you to perform a diff from Jupyter notebook and also from CLI, without the distraction of extraneous JSON output.Reproducible notebooksYou and your team should write notebooks in such a way that anyone can rerun it on the same inputs, and produce the same outputs. Your notebook should be executable from top to bottom and should contain the information required to set up the correct, consistent environment.How to do it?If you are using AI Platform notebooks, for example on the TensorFlow M22 image, this platform information should be embedded in your notebook’s metadata for future use.Let’s say you create a notebook and install TensorFlow’s nightly version. If you execute the same notebook in a different Compute Engine instance, you need to make sure that this dependency is already installed. A notebook should have a notion of dependencies and its dependencies appropriately tracked, this can be in the environment or in the notebook metadata.In summary, a notebook is reproducible if it meets the following requirements:The Compute Engine image and underlying hardware used for creating the Notebook should be embedded in the notebook itself.All dependencies should be installed by the notebook itself.A notebook should be executable from top to bottom without any errors.In this demo we clone a GitHub repository that contains a few notebooks, and then activate the new Nova plugin, which allows you to execute notebooks directly from your Jupyter UI. Nova and its corresponding compute workload runs on a separate Compute Engine instance using Nteract papermill. AI Platform notebooks support this plugin by default—to enable it, run the enable_notebook_submission.sh script.Nova pluginContinuous integrationContinuous integration is a software development practice that requires developers to integrate code into a shared repository. Each check-in is verified by an automated build system, allowing teams to detect problems at early stages. Each change to a Jupyter notebook should be validated by a continuous integration system before being checked in; this can be done using different setups (non-master remote branch, remote execution in local branch, etc)In this demo, we modified a notebook so that it contains invalid Python code, and then we commit the results to git. This particular git repository is connected to Cloud Build. The notebook executes and the commit step fails as the engine finds an invalid cell at runtime. Cloud Build creates a new notebook to help you to troubleshoot your mistake. Once you correct the code, you’ll find that your notebook runs successfully, and Cloud Build can then integrate your code.Parameterized NotebooksReusability of code is another software development best practice.You can think of a production-grade notebook as a function or a job specification: A notebook takes a series of inputs, processes them, and generates some outputs—consistently. If you’re a data scientist you might start running grid search to find your model’s optimal hyperparameters for training, stepping through different parameters such as learning rate, num_steps or batch_size:During notebook execution, you can pass different parameters to your models, and once results are generated, pick the best options using the same notebook. For the previous execution steps, consider using Papermill and its ability to configure different parameters, these parameters will be used by the notebook during execution. This means you can override the default source of data for training or submit the same notebook with different input (for example different learning rate, epochs, etc).In this demo, we execute a notebook passing different extra parameters. Here we’re using information about bike rentals in San Francisco, with the bike rental data stored in BigQuery. This notebook queries the data and generates a top ten list and station map of the most popular bike rental stations, using start and end date as parameters. By tagging the cells with a parameters tags so Papermill can use these options, you can run reuse your notebook without making any updates to it, but still generate a different dashboard.Continuous deploymentEach version of a Jupyter Notebook that has passed all the tests should be used to automatically generate a new artifact and deploy it to staging and production environments.In this demo, we show you how to perform continuous deployment on GCP, incorporating Cloud Functions, Cloud Pub/Sub, and Cloud Scheduler.Now that you’ve established a CI system that generates a tested, reproducible, and parameterized notebook, let’s automate the generation of artifacts for a continuous deployment system.Based on the previous CI system, there is an additional step in CI to upload a payload to Cloud Functions when tests are successful. When triggered, this payload sends the same artifact build request with parameters to Cloud Build, spinning up the instance and storing the results. To add the automation, we’ll orchestrate using Cloud Pub/Sub (message passing) and Cloud Scheduler (cron). The first time the cloud function is deployed, it will create a new Pub/Sub topic and subscribe to it, later any published message will start the cloud function.  This notification is published using Cloud Scheduler, which sends messages based on time. Cloud Scheduler can use different interfaces, for example new data arriving in Cloud Storage or a manual job request.Log all experimentsEvery time you try to train a model, metadata about the training session should be automatically logged. You’ll want to keep track of things like the code you ran, hyperparameters, data sources, results, and training time. This way, you remember past results and won’t find yourself wondering if you already tried running that experiment.ConclusionBy following the guidelines defined above, you can make your Jupyter notebooks deployments more efficient. To learn more, read our AI Platform Notebooks overview.Acknowledgements: Gonzalo Gasca Meza, Developer Programs Engineer and Karthik Ramachandran, Product Manager contributed to this post.
Quelle: Google Cloud Platform

Are you up for the challenge? Get Google Cloud Certified in 3 months

There’s no doubt that cloud skills are in demand. Google Cloud skills are especially in high demand, with a 66.74% increase over the past year in job listings, which is why we rolled out four new certifications at the beginning of this year. So today we’re excited to announce that we are reaffirming our commitment to prepare millions of workers to thrive in a cloud-first world by launching the Google Cloud certification challenge, available in 25 countries (details at the bottom).By signing up for the certification challenge, you’ll get access to a series of free learning resources on Qwiklabs and Coursera to sharpen your cloud architecture knowledge. You’ll also receive additional tips and resources to help prepare you for the Google Cloud Certified Associate Cloud Engineer or Professional Cloud Architect exam. If you successfully certify within 12 weeks of starting the certification challenge, we’ll send you a $100 Google Store voucher to redeem toward the product of your choice.Why get Google Cloud certified?Cloud certifications are a great way for you to demonstrate your skills to the larger IT market. Not only does it validate your cloud skills and experience to recruiters, it demonstrates your value to your current employer. Getting certified can open up opportunities to progress within your company and could help in the next review of your compensation package. For example, the Google Cloud Professional Cloud Architect certification debuted at number one on the top-paying certifications list in the 2019 Global Knowledge survey.   Hear from the Google Cloud certified communityHere’s what a few community members had to say about the certification.   Sign up for the certification challenge todayVisit our certification challenge site to sign up, and start thinking about how you’ll spend that $100! We’ll be cheering you on.Qualifying countries for the certification challenge are: U.S., Canada, Puerto Rico, Australia, Hong Kong, Japan, New Zealand, Singapore, Taiwan, Austria, Belgium, Denmark, Finland, France, Germany, Ireland, Italy, South Korea, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the U.K.
Quelle: Google Cloud Platform

How to efficiently process both real-time and aggregate data with Dataflow

Pub/Sub, Dataflow and BigQuery is a common stack that enables analytic workloads over event data streams. But when choosing the right implementation, many businesses need to consider both real-time constraints and historical analysis over the whole dataset, resulting in trade-offs. But it doesn’t have to be this way.Imagine that we face a scenario where data can be conveniently divided into two categories: (1) actionable events that need to be delivered with stringent latency requirements, and (2) not-so-urgent data that can tolerate some delay. Should we opt for streaming inserts or go with load jobs? Is there a better solution? Spoiler alert: With a clever and simple pipeline design that combines these two worlds, we can meet all our requirements and provide significant cost savings.Where can this be applied?Before we continue, let’s examine some of the business use cases that can benefit from our approach.Fraud detection—Potential fraudulent activity can be flagged immediately while all other transactions are logged to be later used to derive insights or train ML models.Monitoring systems—Anomalies can be detected and instantly alerted while allowing for delay in data under normal conditions. Applications can range from earthquake detection to SRE dashboards.Customer service ticketing systems—Critical issues filed by customers can be prioritized while non-critical issues (like feature requests) can be delayed without impacting the customer experience.Online gaming health checks—By using a representative fraction of the incoming data for quick analysis, we can check that everything is in order while preserving the rest of the data for future deeper analysis or ML projects.In three of the scenarios above, incoming data is classified as either urgent (when there is a need for low latency data) or non-urgent. But this approach can also be applied in other ways. For example, let’s say you need early speculative results (like in the online gaming health check use case described above). By sampling all incoming events, we can get an early analysis while preserving the complete data set for deeper future analysis. In other words, this approach can be easily adapted to stream a representative sample of the data while the rest is completed afterwards with load jobs.Architecture conceptsWithin our Pub/Sub, Dataflow, and BigQuery stack, Dataflow provides simple ways to connect to Pub/Sub and BigQuery via the Apache Beam for Java SDK built-in IO connectors.In our pipeline, we will be reading the real-time events generated by a Pub/Sub topic with the PubsubIO connector. Once data has been processed, we will insert it into the BigQuery destination tables. The BigQueryIO connector provides two ways to insert our data: Load Jobs or Streaming Inserts.With Load Jobs, elements are buffered in Cloud Storage and each batch is written to BigQuery in a single atomic update. On the other hand, with Streaming Inserts, each record will be immediately appended to the BigQuery table and available to be queried within seconds.Choosing the right implementationWe can favor a play-it-safe design in which we stream all data directly into BigQuery. Streaming insert quotas are generous and it’s easy to be within them, but we will be paying for each inserted row, regardless of its urgency. In some of the previous examples, the fraction of high-priority events can be very low. Also, operations such as DML updates are disallowed (on a partition level) when a streaming buffer is attached to the table.Instead, we can leverage load jobs which are free. To satisfy the real-time view of the data, we’ll need to write data very frequently, which can lead us to exhaust the daily load jobs per table quota and hinder query performance, fragmenting the table into an excessive amount of files.An interesting solution is to combine both: use streaming inserts to send urgent events right away and load jobs that contain all events. Herein we develop and (briefly) explain this design choice.We read JSON-formatted messages from Cloud Pub/Sub with an attribute that manifests the event urgency. Events with an urgency factor equal or above the threshold will be stream-ingested into a BigQuery table using a side output. Depending on the urgency category of the event, it will be emitted to a different table. In the event that we need to query data from both tables, a simple UNION statement will suffice.We add a timestamp field to all elements when the row is created. We can retrieve actual processing time even if two events belong to the same batch and were inserted simultaneously.We’ll redirect each output to the corresponding table according to their tag. Changes are straightforward. Note that if we don’t specify the write method it will default to streaming inserts. For load jobs, we add Method.FILE_LOADS and the triggering frequency can be adjusted at will to better suit our use case.In the alternative case where there is not an explicit priority field, we can modify the example to sample the data and send some immediate representative results while the rest is completed afterwards. By using a random function instead of an urgency value, we can get a desired percentage of our data for real-time analysis. There may be some cases where another sampling strategy is preferred and for that you would need to implement your own logic.What are the benefits of this solution?Here are some of the advantages we’ve experienced taking this approach.Direct control on data availability—We can decide upfront which events will be streamed into our destination table.Easier-to-accommodate quotas—Since we are splitting data into streamed rows and batched loads, we can relax both rates.Cost expenditure—Load jobs are free of charge, so we would only pay for the important data that we choose to stream.Avoiding duplicate work—We process elements once and send them to the corresponding side output. BigQueryIO makes the changes for each insert method trivial.Sounds great, doesn’t it?Optimizing the pipeline furtherConsidering additional best practices in an upfront design phase can be the icing on the cake in terms of optimizing performance and cost:The amount of records written to the table can be huge and can drive up the amount of scanned data in queries over time. Using partitioned tables, our queries can target only the days we want to analyze, thus reducing the analytics cost.Another possible approach would be to have a table that hosts only high urgency events and another table that hosts all events no matter its urgency. In this case, even if it’s just a small fraction, we would be paying for extra storage. Again, we can resort to partitions and set a low TTL (partition expiration time) so that we don’t have to manually clean up the data.Depending on the nature of our data, we can also add clustering into the equation. In this instance, we can force a better collocation of data with a daily query that overwrites the “closed” partition (we don’t expect new data to arrive for that partition) as explained in this documentation.Wrapping upIn this post, we explored a mixed streaming and batch approach in Dataflow to get the best performance out of our data pipeline, taking into consideration project needs and the latest BigQuery features. We considered many factors, such as data availability requirements, code easiness, scalability and cost, and determined the optimal architecture for our use case.  To learn more, about Dataflow and data analytics on Google Cloud, visit our website.Acknowledgements: Guillem Xercavins and Alvaro Gomez, Big Data SMEs, and Berta Izquierdo, Big Data Team Lead, contributed to this post.
Quelle: Google Cloud Platform

Google Cloud named a leader in the Forrester Wave: Data Security Portfolio Vendors, Q2 2019 report

Today, we’re honored to share that Google Cloud was named a Leader in The Forrester Wave™: Data Security Portfolio Vendors, Q2 2019 report. The report evaluates a vendor’s portfolio of offerings specific to data security and includes both cloud and on-premise offerings. Of the 13 vendors evaluated, Google Cloud scored highest in the Strategy category.Making data security easier and scalable for enterprisesThe report notes that Google Cloud customers appreciate our ease of deployment and the scalability of our capabilities. This includes services like Cloud Data Loss Prevention (DLP) that help you discover, classify, and redact sensitive data across your organization. We also continue to work to provide easy-to-adopt ways for our customers to increase visibility into data use, sharing, and protection in their cloud environments. This includes Cloud Security Command Center for Google Cloud Platform (GCP) and Security Center for G Suite, products that help to surface actionable security insights.Security at the heart of Google CloudThe report recognizes that we put security at the center of our strategy at Google Cloud. We’ve written at length in the past about our belief that if you put security first, all else will follow. And we are explicit in our commitment to our Cloud customers: you own your data, and we put you in control.The report also recognizes Google’s strengths around access control granularity when it comes to supporting a “Zero Trust” approach via our BeyondCorp model and Context-Aware Access solutions.To learn more about how Forrester evaluates Google Cloud’s data security portfolio, you can download a complimentary copy of the report here.Google Cloud is a rated Leader by industry analyst firms in many areas. Learn more at our analyst reports page.
Quelle: Google Cloud Platform

On a quest: Learn GKE security and monitoring best practices

Whether you’re running Kubernetes yourself, using our Google Kubernetes Engine (GKE) managed service, or using Anthos, you need visibility into your environment, and you need to know how to secure it. To help you on your way, there are two new educational resources to teach you application observability and security best practices for using Kubernetes at scale.Fashioned as a series of self-paced labs, this learning content will guide you through the most common activities associated with monitoring and securing Kubernetes through a series of complementary hands-on exercises that we call quests.Quest for migration and observability best practicesFor migration and observability best practices, enroll in the Cloud Kubernetes Best Practice quest, which includes the following labs:GKE Migrating to Containers demonstrates containers’ central premise of isolation, restricting resources and portability.Monitoring with Stackdriver on Kubernetes Engine explores how to obtain useful deployment information from code by using Stackdriver’s extensive real-time tooling.Tracing with Stackdriver on Kubernetes Engine explores how to follow application trace events to find potential algorithm improvements.  Logging with Stackdriver on Kubernetes Engine presents common techniques for resource identification and export sink, including an overview of the powerful resource filter.Connect to Cloud SQL from an Application in Kubernetes Engine helps to bridge the divide between containers and non-containers, leveraging design patterns such as the sidecar or ambassador to connect to external resources via the Kubernetes API.On a quest for secure Kubernetes applicationsSimilarly, the Google Kubernetes Engine Security Best Practice quest provides actionable guidance on how to approach Kubernetes security, and includes the following labs:How to Use a Network Policy on GKE discusses the “principle of least privilege” as applied to Kubernetes network policy, illustrating how to achieve granular control over intra-cluster communication.Using Role-based Access Control in Kubernetes Engine shows you how to use RBAC to restrict things such as cluster state changes.Google Kubernetes Engine Security: Binary Authorization highlights a new GKE feature that helps to determine and enforce the provenance of container security.Securing Applications on Kubernetes Engine – Three Examples demonstrates how to use AppArmor to secure an Nginx web server; how to apply policies to unspecified resources using a Kubernetes Daemonset; and how to update pod metadata associated with a deployment with the Kubernetes API’s ServiceAccount, Role, and RoleMapping features.Kubernetes Engine Communication Through VPC Peering walks through the process to expose services between distinct clusters using VPC Peering.Hardening Default GKE Cluster Configurations explores mitigation security issues that can arise from running a cluster based on default settings.When working with infrastructure and application environments, sophisticated observability tools like Stackdriver provide a unified method of monitoring, tracing and logging. Likewise, securing an environment represents an ongoing challenge, but Google Cloud Platform offers a number of tools that help to reduce the complexity, and ensure that deployments follow generally accepted best practices.Ready to begin? Get started with Kubernetes best practice and the GKE Security Best Practice quests. On completion of the quest, you’ll be presented with a Qwiklabs digital badge that you can share on social media.
Quelle: Google Cloud Platform