März 2021 - Seite 35 von 45 - Cloud Computing Köln

Banks are among the most highly regulated businesses on the planet, subject to regulations based on geography as well as their own internal policies and procedures. HSBC, a global giant in commercial and personal banking, has experts all over the world who support thousands of their risk management colleagues each time they need an internal policy question answered. The sheer volume of HSBC’s business means that making everyday risk management decisions can result in tens of thousands of calls to internal policy experts every year. Because questions can come from all over the world, naturally there can be delays due to time zone differences and individual workloads. And depending on the day, people’s responses can vary. Steve Suarez, HSBC’s Global Head of Innovation, Finance & Risk and Gareth Butler, Head of Risk Transformation and innovation Lead for Asia Pacific, thought the bank could use artificial intelligence (AI) to take a fresh approach to operational risk and resilience. The goals would be to answer questions faster and to improve the overall consistency and quality of policy response. Building with Contact Center AI to answer questions quickly and consistentlyHSBC wanted to use AI and machine learning (ML) to reduce the time employees were spending on manually intensive queries, improve the consistency of policy response, and understand what kinds of questions were being asked. The bank envisioned building a solution that would support its large-scale, global environment. After evaluating the main AI solutions on the market, HSBC selected Google Cloud for the project, leveraging their existing strategic relationship. The bank worked with the customer engineering team at Google Cloud, who helped it to connect with partner KPMG’s Innovation Division. Together they formed a project team to architect and deliver Operational Resilience and Risk Application (ORRA), the first of what HSBC hopes will be many FAQ and document search-enabled chatbots. ORRA performs dynamic document search and powers natural conversations with Google Cloud Dialogflow, a core component ofGoogle Cloud Contact Center AI. Easily accessible to all employees from the HSBC intranet, ORRA answers queries on internal policy and framework areas applicable across the bank. And with Dialogflow, HSBC was able to build a conversational platform that quickly and accurately addressed user needs at scale. HSBC chose Dialogflow as a cost efficient, feature-rich solution for large scale conversational FAQ flow. The team created the initial knowledge base consisting of intents, utterances, and answers and used the bulk upload function to load it into the Dialogflow console. The wide range of native features was utilized to finesse intents, train responses, and to create synonyms and entities. They added smalltalk to humanize the bot responses and to create a “real-world personality”. Within the Google Cloud chatbot architecture the team implemented an inhouse document search capability which returns search responses through the friendly user interface (UI). Dialogflow coupled with this search capability enabled a groundbreaking solution for HSBC employees. Users now get direct answers to questions with the flexibility of an app using Natural Language Processing (NLP) to interrogate large documents for supplementary answers in milliseconds – all through the same UI. Dialogflow’s native machine learning and NLP technologies improve the user experience and reduce the setup required to develop complex conversation architectures.Using machine learning to inform decision making“The knowledge gained from analyzing the type, frequency, and source of queries is in itself valuable business intelligence on internal demands for information,” said Suarez. “ORRA learns from every conversation, and at the most basic level, the more questions employees have about a specific policy, the more likely that policy may be due for simplification or revision.” “In addition to providing rapid access to information, ORRA also brings important benefits for learning and development and the embedding of policies and procedures” said Butler. “As query flow increases, the architecture uses machine learning and user feedback to determine the best response to give”.“It’s about the speed of getting your information,” said Suarez. “Before the FAQ chatbot, somebody would ask you a question and if you didn’t know the answer or you gave an inconsistent answer, you’d have to do a bit of research and then come back to them. People go out and use Google to answer questions every day and receive instant, precise responses. Similarly, we’re now getting information to users in a way that feels familiar to them without having to read through an entire policy document.“ He said this gives risk managers access to immediate, accurate policy information and frees up time for subject matter experts to focus on adding value in less routine areas of their jobs. Creating conversation architecture that scales across the organizationFuture versions of ORRA will include guidance on judgment calls. “As we move forward, we’ll be adding in more around risk acceptance, risk issue, and risk relevance,” said Chris Wilson, Head of Architecture, Policy & Regulatory Mapping at HSBC. The next evolution of HSBC’s FAQ chatbot will involve scaling the architecture to capture other policies and procedural data. “This is only the beginning of our conversational AI journey,” adds Butler. The solution can accommodate more policies and documents, it can also be enhanced to support multiple languages, mobile interactions, and speech and can search numeric as well as text-based data stores. “This opens up more possibilities in this space as more information is hosted on the cloud,” Butler said.The bank is focused on how the AI/ML solution can be extended to other areas of the organization. Having invested the resources to develop and implement ORRA, they believe future expansion will be simple and cost-effective. Given the pervasiveness of chatbots across all areas of our lives, Suarez sees this as an imperative for the bank. “In the future, we are going to be interacting with chatbots frequently for routine transactions,” he said. “As a leader in the financial services industry and a technology innovator, HSBC is taking the first steps in using cloud-based chatbot technology to get fast, accurate answers to our customers.”
Quelle: Google Cloud Platform

12. März 2021

da Agency

Transform data to secure it: Use Cloud DLP

When you want to protect data in-motion, at rest or in use, you usually think about data discovery, data loss detection and prevention. Few would immediately consider transforming or modifying data in order to protect it. But doing so can be a powerful and relatively easy tactic to prevent data loss. Our data security vision includes transforming data to secure it, and that’s why our DLP product includes powerful data transformation capabilities.So what are some data modification techniques that you can use to protect your data and the use cases for them?Delete sensitive elementsLet’s start with a simple example: one of the best ways to protect payment card data and comply with PCI DSS is to simply delete it. Deleting sensitive data as soon as it’s collected (or better yet, never collecting it in the first place) saves resources on encryption, data access control and removes – not merely reduces – the risk of data exposure or theft.More generally, deleting the data is one way to practice data minimization. Having less data that attracts the attackers is both a security best practice (one of the few that is as true in the 1980s as in 2020s) and a compliance requirement (for example, it serves as one of the core principles of GDPR)Naturally, there are plenty of types of sensitive data that you can’t simply delete, and for which this strategy will not work, like business secrets or patient information at a hospital. But for many cases, transforming data to protect it satisfies the triad of security, compliance and privacy use cases.In many cases, data retains its full value even when sensitive or regulated elements are removed. Customer support chat logs work just as well after an accidentally shared payment card number is removed. A doctor can make a diagnosis without seeing a Social Security Number (SSN) or Medical Record Number (MRN). Transaction trend analysis works just as well when bank account numbers are not included. For many contexts, the sensitive, personal or regulated parts don’t matter at all. Another area this works well is when a communication’s purpose is satisfied even with data removed. For example, a support rep can help a customer use an app without knowing that customer’s first name and last name.As another example, our DLP system can clean up the datasets used to train an AI, so that the AI systems can learn without being exposed to any personal or sensitive data. Even first and last names can be automatically removed from a stream of data before it’s used to train an AI. Does your DLP do that? In practice, this tactic can be applied to both structured (databases) and unstructured (email, chats, image captures, voice recordings) data. Removing “toxic” elements that are a target for attackers or subject to regulations reduces the risk, and preserves the business value of a dataset.sourceTransforming data as part of DLP goes beyond just deleting it. Various forms of data masking (both static and dynamic) are key to this approach. DLP can mean simply removing sensitive data from view, like obscuring what is shown to a call center employee. Notably, Cloud DLP works on stored or streamed data including unstructured text, structured tabular data, and even images. Paired with services like Speech-to-Text, Cloud DLP can even be used to redact audio data or transcripts. Ultimately, the goal of any DLP strategy is to reduce the risk of sensitive data falling into the wrong hands. This is subtly different and broader than merely securing the data. If we can reduce the risk of holding the data, we in turn reduce the risk of losing it. Replace sensitive elements with safe equivalentsSometimes we can’t remove even small parts of sensitive data, but we can replace them with safer elements through tokenization. This is alsoa feature of Google Cloud DLP.One of the advantages of tokenization is it can be reversible. Tokenization both reduces risk and helps ensure compliance with PCI DSS or other regulations—depending on the data being replaced. We can tokenize sensitive elements of data in storage or during display in order to reduce its risk. An insurance company may collect and use customer driver’s license numbers for record validation, and replace those numbers with a token when displayed elsewhere.Another situation in which tokenization is particularly helpful is when two datasets need to be joined for analysis, and the best place to join them is a sensitive piece of data like an SSN. For instance, when a patient records database needs to be joined to a lab results database, or loan applications to financial records, we can tokenize the sensitive columns of both datasets using the same algorithm and parameters, and they can be joined without exposing any sensitive data. Take fraud analysis as another example. Our case study shows that DLP can be used to remove international mobile subscriber identity (IMSI) numbers from the data stored in BigQuery. The data can be restored later, such as when fraud is confirmed and the investigation is ongoing. Note the staggering volumes of data being processed.Now, some readers may point out that tokenization and DLP are traditionally considered different technologies. Cloud DLP is a broader system that covers both of these functions, as well as several others in one scalable, cloud-native solution. This allows us to solve for the greater goal of reducing risk while retaining the business value of a dataset.Transform personal dataThe risk of losing data is not only that criminals may steal and use it to defraud your company. There’s also the risk of privacy violations, off-policy use and other situations that come from the exposure of personal data. The loss of personal data is a twofold risk; both that of security and of privacy.This makes transformation of data for DLP a worthwhile tactic for both privacy and security purposes. For example, an organization may be sharing data with a partner to run a trend analysis of their mutual customers. Generalizing demographics such as age, zip code, and job title can help reduce the risk of these partial identifiers from linking to a specific individual. This is useful for both citizen data collected by government agencies and healthcare research done by the Universities, for example.Similarly, a user may share transactional data that includes dates that someone could use to triangulate their location, like travel dates, purchase dates, or calendar information. Cloud DLP can prevent this misuse with a date shifting technique that shifts dates per “customer” so that behavioral analysis can still be done, but the actual dates are blurred. Again, this is not a feature of any traditional DLP system.Note that many of these methods are not reversible, and irrevocably change or destroy elements of a dataset. Yet they preserve the value of that dataset for specific business use cases, while reducing the risk inherent in that data. This makes using DLP a worthy consideration for teams looking to reduce both security and privacy risk, while retaining the ability to derive value from a dataset, without having to waste compute resources on encryption and granular access control. The constant balancing act between risk and utility becomes significantly easier when employing this approach.Google Cloud DLP can help you employ all of these strategies. Read more about the future of DLP in Not just compliance: reimagining DLP for today’s cloud-centric world. If you are Google Cloud customer, go here to get started with DLP.Related ArticleNew whitepaper: Designing and deploying a data security strategy with Google CloudOur new whitepaper helps you start a data security program in a cloud-native way and adjust your existing data security program when you …Read Article
Quelle: Google Cloud Platform

12. März 2021

da Agency

Open innovations, scaling data science, and amazing data analytics stories from industry leaders

February might be the shortest month of the year, but it was certainly one of our busiest for data analytics at Google! From our partnership announcement with Databricks to the launch of Dataproc Hub and BigQuery BI Engine, and the incredible journeys of Twitter, Verizon Media, and J.B. Hunt—this month was full of great activities for our customers, partners, and the community at large.Our commitment to an open approach for data analyticsMuch has been written about our launches over the past month, and while it would be too much to list all the great reviews and articles, I thought I’d direct you to SiliconAngle Maria Deutscher’s story from last week on our commitment to an open data analytics approach. Her piece, covering last week’s BI Engine and Materialized Views launches, does a great job highlighting how data analytics, and BigQuery in particular, play a key role in our overall strategy. The average organization has tens (sometimes hundreds) of BI tools. These tools might be ours, our partners’, or custom applications customers have built using packaged and open-source software. We’re delighted by the amazing support this effort has gathered from our partners: from Microsoft to Tableau, Qlik, ThoughtSpot, Superset, and many more.Getting started with BI Engine PreviewWe are committed to creating the best analytics experience for all users by meeting them in the tools they already know and love. That’s why BI Engine works seamlessly with BI tools without requiring any additional changes from end-users. We can’t wait to tell you how customers are adopting this new offering. Join our webinar “Delivering fast and fresh data experiences” by registering here.Running data science securely at scale Running data science at scale has been a challenge for many organizations. Data scientists want the freedom to use the tools they need, while IT leaders need to set frameworks to govern that work. Dataproc Hub is the solution that provides freedom within a governed framework. This new functionality lets data scientists easily scale their work with templated and reusable configurations and ready-to-use big data frameworks. At the same time, it provides administrators with integrated security controls, the ability to set auto scaling policies, auto-deletions, and timeouts to ensure that permissions are always in sync and that the right data is available to the right people. Dataproc Hub is both integrated and open. AI Platform Notebooks customers who want to use BigQuery or Cloud Storage data for model training, feature engineering, and preprocessing greatly benefit from this new functionality. With Dataproc Hub, data scientists can leverage APIs like PySpark and Dask without much setup and configuration work, as well as accelerate their Spark XGBoost pipelines with NVIDIA GPUs to process their data 44x faster at a 14x reduction in cost vs. CPUs. You’ll find more information about our Dataproc Hub launch here, and if you’d like to dive into model training with RAPIDS, Dask, and NVIDIA GPUs on AI Platform, this blog is a great place to start.As Scott McClellan, Sr Director, Data Science Product Group at NVIDIA wrote this past week, it’s time to make “data science at scale more accessible”. We’re proud to count NVIDIA as a partner in this journey!Dataproc in a minuteAs I wrote in my post last month, our goal is to democratize access to data science and machine learning for everyone. You don’t have to be a data scientist to take advantage of Google’s Data Analytics machine learning capabilities. Any Google Workspace user can use machine learning right from Connected Sheets. To get started, check out this blog: How to use a machine learning model from a Google Sheet using BigQuery ML. That’s right, you can tap into the power of machine learning right from Google Sheets, our spreadsheet application which, today, counts over 2 billion users. So, don’t be shy, start using data at scale and make an impact!Building the future together is betterThis past month, we were particularly inspired by Nikhil Mishra’s, Sr. Director of Engineering at Verizon Media, guest post about Verizon Media’s migration journey to the cloud. Mishra dives deep into the process that led to their final decision, from identifying problems to solution requirements to the entire proof of concept used to select BigQuery and Google’s Looker. This is a must-read for those looking for practical guidance to modernize and optimize for scale, performance, and cost.Employing the right cloud strategy is critical to our customers’ transformation journey and if you’re looking for straightforward guidance, another great customer example to follow is Twitter. In his interview with Venturebeat, Twitter platform leader Nick Torno explains how the company leverages Google BigQuery, Dataflow, and Machine Learning to improve the experience of people using Twitter. The piece concludes with guidance for breaking down silos and future-proofing your data analytics environment while delivering value quickly through business use cases.We were also delighted to support J.B. Hunt, one of the largest transportation logistics companies in North America, in their goal to develop new services to digitally transform the shipping and logistics experience for shippers, carriers, and service providers. Real-time data is a cornerstone in the $1 trillion logistics industry, and today’s carriers rely on a patchwork of IT systems across supply chain, capacity utilization, pricing, and transportation execution. J.B. Hunt’s 360 platform aims to centralize data from across these different systems, helping to reduce waste, friction, and inefficiencies.You might also find inspiration in hearing about how Google Cloud is helpingFord transform their automotive technologies and enabling BNY Mellon to better predict billions of dollars in daily settlement failures. We also recently agreed to extend our partnership with the U.S. National Oceanic and Atmospheric Administration (NOAA), empowering them to continue sharing their data more broadly than ever—with some pretty cool results. Feature highlights you might have missedAt Google Cloud, the aim is always to continuously improve and introduce new features and functionality that make a difference for our customers. Last month, we announced the public preview launch of the replication application in Data Fusion to enable low-latency, real-time data replication from transactional and operational databases such as SQL Server and MySQL directly into BigQuery. Data Fusion’s simple, wizard-driven interface lets citizen developers set up replication easily. It comes with an assessment tool that not only identifies schema incompatibilities, connectivity issues, and missing features prior to starting replication, but also provides corrective actions. Replication in Data Fusion means that you’ll benefit from end-to-end visibility: real-time operational dashboards to monitor throughput, latency, and errors in replication jobs, zero-downtime snapshot replication into BigQuery, and support for CDC streams, so users have access to the latest data in BigQuery for analysis and action.Cloud Data Fusion’s integration within the Google Cloud platform ensures that the highest levels of enterprise security and privacy are observed while making the latest data available in your data warehouse for analytics. This launch includes support for Customer-Managed Encryption Keys (CMEK) and VPC-SC. If you’re new to Data Fusion, I suggest you check out Chapter 1 of our blog series on data lake solution architecture with Data Fusion and Cloud Composer. Speaking of fast-moving and ever-changing data, you might want to check out the latest best practices for continuous model evaluation with BigQuery ML by Developer Advocates Polong Lin and Sara Robinson. Their post takes us through a full model’s life cycle—from creating it with BigQuery ML, evaluating data with ML.EVALUATE, creating a Stored Procedure to assess incoming data to using it to insert evaluation metrics into a table. This blog shows the power of an integrated platform built with BigQuery and Cloud Scheduler, and what you can achieve—from using Cloud Functions to visualizing model metrics in Data Studio. It has fantastic guidance that I hope you’ll enjoy!Finally, we also covered data traceability this past month with a post on how to architect a data lineage system using BigQuery, Data Catalog, Pub/Sub & Dataflow. Data lineage is critical for performing data forensics, identifying data dependencies, and above all, securing business data.Data Catalog provides a powerful interface that allows you to sync and tag business metadata to data across Google Cloud services as well as your own on-premises data centers and databases. Read thisgreat article for insights on our recommended architecture for the most common user journeys and start here to build a data lineage system using BigQuery Streaming, Pub/Sub, ZetaSQL, Dataflow, and Cloud Storage.See how BlackRock uses Data Catalog: Data discovery and Metadata management in Action!That’s it for February! I can’t wait to hear back from you about what you think, and I’m looking forward to sharing everything we’ve got coming up in March.
Quelle: Google Cloud Platform

12. März 2021

da Agency

Build your future with GKE

American poet Maya Angelou said ”If you don’t know where you’ve come from, you don’t know where you’re going.” We agree. Today, as we kick off the Build with Google Kubernetes Engine event, and fresh off our GKE Autopilot launch, we wanted to take a step back and reflect on just how far GKE has come. In justsix short years, GKE has become one of the most widely adopted services for running modern cloud-native applications, used by startups and Fortune 500 companies alike. This enthusiasm inspires us to push the limits of what’s possible with Kubernetes, making it easier for you to focus on creating great services for your users, while we take care of your Kubernetes clusters. So let’s take a look at where we’ve been with Kubernetes and where we are today—so we can build the future together. Sustained innovationA lot has changed in the container orchestration space since we created Kubernetes and opened it up to the world more than 6 years ago. It’s a little hard to remember, but back when we first designed Kubernetes, there was no industry standard for managing fleets of containerized applications at scale. Because we had developed so many technical innovations for containers already (e.g., container optimized OS), it was only natural for us to propose a new approach for managing containers—one based on our experience at the time, launching billion containers every week for our internal needs.In 2015, we co-founded the Cloud Native Computing Foundation (CNCF) as a vendor-neutral home for the Kubernetes project. Since then, a diverse, global community of developers has contributed to—and benefitted from—the project. Last year alone, developers from 500+ companies contributed to Kubernetes, and all the major cloud providers have followed in our footsteps in offering a managed Kubernetes service. This broad industry support for the technology we developed helps us deliver on our vision: giving customers the choice to run their workloads where and when they want, without being stuck on a legacy cloud provider with proprietary APIs.Community leadershipSince its inception as an internal Google project, we’ve only continued to invest in Kubernetes. Under the auspices of the CNCF, we’ve made over 680,000 additional contributions to the project, including over 123,000 contributions in 2020. That’s more than all the other cloud providers combined. When you truly want to take advantage of Kubernetes, there’s no match for Google’s expertise—or GKE.We also actively support CNCF with credits to host Kubernetes on Google Cloud, enabling 100 million container downloads every day and over 400,000 integration tests per month, totaling over 300,000 core hours on GKE and Google Cloud. (Yes, you read that right, the Kubernetes project itself is built and served from GKE and Google Cloud.) Customer outcomesAs the creators of Kubernetes, and with all this continued investment, it’s not surprising that we have a great managed Kubernetes service; in fact, I think we can credibly claim it’s the best one in the market. Enterprises flock to GKE to solve for speed, scale, security and availability. Among the Fortune 500, five out of top 10 telecommunications, media and gaming companies, 6 out of top 10 healthcare and lifesciences companies, and 7 out of top 10 retail and consumer packaged goods companies all use GKE. Leading technology companies are also embracing GKE, for example, Databricks is enabling customers to leverage a Google Kubernetes Engine-based Databricks service on Google Cloud.When it comes to scale, GKE is second to none. After all, Google itself operates numerous globally available services like YouTube, Gmail and Drive, so we know a thing or two about deploying workloads at scale. We bring this expertise to Kubernetes in a way that only Google can. For example, Bayer Crop Science used GKE to seamlessly scale their research workloads over 200x with 15,000 node clusters. GKE offers native security capabilities such as network policy logging, hardened sandbox environment, vulnerability scanning, shielded nodes (that use a cryptographically verifiable check) and confidential nodes—all designed to simplify implementing a defense-in-depth approach to security, so you can operate safely at scale. Customers like Shopify trust GKE to help them handle terrific scale with no interruptions. Over the most recent Black Friday Cyber Monday period, Shopify processed over $5B in transactions!It also offers a series of industry-first capabilities such as release channels, multi-cluster support, four-way auto-scaling, including node auto repair to help improve availability. And that’s just its feature set—GKE also helps optimize costs with efficient bin packing and auto-scaling. Customers like OpenX are saving up to 45% using GKE. GKE Autopilot momentumThis leads us to GKE Autopilot, a new mode of operation for GKE that helps reduce the operational cost of managing clusters, optimize your clusters for production, and yield higher workload availability. Already since its launch last month, customers like Strabag and Via Transportation report seeing dramatic improvement in the performance, security, and resilience of their Kubernetes environments, all while spending less time managing their clusters. In short, we’ve worked hard to deliver the most configurable, secure, scalable, and automated Kubernetes service on the market today. And we’re just getting started. With 5+ years of ongoing investment in Kubernetes, you can be confident that GKE will be there to support your success and growth— today and into the future.Interested in showing just how much you love GKE? Join the Google Cloud {Code_Love_Hack} hackathon to show us how you use GKE, containers, and Cloud Code to spread the love of coding! Registration is open and we’re excited for all the great projects you’ll make using GKE!Related ArticleLooking ahead as GKE, the original managed Kubernetes, turns 5Happy birthday, GKE. As we look ahead, we wanted to share five ways we’re continuing our work to make GKE the best place to run Kubernetes.Read Article
Quelle: Google Cloud Platform

12. März 2021

da Agency

Introducing #AskGoogleCloud: A community driven YouTube live series

We’re excited to introduce a new series on the Google Cloud Tech YouTube channel connecting the cloud community directly with our Google Cloud product experts. We’ll be selecting questions that use #AskGoogleCloud on Twitter and YouTube and answer them in a featured premiere each quarter by experts in that area. Each premiere will be paired with a live chat so you can ask your questions live and get them answered with the speakers featured.Our first segment will be aired on March 12th, 2021 on the topic of serverless architectures featuring Developer Advocates Stephanie Wong, Martin Omander and James Ward. They’ll be addressing questions on the best workloads for serverless, the differences between “serverless” and “cloud native”, how to accurately estimate costs for using Cloud Run and much more. Serverless content on demandServerless ExpeditionsWe have tons of serverless content on demand on the Google Cloud Tech YouTube channel. Be sure to check out Serverless Expeditions, a series that covers all things serverless from using Python on Google Cloud with Cloud Run, to Cloud Functions vs. Cloud Run to Securing a REST API with JWT, concepts.Serverless containers with Cloud RunCheck out this video to learn how to deploy serverless containers in 3 environments using Cloud Run and Knative. In this demo, we deploy a serverless microservice that transforms word documents to PDFs.Building APIs for serverless workloads with Google CloudThis video demonstrates Google Cloud’s API Gateway—a tool that helps you create, secure, and monitor APIs for many of Google Cloud serverless backends such as Cloud Functions, Cloud Run, and more. Learn how you can provide secure access to your backend services through a well-defined REST API, providing consistency across all of your services, regardless of the service implementation.6 strategies for scaling your serverless applicationsThis video walks you through a few tips that can help you scale serverless workloads while protecting underlying resources. Learn how to configure instance scaling limits, use Cloud Tasks to limit the rate of work done, utilize stateful storage, and more. This is a great episode to understand how to improve performance and scalability for your serverless applications.Stay updated with the latest videosThe Google Cloud Tech YouTube channel has new daily videos to help you build what’s next with secure infrastructure, developer tools, APIs, data analytics and machine learning. Subscribe to get notified of our newest content!
Quelle: Google Cloud Platform

12. März 2021

da Agency

Guest Post: Calling the Docker CLI from Python with Python-on-whales

Image: Alice Lang, alicelang-creations@outlook.fr

At Docker, we are incredibly proud of our vibrant, diverse and creative community. From time to time, we feature cool contributions from the community on our blog to highlight some of the great work our community does. The following is a guest post by Docker community member Gabriel de Marmiesse. Are you working on something awesome with Docker? Send your contributions to William Quiviger (@william) on the Docker Community Slack and we might feature your work!

The most common way to call and control Docker is by using the command line.

With the increased usage of Docker, users want to call Docker from programming languages other than shell. One popular way to use Docker from Python has been to use docker-py. This library has had so much success that even docker-compose is written in Python, and leverages docker-py.

The goal of docker-py though is not to replicate the Docker client (written in Golang), but to talk to the Docker Engine HTTP API. The Docker client is extremely complex and is hard to duplicate in another language. Because of this, a lot of features that were in the Docker client could not be available in docker-py. Sometimes users would sometimes get frustrated because docker-py did not behave exactly like the CLI.

Today, we’re presenting a new project built by Gabriel de Marmiesse from the Docker community: Python-on-whales. The goal of this project is to have a 1-to-1 mapping between the Docker CLI and the Python library. We do this by communicating with the Docker CLI instead of calling directly the Docker Engine HTTP API.

If you need to call the Docker command line, use Python-on-whales. And if you need to call the Docker engine directly, use docker-py.

In this post, we’ll take a look at some of the features that are not available in docker-py but are available in Python-on-whales:

Building with Docker buildxDeploying to Swarm with docker stackDeploying to the local Engine with Compose

Start by downloading Python-on-whales with

pip install python-on-whales

and you’re ready to rock!

Docker Buildx0

Here we build a Docker image. Python-on-whales uses buildx by default and gives you the output in real time.

>>> from python_on_whales import docker
>>> my_image = docker.build(“.”, tags=”some_name”)
[+] Building 1.6s (17/17) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 32B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/python:3.6 1.4s
=> [python_dependencies 1/5] FROM docker.io/library/python:3.6@sha256:293 0.0s
=> [internal] load build context 0.1s
=> => transferring context: 72.86kB 0.0s
=> CACHED [python_dependencies 2/5] RUN pip install typeguard pydantic re 0.0s
=> CACHED [python_dependencies 3/5] COPY tests/test-requirements.txt /tmp 0.0s
=> CACHED [python_dependencies 4/5] COPY requirements.txt /tmp/ 0.0s
=> CACHED [python_dependencies 5/5] RUN pip install -r /tmp/test-requirem 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 1/7] RUN apt-get update && 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 2/7] RUN curl -fsSL https: 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 3/7] RUN add-apt-repositor 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 4/7] RUN apt-get update & 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 5/7] WORKDIR /python-on-wh 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 6/7] COPY . . 0.0s
=> CACHED [tests_ubuntu_install_without_buildx 7/7] RUN pip install -e . 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.0s
=> => writing image sha256:e1c2382d515b097ebdac4ed189012ca3b34ab6be65ba0c 0.0s
=> => naming to docker.io/library/some_image_name

Docker Stacks

Here we deploy a simple Swarmpit stack on a local Swarm. You get a Stack object that has several methods: remove(), services(), ps().

>>> from python_on_whales import docker
>>> docker.swarm.init()
>>> swarmpit_stack = docker.stack.deploy(“swarmpit”, compose_files=[”./docker-compose.yml”])
Creating network swarmpit_net
Creating service swarmpit_influxdb
Creating service swarmpit_agent
Creating service swarmpit_app
Creating service swarmpit_db
>>> swarmpit_stack.services()
[<python_on_whales.components.service.Service object at 0x7f9be5058d60>,
<python_on_whales.components.service.Service object at 0x7f9be506d0d0>,
<python_on_whales.components.service.Service object at 0x7f9be506d400>,
<python_on_whales.components.service.Service object at 0x7f9be506d730>]
>>> swarmpit_stack.remove()

Docker Compose

Here we show how we can run a Docker Compose application with Python-on-whales. Note that, behind the scenes, it uses the new version of Compose written in Golang. This version of Compose is still experimental. Take appropriate precautions.

$ git clone https://github.com/dockersamples/example-voting-app.git
$ cd example-voting-app
$ python
>>> from python_on_whales import docker
>>> docker.compose.up(detach=True)
Network “example-voting-app_back-tier” Creating
Network “example-voting-app_back-tier” Created
Network “example-voting-app_front-tier” Creating
Network “example-voting-app_front-tier” Created
example-voting-app_redis_1 Creating
example-voting-app_db_1 Creating
example-voting-app_db_1 Created
example-voting-app_result_1 Creating
example-voting-app_redis_1 Created
example-voting-app_worker_1 Creating
example-voting-app_vote_1 Creating
example-voting-app_worker_1 Created
example-voting-app_result_1 Created
example-voting-app_vote_1 Created
>>> for container in docker.compose.ps():
… print(container.name, container.state.status)
example-voting-app_vote_1 running
example-voting-app_worker_1 running
example-voting-app_result_1 running
example-voting-app_redis_1 running
example-voting-app_db_1 running
>>> docker.compose.down()
>>> print(docker.compose.ps())
[]

Bonus section: Docker objects attributes as Python attributes

All information that you can access with docker inspect is available as Python attributes:

>>> from python_on_whales import docker
>>> my_container = docker.run(“ubuntu”, [”sleep”, “infinity”], detach=True)
>>> my_container.state.started_at
datetime.datetime(2021, 2, 18, 13, 55, 44, 358235, tzinfo=datetime.timezone.utc)
>>> my_container.state.running
True
>>> my_container.kill()
>>> my_container.remove()

>>> my_image = docker.image.inspect(“ubuntu”)
>>> print(my_image.config.cmd)
[’/bin/bash’]

What’s next for Python-on-whales ?

We’re currently improving the integration of Python-on-whales with the new Compose in the Docker CLI (currently beta).

You can consider that Python-on-whales is in beta. Some small API changes are still possible.

We encourage the community to try it out and give feedback in the issues!

To learn more about Python-on-whales:

DocumentationGithub repository
The post Guest Post: Calling the Docker CLI from Python with Python-on-whales appeared first on Docker Blog.
Quelle: https://blog.docker.com/feed/