Actuate your data in real time with new Bigtable change streams

Cloud Bigtable is a highly scalable, fully managed NoSQL database service that offers single-digit millisecond latency and an availability SLA up to 99.999%. It is a good choice for applications that require high throughput and low latency, such as real-time analytics, gaming, and telecommunications.Cloud Bigtable change streams is a feature that allows you to track changes to your Bigtable data and easily access and integrate this data with other systems. With change streams, you can replicate changes from Bigtable to BigQuery for real-time analytics, trigger downstream application behavior using Pub/Sub (for event-based data pipelines), or capture database changes for multi-cloud scenarios and migrations to Bigtable.Cloud Bigtable change streams is a powerful tool that can help you unlock new value from your data.NBCUniversal’s streaming service Peacock uses Bigtable for identity management across their platform. The Bigtable change streams feature helped them simplify and optimize their data pipeline. “Bigtable change streams was simple to integrate into our existing data pipeline leveraging the dataflow beam connector to alert on changes for downstream processing. This update saved us significant time and processing in our data normalization objectives.” – Baihe Liu, PeacockActuating your data changesEnabling a change stream on your table can easily be done through the Google Cloud console, or via the API, client libraries or declarative infrastructure tools like Terrafom.Once enabled on a particular table, all data changes to the table will be captured and stored for up to seven days. This is useful for tracking changes to data over time, or for auditing purposes. The retention period can be customized to meet your specific needs. You can build custom processing pipelines using the Bigtable connector for Dataflow. This allows you to process data in Bigtable in a variety of ways, including batch processing, streaming processing, and machine learning. Or, you can have even more flexibility and control by integrating with the Bigtable API directly.Cloud Bigtable change streams use cases Change streams can be leveraged for a variety of use cases and business-critical workloads. Analytics and MLCollect event data and analyze it in real time. This can be used to track customer behavior to update feature store embeddings for personalization, monitor system performance in IoT services for fault detection or identify security threats, or monitor events to detect fraud.In the context of BigQuery, change streams can be used to track changes to data over time, identify trends, and generate reports. There are two main ways to send change records to BigQuery: as a set of change logs or mirroring your data on BigQuery for large scale analytics.Event-based applications Leverage change streams to trigger downstream processing of certain events, for example, in gaming, to keep track of player actions in real time. This can be used to update game state, provide feedback to players, or detect cheating.Retail customers leverage change streams to monitor catalog changes like pricing or availability to trigger updates and alert customers.Migration and multi-cloudCapture Bigtable changes for multicloud or hybrid cloud scenarios. For example, leverage Bigtable HBase replication tooling and change streams to keep your data replicated across clouds or on-premises databases. This topology can also be leveraged for online migrations to Bigtable without disruption to serving activity.ComplianceCompliance often refers to meeting the requirements of specific regulations, such as HIPAA or PCI DSS. Retaining the change log can help you to demonstrate compliance by providing a record of all changes that have been made to your data. This can be helpful in the event of an audit or if you need to investigate a security incident.Learn moreChange streams is a powerful feature providing additional capability to actuate your data on Bigtable to meet your business requirements and optimize your data pipelines. To get started, check out our documentation for more details on Bigtable change streams, along with these additional resources:Expanding your Bigtable architecture with change streamsProcess a Bigtable change stream tutorialCreate a change stream-enabled table and capture changes quickstartBigtable change streams Code samples
Quelle: Google Cloud Platform

Fine tune autoscaling for your Dataflow Streaming pipelines

Stream processing helps users get timely insights and act on data as it is generated. It is used for applications such as fraud detection, recommendation systems, IoT and others. However, scaling live streaming pipelines as input load changes is a complex task, especially if you need to provide low-latency guarantees and keep costs in check. That’s whyDataflow has invested heavily in improving its autoscaling capabilities over the years, to help users by automatically adjusting compute capacity for the job. These capabilities include:Horizontal auto-scaling: This lets the Dataflow service automatically choose the appropriate number of worker instances required for your job.Streaming Engine: This provides smoother horizontal autoscaling in response to variations in incoming data volume.Vertical auto-scaling (in Dataflow Prime): This dynamically adjusts the compute capacity allocated to each worker based on utilization.Sometimes customers want to customize the autoscaling algorithm parameters. In particular, we see three common use cases when customers want to update min/max number of workers for a running streaming job:Save cost when latency spikes: Latency spikes may cause excessive upscaling to handle the input load, which increases cost. In this case, customers may want to apply a smaller number of worker’ limits to reduce the costs.Keep latency low during expected increase in traffic: For example, a customer may have a stream that is known to have spikes in traffic every hour. It can take minutes for the autoscaler to respond to those spikes. Instead, the users can have the number of workers to be increased proactively ahead of the top of the hour.Keep latency low during traffic churns: It can be hard for the default autoscaling algorithm to select the optimal number of workers during bursty traffic. This can lead to higher latency. Customers may want to be able to apply a narrower range of min/max number of workers to make autoscaling less sensitive during these periods. Introducing inflight streaming job updates for user-calibrated autoscalingDataflow already offers a way to update auto-scaling parameters for long-running streaming jobs by doing a job update. However, this update operation causes a pause in the data processing, which can last minutes and doesn’t work well for pipelines with strict latency guarantees.This is why we are happy to announce thein-flight job option update feature. This feature allows Streaming Engine users to adjust min/max number of workers at runtime. If the current number of workers is within the new minimum and maximum boundaries then this update will not cause any processing delays. Otherwise the pipeline will start scaling up or down within a short period of time.It is available for users through:Google Cloud console command:code_block<ListValue: [StructValue([(‘code’, ‘gcloud dataflow jobs update-options \rn –region=REGION\rn –min-num-workers=MINIMUM_WORKERS\rn –max-num-workers=MAXIMUM_WORKERS\rnJOB_ID’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6bc35522b0>)])]>Dataflow Update APIcode_block<ListValue: [StructValue([(‘code’, ‘PUT https://dataflow.googleapis.com/v1b3/projects//locations//jobs/?updateMask=runtime_updatable_params.max_num_workers,runtime_updatable_params.min_num_workersrn{rn “runtime_updatable_params”: {rn “min_num_workers”: ,rn “max_num_workers”: rn }rn}’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e6bc3552790>)])]>Please note that the in-flight job updates feature is only available to pipelines using Streaming Engine. Once the update applied, users can see the effects in the Autoscaling monitoring UI:The “Pipeline options” section in the “Job info” panel will display the new values of “minNumberOfWorkers” and “maxNumberOfWorkers”.Case Study: How Yahoo used this featureYahoo needs to frequently update their streaming pipelines that process Google Pub/Sub messages. This customer also has a very tight end-to-end processing SLA so they can’t afford to wait for the pipeline to be drained and replaced. If they were to follow the typical process, they would start missing their SLA. With the new in-flight update option, we proposed an alternative approach. Before the current pipeline drain is initiated, its maximum number of workers is set to the current number of workers using the new API. Then a replacement pipeline is launched with the maximum number of workers also equal to the current number of workers of the existing pipeline. This new pipeline is launched on the same Pub/Sub subscription as the existing one (note: in general using the same subscriptions for multiple pipelines is not recommended as it allows duplicates to occur as there is no deduplication across separate pipelines. It works only when duplicates during update are acceptable). Once the new pipeline starts processing the messages, the existing pipeline is drained. Finally, the new production pipeline is updated with the desired minimum and maximum number of workers. Typically, we don’t recommend running more than one Dataflow pipeline on the same Pub/Sub subscription. It’s hard to predict how many Pub/Sub messages will be in the pipeline, so the pipelines might scale up too much. The new API lets you disable autoscaling during replacement, which has been shown to work well for this customer and helped them maintain the latency SLA. “With Yahoo mail moving to the Google Cloud Platform we are taking full advantage of the scale and power of Google’s data and analytics services. Streaming data analytics real time across hundreds of millions of mailboxes is key for Yahoo and we are using the simplicity and performance of Google’s Dataflow to make that a reality.” – Aaron Lake, SVP & CIO, YahooYou can see the source code of sample scripts to orchestrate a no-latency pipeline replacement and a simple test pipeline in this GitHub repository. What’s nextAutoscaling live streaming pipelines is important to achieve low-latency guarantees and meet the cost requirements. Doing it right can be challenging. That’s where theDataflow Streaming Engine comes in. Many auto scaling features are now available to all Streaming Engine users. With the in-flight job updates, our users get an additional tool to fine tune the auto-scaling for their requirements. Stay tuned for future updates and learn more by contacting the Google Cloud Sales team.
Quelle: Google Cloud Platform

Fine tune autoscaling for your Dataflow Streaming pipelines

Stream processing helps users get timely insights and act on data as it is generated. It is used for applications such as fraud detection, recommendation systems, IoT and others. However, scaling live streaming pipelines as input load changes is a complex task, especially if you need to provide low-latency guarantees and keep costs in check. That’s whyDataflow has invested heavily in improving its autoscaling capabilities over the years, to help users by automatically adjusting compute capacity for the job. These capabilities include:Horizontal auto-scaling: This lets the Dataflow service automatically choose the appropriate number of worker instances required for your job.Streaming Engine: This provides smoother horizontal autoscaling in response to variations in incoming data volume.Vertical auto-scaling (in Dataflow Prime): This dynamically adjusts the compute capacity allocated to each worker based on utilization.Sometimes customers want to customize the autoscaling algorithm parameters. In particular, we see three common use cases when customers want to update min/max number of workers for a running streaming job:Save cost when latency spikes: Latency spikes may cause excessive upscaling to handle the input load, which increases cost. In this case, customers may want to apply a smaller number of worker’ limits to reduce the costs.Keep latency low during expected increase in traffic: For example, a customer may have a stream that is known to have spikes in traffic every hour. It can take minutes for the autoscaler to respond to those spikes. Instead, the users can have the number of workers to be increased proactively ahead of the top of the hour.Keep latency low during traffic churns: It can be hard for the default autoscaling algorithm to select the optimal number of workers during bursty traffic. This can lead to higher latency. Customers may want to be able to apply a narrower range of min/max number of workers to make autoscaling less sensitive during these periods. Introducing inflight streaming job updates for user-calibrated autoscalingDataflow already offers a way to update auto-scaling parameters for long-running streaming jobs by doing a job update. However, this update operation causes a pause in the data processing, which can last minutes and doesn’t work well for pipelines with strict latency guarantees.This is why we are happy to announce thein-flight job option update feature. This feature allows Streaming Engine users to adjust min/max number of workers at runtime. If the current number of workers is within the new minimum and maximum boundaries then this update will not cause any processing delays. Otherwise the pipeline will start scaling up or down within a short period of time.It is available for users through:Google Cloud console command:code_block[StructValue([(u’code’, u’gcloud dataflow jobs update-options \rn –region=REGION\rn –min-num-workers=MINIMUM_WORKERS\rn –max-num-workers=MAXIMUM_WORKERS\rnJOB_ID’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eab159de710>)])]Dataflow Update APIcode_block[StructValue([(u’code’, u’PUT https://dataflow.googleapis.com/v1b3/projects//locations//jobs/?updateMask=runtime_updatable_params.max_num_workers,runtime_updatable_params.min_num_workersrn{rn “runtime_updatable_params”: {rn “min_num_workers”: ,rn “max_num_workers”: rn }rn}’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eab0475ee50>)])]Please note that the in-flight job updates feature is only available to pipelines using Streaming Engine. Once the update applied, users can see the effects in the Autoscaling monitoring UI:The “Pipeline options” section in the “Job info” panel will display the new values of “minNumberOfWorkers” and “maxNumberOfWorkers”.Case Study: How Yahoo used this featureYahoo needs to frequently update their streaming pipelines that process Google Pub/Sub messages. This customer also has a very tight end-to-end processing SLA so they can’t afford to wait for the pipeline to be drained and replaced. If they were to follow the typical process, they would start missing their SLA. With the new in-flight update option, we proposed an alternative approach. Before the current pipeline drain is initiated, its maximum number of workers is set to the current number of workers using the new API. Then a replacement pipeline is launched with the maximum number of workers also equal to the current number of workers of the existing pipeline. This new pipeline is launched on the same Pub/Sub subscription as the existing one (note: in general using the same subscriptions for multiple pipelines is not recommended as it allows duplicates to occur as there is no deduplication across separate pipelines. It works only when duplicates during update are acceptable). Once the new pipeline starts processing the messages, the existing pipeline is drained. Finally, the new production pipeline is updated with the desired minimum and maximum number of workers. Typically, we don’t recommend running more than one Dataflow pipeline on the same Pub/Sub subscription. It’s hard to predict how many Pub/Sub messages will be in the pipeline, so the pipelines might scale up too much. The new API lets you disable autoscaling during replacement, which has been shown to work well for this customer and helped them maintain the latency SLA. “With Yahoo mail moving to the Google Cloud Platform we are taking full advantage of the scale and power of Google’s data and analytics services. Streaming data analytics real time across hundreds of millions of mailboxes is key for Yahoo and we are using the simplicity and performance of Google’s Dataflow to make that a reality.” – Aaron Lake, SVP & CIO, YahooYou can see the source code of sample scripts to orchestrate a no-latency pipeline replacement and a simple test pipeline in this GitHub repository. What’s nextAutoscaling live streaming pipelines is important to achieve low-latency guarantees and meet the cost requirements. Doing it right can be challenging. That’s where theDataflow Streaming Engine comes in. Many auto scaling features are now available to all Streaming Engine users. With the in-flight job updates, our users get an additional tool to fine tune the auto-scaling for their requirements. Stay tuned for future updates and learn more by contacting the Google Cloud Sales team.
Quelle: Google Cloud Platform

Actuate your data in real time with new Bigtable change streams

Cloud Bigtable is a highly scalable, fully managed NoSQL database service that offers single-digit millisecond latency and an availability SLA up to 99.999%. It is a good choice for applications that require high throughput and low latency, such as real-time analytics, gaming, and telecommunications.Cloud Bigtable change streams is a feature that allows you to track changes to your Bigtable data and easily access and integrate this data with other systems. With change streams, you can replicate changes from Bigtable to BigQuery for real-time analytics, trigger downstream application behavior using Pub/Sub (for event-based data pipelines), or capture database changes for multi-cloud scenarios and migrations to Bigtable.Cloud Bigtable change streams is a powerful tool that can help you unlock new value from your data.NBCUniversal’s streaming service Peacock uses Bigtable for identity management across their platform. The Bigtable change streams feature helped them simplify and optimize their data pipeline. “Bigtable change streams was simple to integrate into our existing data pipeline leveraging the dataflow beam connector to alert on changes for downstream processing. This update saved us significant time and processing in our data normalization objectives.” – Baihe Liu, PeacockActuating your data changesEnabling a change stream on your table can easily be done through the Google Cloud console, or via the API, client libraries or declarative infrastructure tools like Terrafom.Once enabled on a particular table, all data changes to the table will be captured and stored for up to seven days. This is useful for tracking changes to data over time, or for auditing purposes. The retention period can be customized to meet your specific needs. You can build custom processing pipelines using the Bigtable connector for Dataflow. This allows you to process data in Bigtable in a variety of ways, including batch processing, streaming processing, and machine learning. Or, you can have even more flexibility and control by integrating with the Bigtable API directly.Cloud Bigtable change streams use cases Change streams can be leveraged for a variety of use cases and business-critical workloads. Analytics and MLCollect event data and analyze it in real time. This can be used to track customer behavior to update feature store embeddings for personalization, monitor system performance in IoT services for fault detection or identify security threats, or monitor events to detect fraud.In the context of BigQuery, change streams can be used to track changes to data over time, identify trends, and generate reports. There are two main ways to send change records to BigQuery: as a set of change logs or mirroring your data on BigQuery for large scale analytics.Event-based applications Leverage change streams to trigger downstream processing of certain events, for example, in gaming, to keep track of player actions in real time. This can be used to update game state, provide feedback to players, or detect cheating.Retail customers leverage change streams to monitor catalog changes like pricing or availability to trigger updates and alert customers.Migration and multi-cloudCapture Bigtable changes for multicloud or hybrid cloud scenarios. For example, leverage Bigtable HBase replication tooling and change streams to keep your data replicated across clouds or on-premises databases. This topology can also be leveraged for online migrations to Bigtable without disruption to serving activity.ComplianceCompliance often refers to meeting the requirements of specific regulations, such as HIPAA or PCI DSS. Retaining the change log can help you to demonstrate compliance by providing a record of all changes that have been made to your data. This can be helpful in the event of an audit or if you need to investigate a security incident.Learn moreChange streams is a powerful feature providing additional capability to actuate your data on Bigtable to meet your business requirements and optimize your data pipelines. To get started, check out our documentation for more details on Bigtable change streams, along with these additional resources:Expanding your Bigtable architecture with change streamsProcess a Bigtable change stream tutorialCreate a change stream-enabled table and capture changes quickstartBigtable change streams Code samples
Quelle: Google Cloud Platform

Accelerate your cloud transformation with Delivery Navigator

Our goal at Google Cloud Consulting is to make it easy for our partners and customers to transform, create, and innovate on Google Cloud. Too often we find that cloud migrations and transformations aren’t as efficient as they could be because access to the latest and greatest tools, techniques, and ways of working aren’t known or easily accessible. This takes away from our partners’ and customers’ abilities to focus on what’s important to their businesses. Now imagine a world where that cloud transformation expertise—and other leading technical practices—are in a single place, at your fingertips, whenever you need them. That’s why today we’re announcing that we’re opening up our internal, integrated platform for delivering cloud projects, called Delivery Navigator, to our partners. You can learn more about the product in our 90-second overview video here.Created by uniting Google technology and methodologiesWe started building Delivery Navigator almost two years ago as a way for our practitioners to create consistent, repeatable, agile, high-quality experiences for our customers. By uniting our technology and our implementation methodologies based on thousands of projects, we’re providing our partners with the same methods and assets, so they can accelerate delivery readiness with customers. Specifically, we’re bringing together a library of transformation methods with project-management tool integration and telemetry, all supported by helpful features that leverage our in-house generative AI technology. Like many Google products, we believe that if we focus on the user, everything else will follow. This includes our ecosystem of delivery partners who deliver value to our customers every day. We know first-hand how difficult it can be to build momentum for your transformation when valuable time is being spent looking for the right template or tracking project hygiene. We also understand that each of our customers may experience these transformations differently, with different industry standards leading to variations in delivery approaches, nomenclature, and scoping estimates. Delivery Navigator aims to keep practitioners focused on driving creative solutions, innovation and other value-added customer outcomes, by:Compiling standards, technical knowledge, and leading delivery practices: We want to save your teams time by making it easy to find standard, reusable delivery methods and code snippets for everything from establishing Cloud Foundations to building a Data Science Development Platform, reducing variability in scoping estimates and the need to start from scratch. We also think it’s important to establish a common vocabulary when talking about scope and deliverables with our partners and customers. Providing helpful project telemetry, so you can keep your team on-track: We want to help you mitigate delivery risk by enabling timely project visibility across key health performance indicators, while reducing the toil of generating a regular project status by using our standardized metrics and reports.Integrating with your project management tools: Day-to-day, we want you to be able to manage your project, your way. Standard Delivery Navigator methods are designed to be connected into popular project management tools such as Asana, Jira, and Smartsheet, along with project health and status integration. We recognize every customer has their own tooling for project management, so we have built the solution to allow that to continue.  Help us build a new cloud methodology communityWe believe what’s good for cloud adoption, and the ecosystem at large, is good for us all. Our goal is to co-create value and great experiences for our customers, faster. Our vision is to ensure Delivery Navigator becomes a vibrant cloud delivery methodology community that includes our partners, and eventually our customers, too. We see the platform as a differentiated opportunity for Google, our partners, and our customers to come together, collaborate, share ideas, and drive continuous improvement into the cloud ecosystem. While initially the platform will contain a portion of our delivery knowledge, if you believe you have more to contribute, we’d love to talk to you about contributing to the breadth and depth of the content. Delivery Navigator will first open to partners through our public preview launch, scheduled for early Q4. You can learn more about Delivery Navigator and subsequent product launch phases on our Partner Advantage portal, or join our broader Partner Advantage program as a new user, here.
Quelle: Google Cloud Platform

Welcome to Google Cloud Next ’23

Editor’s note: Content updated at 9am PT to reflect announcements made on stage in the opening keynote at Google Cloud Next ’23.This week, Google Cloud will welcome thousands of people to San Francisco for our first in-person Google Cloud Next event since 2019. I am incredibly excited to bring so many of our customers and partners together to showcase the amazing innovations we have been working on across our entire portfolio of Infrastructure, Data and AI, Workspace Collaboration, and Cybersecurity solutions. It’s been an exciting year so far for Google Cloud. We’ve achieved some noteworthy milestones, including in Q2 2023, reaching a $32B annual revenue run rate and seeing our second quarter of profitability, which is all based on the success of our customers across every industry. This year, we have shared some incredible stories about how we are working with leading organizations like Culture Amp, Deutsche Borse, eDreams ODIGEO, HSBC, IHOP, IPG Mediabrands, John Lewis Partnership, The Knot Worldwide, Macquarie Bank, Mayo Clinic, Priceline, Shopify, the Singapore Government, U.S. Steel, and Wendy’s. Today, we are announcing new or expanded relationships with The Estée Lauder Companies, FOX Sports, GE Appliances, General Motors, HCA Healthcare, and more. I’d like to thank all of these customers and the millions of others around the world for trusting us as they progress on their digital transformation journeys.Today at Google Cloud Next ’23, we’re proud to announce new ways we’re helping every business, government, and user benefit from generative AI and leading cloud technologies, including: AI-optimized Infrastructure: The most advanced AI-optimized infrastructure for companies to train and serve models. We offer this infrastructure in our cloud regions, to run in your data centers with Google Distributed Cloud, and on the edge. Vertex AI: Developer tools to build models and AI-powered applications, with major advancements to Vertex AI for creating custom models and building custom Search and Conversation apps with enterprise data; Duet AI: Duet AI is an always-on AI collaborator that is deeply integrated in Google Workspace and Google Cloud. Duet AI in Workspace gives every user a writing helper, a spreadsheet expert, a project manager, a note taker for meetings, and a creative visual designer, and is now generally available. Duet AI in Google Cloud collaborates like an expert coder, a software reliability engineer, a database pro, an expert data analyst, and a cybersecurity adviser — and is expanding its preview and will be generally available later this year; and Many more significant announcements across Developer Tools, Data, Security, Sustainability, and our fast-growing cloud ecosystem.New infrastructure and tools to help customersThe advanced capabilities and broad applications that make gen AI so revolutionary demand the most sophisticated and capable infrastructure. We have been investing in our data centers and network for 25 years, and now have a global network of 38 cloud regions, with a goal to operate entirely on carbon-free energy 24/7 by 2030.Our AI-optimized infrastructure is a leading choice for training and serving gen AI models. In fact, more than 70% of gen AI unicorns are Google Cloud customers, including AI21, Anthropic, Cohere, Jasper, MosaicML, Replit, Runway, and Typeface; and more than half of all funded gen AI startups are Google Cloud customers, including companies like Copy.ai, CoRover, Elemental Cognition, Fiddler AI, Fireworks.ai, PromptlyAI, Quora, Synthesized, Writer, and many others.Today we are announcing key infrastructure advancements to help customers, including:Cloud TPU v5e: Our most cost-efficient, versatile, and scalable purpose-built AI accelerator to date. Now, customers can use a single Cloud TPU platform to run both large-scale AI training and inference. Cloud TPU v5e scales to tens of thousands of chips and is optimized for efficiency. Compared to Cloud TPU v4, it provides up to a 2x improvement in training performance per dollar and up to a 2.5x improvement in inference performance per dollar.A3 VMs with NVIDIA H100 GPU: Our A3 VMs powered by NVIDIA’s H100 GPU will be generally available next month. It is purpose-built with high-performance networking and other advances to enable today’s most demanding gen AI and large language model (LLM) innovations. This allows organizations to achieve three times better training performance over the prior-generation A2. GKE Enterprise: This enables multi-cluster horizontal scaling ;-required for the most demanding, mission-critical AI/ML workloads. Customers are already seeing productivity gains of 45%, while decreasing software deployment times by more than 70%. Starting today, the benefits that come with GKE, including autoscaling, workload orchestration, and automatic upgrades, are now available with Cloud TPU v5e.Cross-Cloud Network: A global networking platform that helps customers connect and secure applications across clouds. It is open, workload-optimized, and offers ML-powered security to deliver zero trust. Designed to enable customers to gain access to Google services more easily from any cloud, Cross-Cloud Network reduces network latency by up to 35%.Google Distributed Cloud: Designed to meet the unique demands of organizations that want to run workloads at the edge or in their data center. In addition to next-generation hardware and new security capabilities, we’re also enhancing the GDC portfolio to bring AI to the edge, with Vertex AI integrations and a new managed offering of AlloyDB Omni on GDC Hosted.  Our Vertex AI platform gets even betterOn top of our world-class infrastructure, we deliver what we believe is the most comprehensive AI platform — Vertex AI — which enables customers to build, deploy and scale machine learning (ML) models. We have seen tremendous usage, with the number of gen AI customer projects growing more than 150 times from April-July this year. Customers have access to more than 100 foundation models, including third-party and popular open-source versions, in our Model Garden. They are all optimized for different tasks and different sizes, including text, chat, images, speech, software code, and more. We also offer industry specific models like Sec-PaLM 2 for cybersecurity, to empower global security providers like Broadcom and Tenable; and Med-PaLM 2 to assist leading healthcare and life sciences companies including Bayer Pharmaceuticals, HCA Healthcare, and Meditech. Vertex AI Search and Conversation are now generally available, enabling organizations to create Search and Chat applications using their data in just minutes, with minimal coding and enterprise-grade management and security built in. In addition, Vertex AI Generative AI Studio provides user-friendly tools to tune and customize models, all with enterprise-grade controls for data security. These include developer tools like Text Embeddings API, which lets developers build sophisticated applications based on semantic understanding of text or images, and Reinforcement Learning from Human Feedback (RLHF), which incorporates human feedback to deeply customize and improve model performance. Today, we’re excited to announce several new models and tooling in the Vertex AI platform:PaLM 2, Imagen and Codey Upgrades: We’re updating PaLM 2 to 32k context windows so enterprises can easily process longer form documents like research papers and books. We’re also improving Imagen’s visual appeal, and extending support for new languages in Codey.Tools for tuning: For PaLM 2 and Codey, we’re making adapter tuning generally available and in preview respectively, which can help improve LLM performance with as few as 100 examples. We’re also introducing a new method of tuning for Imagen, called Style Tuning, so enterprises can create images aligned to their specific brand guidelines or other creative needs with a small amount of reference images.New models: We’re announcing availability of Llama 2 and Code Llama from Meta, and Technology Innovative Institute’s Falcon LLM, a popular open-source model, as well as pre-announcing Claude 2 from Anthropic. In the case of Llama 2, we will be the only cloud provider offering both adapter tuning and RLHF.Vertex AI extensions: Developers can access, build, and manage extensions that deliver real-time information, incorporate company data, and take action on the user’s behalf. This opens up endless new possibilities for gen AI applications that can operate as an extension of your enterprise, enabled by the ability to access proprietary information and take action on third-party platforms like your CRM system or email.Grounding: We are announcing an enterprise grounding service that works across Vertex AI foundation models, Search and Conversation that gives customers the ability to ground responses in their own enterprise data to deliver more accurate responses. We are also working with a few early customers to test grounding with the technology that powers Google Search.Digital Watermarking on Vertex AI: Powered by Google DeepMind SynthID, this offers a state-of-the art technology that embeds the watermark directly into the image of pixels, making it invisible to the human eye and difficult to tamper with. Digital watermarking provides customers with a scalable approach to creating and identifying AI-generated images responsibly. We are the first hyperscale cloud provider to offer this technology for AI-generated images.Colab Enterprise: This managed service combines the ease-of-use of Google’s Colab notebooks with enterprise-level security and compliance capabilities. Data scientists can use Colab Enterprise to collaboratively accelerate AI workflows with access to the full range of Vertex AI platform capabilities, integration with BigQuery, and even code completion and generation. Equally important to discovering and training the right model is controlling your data. From the beginning, we designed Vertex AI to give you full control and segregation of your data, code, and IP, with zero data leakage. When you customize and train your model with Vertex AI — with private documents and data from your SaaS applications, databases, or other proprietary sources — you are not exposing that data to the foundation model. We take a snapshot of the model, allowing you to train and encapsulate it together in a private configuration, giving you complete control over your data. Your prompts and data, as well as user inputs at inference time, are not used to improve our models and are not accessible to other customers.Duet AI in Workspace and Google CloudWe unveiled Duet AI at I/O in May, introducing powerful new features across Workspace and showcasing developer features such as code and chat assistance in Google Cloud. Since then, trusted testers around the world have experienced the power of Duet AI while we worked on expanding capabilities and integrating it across a wide range of products and services throughout Workspace and Google Cloud. Let’s start with Workspace, the world’s most popular productivity tool, with more than 3 billion users and more than 10 million paying customers who rely on it every day to get things done. With the introduction of Duet AI just a few months ago, we delivered a number of features to make your teams more productive, like helping you write and refine content in Gmail and Google Docs, create original images in Google Slides, turn ideas into action and data into insights with Google Sheets, foster more meaningful connections in Google Meet, and more. Since then, thousands of companies and more than a million trusted testers have used Duet AI as a powerful collaboration partner — a coach, source of inspiration, and productivity booster — all while helping to ensure every user and organization has control over their data. Today, we are introducing a number of new enhancements:Duet AI in Google Meet: Duet AI will take notes during video calls, send meeting summaries, and even automatically translate captions in 18 languages. In addition, to ensure every meeting participant is clearly seen, heard, and understood, Duet AI in Meet announced studio look, studio lighting, and studio sound. Duet AI in Google Chat: You’ll be able to chat directly with Duet AI to ask questions about your content, get a summary of documents shared in a space, and catch up on missed conversations. We’ve also delivered a refreshed user interface, new shortcuts, and enhanced search to allow you to stay on top of conversations, as well as huddles in Chat which allow teams to start meetings from the place where they are already collaborating.Workspace customers of all sizes and from all industries are using Duet AI and seeing improvements in customer experience, productivity and efficiency. Instacart is creating enhanced customer service workflows and industrial technology company Trimble can now deliver solutions faster to their clients. Adore Me, Uniformed Services University and Thoughtworks are increasing productivity by using Duet AI to quickly write content such as emails, campaign briefs, and project plans with just a simple prompt. Today, we are making Duet AI in Google Workspace generally available, while expanding the preview capabilities of Duet AI in Google Cloud, with general availability coming later this year. Beyond Workspace, Duet AI can now provide AI assistance across a wide range of Google Cloud products and services — as a coding assistant to help developers code faster, as an expert adviser to help operators quickly troubleshoot application and infrastructure issues, as a data analyst to provide quick and better insights, and as a security adviser to recommend best practices to help prevent cyber threats.Customers are already realizing value from Duet AI in Google Cloud: L’Oréal is able to achieve better and faster business decisions from their data, and Turing, in early testing, is reporting engineering productivity gains of one-third.Our Duet AI in Google Cloud announcements include advancements for:Software development: Duet AI provides expert assistance across your entire software development lifecycle, enabling developers to stay in flow-state longer by minimizing context switching to help them be more productive. In addition to code completion and code generation, it can help you modernize applications faster by assisting you with code refactoring; and by using Duet AI in Apigee, any developer can now easily build APIs and integrations using simple natural language prompts. Application and infrastructure operations: Operators can chat with Duet AI in natural language across a number of services directly in the Google Cloud Console to quickly retrieve “how to” information about infrastructure configuration, deployment best practices, and expert recommendations on cost and performance optimization. Data Analytics: Duet AI in BigQuery provides contextual assistance for writing SQL queries as well as Python code, generates full functions and code blocks, auto-suggests code completions and explains SQL statements in natural language, and can generate recommendations based on your schema and metadata. These capabilities can allow data teams to focus more on outcomes for the business. Accelerating and modernizing databases: Duet AI in Cloud Spanner, AlloyDB and Cloud SQL, helps generate code to structure, modify, or query data using natural language. We’re also bringing the power of Duet AI to Database Migration Service (DMS), helping automate the conversion of database code, such as stored procedures, functions, triggers, and packages, that could not be converted with traditional translation technologies.Security Operations: We are bringing Duet AI to our security products including Chronicle Security Operations, Mandiant Threat Intelligence and Security Command Center, which can empower security professionals to more efficiently prevent threats, reduce toil in security workflows, and uplevel security talent. Duet AI delivers contextual recommendations from PaLM 2 LLM models and expert guidance, trained and tuned with Google Cloud-specific content, such as documentation, sample code, and Google Cloud best practices. In addition, Duet AI was designed using Google’s comprehensive approach to help protect customers’ security and privacy, as well as ourAI principles. With Duet AI, your data is your data. Your code, your inputs to Duet AI, and your recommendations generated by Duet AI will not be used to train any shared models nor used to develop any products.Simplify analytics at scale with a unified data and AI foundationData sits at the center of gen AI, which is why we are bringing new capabilities to Google’s Data and AI Cloud that will help unlock new insights and boost productivity for data teams. In addition to the launch of Duet AI, which assists data engineers and data analysts across BigQuery, Looker, Spanner, Dataplex, and our database migration tools, we have several other important announcements today in data and analytics:BigQuery Studio: A single interface for data engineering, analytics, and predictive analysis, BigQuery Studio helps increase efficiency for data teams. In addition, with new integrations to Vertex AI foundation models, we are helping organizations AI-enable their data lakehouse with innovations for cross-cloud analytics, governance, and secure data sharing.AlloyDB AI: Today we’re introducing AlloyDB AI, an integral part of AlloyDB, our PostgreSQL-compatible database service. AlloyDB AI offers an integrated set of capabilities for easily building GenAI apps, including high-performance, vector queries that are up to 10x faster than Standard PostgreSQL. In addition, with AlloyDB Omni, you can also run AlloyDB virtually everywhere. This includes on-premises, on Google Cloud, AWS, Azure, or through Google Distributed Cloud. Data Cloud Partners: Our open data ecosystem is an asset for customers’ gen AI strategies, and we’re continuing to expand the breadth of partner solutions and datasets available on Google Cloud. Our partners, like Confluent, DataRobot, Dataiku, Datastax, Elastic, MongoDB, Neo4j, Redis, SingleStore, and Starburst are all launching new capabilities to help customers accelerate and enhance gen AI development with data. Our partners are also adding more datasets to Analytics Hub, which customers can use to build and train gen AI models. This includes trusted data from Acxiom, Bloomberg, TransUnion, ZoomInfo, and more.These innovations help organizations harness the full potential of data and AI through a unified data foundation. With Google Cloud, companies can now run their data anywhere and bring AI and machine learning tools directly to their data, which can lower the risk and cost of data movement.  Addressing top security challenges Google Cloud is the only leading security provider that brings together the essential combination of frontline intelligence and expertise, a modern SecOps platform, and a trusted cloud foundation, all infused with the power of gen AI, to help drive the security outcomes you’re looking to achieve. Earlier this year, we introduced Security AI Workbench, an industry-first extensible platform powered by our next generation security LLM, Sec-PaLM 2, which incorporates Google’s unique visibility into the evolving threat landscape and is fine-tuned for cybersecurity operations. And just a few weeks ago, we announced Chronicle CyberShield, a security operations solution that allows governments to break down information silos, centralize security data to help strengthen national situational awareness, and initiate a united response. In addition to the Duet AI innovations mentioned earlier, today we are also announcing:Mandiant Hunt for Chronicle: This service integrates the latest insights into attacker behavior from Mandiant’s frontline experts with Chronicle Security Operations’ ability to quickly analyze and search security data, helping customers gain elite-level support without the burden of hiring, tooling, and training. Agentless vulnerability scanning: These posture management capabilities in Security Command Center detect operating system, software, and network vulnerabilities on Compute Engine virtual machines. Network security advancements: Cloud Firewall Plus adds advanced threat protection and next-generation firewall (NGFW) capabilities to our distributed firewall service, powered by Palo Alto Networks; and Network Service Integration Manager allows network admins to easily integrate trusted third-party NGFW virtual appliances for traffic inspection.Assured Workloads Japan Regions: Customers can have controlled environments that enforce data residency in our Japanese regions, options for local control of encryption keys, and administrative access transparency. We also continue to grow our Regulated and Sovereignty solutions partner initiative to bring innovative third-party solutions to customers’ regulated cloud environments. Expanding our ecosystemOur ecosystem is already delivering real-world value for businesses with gen AI, and bringing new capabilities, powered by Google Cloud, to millions of users worldwide. Partners are also using Vertex AI to build their own features for customers – including Box, Canva, Salesforce, UKG, and many others. Today at Next ‘23, we’re announcing:DocuSign is working with Google to pilot how Vertex AI could be used to help generate smart contract assistants that can summarize, explain and answer what’s in complex contracts and other documents.SAP is working with us to build new solutions utilizing SAP data and Vertex AI that will help enterprises apply gen AI to important business use cases, like streamlining automotive manufacturing or improving sustainability.Workday’s applications for Finance and HR are now live on Google Cloud and they are working with us to develop new gen AI capabilities within the flow of Workday, as part of their multicloud strategy. This includes the ability to generate high-quality job descriptions and to bring Google Cloud gen AI to app developers via the skills API in Workday Extend, while helping to ensure the highest levels of data security and governance for customers’ most sensitive information.In addition, many of the world’s largest consulting firms, including Accenture, Capgemini, Deloitte, and Wipro, have collectively planned to train more than 150,000 experts to help customers implement Google Cloud GenAI.We are in an entirely new era of digital transformation, fueled by gen AI. This technology is already improving how businesses operate and how humans interact with one another. It’s changing the way doctors care for patients, the way people communicate, and even the way workers are kept safe on the job. And this is just the beginning.Together, we are creating a new way to cloud. We are grateful for the opportunity to be on this journey with our customers. Thank you for your partnership, and have a wonderful Google Cloud Next ‘23.
Quelle: Google Cloud Platform

Celebrating the winners of the 2023 Google Cloud Customer Awards

It’s that time once again, where we announce the winners of our Google Cloud Customer Awards. These awards celebrate organizations around the world that are turning inspiring ideas into exciting realities. Whether it’s social enterprise experts like Singapore’s FairPrice Group, transformational talent backers like Ford Motor Company, environmental leaders like SAP, or diversity, equity, and inclusion game-changers like COTA—we are honored to celebrate innovators who are building new ways forward with AI, data, infrastructure, collaboration, and security technologies in the cloud.This year, AI has demonstrated significant potential to help companies innovate and become more efficient. Google Cloud is supporting its customers in this ambition, like one of this year’s industry winners, Carrefour Belgium, which is using Google Cloud AI tools to extract value from its operational data to accelerate insights. AI research and development firm Kakao Brain in South Korea, meanwhile, is using Google Cloud’s AI/ML infrastructure to underpin the generative AI services it provides to its customers. Recognizing innovative thinkingJust like last year, we received a tremendous number of entries for the awards, which a panel of senior Google Cloud executives independently assessed according to select criteria. Specifically, judges looked for real-world metrics, examples of innovative thinking, and outstanding business transformation results. Regardless of who won, every organization that submitted an entry should be proud of what they have achieved with cloud technologies.Google Cloud Customer Awards are given to companies from around the globe and across a number of industries, such as healthcare and life sciences, financial services, and government, using Google Cloud technologies to improve their operations, and their social and governance (ESG) measures. Congratulations to all the winners!Technology for Good AwardsSustainabilityEnvironmental impact is a key priority for our global customers. We are excited to see growing momentum around implementing sustainability-focused solutions, which we introduced last year as a category in our Technology for Good Awards. Our Sustainability Customer Awards recognize customers with new and innovative solutions to accelerate sustainability within their own organizations and drive meaningful climate action. The winning team from the New York State Department of Environmental Conservation—who used Google Cloud tools like BigQuery to implement the mobile monitoring of air quality and greenhouse gas emissions—shows one inspiring way this can be done.Diversity, Equity, and InclusionIn an era when technology and data are reshaping the world, customers who won our DEI Customer Awards distinguished themselves for their commitment to using cloud tools to promote economic mobility and representation for historically underrepresented communities. The organizations below, and their partners, are leveraging the power of data and AI to transform and strengthen representation, progression, retention, and the inclusion of underserved or underrepresented groups in their organizations. By making a difference to their communities, they’re also leading the way for other organizations to drive for a more equitable world.Social Impact The Social Impact Customer Award winners made a positive impact with technology projects that cultivated inclusion, openness, and community support. In a time of economic and climate uncertainty, these customers used Google Cloud solutions to create positive change at the scale the world critically needs. From government agencies encouraging public input on transportation planning, to supermarkets partnering with food banks, we applaud them for the work they are doing to improve their communities.Talent Transformation When it comes to fostering digital skills for all employees, some of the world’s most recognizable brands are leading the way. This includes our Talent Transformation Customer Award winners like General Motors, DataLab, and EFX, who are empowering their workforces with hands-on learning opportunities to boost their technology skills. With the country facing a critical gap in technological capabilities, this kind of work is important to not only drive long-term business success, but also improve the lives and careers of employees.Industry Customer AwardsCommunications and Service Providers (CSP)With our CSP Customer Awards, we are proud to recognize leading companies in the telecommunications sector who are finding new ways to improve customer experience. Whether it’s leveraging Google Cloud tools like Dataflow to process millions of real-time records every hour, or using BigQuery to assess performance and optimize data, these winners are getting creative with the cloud to scale and meet the needs of their customers.Cross-Industry Customers in the Cross-Industry category demonstrated innovation across multiple verticals. One of this year’s winners, cybersecurity firm Palo Alto Networks, built its cloud-first security platform, ADEM (Autonomous Digital Experience Management), on Google Cloud. ADEM is a digital experience management platform that helps Palo Alto customers proactively monitor and manage infrastructure, system, and application issues. ADEM has increased security visibility across networks, applications, and devices, ultimately reducing ticket escalations by 46%.Education With schools racing to adapt to new ways of learning, winners of our Education Customer Awards are using cloud technologies to make education accessible. This year, educational institutions like FMU in Brazil are using Google Cloud to exponentially increase the number of students they are able to reach, while the Salk Institute for Biological Studies in San Diego unlocked entirely new areas of scientific enquiry by mining its data more efficiently. Both institutions demonstrated what a dramatic impact cloud technology can have on the world of learning.Financial ServicesWe received hundreds of entries in every geography around the world for the Financial Services Award category, reflecting the high standard of business excellence in this industry. Financial services firms who won these awards undertook a number of successful projects, ranging from launching new apps and features to take customers’ experiences to the next level, to leading complex migrations and business transformations, to using automation to strengthen security.Government Government organizations often ask themselves the same question: How can we better serve our citizens? It’s this people-centric mindset, combined with using data-driven  solutions and secure cloud platforms that enabled these Government Award winners  to accomplish their missions this year. More than ever before, governments are turning to cloud technologies to collaborate internally and with their citizens to support the people they serve in a more agile and helpful way.Healthcare & Life Sciences Healthcare is a sector that creates extraordinary levels of innovation, and our Healthcare & Live Sciences Customer Award -winner showed how far it can push scientific boundaries. COTA is working with Google Cloud to transform the raw, unstructured data in electronic health records into a usable format that is driving a new era of data-driven cancer care. Google Cloud is proud to partner with COTA, who is saving lives by speeding up medical breakthroughs.Manufacturing Manufacturers, particularly in the automotive vertical, are undergoing a sea change to more climate-focused solutions. A good example is our Manufacturing Customer Award winner, Jaguar Land Rover (JLR), who is digitally transforming by investing in vehicle electrification and advanced autonomous driving. JLR has used Google Cloud solutions to help it understand and manage supply chain shortages in critical EV components, such as semiconductor chips, so that it can continue to deliver electric vehicles to a growing list of customers.Media & EntertainmentBy using cloud technologies, including AI/ML, data analytics, and more, our Media & Entertainment Award winners are modernizing content production and reinventing audience experiences with engaging and personalized insights. Combined with AI-infrastructure like TPUs, customers like Kakao Mobility can greatly accelerate ML insights at lower costs. Creating a more meaningful connection with viewers is one of this industry’s fundamental goals, and these customers are achieving it.RetailOur Retail Customer Award winners are facing a shopping world that has shifted from ecommerce (during the pandemic) to today’s omnichannel reality. By using the cloud to enable enhanced and seamless services, businesses like Carrefour Belgium, FairPrice Group, and Schnucks Markets are delighting their customers with highly personalized online and in-person shopping experiences made possible by cloud AI.Supply Chain & LogisticsInnovation is critical in the supply chain and logistics sector. One of our Customer Award winners, the Finnish accounting software firm Snowfox, is taking advantage of the serverless nature of Google Cloud to automate the processes of its clients’ invoices. Snowfox has gone even further by setting up Carbonfox, which uses AI to calculate its customers’ carbon emissions—proving that supply chain and sustainability can go hand-in-hand.What connects these Google Cloud customers? They’re all building new ways forward with the cloud—whether it’s improving access to education, personalizing their customers’ experiences, or saving lives. We’re proud to serve customers in more than 200 countries and territories, and we’ll continue to help them forge new ways with ground-breaking technology, industry expertise, and relentless optimism. Discover today how customers are transforming their business through Google Cloud.
Quelle: Google Cloud Platform

Introducing new SQL functions to manipulate your JSON data in BigQuery

Enterprises are generating data at an exponential rate, spanning traditional structured transactional data, semi-structured like JSON and unstructured data like images and audio. Beyond the scale of the data, these divergent types present processing challenges for developers, at times requiring separate processing flows for each. With its initial release BigQuery’s support for semi-structured JSON eliminated the need for such complex preprocessing and providing schema flexibility, intuitive querying and the scalability benefits afforded to structured data. Today, we are excited to announce the release of new SQL functions for BigQuery JSON, extending the power and flexibility of our core JSON support. These functions make it even easier to extract and construct JSON data and perform complex data analysis.With these new query functions, you can:Convert JSON values into primitive types (INT64, FLOAT64, BOOL and STRING) in an easier and more flexible way with the new JSON LAX functionsEasily update and modify an existing JSON value in BigQuery  with the new JSON  mutator functions.Construct JSON object and JSON array with SQL in BigQuery with the new JSON constructor functions.Let’s review these new features and some examples of how to use them. First, we will create a table for demonstration.code_block[StructValue([(u’code’, u’CREATE TABLE dataset_name.users_sample AS (rn SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALLrn SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALLrn SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb666f43750>)])]code_block[StructValue([(u’code’, u’Query:rnrn– Table contentsrnSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);rnOutput:rn+———————————————————————————–+rn| user |rn+———————————————————————————–+rn| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |rn+———————————————————————————–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb666f43790>)])]Great! Let’s say we want to get a list of all users over 30. Looking at the table, you will see that user.age contains a JSON number in the first record, a JSON string in the second, and a JSON null in the third. With the new powerful LAX function, LAX_INT64, all types are automatically inferred and processed correctly.code_block[StructValue([(u’code’, u’Query:rnrnrnSELECT user.name FROM dataset_name.users_samplernWHERE LAX_INT64(user.age) > 30rnOutput:rn+——-+rn| name |rn+——-+rn| “Bob” |rn+——-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665254e50>)])]Unlike the “strict” conversion functions, which require that the JSON type matches the primitive type exactly, the “lax” conversion functions will also handle conversions between mismatched data types. For example, the strict conversion function below would return an error:code_block[StructValue([(u’code’, u’Query:rnrnSELECT INT64(JSON ‘”10″‘) AS strict_int64rnOutput:rnError: The provided JSON input is not an integer’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb6642ce210>)])]However, the lax conversion function below would return the desired result:code_block[StructValue([(u’code’, u’Query:rnrnSELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64rnOutput:rn+———–+rn| lax_int64 |rn+———–+rn| 10 |rn+———–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb6671f4090>)])]Furthermore, you can quickly and easily remove a field in the JSON data by using JSON_REMOVE functions.code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_REMOVE(user, “$.address.zipcode”)rnWHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null},”age”:null,”name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665b19190>)])]JSON_STRIP_NULLS compresses the data by removing JSON nulls. Although BigQuery null values neither impact performance nor storage cost, it can be helpful for reducing data size during exports.code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb66723f6d0>)])]Now, what if we want to modify or add a field to the JSON data? You can now update the data with the new JSON_SET function. And you have the ability to mix and match JSON functions together to achieve desired results. For example, the query below adds a new field, “region_code”, to the table. The value of the field will be “America” if the value of the “country” field is “USA”, and “Other” if it is not.code_block[StructValue([(u’code’, u’– Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)rnrnQuery:rnrnUPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrn+——————————————————————————————-+rn| user |rn+——————————————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |rn+——————————————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb66723f110>)])]Last but not least, let’s say you have a table of property/value pairs you want to convert to a JSON object. With the new JSON_OBJECT constructor function, you can effortlessly create the new JSON object.code_block[StructValue([(u’code’, u’Query:rnrnWITH Fruits AS (rnSELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALLrnSELECT 0, ‘fruit’, ‘apple’ UNION ALLrnSELECT 1, ‘fruit’,’banana’ UNION ALLrnSELECT 1, ‘ripe’, ‘true’rn)rnSELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_datarnFROM FruitsrnGROUP BY idrnOutput:rn+———————————-+rn| json_data |rn+———————————-+rn| {“color”:”Red”,”fruit”:”apple”} |rn| {“fruit”:”banana”,”ripe”:”true”} |rn+———————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665591cd0>)])]Complete list of functionsLax conversion functions:LAX_BOOLLAX_INT64LAX_FLOAT64LAX_STRINGJSON constructor functions:JSON_ARRAYJSON_OBJECTJSON mutator functions:JSON_ARRAY_APPENDJSON_ARRAY_INSERTJSON_REMOVEJSON_SETJSON_STRIP_NULLSTry it out!Google BigQuery is constantly adding new features to make it easier and more powerful to analyze your data. We encourage you to check them out and provide your feedback to us as we continue to develop additional features and capabilities to make working JSON easier and faster over time.
Quelle: Google Cloud Platform

Building internet-scale event-driven applications with Cloud Spanner change streams

Since its launch, Cloud Spanner change streams has seen broad adoption by Spanner customers in healthcare, retail, financial services, and other industries. This blog post provides an overview of the latest updates to Cloud Spanner change streams and how they can be used to build event-driven applications.A change stream watches for changes to your Spanner database (inserts, updates, and deletes) and streams out these changes in near real-time. One of the most common uses of change streams is replicating Spanner data to BigQuery for analytics. With change streams, it’s as easy as writing Data definition language (DDL) to create a change stream on the desired tables and configuring Dataflow to replicate these changes to BigQuery so that you can take advantage of BigQuery’s advanced analytic capabilities.Yet analytics is just the start of what change streams can enable. Pub/Sub and Apache Kafka are asynchronous and scalable messaging services that decouple the services that produce messages from the services that process those messages. With support for Pub/Sub and Apache Kafka, Spanner change streams now lets you use Spanner transactional data to build event-driven applications.An example of an event-driven architecture is an order system that triggers inventory updates to an inventory management system whenever orders are placed. In this example, orders are saved in a table called order_items. Consequently, changes on this table will trigger events in the inventory system. To create a change stream that tracks all changes made order_items, run the following DDL statement:code_block[StructValue([(u’code’, u’CREATE CHANGE STREAM order_items_changes FOR order_items’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ef9f6d0c6d0>)])]Once the order_items_changes change stream is created, you can create event streaming pipelines to Pub/Sub and Kafka.Creating an event streaming pipeline to Pub/SubThe change streams Pub/Sub Dataflow template lets you create Dataflow jobs that send change events from Spanner to Pub/Sub and build these kinds of event streaming pipelines.Once the Dataflow job is running, we can simulate inventory changes by inserting and updating order items in the Spanner database:code_block[StructValue([(u’code’, u”INSERT INTO order_items (order_item_id, order_id, article_id, quantity)rnVALUES (rn ‘5fb2dcaa-2513-1337-9b50-cc4c56a06fda’,rn ‘b79a2147-bf9a-4b66-9c7f-ab8bc6c38953′, rn ‘f1d7f2f4-1337-4d08-a65e-525ec79a1417′, rn 5rn);”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3efa0d3d6450>)])]code_block[StructValue([(u’code’, u”UPDATE order_items rnSET quantity = 10 rnWHERE order_item_id = ‘5fb2dcaa-2513-1337-9b50-cc4c56a06fda';”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3efa0d3d6b90>)])]This causes two change records to be streamed out through Dataflow and published as messages to the given Pub/Sub topic, as shown below:The first Pub/Sub message contains the inventory insert, and the second message contains inventory update.From here, the data can be consumed using any of the many integration options Pub/Sub offers.Creating an event streaming pipeline to Apache KafkaIn many event-driven architectures, Apache Kafka is the central event store and stream-processing platform. With our newly added Debezium-based Kafka connector, you can build event streaming pipelines with Spanner change streams and Apache Kafka. The Kafka connector produces a change event for every insert, update, and delete. It sends groups change event records for each Spanner table into a separate Kafka topic. Client applications then read the Kafka topics that correspond to the database tables of interest, and can react to every row-level event they receive from those topics.The connector has built-in fault-tolerance. As the connector reads changes and produces events, it records the last commit timestamp processed for each change stream partition. If the connector stops for any reason (e.g. communication failures, network problems, or crashes), it simply continues streaming records where it last left off once it restarts.To learn more about the change streams connector for Kafka, see Build change streams connections to Kafka. You can download the change streams connector for Kafka from Debezium.Fine-tuning your event messages with new value capture typesIn the example above, the stream order_items_changed that uses the default value capture type OLD_AND_NEW_VALUES. This means that the Change streams change record includes both the old and new values of a row’s modified columns, along with the primary key of the row. Sometimes, however,  you don’t need to capture all that change data. For this reason, we added two new value capture types: NEW_VALUES and NEW_ROW, described below:To continue with our existing example, let’s create another change stream that contains only the new values of changed columns. This is the value capture type with the lowest memory and storage footprint.code_block[StructValue([(u’code’, u”CREATE CHANGE STREAM order_items_changed_values rnFOR order_itemsrnWITH ( value_capture_type = ‘NEW_VALUES’ )”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3ef9f6e7bf90>)])]The DDL above creates a change stream using the PostgreSQL interface syntax. Read Create and manage change streams to learn more about the DDL for creating change streams for both PostgreSQL and GoogleSQL Spanner databases.SummaryWith change streams, your Spanner data follows you wherever you need it, whether that’s for analytics with BigQuery, for triggering events in downstream applications, or for compliance and archiving. And because change streams are built into Spanner, there’s no software to install, and you get external consistency, high scale, and up to 99.999% availability.With support for Pub/Sub and Kafka, Spanner change streams makes it easier than ever to build event-driven pipelines with whatever flexibility you need for your business.To get started with Spanner, create an instance or try it out for free, or take a Spanner QwiklabTo learn more about Spanner change streams, check out About change streams To learn more about the change streams Dataflow template for Pub/Sub, go to Cloud Spanner change streams to Pub/Sub template To learn more about the change streams connector for Kafka, go to Build change streams connections to Kafka
Quelle: Google Cloud Platform

Unlock insights faster from your MySQL data in BigQuery

Data practitioners know that relational databases are not designed for analytical queries. Data-driven organizations that connect their relational database infrastructure to their data warehouse get the best of both worlds: a production database unhassled by a barrage of analytical queries, and a data warehouse that is free to mine for insights without the fear of bringing down production applications. The remaining question is how do you create a connection between two disparate systems with as little operational overhead as possible.Dataflow Templates makes connecting your MySQL data warehouse with BigQuery as simple as filling out a web form. No custom code to write, no infrastructure to manage. Dataflow is Google Cloud’s serverless data processing for batch and streaming workloads that makes data processing fast, autotuned, and cost-effective. Dataflow Templates are reusable snippets of code that define data pipelines — by using templates, a user doesn’t have to worry about writing a custom Dataflow application. Google provides a catalog of templates that help automate common workflows and ETL use cases. This post will dive into how to schedule a recurring batch pipeline for replicating data from MySQL to BigQuery.Launching a MySQL-to-BigQuery Dataflow Data PipelineFor our pipeline, we will launch a Dataflow Data Pipeline. Data Pipelines allow you to schedule recurring batch jobs1 and feature a suite of lifecycle management features for streaming jobs that make it an excellent starting point for your pipeline. We’ll click on the “Create Data Pipeline” button at the top.We will select the MySQL to BigQuery pipeline. As you can see, if your relational database is Postgres or SQL Server, we also have templates for those systems as well.The form will now expand to provide a list of parameters for this pipeline that will help execute the pipeline:Required parametersSchedule: The recurring schedule for your pipeline (you can schedule hourly, daily, or weekly jobs, or define your own schedule with unix cron)Source: The URL connection string to connect to the Jdbc source. If your database requires SSL certificates, you can append query strings that enable SSL mode and the GCS locations of certificates. These can be encoded using Google Cloud Key Management Service.Target: BigQuery output tableTemp Bucket: GCS bucket for staging filesOptional parameters Jdbc source SQL query, if you want to replicate a portion of the database. Username & password, if your database requires authentication. You can also pass in an encoded string from Google Cloud KMS, if you desire.Partitioning parametersDataflow-related parameters, including options to modify autoscaling, number of workers, and other configurations related to the worker environment. If you require an SSL certificate and you have truststore and certificate files, you will use the “extra files to stage” parameter to pass in their respective locations.Once you’ve entered your configurations, you are ready to hit the Create Pipeline button.Creating the pipeline will take you to the Pipeline Info screen, which will show you a history of executions of the pipeline. This is a helpful view if you are looking for jobs that ran long, or identifying patterns that happen across multiple executions. You’ll find a list of jobs related to the pipeline in a table view near the bottom of the page. Clicking on one of those job IDs will allow you to inspect a specific execution in more detail.The Dataflow monitoring experience features a job graph showing a visual representation of the pipeline you launched, and includes a logging panel at the bottom that displays logs collected from the job and workers. You will find information associated with the job on the right hand panel, as well as several other tabs that allow you to understand your job’s optimized execution, performance metrics, and cost.Finally, you can go to the BigQuery SQL workspace to see your table written to its final destination. If you prefer a video walkthrough of this tutorial, you can find that here. You’re all set for unlocking value from your relational database — and it didn’t take an entire team to set it up!What’s nextIf your use case involves reading and writing changes in continuous mode, we recommend checking out our Datastream product, which serves change-data-capture and real-time replication use cases. If you prefer a solution based on open-source technology, you can also explore our Change Data Capture Dataflow template that uses a Debezium connector to publish messages to Pub/Sub, then writes to BigQuery.Happy Dataflowing!1. If you do not need to run your job on a scheduled basis, we recommend using the “Create Job from Template” workflow, found on the “Jobs” page
Quelle: Google Cloud Platform