Break down data silos with the new cross-cloud transfer feature of BigQuery Omni

To help customers break down data silos, we launched BigQuery Omni in 2021. Organizations globally are using BigQuery Omni to analyze data across cloud environments. Now, we are excited to launch the next big evolution for multi cloud analytics: cross-cloud analytics. Cross-cloud analytics tools help analysts and data scientists easily, securely, and cost effectively distribute data between clouds to leverage the analytics tools they need. In April 2022, we previewed a SQL supported LOAD statement that allowed AWS/Azure blob data to be brought into BigQuery as a managed table for advanced analysis. We’ve learned a lot in this preview period. A few learnings stand out:Cross-cloud operations need to meet analysts where they are. In order for analysts to work with distributed data, workspaces should not be siloed. As soon as analysts are asked to leave their SQL workspaces to copy data, set up permissions, or grant permission, workflows break down and insights are lost. Same SQL can be used to periodically copy data using BigQuery scheduled queries. The more of the workflow that can be managed by SQL, the better. Networking is an implementation detail, latency should be too. The longer an analyst needs to wait for an operation to complete, the less likely a complete workflow is to be completed end-to-end. BigQuery users expect high performance for a single operation, even if those operations are managed across multiple data centers.Democratizing data shouldn’t come at the cost of security. In order for data admins to empower data analysts and engineers, they need to be assured there isn’t additional risk in doing so. By default, data admins and security teams are increasingly looking for solutions that don’t persist user credentials between cloud boundaries. Cost control comes with cost transparency. Data transfer costs can get costly, and we hear frequently this is the number 1 concern for multi-cloud data organizations. Providing transparency into single operations and invoices in a consolidated way is critical to driving success for cross-cloud operations. Allowing administrators to cap costs for budgeting is a must.This feedback is why we’ve spent much of this year improving our cross-cloud transfer product to optimize releases around these core tenants: Usability: The LOAD SQL experience allows for data filtering and loading within the same editor across clouds. LOAD SQL supports data formats like JSON, CSV, AVRO, ORC and PARQUET. With semantics for both appending and truncating tables, LOAD supports both periodic syncs and refreshing the complete table semantics. We’ve also added SQL support for data lake standards like Hive partitioning and JSON data type.  Security: With a federated identity model, users don’t have to share or store credentials between cloud providers to access and copy their data. We also now support CMEK support for the destination table to help secure data as it’s written in BigQuery and VPC-SC boundaries to mitigate data exfiltration risks. Latency: With data movement managed by BigQuery Write API, users can effortlessly move just the relevant data without having to wait for complex pipes. We’ve improved job latency significantly for the most common load jobs and are seeing performance improvements with each passing day. Cost auditability: From one invoice, you can see all your compute and transfer costs for LOADs across clouds. Each job comes with statistics to help admins manage budgets.During our preview period, we saw good proof points on how cross-cloud transfer can be used to accelerate time to insight and deliver value to data teams. Getting started with a cross-cloud architecture can be daunting, but cross-cloud transfer has been used to help customers jumpstart proof of concepts because it enables the migration of subsets of data without committing to a full migration. Kargo used cross-cloud transfer to accelerate a performance test of BigQuery. “We tested Cross-Cloud Transfer to assist with a proof of concept on BigQuery earlier this year.  We found the usability and performance useful during the POC,” said Dinesh Anchan, Manager of Engineering at Kargo. We also saw this product being used to combine key datasets across clouds. A common challenge for customers is to manage cross-cloud billing data. CCT is being used to tie files together which have evolving schema on delivery for blob storage. “We liked the experience of using Cross-Cloud transfer to help consolidate our billing files across GCP, AWS, and Azure.  CCT was a nice solution because we could use SQL statements to load our billing files into BigQuery,” said the engineering lead of a large research institution. We’re excited to release the first of many cross-cloud features through BigQuery Omni. Check out the Google Cloud Next session to learn about more upcoming launches in the multicloud analytics space including support for Omni tables and local transformations to help supercharge these experiences for analysts and data scientists. We’re investing in cross-cloud because cloud boundaries shouldn’t slow innovation. Watch this space.Availability and pricingCross-Cloud Transfer is now available in all BigQuery Omni regions. Check the BigQuery Omni pricing page for data transfer costs.Getting StartedIt has never been easier for analysts to move data between clouds. Check out our getting started (AWS/Azure) page to try out this SQL experience. For a limited trial, BigQuery customers can explore BigQuery Omni at no charge using on-demand byte scans from September 15, 2022 to March 31, 2023 (the “trial period”) for data scans on AWS/Azure. Note: data transfer fees for Cross-Cloud Transfer will still apply.
Quelle: Google Cloud Platform

Published by