Announcing the GA of BigQuery multi-statement transactions

Transactions are mission critical for modern enterprises supporting payments, logistics, and a multitude of business operations. And in today’s modern analytics-first and data-driven era, the need for the reliable processing of complex transactions extends beyond just the traditional OLTP database; today businesses also have to trust that their analytics environments are processing transactional data in an atomic, consistent, isolated, and durable (ACID) manner. So BigQuery set out to support DML statements spanning large numbers of tables in a single transaction and commit the associated changes atomically (all at once) if successful or rollback atomically upon failure. And today, we’d like to highlight the recent general availability launch of multi-statement transactions within BigQuery and the new business capabilities it unlocks. While in preview, BigQuery multi-statement transactions were tremendously effective for customer use cases, such as keeping BigQuery synchronized with data stored in OLTP environments, the complex post processing of events pre-ingested into BigQuery, complying with GDPR’s right to be forgotten, etc. One of our customers, PLAID, leverages these multi-statement transactions within their customer experience platform KARTE to analyze the behavior and emotions of website visitors and application users, enabling businesses to deliver relevant communications in real time and further PLAID’s mission to Maximize the Value of People with the Power of Data.“We see multi-statement transactions as a valuable feature for achieving expressive and fast analytics capabilities. For developers, it keeps queries simple and less hassle in error handling, and for users, it always gives reliable results.”—Takuya Ogawa, Lead Product EngineerThe general availability of multi-statement transactions not only provides customers with a production ready means of handling their business critical transactions in a comprehensive manner within a single transaction, but now also provides customers with far greater scalability compared to what was offered during the preview. At GA, multi-statement transactions increase support for mutating up to 100,000 table partitions and modifying up to 100 tables per transaction. This 10x scale in the number of table partitions and 2x scale in the number of tables was made possible by a careful re-design of our transaction commit protocol which optimizes the size of the transactionally committed metadata. The GA of multi-statement transactions also introduces full compatibility with BigQuery sessions and procedural language scripting. Sessions are useful because they store state and enable the use of temporary tables and variables, which then can be run across multiple queries when combined with multi-statement transactions. Procedural language scripting provides users the ability to run multiple statements in a sequence with shared state and with complex logic using programming constructs such as IF … THEN and WHILE loops.For instance, let’s say we wanted to enhance the current multi-statement transaction example, which uses transactions to atomically manage the existing inventory and supply of new arrivals of a retail company. Since we’re a retailer monitoring our current inventory on hand, we would now also like to add functionality to automatically suggest to our Sales team which items we should promote with sales offers when our inventory becomes too large. To do this, it would be useful to include a simple procedural IF statement, which monitors the current inventory and supply of new arrivals and modifies a new PromotionalSales table based on total inventory levels. And let’s validate the results ourselves before committing them as one single transaction to our sales team by using sessions. Let’s see how we’d do this via SQL.First, we’ll create our tables using DDL statements:code_block[StructValue([(u’code’, u’CREATE OR REPLACE TABLE my_dataset.Inventoryrn(product string,rnquantity int64,rnsupply_constrained bool);rn rnCREATE OR REPLACE TABLE my_dataset.NewArrivalsrn(product string,rnquantity int64,rnwarehouse string);rn rnCREATE OR REPLACE TABLE my_dataset.PromotionalSalesrn(product string,rninventory_on_hand int64,rnexcess_inventory int64);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c43fb6cd0>)])]Then, we’ll insert some values into our Inventory and NewArrivals tables:code_block[StructValue([(u’code’, u”INSERT my_dataset.Inventory (product, quantity)rnVALUES(‘top load washer’, 10),rn (‘front load washer’, 20),rn (‘dryer’, 30),rn (‘refrigerator’, 10),rn (‘microwave’, 20),rn (‘dishwasher’, 30);rn rnINSERT my_dataset.NewArrivals (product, quantity, warehouse)rnVALUES(‘top load washer’, 100, ‘warehouse #1′),rn (‘dryer’, 200, ‘warehouse #2′),rn (‘oven’, 300, ‘warehouse #1′);”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c40dced50>)])]Now, we’ll use a multi-statement transaction and procedural language scripting to atomically merge our NewArrivals table with the Inventory table while taking excess inventory into account to build out our PromotionalSales table. We’ll also create this within a session, which will allow us to validate the tables ourselves before committing the statement to everyone else.code_block[StructValue([(u’code’, u”DECLARE average_product_quantity FLOAT64;rn rnBEGIN TRANSACTION;rn rnCREATE TEMP TABLE tmp AS SELECT * FROM my_dataset.NewArrivals WHERE warehouse = ‘warehouse #1′;rnDELETE my_dataset.NewArrivals WHERE warehouse = ‘warehouse #1′;rn rn#Calculates the average of all product inventories.rnset average_product_quantity = (SELECT AVG(quantity) FROM my_dataset.Inventory);rn rnMERGE my_dataset.Inventory IrnUSING tmp TrnON I.product = T.productrnWHEN NOT MATCHED THENrnINSERT(product, quantity, supply_constrained)rnVALUES(product, quantity, false)rnWHEN MATCHED THENrnUPDATE SET quantity = I.quantity + T.quantity;rn rn#The below procedural script uses a very simple approach to determine excess_inventory based on current inventory being 120% of the average inventory across all products.rnIF EXISTS(SELECT * FROM my_dataset.Inventoryrn WHERE quantity > (1.2 * average_product_quantity)) THENrn INSERT my_dataset.PromotionalSales (product, inventory_on_hand, excess_inventory)rn SELECTrn product,rn quantity as inventory_on_hand,rn quantity – CAST(ROUND((1.2 * average_product_quantity),0) AS INT64) as excess_inventoryrn FROM my_dataset.Inventoryrn WHERE quantity > (1.2 * average_product_quantity);rnEND IF;rn rnSELECT * FROM my_dataset.NewArrivals;rnSELECT * FROM my_dataset.Inventory ORDER BY product;rnSELECT * FROM my_dataset.PromotionalSales ORDER BY excess_inventory DESC;rn#Note the multi-statement SQL temporarily stops here within the session. This runs successfully if you’ve set your SQL to run within a session.”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c4268fad0>)])]From the results of the SELECT statements, we can see the warehouse #1 arrivals were successfully added to our inventory and the PromotionalSales table correctly reflects what excess inventory we have. Looks like these transactions are ready to be committed.However, just in case there were some issues with our expected results, if others were to query the tables outside the session we created, the changes wouldn’t have taken effect. Thus, we have the ability to validate our results and could roll them back if needed without impacting others.code_block[StructValue([(u’code’, u’#Run in a different tab outside the current session. Results displayed will be consistent with the tables before running the multi-statement transaction.rnSELECT * FROM my_dataset.NewArrivals;rnSELECT * FROM my_dataset.Inventory ORDER BY product;rnSELECT * FROM my_dataset.PromotionalSales ORDER BY excess_inventory DESC;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c4268f650>)])]Going back to our configured session, since we’ve validated our Inventory, NewArrivals, and PromotionalSales tables are correct, we can go ahead and commit the multi-statement transaction within the session, which will propagate the changes outside the session too.code_block[StructValue([(u’code’, u’#Now commit the transaction within the same session configured earlier. Be sure to delete or comment out the rest of the SQL text run earlier.rnCOMMIT TRANSACTION;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c42de2590>)])]And now that the PromotionalSales table has been updated for all users, our sales team has some ideas of what products they should promote due to our excess inventory.code_block[StructValue([(u’code’, u’#Results now propagated for all users.rnSELECT * FROM my_dataset.PromotionalSales ORDER BY excess_inventory DESC;’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e4c42de2c10>)])]As you can tell, using multi-statement transactions is simple, scalable, and quite powerful, especially combined with other BigQuery Features. Give them a try yourself and see what’s possible.
Quelle: Google Cloud Platform

Published by