Introducing new SQL functions to manipulate your JSON data in BigQuery

Enterprises are generating data at an exponential rate, spanning traditional structured transactional data, semi-structured like JSON and unstructured data like images and audio. Beyond the scale of the data, these divergent types present processing challenges for developers, at times requiring separate processing flows for each. With its initial release BigQuery’s support for semi-structured JSON eliminated the need for such complex preprocessing and providing schema flexibility, intuitive querying and the scalability benefits afforded to structured data. Today, we are excited to announce the release of new SQL functions for BigQuery JSON, extending the power and flexibility of our core JSON support. These functions make it even easier to extract and construct JSON data and perform complex data analysis.With these new query functions, you can:Convert JSON values into primitive types (INT64, FLOAT64, BOOL and STRING) in an easier and more flexible way with the new JSON LAX functionsEasily update and modify an existing JSON value in BigQuery  with the new JSON  mutator functions.Construct JSON object and JSON array with SQL in BigQuery with the new JSON constructor functions.Let’s review these new features and some examples of how to use them. First, we will create a table for demonstration.code_block[StructValue([(u’code’, u’CREATE TABLE dataset_name.users_sample AS (rn SELECT JSON ‘{“name”: “Alice”, “age”: 28, “address”: {“country”: “USA”, “city”: “SF”, “zipcode”: 94102}}’ AS user UNION ALLrn SELECT JSON ‘{“name”: “Bob”, “age”: “40”, “address”: {“country”: “Germany”}}’ UNION ALLrn SELECT JSON ‘{“name”: “Charlie”, “age”: null, “address”: {“zipcode”: 12356, “country”: null}}’rn)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb666f43750>)])]code_block[StructValue([(u’code’, u’Query:rnrn– Table contentsrnSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);rnOutput:rn+———————————————————————————–+rn| user |rn+———————————————————————————–+rn| {“address”:{“city”:”SF”,”country”:”USA”,”zipcode”:94102},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null,”zipcode”:12356},”age”:null,”name”:”Charlie”} |rn+———————————————————————————–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb666f43790>)])]Great! Let’s say we want to get a list of all users over 30. Looking at the table, you will see that user.age contains a JSON number in the first record, a JSON string in the second, and a JSON null in the third. With the new powerful LAX function, LAX_INT64, all types are automatically inferred and processed correctly.code_block[StructValue([(u’code’, u’Query:rnrnrnSELECT user.name FROM dataset_name.users_samplernWHERE LAX_INT64(user.age) > 30rnOutput:rn+——-+rn| name |rn+——-+rn| “Bob” |rn+——-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665254e50>)])]Unlike the “strict” conversion functions, which require that the JSON type matches the primitive type exactly, the “lax” conversion functions will also handle conversions between mismatched data types. For example, the strict conversion function below would return an error:code_block[StructValue([(u’code’, u’Query:rnrnSELECT INT64(JSON ‘”10″‘) AS strict_int64rnOutput:rnError: The provided JSON input is not an integer’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb6642ce210>)])]However, the lax conversion function below would return the desired result:code_block[StructValue([(u’code’, u’Query:rnrnSELECT LAX_INT64(JSON ‘”10″‘) AS lax_int64rnOutput:rn+———–+rn| lax_int64 |rn+———–+rn| 10 |rn+———–+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb6671f4090>)])]Furthermore, you can quickly and easily remove a field in the JSON data by using JSON_REMOVE functions.code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_REMOVE(user, “$.address.zipcode”)rnWHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“country”:null},”age”:null,”name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665b19190>)])]JSON_STRIP_NULLS compresses the data by removing JSON nulls. Although BigQuery null values neither impact performance nor storage cost, it can be helpful for reducing data size during exports.code_block[StructValue([(u’code’, u’Query:rnrnrnUPDATE dataset_name.users_sample SET user = JSON_STRIP_NULLS(user, remove_empty=>true) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrnrn+——————————————————————-+rn| user |rn+——————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”},”age”:”40″,”name”:”Bob”} |rn| {“name”:”Charlie”} |rn+——————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb66723f6d0>)])]Now, what if we want to modify or add a field to the JSON data? You can now update the data with the new JSON_SET function. And you have the ability to mix and match JSON functions together to achieve desired results. For example, the query below adds a new field, “region_code”, to the table. The value of the field will be “America” if the value of the “country” field is “USA”, and “Other” if it is not.code_block[StructValue([(u’code’, u’– Updating/Adding a field is easy to do as well. The structure will be automatically created (see “Charlie” row)rnrnQuery:rnrnUPDATE dataset_name.users_sample SET user = JSON_SET(user, “$.address.region_code”, IF(LAX_STRING(user.address.country) = “USA”, “America”, “Other”)) WHERE truernAfter the query above; if you run the query u201cSELECT * FROM dataset_name.users_sample ORDER BY STRING(user.name);u201d, you will receive the following Output:rnrn+——————————————————————————————-+rn| user |rn+——————————————————————————————-+rn| {“address”:{“city”:”SF”,”country”:”USA”,”region_code”:”America”},”age”:28,”name”:”Alice”} |rn| {“address”:{“country”:”Germany”,”region_code”:”Other”},”age”:”40″,”name”:”Bob”} |rn| {“address”:{“region_code”:”Other”},”name”:”Charlie”} |rn+——————————————————————————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb66723f110>)])]Last but not least, let’s say you have a table of property/value pairs you want to convert to a JSON object. With the new JSON_OBJECT constructor function, you can effortlessly create the new JSON object.code_block[StructValue([(u’code’, u’Query:rnrnWITH Fruits AS (rnSELECT 0 AS id, ‘color’ AS k, ‘Red’ AS v UNION ALLrnSELECT 0, ‘fruit’, ‘apple’ UNION ALLrnSELECT 1, ‘fruit’,’banana’ UNION ALLrnSELECT 1, ‘ripe’, ‘true’rn)rnSELECT JSON_OBJECT(ARRAY_AGG(k), ARRAY_AGG(v)) AS json_datarnFROM FruitsrnGROUP BY idrnOutput:rn+———————————-+rn| json_data |rn+———————————-+rn| {“color”:”Red”,”fruit”:”apple”} |rn| {“fruit”:”banana”,”ripe”:”true”} |rn+———————————-+’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3eb665591cd0>)])]Complete list of functionsLax conversion functions:LAX_BOOLLAX_INT64LAX_FLOAT64LAX_STRINGJSON constructor functions:JSON_ARRAYJSON_OBJECTJSON mutator functions:JSON_ARRAY_APPENDJSON_ARRAY_INSERTJSON_REMOVEJSON_SETJSON_STRIP_NULLSTry it out!Google BigQuery is constantly adding new features to make it easier and more powerful to analyze your data. We encourage you to check them out and provide your feedback to us as we continue to develop additional features and capabilities to make working JSON easier and faster over time.
Quelle: Google Cloud Platform

Published by