Optimizing Waze ad delivery using TensorFlow over Vertex AI

Waze AdsWaze is the world’s largest community-based traffic and navigation app. As part of its offering, it lets advertisers put their businesses on the Waze map. By doing so, ads on Waze will reach consumers at key moments of their journey. Goals for advertising on Waze include getting customers to business locations, building brand awareness, and connecting with nearby customers at the right moments.Waze uses several ad formats, the most prominent of which is called a “Pin”. Like a store sign, Pins inform and remind customers that a business is on or near their route.Ad Serving @WazeWaze Ads is a reservation platform, which means we commit to a fixed number of ad impressions in advance and then attempt to meet expected delivery based on the actual drives that occur. It is important to note that Waze only shows ads to users in a certain proximity to the advertised business location. Our ads inventory is thus highly correlated with traffic patterns – i.e. where and when people drive with Waze. After we set up an ads campaign, we choose the right time and place so we deliver on our commitment to the advertisers. We also have a planning tool to predict the quantity of sellable ads inventory based on traffic patterns and campaign setup, but that’s something for a different blog post :)Following a locked and launched advertising campaign, “the life of a Waze ad” looks something like this:Mobile client connects to server and “asks for pins to show” [Every few minutes for saving battery – this is important for what comes next]Ad server gets request and scans for a list of candidate pins which advertise businesses in a certain proximity to the user’s locationAd server ranks (and logs) all candidates according to internal logic (e.g. distance)Mobile client gets a ranked list and saves it for later use[Over the next few minutes] – map is shown on screen and client logic has the opportunity to show a pin adMobile client scans the ranked list and displays a suitable number of pins that can fit the map on the user’s screen and are appropriate for its zoom levelMobile client logs successfully displayed adsDid you catch the issue in step 6?Waze is a navigation app, meaning the user is driving!The user’s visible map on screen constantly changes based on their destination, speed, traffic pattern, etc. These screen changes and alignments are important for providing the best user experience while navigating.Upon performing a funnel-like drop analysis, we’ve noticed that step 6, although optimized for distance from the user (step 2) is a place where we lose ads in the funnel. Moreover, the effectiveness of the mobile client to find pins to display (step 6) is a direct result of the ads we choose to send to it (step 3). By making ad ranking (step 3) smarter, we can seamlessly unlock additional pin ads inventory, which would ensure Waze could better uphold its delivery commitments.What would that include though? Predicting where the user is going? Predicting where they’ll be in the next few minutes?Unlocking lost inventory using MLGoogle’s CEO (Sundar Pichai) once said: “Machine learning is a core, transformative way by which we’re rethinking how we’re doing everything”As you can imagine, we’ve naturally approached solving this problem with ML.The problem can easily be formulated as a learning to rank ML problem where we rank candidate ads to maximize the likelihood of ads to be displayed in the mobile client.We can debate the exact optimization goal, but ultimately when we create a list that should serve the mobile client for the next few minutes, we want to meet expected ad delivery (given an even sized candidate list) in that time window.Maximizing Display ProbabilityBy matching the ad server’s logged candidates with the mobile client’s successfully displayed ads, we can create a labeled dataset to be used for supervised learning.As mentioned before, a successful display is based on whether the user’s screen in the next few minutes (after getting the candidate list) will include candidate locations. To optimize that, we need to know the user’s current location, destination, current route (suggested by Waze to follow) and the locations of all candidate pins. We translate the above information to several features to be used in a supervised model.The trained model assigns probabilities for pins to be displayed in real time, which are taken into account in ranking. Note that they are not the sole contributor for ad ranking, as we still have multiple goals in choosing the right ad to show – (e.g. user relevance).We chose to use TensorFlow to power this model. We were motivated by our requirement to perform complex feature engineering on numeric (mostly distance-based) features and our extreme scale requirements to power a real time ad serving use case with millions of predictions per second and a strict requirement on < 70ms end to end latency. As avid GCP users, we’ve used the Vertex AI suite to train and deploy this TF model and easily integrate with the rest of our data stack. The resulting architecture looks something like this:It is worth saying that the above diagram including the clean separation of concerns (based on FCDS philosophy) took a few iterations for us to achieve. We first started with an offline model deployed to Vertex AI models and rigorous A/B testing to demonstrate value before going for full productionization and automation (using TensorFlow Extended (TFX) over Vertex Pipelines) of this flow.ResultsWe launched our integration with Vertex AI to power our display probability model in late 2020. With the display probability score incorporated into ad ranking we observed a lift of up to 19% in pins displayed per session in large markets including the US, Brazil, and France! Vertex AI delivered low latency predictions within our performance parameters and CPU based autoscaling ensured smooth scaling of additional resources as ads traffic changed throughout the day. SummaryBy using ML to rank the display probability of candidate ads we were able to increase the number of reserved impressions delivered per session, helping us keep our delivery commitments to advertisers.  There were many complexities involved in running ML at this scale in Waze. But luckily, thanks to Vertex AI we didn’t have to worry much about scale, latency, or devops and could focus on the ranking side. This was the first integration of such scale at Waze, and it paved the way for many more use cases in Ads, ETA modeling, drive suggestions and more. It allowed Waze to justify going all in on using TFX in Vertex AI.
Quelle: Google Cloud Platform

What type of data processing organization are you?

Every organization has its own unique data culture and capabilities. Yet each is expected to use technology trends and solutions in the same way as everyone else. Your organization may be built on years of legacy applications, you may have developed a considerable amount of expertise and knowledge, yet you may be asked to adopt a new approach based on a technology trend. On the other hand, you may be on the other side of the spectrum, a digitally native organization built with engineering principles from scratch without legacy systems but expected to follow the same principles as process driven, established organizations. The question is, should we treat these organizations in the same way when it comes to data processing? In this series of blogs and papers this is what we are exploring: how to set up an organization from the first principles from data analyst, data engineering and data science point of view. In reality, there is no such organization that is solely driven by one of these but it is likely to be a combination of multiple types. What type of organization you become is then driven by how much you are influenced by each of these principles. When you are considering what data processing technology encompasses, take a step back and make a strategic decision based on your key goals. This can be whether you optimize for performance, cost, reduction in operational overhead, increase in operational excellence, integration of new analytical and machine learning approaches. Or perhaps you’re looking to leverage existing employees’ skills while meeting all your data governance and regulatory requirements. We will be exploring these different themes and will focus on how they guide your decision-making process. You may be coming from technologies which are solving some of the past problems and some of the terminologies may be more familiar, however they don’t scale your capabilities. There is also the opportunity cost of prioritizing legacy and new issues that arise from a transformation effort, and as a result your new initiative can set you further behind on your core business while you play catch up to an ever changing technology landscape. Data value chainThe key for any ingestion and transformation tool is to extract data from a source and start acting on it. The ultimate goal is to reduce the complexity and increase the timeliness of the data. Without data, it is impossible to create a data driven organization and act on the insights. As a result, data needs to be transformed, enriched, joined with other data sources, and aggregated to make better decisions. In other words, insights on good timely data mean good decisions.While deciding on the data ingestion pipeline, one of the best approaches is to look into the volume of data, the velocity of the data, and type of data that is arriving. Other considerations include the number of different data sources you are managing, whether you need to scale to thousands of sources using generic pipelines, whether you want to create one generic pipeline but then apply data quality rules and governance. ETL tools are ideal for this use case as generic pipelines can be written and then parameterized. On the other hand, consider the data source. Can the data be directly ingested without transforming and formatting the data? If the data does not need to be transformed and can be ingested directly into the data warehouse as a managed solution. This not only reduces the operational costs but also allows for more timely data delivery. If the data is coming in through an unstructured format such as XML or in a format such as EBCDIC and needs to be transformed and formatted, then a tool with ETL Capabilities can be used depending on the speed of the data arrival. It is also important to understand the speed and time of arrival of the data. Think about your SLAs and time durations/windows that are relevant for your data ingestion plans. This would not only drive the ingestion profiles but would also dictate which framework to use. As discussed above, velocity requirements would drive the decision-making process.Type of OrganizationDifferent organizations can be successful by employing different strategies based on the talent that they have. Just like in sports, each team plays with a different strategy with the ultimate goal of winning. Organizations often need to decide on what’s the best strategy to take in respect to data ingestion and processing – whether you need to hire an expensive group of data engineers, or exploit your data wizards and analysts to enrich and transform data that can be acted on, or whether it would be more realistic to train the current workforce to do more functional/high value work rather than to focus on building generally understood and available foundational pieces.On the other hand, the transformation part of ETL pipelines as we know it, dictates where the load will be. All of these are made a reality in the cloud native world where data can be enriched, aggregated, and joined. Loading data into a powerful and modern data warehouse means that you can already join and enrich the data using ELT. Consequently, ETL isn’t really needed in its strict terms anymore if the data can be loaded directly into the data warehouse.All of the above was not possible in the traditional, siloed, and static data warehouses and data ecosystems whereby systems would not talk to each other or there were capacity constraints in respect to both storing and processing the data in the expensive Data Warehouse. This is no longer the case in the BigQuery world as storage is now cheap and transformations are now much more capable without constraints of virtual appliances. If your organization is already heavily invested into an ETL tool, one option is to use them to load BigQuery and transform the data initially within the ETL tool. Once the as-is and to-be are verified to be matching, then with the improved knowledge and expertise one can start moving workloads into BigQuery SQL, and effectively do ELT. Furthermore, if your organization is coming from a more traditional data warehouse that extensively relies on stored procedures and scripting, then the question that one may ask is, do I continue leveraging these skills and expertise and use these capabilities that are also provided in BigQuery? ELT with BigQuery is more natural, similar to what’s already in Teradata BTEQ, Oracle PL/SQL but migrating from ETL to ELT requires changes. This change then enables exploiting streaming use cases, such as real-time use cases in retail. This is because there is no preceding step before data is loaded and made available.Organizations can be broadly classified under 3 types as Data Analyst Driven, Data Engineering driven, and Blended organization. We will be covering a Data Science driven organization within the Blended category.   Data Analyst DrivenAnalysts understand the business and are used to using SQL/spreadsheets. Allowing them to do advanced analytics through interfaces that they are accustomed to enables scaling. As a result, easy to use ETL tooling to bring data quickly into the target system becomes a key driver. Ingesting data directly from a source or staging area then also becomes critical as it allows analysts to exploit their key skills using ELT and increases timeliness of the data. This is commonplace with traditional EDWs and realized by extended capabilities of using Stored Procedures and Scripting. Data is enriched, transformed, and cleansed using SQL and ETL tools act as the orchestration tools. The capabilities brought by cloud computing on separation of data and computation changes the face of the EDW as well. Rather than creating complex ingestion pipelines, the role of the ingestion becomes, bringing data close to the cloud, staging on a storage bucket or on a messaging system before being ingested into the cloud EDW. This then releases data analysts to focus on looking into data insights using tools and interfaces that they are accustomed to. Data Engineering / Data Science Driven Building complex data engineering pipelines is expensive but enables increased capabilities. This allows creating repeatable processes and scaling the number of sources. Once complemented with cloud it enables agile data processing methodologies. On the other hand, data science organizations allow carrying out experiments and producing applications that work for specific use cases but are not often productionised or generalized. Real-time analytics enables immediate responses and there are specific use cases where low latency anomaly detection applications are required to run. In other words, business requirements would be such that it has to be acted upon as the data arrives on the fly. Processing this type of data or application requires transformation done outside of the target.All the above usually requires custom applications or state-of-the-art tooling which is achieved by organizations that excel with their engineering capabilities. In reality, there are very few organizations that can be truly engineering organizations. Many fall into what we call here as the blended organization.  Blended orgThe above classification can be used on tool selection for each project. For example, rather than choosing a single tool, choose the right tool for the right workload, because this would reduce operational cost, license cost and use the best of the tools available. Let the deciding factor be driven by business requirements: each business unit or team would know the applications they need to connect with to get valuable business insights. This coupled with the data maturity of the organization would be the key to making sure the right data processing tool would be the right fit. In reality, you are likely to be somewhere on a spectrum. Digital native organizations are likely to be closer to being engineering driven, due to their culture and business that they are in. However, brick and mortar organizations would be closer to being analyst driven due to the significant number of legacy systems and processes they possess. These organizations are either considering or working toward digital transformation with an aspiration of having a data engineering / software engineering culture like Google. The blended organization with strong skills around data engineering, would have built the platform and built frameworks, to increase reusable patterns would increase productivity and then reduce costs. Data engineers focus on running Spark on Kubernetes whereas infrastructure engineers focus on container work. This in turn provides unparalleled capabilities as application developers focus on the data pipelines and even the underlying technologies or platforms changes code stays the same. As a result, security issues, latency requirements, cost demands and portability are addressed at multiple layers. Conclusion – What type of organization are you?Often an organization’s infrastructure is not flexible enough to react to a fast changing technological landscape. Whether you are part of an organization which is engineering driven or analyst driven, organizations frequently look at technical requirements that inform which architecture to implement. But a key, and frequently overlooked, component needed to truly become a data-driven organization is the impact of the architecture on your data users. When you take into account the responsibilities, skill sets, and trust of your data users, you can create the right data platform to meet the needs of your IT department as well as your business.To become a truly data-driven organization, the first step is to design and implement an analytics data platform that meets your technical and business needs. The reality is that each organization is different and has a different culture, different skills, and capabilities. Key is to leverage its strengths to stay competitive while adopting new technologies when it is needed and as it fits to your organization. To learn more about the elements of how to build an analytics data platform depending on the organization you are, read our paper here.Related ArticleJust released: The Google Cloud Next session catalog is live. Build your custom playlists.Google Cloud Next session catalog is liveRead Article
Quelle: Google Cloud Platform

Recommendations AI data ingestion

In our previous post, we presented a high-level picture of Recommendations AI, showing how the product is typically used. In this post, we’ll take a deep dive into the first step of getting started, which is data ingestion. This post will answer all your questions on getting your data into Recommendations AI so you can train models and get recommendations.Recommendations AI uses your product catalog and user events to create machine learning models and deliver personalized product recommendations to your customers. Essentially, Recommendations AI uses a list of items available to be recommended (product catalog) and user’s interactions with those products (events), allowing you to create various types of models (algorithms specifically designed for your data) to generate predictions based on business objectives (conversion rate, click through rate, revenue).Recommendations AI is now part of the Retail API which uses the same product catalog and event data for several Google Retail AI products, like Retail Search.Catalog dataTo get started with Recommendations AI, you will first need to upload your data, starting with your complete product catalog. The Retail API catalog is made up of product entries. Take a look at the full Retail Product schema to see what can be included in a product. The schema is shared between all Retail Product Discovery products, so once you upload a catalog it can be used for Recommendations AI and Retail Search. While there are a lot of fields available in the schema, you can start with a small amount of data per product – the minimal required fields are: id, title, categories. We recommend submitting description and price as well as any custom attributes as well.Catalog levelsBefore uploading any products you may also need to determine which product level to use. By default, all products are “primary”, but if you have variants in your catalog you may need to change the default ingestion behavior. If your catalog has multiple levels (variants), you need to determine if you want to get recommendations back at the primary (group) level or at the variant (sku) level, and also if the events are sent using the primary id or the variant ids. If you’re using Google Merchant Center, you can easily import your catalog directly (see below).  In Merchant Center, the item grouping is done using item_group_id. If you have variants, and you’re not ingesting the catalog from Merchant Center, you just need to make sure your primaryProductId is set appropriately and you set ingestionProductType as needed before doing your initial catalog import.1. Catalog importThere are several ways to import catalog data into Retail API:a. Merchant Center syncMany retailers use Google Merchant Center to upload their product catalogs in the form of product feeds. These products can then be used for various types of Google Ads and for other services like Google Shopping and Buy on Google. But another nice feature of Merchant Center is the ability to export your products for use with other services – BigQuery for example.The Merchant Center product schema is similar to the Retail product schema, so the minimum requirements are met if you do want to use Merchant Center to feed your Retail API product catalog.The easiest way to import your catalog from Merchant Center is to set up a Merchant Center Sync in the Retail Admin Console:Simply go to the Data tab and select Import at the top of the screen. Then as the Source of data select Merchant Center Sync. Add your Merchant Center account # and select a branch to sync to.While this method is easy, there are some limitations. For example, if your Merchant Center catalog is not complete, you won’t be able to add more products directly to the Recommendations catalog – you would need to add them to the merchant center feed and they would then get synced to your Recommendations catalog. This may be easier than maintaining a separate feed for Recommendations however, as you can easily add products to your Merchant Center feed and simply leave them out of your Ads destinations if you don’t want to use them for Ads & Shopping.Another limitation of using Merchant Center data is that you may not have all of the attributes that you need for Recommendations AI. Size, Brand, Color are often submitted to Merchant Center, but you may have other data you want to use for Recommendations model data.Also, you are only able to enable a sync to a catalog branch that has no items. So if you have existing items in the catalog, you would need to delete them all first. b. Merchant Center import via BigQueryAnother option that provides a bit more flexibility is to export your Merchant Center catalog to BigQuery using the BigQuery Data Transfer Service. You can then bulk import that data from BigQuery directly into the Retail API catalog. You are still somewhat limited by the merchant center schema, but it is possible to add additional products from other sources to your catalog (unlike MC Sync which doesn’t allow updating the branch outside of the sync).The direct Merchant Center sync in a) is usually the simplest option, but if you already have a BigQuery DTS job or want to control exactly when items are imported, then this method may be a good option. You also have the flexibility to use a BigQuery view, so you could limit the import to a subset of the Merchant Center data if necessary – a single language or variant to avoid duplicate items for example. Likewise, you could also use unions or multiple tables to import from different sources as necessary.c. Google Cloud Storage importIf your catalog resides in a database or if you need to pull product details from multiple sources, doing an import from GCS may be your easiest option. For this option, you simply need to create a text file with one product per line (typically referred to as NDJSON format)  in the Retail AI JSON Product Schema. There are a lot of fields in the schema, but you can usually just start with the basics. So a very basic sample to import 2 items from a GCS file might look like this:d. BigQuery importJust as you can import products from BQ in the merchant center schema, you can also create a BigQuery table using the Retail product schema. The product schema definition for BigQuery is available here. The Merchant Center Big Query schema can be used whether or not you transfer the data from Merchant Center, but it is not the full schema for retail. It doesn’t include custom attributes for example. So using the Retail Schema allows you to import all possible fields.Importing from BigQuery is useful if your product catalog is already in BigQuery. You can also create a view that matches the Retail schema, and import from the view, pulling data from existing tables as necessary.For Merchant Center, Cloud Storage and BigQuery imports, the import itself can be triggered through the Admin Console UI, or via the import API call. When using the API, the schema needs to be specified with the dataSchema attribute as product or product_merchant_center accordingly.e. API import & product managementYou can also import and modify catalog items directly via API. This is useful to make changes to products in realtime for example, or if you want to integrate with an existing catalog management system. The inline import method is very similar to GCS import: you simply construct a list of products in the Retail Schema format, and call the products.import method API to submit the products. Like with GCS, existing products are overwritten and new products are created. Currently the import method can import up to 100 products per call.There is also the option to manage products individually with the API, using get, create, patch, and delete methods.All of the API calls can be done using HTTP/REST or gRPC, but using the retail client libraries for the language of your choice may be the easiest option. The documentation currently has many examples using curl with the REST API, but the client libraries are usually preferred for production use.2. Live user eventsOnce your catalog is imported you’ll need to start sending user events to the Retail API. Since recommendations are personalized in real-time based on recent activity, user events should be sent in real-time as they are occurring. Typically, you’ll want to start sending live, real-time events and then optionally backfill historical events before training any  models.There are currently 4 event types used by the Recommendations AI models:detail-page-viewadd-to-cartpurchase-completehome-page-viewNot all models require all of these events, but it is recommended to send all of these if possible.Note the “minimum required” fields for each event. As with the product schema, the user event schema also has many fields, but only a few are required. A typical event might looks like this:There are 3 ways you can send live events to Recommendations:a. Google Tag ManagerIf you are already using Google Tag Manager and are integrated with Google Analytics with Enhanced Ecommerce, then this will usually be the easiest way to get real-time events into the Retail API. We have provided a Cloud Retail tag in Google Tag Manager that can easily be configured to use the Enhanced Ecommerce data layer, but you can also populate the cloud retail data layer, and use your own variables in GTM to populate the necessary fields. Detailed instructions for setting up the cloud retail tag can be found here. Set up is slightly different depending on if you are using GA360 or regular Google Analytics, but essentially you just need to provide your Retail API key, project number, and then set up a few variable overrides to get visitorId, userId and any other fields that aren’t provided via Enhanced Ecommerce.The Cloud Retail Tag doesn’t require Google Analytics with Enhanced Ecommerce, but you will need to populate a data layer with the required fields or be able to get the required data fields GTM variables or existing data layer variables. A typical Cloud Retail tag configuration in GTM might look something like this:b. JavaScript pixelIf you’re not currently using Google Tag Manager, an easy alternative is to add our JavaScript pixel to the relevant pages on your site. Usually this would be the home page, product details pages and cart pages.Configuring this will usually require adding the javascript code along with the correct data to a page template. It may also require some server-side code changes depending on your environment.c. API write methodAs an alternative to GTM or the tracking pixel which sends events directly from the user’s browser to the Retail API, you can also opt to send events server-side using the userEvents.write API method. This is usually done by service providers that want to have an existing event handling infrastructure in their platform.3. Historical eventsAI models tend to work best with large amounts of data. There are minimum event requirements for training Recommendations models, but it is usually advised to submit a year’s worth of historical data if available. This is especially useful for retailers with high seasonality. For a high-traffic site, you may gather enough live events in a few days to start training a model, even so it’s usually a good idea to submit more historical data. You’ll get higher quality results without having to wait for events to stream in over weeks or months.Just like the catalog data, there are several ways to import historical event data:a. GA360 importIf you are using GA360 with Enhanced Ecommerce tracking you can easily export historical data into BigQuery and then import directly into the Retail API. Regular Google Analytics does not have an export functionality, but GA360 does. Using this export feature you can easily import historical events from GA360 into Retail API.b. Google Cloud Storage importIf you have historical events in a database or logs you can also write them out to files in NDJSON format and import those files from Cloud Storage. This is usually the easiest method of importing large number of events, since you simply have to write JSON to text files and then they can be imported directly from Google Cloud Storage.Just as with catalog import, the lines in each file simply need to be in the correct JSON format, in this case the JSON event format.The import can be done with the API, or in the cloud console UI, simply enter the GCS bucket path for your file:c. BigQuery importEvents can be read directly from BigQuery in the Retail Event Schema or in GA360 Event Schema. This method is useful if you already have events in BigQuery, or prefer to use BigQuery instead of GCS for storage.Since each event type is slightly different, it may be easiest to create a separate table for each event type.As with the GCS import, the BigQuery import can also be done using the API or in the cloud console UI by entering the BigQuery table name.d. API import & writeThe userEvents.write method used to do realtime event ingestion via API can also be used to write historical events. But for importing large batches of events the userEvents.import method is usually a better choice since it requires less API calls. The import method should not be used for real-time event ingestion since it may add additional processing latency.Keep in mind that you should only have to import historical events once, so the events in BigQuery or Cloud Storage can usually be deleted after importing. The Retail API will de-duplicate events that are exactly the same if you do accidentally import the same events.4. Catalog & event data qualityAll of the methods above will return errors if there are issues with the products or events in the request. For the inline and write methods errors will be returned immediately in the API response. For the BigQuery, Merchant Center & Cloud Storage imports error logs can be written to a GCS bucket, and there will be some details in the Admin Console UI. If you look at the Data section in the Retail Admin Console UI there are a number of places to see details about the Catalog data:The main Catalog tab shows the overall catalog status. If you click the VIEW link for Data quality you will see some more detailed metrics around key catalog fields:You can also click the Import Activity or Merchant Center links on the top of the page to view the status of the past imports or change your Merchant Center linking (if necessary).Commonly seen errorsBy far the most important metric is “Unjoined Rate”. An “unjoined” event is one in which we received an item id that was not in the catalog. This can be caused by numerous factors: outdated catalog, errors in event ingestion implementation, perhaps the events are for variant id’s but the catalog only has primary id’s, etc. To view the event metrics click on the Data > Event tab:Here you can see various errors over time. Clicking on the error will take you into cloud logging where you can see the full error response and determine exactly why a specific error occurred.Training modelsOnce your catalog & events are imported you should be ready to train your first model. Check the Data > Catalog & Data > Event tabs as shown above. If your catalog item count has the correct number of in-stock items for your inventory, the total number of events ingested, unjoined rate, and days with joined events are sufficient, you should be ready to train a model.Tune in for our next post for more details!Related ArticleHow to get better retail recommendations with Recommendations AIRecommendations AI is a solution that uses machine learning to bring product recommendations to their shoppers across any catalog or clie…Read Article
Quelle: Google Cloud Platform

Deployment models for the Cloud Spanner emulator

This is the first of a three-part series of blog posts, which together will form a solution guide for developers using the Cloud Spanner emulator. In this series, after a quick introduction to the Cloud Spanner emulator, we will explore the usage of the emulator with a Cloud Spanner sample application called OmegaTrade. We will manually deploy the OmegaTrade application’s backend service with a Cloud Spanner emulator backend instead of the Cloud Spanner instance and compare the pros and cons of running an emulator locally vs a remote GCE instance vs Cloud Run. But first, let’s talk about what the Cloud Spanner emulator is and how it can simplify your development process.Overview Cloud Spanner is a fully managed, distributed relational database offered on Google Cloud. Since its launch in 2017, Cloud Spanner has seen great interest and adoption, with customers from industries like gaming, retail, social media, and financial services running production workloads on Cloud Spanner. In addition to the recently-announced support for more granular instance sizing, which can be very handy for small or non-production workloads, Cloud Spanner offers a no-cost option that supports light-weight, offline environments, such as Continuous Integration and Continuous Delivery/Deployment (CI/CD).This option is the Cloud Spanner emulator, which enables application developers to emulate an instance of Cloud Spanner locally for development and testing.Introduction to the Cloud Spanner emulator The Cloud Spanner emulator enables a no-cost experience for developers to quickly build and test applications with Cloud Spanner without the need for a GCP project, a billing account, or even an Internet connection. The emulator provides the same APIs as the Cloud Spanner production service with some limitations. An important thing to note is that the emulator is in-memory and does not persist data across restarts. All of the configuration, schema, and data are lost upon a restart. The emulator is intended for local development and testing use cases – e.g., stand it up for a test suite with a known state, run the tests, verify its new state, then shut it down. It is also possible to deploy an emulator on a remote GCE instance and use it as a database for learning, development and testing purposes in a shared environment. The Cloud Spanner emulator should not be used for performance testing and while the emulator can be used in development and test environments, it is still a good idea to verify your application by running it against the Cloud Spanner service before it is deployed to production. For a complete list of supported features and limitations of the emulator, take a look at the README file in GitHub. For a more detailed introduction to the Cloud Spanner emulator, read this post. Running the Cloud Spanner emulator There are multiple options available for running the emulator – a pre-built Docker image, pre-built Linux libraries, Bazel, or by building a custom Docker image. Below is a comparison between multiple ways of provisioning and starting the emulator and their respective use cases.OptionUse caseCostRun the emulator locally:Docker imagegcloud commandsLinux binariesBazelFor most of the development and testing use cases, this is the quickest way to get started. All configurations, schema, data etc are lost upon a restart of the emulator process(es).FreeDeploy the emulator on a Remote GCE instance: Manual / gcloud deploymentProvision via TerraformProvide a free (in terms of Spanner cost) experience of Cloud Spanner for multiple team members Although the configurations, schema and data are lost on a restart of the GCE VM or emulator services, the services on the remote instance allow multiple developers to troubleshoot a specific problem collaborativelyIn organizations that have segregation of Dev, Test and Prod GCP projects, a remote deployment on a GCE instance within the VPC can serve as a Cloud Spanner or Test environment, which can result in cost savings Continuous unit and integration tests can be run against the Cloud Spanner emulator, which saves costs. However, note that the emulator service is not intended for performance testsIn a GitOps driven environment, provisioning the emulator using a Terraform template can make it easier to set up CI pipelines Minimal (emulator itself is free, cost depends on the GCE instance type chosen)Deploy the emulator on Cloud Run Since the emulator is available as a pre-built image on GCR, it can be deployed on Cloud Run as a service. Note that you can only bind one port on Cloud Run and the emulator has two services – a REST server that runs on a default port 9020 and a gRPC server that runs on port 9010Cloud Run can be a good choice if you want to use either the REST gateway or gRPCAll client libraries are built on gRPC. Cloud Run supports gRPC (after enabling HTTP2). If your application uses client libraries or the RPC API, an emulator can be deployed to accept connections on port 9010  If you would like to use the REST interface alone, you can configure Cloud Run to send requests to port 9020Minimal (emulator itself is free, so the only cost is that of running this service on Cloud Run) Sample Cloud Spanner application: OmegaTradeThroughout this series, we will use an application called OmegaTrade to demonstrate the configuration of the Cloud Spanner emulator both locally and on a remote GCE instance or Cloud Run. OmegaTrade is a stock chart visualization tool built in NodeJS with a Cloud Spanner database backend. For this series of blogs, we will use the backend service of the OmegaTrade app along with the Cloud Spanner emulator. You can find the sample app repository here. To learn more about the app and its features relevant for Cloud Spanner, see this blog post.  Coming soonIn the next part, we will learn about running the Cloud Spanner emulator locally and containerizing and deploying the sample app on an emulator running locally. Stay tuned!Related ArticleDeploying a Cloud Spanner-based Node.js applicationWe illustrate how to deploy a Node.js application on Cloud Spanner, and cover a few important Cloud Spanner concepts along the way.Read Article
Quelle: Google Cloud Platform