All you need to know about Firestore: A cheatsheet

Your product teams might ask – “Why does it take so long to build a feature or application?” Building applications is a heavy lift due to the technical complexity, which includes the complexity of backend services that are used to manage and store data. Every moment focused on this technical complexity is a distraction from  delivering on core business value. Firestore alters this by having Google Cloud manage your backend complexity through a complete backend-as-a-service! Firestore is a serverless, NoSQL document database that unlocks application innovation with simplicity, speed and confidence. It acts as a glue that intelligently brings together the complete Google Cloud backend ecosystem, in-app services from Firebase and core UI frameworks & OS from Google.Click to enlargeWhat is Firestore?Firestore is a serverless, fully managed NoSQL document database that scales from zero to global scale without configuration or downtime. Here’s what makes Firestore unique:Ideal for rapid, flexible and scalable web and mobile development with direct connectivity to the database.Supports effortless real time data synchronization with changes in your database as they happen. Robust support for offline mode, so your users can keep interacting with your app even when the internet isn’t available or is unreliable.Fully customizable security and data validation rules to ensure the data is always protectedBuilt-in strong consistency, elastic scaling, high performance & best in class 99.999% availability Integration with Firebase and Google Cloud services like Cloud Functions and BigQuery, serverless data warehouse.In addition to a rich set of Google Cloud service integrations, Firestore also offers deep one-click integrations with a growing set of3rd party partners via Firebase Extensions to help you even more rapidly build applications.Document-model databaseFirestore is a Document-model database. All of your data is stored in “documents” and then “collections”.  You can think of a document as a JSON object. It’s a dictionary with a set of key-value mappings, where the values can be several different supported data types including strings, numbers or binary values..These documents are stored in collections. Documents can’t directly contain other documents, but they can point to subcollections that contain other documents, which can point to subcollections, and so on. This structure brings with it a number of advantages. For starters, all queries that you make are shallow, meaning that you can grab a document without worrying about grabbing all the data underneath it. And this means that you can structure your data hierarchically in a way that makes sense to you logically, without having to worry about grabbing tons of unnecessary data. How to use Firestore?Firestore can be used in two modes:Firestore in Native Mode:  This mode is differentiated by its ability to directly connect your web & mobile app to Firestore. Native Mode supports up to 10K writes per second, and over a million connections. Firestore in Datastore Mode: This mode supports only server-side usage of Firestore, but supports unlimited scaling, including writes. ConclusionWhatever your application use case may be, if you want to build a feature or an application quickly using Firestore backend-as-a-service. For a more in-depth look into Firestore check out the documentation.For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.dev.Related ArticleCloud Firestore explained: for users who never used Firestore beforeIn this article, I will break down some database basics, terms you should know, what Firestore is, how it works, how it stores data, and …Read Article
Quelle: Google Cloud Platform

Let Google Cloud’s predictive services autoscale your infrastructure

At Google Cloud, we believe you get most benefits from the cloud when you scale infrastructure based on changing demand. Compute Engine allows you to configure autoscaling to save costs during periods of low demand, and add capacity to support peak loads. When you use a managed instance group (MIG), you can have an autoscaler automatically create or delete virtual machine (VM) instances based on increases or decreases in load. However, if your application takes several minutes to initialize, creating VMs in response to growing load might not increase your application’s capacity quickly enough. For example, if there’s a large increase in load (like when users first wake up in the morning), some users might experience delays while your application is initializing on new instances.A good way to solve this problem would be to create VMs ahead of demand so that your application has enough time to initialize beforehand. This requires knowing upcoming demand. If only we could predict the future… Well, now we can!Introducing predictive autoscalingPredictive autoscaling uses Google Cloud’s machine learning capabilities to forecast capacity needs. It creates VMs ahead of growing demand allowing enough time for your application to initialize.Figure 1. Autoscaling creates VMs as demand grows leaving no buffer for application to initialize. Predictive autoscaling creates VMs ahead of demand allowing enough time for your application to initialize and start serving new load.How does it work?Predictive autoscaling uses your instance group’s CPU history to forecast future load and calculate how many VMs are needed to meet your target CPU utilization. Our machine learning adjusts the forecast based on recurring load patterns for each MIG. You can specify how far in advance you want autoscaler to create new VMs by configuring the application initialization period. For example, if your app takes 5 minutes to initialize, autoscaler will create new instances 5 minutes ahead of the anticipated load increase. This allows you to keep your CPU utilization within the target and keep your application responsive even when there’s high growth in demand. Many of our customers have different capacity needs during different times of the day or different days of the week. Our forecasting model understands weekly and daily patterns to cover for these differences. For example, if your app usually needs less capacity on the weekend our forecast will capture that. Or, if you have higher capacity needs during working hours, we also have you covered.Why should you try it?Predictive autoscaling continuously adapts forecasted capacity to best match upcoming demand. Autoscaler checks the forecast several times per minute and creates or deletes VMs to match its prediction. The forecast itself is updated every few minutes to match recent load trends so if your growth rate is higher or lower than usual we will adjust the forecast accordingly. This gives you capacity needed to cover peak load while saving on cost when demand goes down. You can start using predictive autoscaling without worry as it’s fully compatible with the current autoscaler. Autoscaler will calculate enough VMs to cover both forecasted as well as real-time CPU load—whichever is higher. This works with other autoscaling features as well: you can scale based on schedule, your Load Balancer request target or Cloud Monitoring metrics. Autoscaler provides enough capacity to all of your configurations by taking the highest number of VMs needed to meet all your targets.Getting startedYou can enable predictive autoscaling in the Google Cloud Console. Select an autoscaled MIG from the instance groups page and click Edit group. Change predictive autoscaling configuration from Off to Optimize for availability.To better understand whether predictive autoscaling is good for your application, click the link See if predictive autoscaling can optimize your availability. This will show you a comparison of the last seven days with your current autoscaling configuration vs. with predictive autoscaling enabled.In the above chart, Average VM minutes overloaded per day shows how often your VMs exceed your CPU utilization target. This happens when demand is higher than available capacity. Predictive autoscaling can reduce this by starting VMs ahead of anticipated load. Average VMs per day is a proxy for cost. This shows how much additional VM capacity you need to keep your CPU utilization within the target you have set. You can optimize your cost by adjusting Minimum instancesand CPU utilization as explained below. Optimizing your configurationMake sure your Cool down period reflects how long it takes for your application to initialize from VM boot time until it’s ready to serve the load. Predictive autoscaling will use this value to start VMs ahead of forecasted load. If you set it to 10 minutes (600 seconds) your VMs will start 10 minutes before the load is expected to increase.Review your autoscaling CPU utilization target and Minimum number of instances. With predictive autoscaling you no longer need a buffer to compensate for the time it takes for a VM to start. If your application works best at 70% CPU utilization you don’t need to set target to a much lower value as predictive autoscaling will start VMs ahead of usual load. A higher CPU utilization and lower Minimum number of instances allows you to reduce the cost as you don’t need to pay for additional capacity to prepare for growing demand.Try predictive autoscaling todayPredictive autoscaling is generally available across all Google Cloud regions. For more information on how to configure, simulate and monitor predictive autoscaling, consult the documentation.Related ArticleAt your service! With schedule-based autoscaling, VMs are at the readySchedule-based autoscaling for Compute Engine lets you improve the availability of your workloads by scheduling capacity ahead of anticip…Read Article
Quelle: Google Cloud Platform

Kinguin helps shoppers find products faster with Recommendations AI

Over 2.14 billion people worldwide are expected to buy online this year, according to Statista. Online retail sales will account for 22% of all purchases by 2023. But in a competitive retail landscape, positive interactions can mean the difference between a sale and an abandoned shopping cart.One of the leading global marketplaces – Kinguin.net is a haven for gamers. Their bustling ecommerce business conducts over 500,000 new transactions monthly. Users will encounter over 50,000 unique digital products, from video games, gift cards, in-game items to computer software and services. With over 10 million registered users, Kinguin improved their experience by helping users find items quickly and deliver service at scale.Helping customers find what they want, fastBecause of Kinguin’s high volume of users—both buyers and sellers—and breadth of digital products, browsing and shopping can be challenging. “Customers shop online for choice and convenience, but it can sometimes be overwhelming. We want anyone who shops at Kinguin to find what they are looking for quickly and easily,” says Viktor Romaniuk Wanli, Kinguin CEO and Founder.Today’s retailers know that creating personalized shopping experiences is crucial for establishing and maintaining customer loyalty. Kinguin discovered their users were getting a rather standard retail experience. They wondered how they could offer them a more tailored, personalized experience.They knew product recommendations were a great way to personalize experiences because they help customers discover products that match their tastes and preferences. But it’s not that easy to recommend products. Various shifting factors make recommendations much more complex:Customer behavior. Understanding customers is tough. How do you recommend something to a cold start user who’s never been to your site before? What happens when their behavior changes?Omnichannel context. According to Harvard Business Review, 73% of all customers use many channels when they buy. What happens when they go from desktop to mobile or from social media shopping to a proprietary app?Product data challenges. How do you recommend new products within a large catalog of items? What if your product data has sparse labeling or unstructured metadata?Data wasn’t a problem for Kinguin. They had data orders, history, wishlists, and could collect events based on their platform interactions. It was the machine learning model expertise they lacked. So rather than building their own solution, they determined it was more cost effective for them to find a reliable partner. It was also essential that the solution integrated easily with Kubernetes, which enabled their global network.With these considerations in mind, they applied for the Google Recommendations AI beta program. Kinguin became the first gaming e-commerce platform in Europe to use Recommendations AI when it launched in 2020.Pro gamer move: using a fully managed AI service Google Recommendations AI uses algorithms to deliver highly personalized suggestions tailored to a customer’s preferences. Google Cloud based these algorithms on the same research that powers models by YouTube search and Google Shopping. Algorithms are always being tuned and adjusted to focus on individuals themselves—not just items.Many shopping AIs rely on manually provisioning infrastructure and training machine learning models. Instead, Recommendations AI’s deep learning models use item and user metadata to gain insights. It processes Kinguin’s thousands of products at scale, iterating in real time. First, Kinguin pieces together a customer’s history and shopping journey. Then, using Recommendations AI, they can serve up personalized products—even for long-tail products and cold-start users. By leveraging internal tools, Kinguin didn’t need to start implementation from scratch. After a few trial sessions with Google Cloud engineers, they got started right away. Due to the fast-paced nature of a marketplace—i.e., price changes, out-of-stock items—Kinguin needed their recommendations to be as close to real time as possible. They used internal event buses to stream events and their product catalog directly to the recommendations API.Kinguin rolled out in high-traffic areas, including their home page, product page, and category pages. They analyzed heat maps and scroll maps to figure out where to test placements. They also experimented with different recommendation models such as “recently bought together” and “you may like.” Engineers also factored in where they were implementing the models. For example, the “others you might like” model would fit best on the homepage, while “frequently bought together” made sense at checkout.Understanding how product recommendations influence financials is critical for demonstrating the impact of personalization. Using BigQuery, Kinguin could analyze different cost projection models. BigQuery helped them dig into specific financial data to understand their margins and revenue gains.Playing to win: enhanced customer experienceSince adopting Recommendations AI, Kinguin has improved both customer experience and satisfaction. Search times have shortened by 20 seconds. Additionally, their average cart value has increased by 5 EUR. Conversion rates have quadrupled since the outset. Click-thru rates have doubled, increasing by 2.16 on product pages and 2.8 times on recommendations pages.“Google Recommendations AI has helped us evolve our service, increase customer loyalty and satisfaction. It has also contributed to a significant rise in sales,” says Wanli. Kinguin is already thinking about other ways of enhancing user experiences with recommendations. Ideas include their checkout process, other landing pages, and email marketing.Kinguin’s journey with Google Cloud shows how companies can leverage AI to optimize sales and deliver high-performing, low-latency recommendations to any customer touchpoint. Learn more about Recommendations AI andGoogle Cloud AI and machine learning solutions.Related ArticleAI in Retail: Google Cloud transforms Cartier’s product search technologyWith Google Cloud, Cartier developed an application to identify any watch ever designed in its 174-year history using visual recognition …Read Article
Quelle: Google Cloud Platform

BigQuery admin reference guide: Tables & routines

Last week in our BigQuery Reference Guide series, we spoke about the BigQuery resource hierarchy – specifically digging into project and dataset structures. This week, we’re going one level deeper and talking through some of the resources within datasets. In this post, we’ll talk through the different types of tables available inside of BigQuery, and how to leverage routines for data transformation. Like last time, we’ll link out to the documentation so you can learn more about using these resources in practice. What is a table?A BigQuery table is a resource that lives inside a dataset. It contains individual records organized into rows, with each record composed of columns (also called fields) where a specified data type is enforced. BigQuery supports numerous differentdata types including GEOGRAPHY for geospatial data, STRUCT and ARRAY for more complex data, and new parameterized data types to add specific constraints like the number of characters in a string. Data access can also be controlled at the table, row and column levels; more details on data governance will be covered later in the series. Metadata, such as descriptions and labels, can be used for surfacing information to end users and as tags for monitoring. You can create and manage a table directly in the UI, through the API / Client SDKs or in a SQL query using a DDL statement.Managed and external tablesManaged tables are tables that are backed by native BigQuery storage, which has many benefits that improve query performance including support for partitions and clusters. We’ll cover more details on BigQuery storage later in this series. Another advantage of using a managed table is that BigQuery allows you to use time travelto access data from any point within the last seven days and query data that was updated, expired or deleted. And now you can even create a snapshot of your table, to preserve its contents at a given time. While managed tables store data inside BigQuery storage, external tables are backed by storage external to BigQuery. BigQuery currently supports creating an external table from Cloud Storage, Cloud Bigtable and Google Drive. Besides an external table, you can create a connection to Cloud SQL, which is somewhat analogous to an external dataset. Here, you can leverage federated queriesto send a query that executes in Cloud SQL but returns the results to be used within BigQuery.Using external tables or federated queries may result in queries that aren’t as fast as if the data had been stored in BigQuery itself. However, they can be useful for some data transformation patterns –  for example, you may want to schedule a  DDL/DML query that hydrates a managed table using a federated query, which selects and transforms data from Cloud SQL. An external table might also be useful for multi-consumer workflows where BQ storage isn’t the source of truth. Like, if you have a dataproc cluster accessing data in a Cloud Storage bucket that you’re not quite ready to port into BigQuery (although I do recommend taking a look at our connector if you need some convincing). You can learn more about querying external data in this video. Logical and materialized viewsIn BigQuery, you can create a virtual table with a logical view or a materialized view. With logical views, BigQuery will execute the SQL statement to create the view at run time, it will not save the result anywhere. Additionally, you can grant users access to an authorized view to share query results without giving them access to the underlying tables. On the other hand, materialized viewsare re-computed in the background when the base data changes. No user action is required – they are always fresh! Better yet, if a query, or part of a query, against the source table can be resolved by querying the materialized view, BigQuery will reroute for improved performance. However, materialized views use a restricted SQL syntax and a limited set of aggregation functions. You can find details on limitations here.  Temporary and cached results tablesAside from the tables we’ve mentioned so far, you can also create atemporary managed tableusing the TEMP or TEMPORARY keyword. This table is saved in BigQuery storage and can be referenced for the duration of the script. Temporary tables can be a good alternative to WITH clauses because the defining query is only executed  once as opposed to being inlined every place the alias is referenced.Original codeOptimizedwith a as (  select …),b as (  select … from a …),c as (  select … from a …)select   b.dim1, c.dim2from  b, c;create temp table a asselect …;with b as (  select … from a …),c as (  select … from a …)select   b.dim1, c.dim2from  b, c;It’s also important to mention that BigQuery writes all query results to a table – one either explicitly identified by the user or to a cached results table. Temporary, cached results tablesare maintained per-user, per-project. There are no storage costs for temporary tables.User defined functions & proceduresIn BigQuery, a routine is either a user defined function (UDF) or a procedure. Routines allow you to re-use logic and handle your data in a unique way. A UDF is a function that is created using either SQL or Javascript, it takes arguments as input and returns a single value as an output. UDFs are often used for cleaning or re-formatting data. For example, extracting parameters from a URL string,  restructuring nested data, or cleaning up strings:We even have a community driven open-source repositoryof BigQuery UDFs! Just like logical views, you can create an authorized UDF that protects aspects of the underlying data. For more details on UDFs checkout our video here. You might also want to take a look attable functions – a preview feature where you can create a SQL UDF that returns a table instead of a scalar value. Procedures, on the other hand, are blocks of SQL statements that can be called from other queries. Unlike UDFs, stored procedures can return multiple values or no values – which means you can run them to create or modify tables. In BigQuery, you can also leverage scripting capabilitieswithin procedures to control execution flow with IF and WHILE statements. Plus, you can call your UDFs within your procedure! These aspects make procedures great for extract-load-transform (ELT) driven workflows.To ensure consistent analytics across your organization, I recommend that you create a library dataset to house UDFs and procedures. You can easily grant everyone in your organization the BigQuery Data Viewer roleto the library dataset so that all analysts use consistent and up-to-date logic in their queries. Stay tuned!We hope this gave you an understanding of how to leverage some of the different resources inside of a BigQuery dataset, and to help you make decisions like using native versus external storage, logical versus materialized views, and user defined functions or procedures. Next up we’ll be talking about workload management in BigQuery by taking a look at jobs and the reservation model. Be sure to keep an eye out for more in this series by following me on LinkedIn and Twitter, and subscribing to our Youtube channel.Related ArticleThe BigQuery admin reference guide: Resource HierarchyLearn about the BigQuery Resource Hierarchy, and how to structure Projects, in the first part of our series to help BigQuery administrato…Read Article
Quelle: Google Cloud Platform

The ultimate App Engine cheat sheet

App Engine is a fully managed serverless compute option in Google Cloud that you can use to build and deploy low-latency, highly scalable applications. App Engine makes it easy to host and run your applications. It scales them from zero to planet scale without you having to manage infrastructure. App Engine is recommended for a wide variety of applications including web traffic that requires low-latency responses, web frameworks that support routes, HTTP methods, and APIs.Click to enlargeEnvironmentsApp Engine offers two environments; here’s how to choose one for your application:App Engine Standard – Supports specific runtime environments where applications run in a sandbox. It is ideal for apps with sudden and extreme traffic spikes because it can scale from zero to many requests as needed. Applications deploy in a matter of seconds. If your required runtime is supported and it’s an HTTP application, then App Engine Standard is the way to go.App Engine Flex – Is open and flexible and supports custom runtimes because the application instances run within Docker containers on Compute Engine. It is ideal for apps with consistent traffic and regular fluctuations because the instances scale from one to many. Along with HTTP applications it also supports applications requiring WebSockets. The max request timeout is 60 minutes. How does it workNo matter which App Engine environment you choose, the app creation and deployment process is the same. First write your code, next specify the app.yaml file with runtime configuration, and finally deploy the app on App Engine using a single command: gcloud app deploy.Notable featuresDeveloper friendly – A fully managed environment lets you focus on code while App Engine manages infrastructure. Fast responses – App Engine integrates seamlessly with Memorystore for Redis enabling distributed in-memory data cache for your apps.Powerful application diagnostics – Cloud Monitoring and Cloud Logging help monitor the health and performance of your app and Cloud Debugger and Error Reporting help diagnose and fix bugs quickly. Application versioning – Easily host different versions of your app, and easily create development, test, staging, and production environments.Traffic splitting – Route incoming requests to different app versions forA/B tests incremental feature rollouts, and similar use cases.Application security – Helps safeguard your application by defining access rules with App Engine firewall and leverage managed SSL/TLS certificates by default on your custom domain at no additional cost.ConclusionWhether you need to build a modern web application or a scalable mobile backend App Engine has you covered. For a more in-depth look, check out the documentation. Click here for demos on how to use serverless technology and free hands-on training.App Engine in a minuteFor more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow me on Twitter @pvergadia and keep an eye out on thecloudgirl.dev.Related ArticleCurious about Google Cloud Bare Metal Solution? Start here.Bare Metal solution helps you modernize your specialized Oracle workloads by providing an easier and a faster migration path while mainta…Read Article
Quelle: Google Cloud Platform

New research: Enterprises more confident than ever in cloud security

Cloud-based solutions were a technology life raft for organizations during the COVID-19 pandemic as employees took to the virtual office and companies scrambled to adjust to a distributed, remote reality. However, these rapid and substantial changes in the role of cloud technologies on the business came with an increased focus on security. The accelerated move to the cloud also meant companies needed to rapidly evolve existing security practices to protect everything that matters at the core of business—from their people and their operational and transactional data to customers and their most sensitive personal information. Suddenly, enterprises were keenly aware of where business practices, employee training, and security policies were falling short. A recent Google-commissioned study by IDG explored the details behind the heightened focus on security solutions since the start of the pandemic while highlighting the role cloud-based security solutions are playing in helping keep customers safe. The survey of 2,000 global IT leaders serves to illustrate that in this new and unfamiliar world, enterprises are more ready than ever to embrace cloud security.Related ArticleRead ArticleSecurity is an even higher priority post-pandemicIn the wake of the pandemic, many organizations are facing a broader attack surface than ever before as employees moved to temporarily working from remote home offices (and in some cases, encouraged to stay there for the foreseeable future). With fewer inherent security protections on personal internet connections and more work meetings happening via video conferencing, attackers have launched a cyber pandemic of their own designed to take advantage of and exploit new weaknesses. However, even as businesses amp up security initiatives and preventative measures, the growing wave of threats continue to keep security top of mind for IT leaders. Security risks and concerns remain one of the top pain points impeding innovation according to the IDG study respondents—only surpassed by insufficient IT and developer skills. Enterprises looking to cloud providers for help with securityAs a result, addressing security risks is a leading area where IT leaders turn to cloud providers for support. For these organizations, the ability to control access to data while using cloud services was the most required infrastructure security and compliance features from a cloud provider.Cloud security is more trusted than ever A deeper look into the results also revealed a shift in perspective about whether cloud security is really up to the task of protecting enterprises against modern attacks. Despite skepticism in the past, the majority of IT leaders are now comfortable with using cloud-based security solutions. Confidence in the security of cloud infrastructure is extremely high with 85% of respondents stating they feel secure (or more secure) than on-premises infrastructure—compared to just 15% who believe on-premises is still safer.This is a clear indication that there are fewer reservations around the efficacy of cloud-based security solutions, signaling an increase in trust as organizations invest in cloud-based infrastructure and solutions. We are committed to safe, secure solutions Google Cloud protects your data, applications, and infrastructure, as well as your customers, from fraudulent activity, spam, and other types of online abuse. We protect you against a growing list of cybersecurity threats using the same infrastructure foundation and security services that we use for our own operations, so you never have to compromise between ease of use and advanced security. To learn more about the IDG findings and how IT leaders are addressing security concerns post-COVID, download the full report.Interested in how Google Cloud’s commitment to providing safe, secure solutions helps you address your security needs?Our networking, data storage, and compute services encrypt data at rest and in transit to ensure the integrity, authenticity, and privacy of your data and your customers’ data. We also offer the ability to encrypt data in use, while it’s being processed in VM and container workloads, and our advanced security tools support compliance and data confidentiality with minimal operational overhead.Related ArticleBest practices to protect your organization against ransomware threatsRansomware attacks are growing in frequency and sophistication. Create a foundation to protect yourself from them with these five strateg…Read Article
Quelle: Google Cloud Platform

Choosing the right machine learning approach for your application

Many of our customers want to know how to choose a technology stack for solving problems with machine learning (ML). There are many choices for these solutions available, some that you can build and some that you can buy. We’ll be focusing on the build side here, exploring the various options and the problems they solve, along with our recommendations. The best ML applications are trained with the largest amount of dataBut first, keep in mind an important concept: the quality of your ML model improves with the size of your data. Dramatic ML performance and accuracy are driven by improvements in data size, as shown in the graph below. This is a text model, but the same principles hold for all kinds of ML models.The unreasonable effectiveness of data Deep Learning scaling is predictable, empiricallyThe X axis represents the size of the data set and the Y axis is the error rate. As the size of the data set increases, the error rate drops. But notice something critical about the size of the data set — the x-axis is2^20, 2^21, 2^ 22, etc. In other words, each new tic here is a doubling of the data set size. To get a linear decrease in your error rate you need to exponentially increase the size of your data set. The blue curve in the graph represents a slightly more sophisticated ML model than the orange curve. Suppose you are deciding between two choices: create a better model or double the data set size. Assuming that these two choices cost the same, it’s better to keep gathering more data. It’s only when improvements due to data size increases start to plateau that it becomes necessary to build a better model. Secondly, ML systems need to be retrained for new situations. For example, if you have a recommendation system in YouTube and you want to provide recommendations in Google Now, you can’t use the same recommendations model. You have to train it in the second instance on the recommendations you want to make in Google Now. So even though the model, the code, and the principles are the same, you have to retrain the model with new data for new situations. Now, let’s combine these two concepts: you get a better ML model when you have more data, and an ML model typically needs to be retrained for a new situation. You have a choice of either spending your time building an ML model or buying a vendor’s off-the-shelf model. To answer the question of whether to buy or whether to build, first determine if the buyable model is solving the same problem that you want to solve. Has it been trained on the same input and on similar labels? Let’s say you’re trying to do a product search, and the model has been trained on catalog images as inputs. But you want to do a product search based on users’ mobile phone photographs of the products. The model that was trained on catalog images won’t work on your mobile phone photographs, and you’d have to build a new model. But let’s say you’re considering a vendor’s translation model that’s been trained on speeches in the European Parliament. If you want to translate similar speeches, the model works well as it uses the same kind of data. The next question to ask: does the vendor have more data than you do? If the vendor has trained their model on speeches in the European Parliament but you have access to more speech data than they have, you should build. If they have more data, then we recommend buying their model. Bottom line: buy the vendor’s solution if it’s trained on the same problem and has access to more data than you do. Technology stack for common ML use casesIf you need to build, what is the technology stack you need? What are the skills your people need to develop? This depends on the type of problem you are solving.  There are four broad categories of ML applications: predictive analytics, unstructured data, automation, and personalization. The recommended technology stack for each is slightly different. Predictive analyticsPredictive analytics includes detecting fraud, predicting click-through rates, and forecasting demand. Step one: build an enterprise data warehouseHere, your data set is primarily structured data, so our recommended first step is to store your data in an enterprise data warehouse (EDW). Your EDW is a source of training examples and product histories tracked over time, and can break down silos and gather data from throughout your organization.Step two: get good at data analyticsNext, you’d build a data culture, get skilled at data analytics, start to build dashboards, and enable data-driven decisions. At this point, you have all of the data and you know which pieces are trustworthy. Step three: build MLFrom your EDW, you can build your models using SQL pipelines. We recommend using BigQuery ML when doing ML with the data in your EDW. If you want to build a more sophisticated model, you can train TensorFlow/Keras models on BigQuery data. A third option is AutoML tables for state-of-the-art accuracy and for building online microservices.Unstructured dataExamples of how our customers use ML to gain insights from unstructured data include annotating videos, identifying eye diseases, and triaging emails. Unstructured data can include videos, images, natural language, and text. Deep learning has revolutionized the way we do ML on unstructured data, whether you’re looking at language understanding, image classification, or speech-to-text. For unstructured data, the models you use will  employ deep learning. Here, the ROI heavily favors using AutoML. The amount of time that you’d spend trying to create a new ML model from scratch is almost never worth it. You can spend your money more effectively collecting more data than trying to get a slightly better model. Regardless of the type of unstructured data, our recommendation is to use AutoML for small and medium size data sizes.But AutoML has a limit to scale. At some point, the size of your data set is going to be so large that architecture search is going to get really expensive. At that point, you may want to go to a best-of-breed model with custom retraining from TensorFlow Hub, for example.  If you have data sets that are in the millions of examples, you can build your own custom neural network (NN) architectures. But determine if your data set size has started to plateau, by plotting a graph similar to the one at the top of this post. Build a custom NN architecture only after you’ve plateaued, where increasing amounts of data won’t give you a better model. AutomationSome examples of how customers are using ML for automation include scheduling maintenance, counting retail footfall, and scanning medical forms. The key thing to keep in mind as you pick a technology stack for these problems is that you’re not building just one ML model. If you want to schedule maintenance orwant to reject transactions, for example, you’ll need to train multiple linked models. Instead of individual models, think in terms of ML pipelines, which you can orchestrate using all of the technologies already mentioned. Then you have three choices for operationalizing, with three levels of sophistication.Vertex AI has turnkey serverless training and batch/online predictions. This is what is recommended for a team of data scientists. .Deep Learning VM Image, Cloud Run, Cloud Functions or Dataflow feature customized training and batch/online predictions. This is what is recommended if the team consists of  data engineers and  scientists.Vertex AI Pipelines are fully customizable and recommended for organizations with separate ML engineering and data science teams.When doing automation, the individual models that you chain together into a pipeline will be a mix – some will be prebuilt, some will be customized, and others will be built from scratch. Vertex AI, by providing a unified interface for all these model types, simplifies the operationalization of these models.PersonalizationML application examples of personalization include customer segmentation, customer targeting, and product recommendations. For personalization, we again recommend using an EDW, because customer segmentation uses structured marketing data. For product recommendations, you will similarly have prior purchases and web logs in your EDW., You can power clustering applications, or recommendation systems like matrix factorization, and create embeddings directly from your EDW for sophisticated recommendation systems.For specific use cases, choose the technology stack based on your data size and scope. Start with BigQuery ML for its quick, easy matrix factorization approach. Once your application proves viable and you want a slightly better accuracy, then try AutoML recommendations. But once your data set grows beyond the capabilities of AutoML recommendations, consider training your own custom TensorFlow and Keras models. To summarize, successful ML starts with the question, “Do I build or do I buy?” If an off-the-shelf solution exists that was trained with similar data and with access to more data than you have, then buy it. Otherwise build it, using the technology stack recommended above for the four categories of ML applications.Learn more about our artificial intelligence (AI) and ML solutions and check out sessions from our Applied ML Summit on-demand.Related ArticleWhy you need to explain machine learning modelsWhy explainable AI (XAI) is essential to widespread AI adoption, common XAI methods, and how Google Cloud can help.Read Article
Quelle: Google Cloud Platform

Creating a unified analytics platform for digital natives

Digital native companies have no shortage of data, which is often spread across different platforms and Software-as-a-service (SaaS) tools. As an increasing amount of data about the business is collected, democratizing access to this information becomes all the more important. While many tools offer in-application statistics and visualizations, centralizing data sources for cross-platform analytics allows everyone at the organization to get an accurate picture of the entire business. With Firebase, BigQuery and Looker, digital platforms can easily integrate disparate data sources and infuse data into operational workflows – leading to better product development and increased customer happiness.How it worksIn this architecture, BigQuery becomes the single source of truth for analytics, receiving data from various sources on a regular basis. Here, we can take use of the broad Google ecosystem to directly import data from Firebase Crashlytics, Google Analytics, Cloud Firestore and query data within Google Sheets. Additionally, third party datasets can be easily pushed into BigQuery with data integration tools like FiveTran. Within Looker, data analysts can leverage pre-built dashboards and data models, or LookML, through source-specific Looker Blocks. By combining these accelerators with custom, first party LookML models, analysts can join across the data sources for more meaningful analytics. Using Looker Actions, data consumers can leverage insights to automate workflows and improve overall application health.The architecture’s components are described below:Data sourceNameDescriptionGoogle data sourcesGoogle Analytics 4Tracks customer interactions in your applicationFirebase CrashlyticsCollects and organizes Firebase application crash informationCloud Firestore Backend database for your Firebase applicationGoogle SheetsSpreadsheet service that can be used to collect manually entered, first party data Third-party data sourcesCustomer Relationship Management Platform (CRM)Manages customer data. (While we use Salesforceas a reference, the same ideas can be applied to other tools)Issue tracking or Project Management softwareCan help product and engineering teams track bug fixes and new feature development in applications. (While we use JIRA as a reference, the same ideas can be applied to other tools)Customer support software or a help deskA tool that organizes customer communications to help businesses respond to customers more quickly and effectively. (While we use Zendesk as a reference, the same ideas can be applied to other tools)Cross-functional analyticsWith the various data sources centralized into BigQuery, members across different teams can use the data to make informed decisions. Executives may want to combine business goals from a Google Sheet with CRM data to understand how the organization is tracking towards revenue goals. In preparation for board or team meetings, business leaders can use Looker’s integrations with Google Workspace, to send query results into Google Sheets and populate a chart inside a Google Slide deck. Technical program managers and site reliability engineers may want to combine Crashlytics, CRM and customer support data to prioritize bugs in the application that are affecting the highest value customers, or are often brought up inside support tickets. Not only can these users easily link back to the Crashlytics console for deeper investigation into the error, they can also use Looker’s JIRA action to automatically create JIRA issues based on thresholds across multiple data sources. Account and customer success managers (CSMs) can use a central dashboard to track the health of their customers using inputs like usage trends in the application, customer satisfaction scores and crash reports. With Looker alerts, CSMs can be immediately notified of problems with an account and proactively reach out to customer contacts.Getting startedTo get started creating a unified application analytics platform, be sure to check out our technical reference guide. If you’re new to Firebase you can learn more here.To get started with BigQuery, check out the BigQuery Sandbox and these guides. For more information on Looker, sign up for a free trial here. Related ArticleSpring forward with BigQuery user-friendly SQLThe newest set of user-friendly SQL features in BigQuery are designed to enable you to load and query more data with greater precision, a…Read Article
Quelle: Google Cloud Platform

Rubin Observatory offers first astronomy research platform in the cloud

This week, the Vera C. Rubin Observatory is launching the first preview of its new Rubin Science Platform (RSP) for an initial cohort of astronomers. The observatory, which is located in Chile but managed by the U.S. National Science Foundation’s NOIRLab in Tucson, AZ and SLAC in California, is jointly funded by the NSF and the U.S. Department of Energy. The platform provides an easy-to-use interface to store and analyze the massive datasets of the Legacy Survey of Space and Time (LSST), which will survey a third of the sky each night for ten years, detecting billions of stars and galaxies, and millions of supernovae, variable stars, and small bodies in our Solar System.The LSST datasets are unprecedented in size and complexity, and will be far too large for scientists to download to their personal computers for analysis. Instead, scientists will use the RSP to process, query, visualize, and analyze the LSST data archives through a mixture of web portal, notebook, and other virtual data analysis services. An initial launch with simulated data, called Data Preview 0, builds on the Rubin Observatory’s three-year partnership with Google to develop an Interim Data Facility (IDF) on Google Cloud to prototype hosting of the massive LSST dataset. This agreement marks the first time a cloud-based data facility has been used for an astronomy application of this magnitude.Bringing the stars to the cloudFor Data Preview 0, the IDF leverages Cloud Storage, Google Kubernetes Engine (GKE), and Compute Engine to provide the Rubin Observatory user community access to simulated LSST data in an early version of the RSP. The simulated data were developed over several years by the LSST Dark Energy Science Collaboration to imitate five years of an LSST-like survey over 300 square degrees of the sky (about 1,500 times the area of the moon). The resulting images are very realistic: they have the same instrumental characteristics, such as pixel size and sensitivity to photons, that are expected from the Rubin Observatory’s LSST Camera, and they were processed with an early version of the LSST Science Pipelines that will eventually be used to process LSST data. “This will be the first time that these workloads have ever been hosted in a cloud environment. Researchers will have an opportunity to explore an early version of this platform,” says Ranpal Gill, senior manager and head of communications at the Rubin Observatory.Broadening access for more researchersOver 200 scientists and students with Rubin Observatory data rights were selected to participate in Data Preview 0 from a pool of applicants that represents a wide range of demographic criteria, regions, and experience level. Participants will be supported with resources such as tutorials, seminars, communication channels, and networking opportunities—and they will be free to pursue their own science at their own pace using the data in the RSP. “The revolutionary nature of the future LSST dataset requires a commensurately innovative system for data access and analysis paired with robust support for scientists,” says Melissa Graham, lead community scientist for the Rubin Observatory and research scientist in the astronomy department at the University of Washington. “I’m personally excited to enhance my own skills by using the RSP’s tools for big data analysis, while also helping others to learn and to pursue their LSST-related science goals during Data Preview 0.” At the same time, the fact that the RSP is hosted in the cloud provides researchers at smaller institutions access to state-of-the-art astronomy infrastructure that is comparable to that of the largest national research centers.The launch benefits the observatory too: the development team can learn what researchers are interested in while also testing and debugging the platform. Graham says that “the platform is still in active development so researchers using it will be able to follow along in the progress, and provide feedback on ways that we can optimize the development of the tools.”Next stepsThe LSST aims to begin the ten-year survey in 2023-24 and expects it to include 500 petabytes of data. Through the cloud, Google aims to help make this extraordinary project scalable and accessible to researchers everywhere. To learn more about Data Preview 0, watch this video.Want to ramp up your own research in the cloud? We offer research credits to academics using Google Cloud for qualifying projects in eligible countries. You can find our application form on Google Cloud’s website or contact our sales team.Related ArticleGoogle Cloud fuels new discoveries in astronomyHigh-performance computing and machine learning are accelerating research in the science of astronomy, and we’re excited to highlight new…Read Article
Quelle: Google Cloud Platform

New in Google Cloud VMware Engine: autoscaling, Mumbai expansion, etc.

We’ve made several updates to Google Cloud VMware Engine in recent weeks—today’s post provides a recap of our latest milestones. Google Cloud VMware Engine delivers an enterprise-grade VMware stack running natively in Google Cloud. This cloud service is one of the fastest paths to the cloud for VMware workloads without making changes to existing applications or operating models across a variety of use-cases. These include rapid data center exit, application lift and shift, disaster recovery, virtual desktop infrastructure, or modernization at your own pace.In fact, Mitel, a global provider of unified communications-as-a-service to 70 million business users across 100 countries, migrated 1,000 VMware instances to Google Cloud VMware Engine in less than 90 days and improved its monthly operational output four times. In our last update, we focused on several innovative capabilities around networking, reach, and scale. Let us take a look at the highlights we released since our last installment.Fast provisioning of a dedicated, intrinsically secure VMware private cloudWith Google Cloud VMware Engine, you can spin up a VMware private cloud in about 30 minutes. You can also scale your VMware-based infrastructure on-demand with dedicated hosts located in secure Google data centers. Let us look at what’s new:Autoscale: The ability to elastically and programmatically manage infrastructure resources to align with business needs or what is called “right-sizing” is a core capability of an IaaS platform. With autoscale, Google Cloud VMware Engine users can leverage policy-driven automation to scale the nodes needed to meet the compute demands of the VMware infrastructure. Autoscale:Addresses seasonal spikes in demand, gradual increases of utilization, or new projects being onboarded or expanded due to disaster recovery events. Analyzes the CPU, memory, and storage utilization to give you the controls to scale Google Cloud VMware Engine nodes up or down. Ensures that storage consumption does not exceed the recommended limits for maintaining the Google Cloud VMware Engine service-level agreement. Reduces overhead on IT teams by automating capacity monitoring and enabling sufficient availability of resources based on thresholds. Note that safeguards for maintaining minimum capacity and maximum capacity can be configured to ensure there are boundaries to the automation.Learn how to set up Autoscale.Mumbai region availability Google Cloud VMware Engine is now available in the Mumbai region. This brings the availability of the service to 12 regions globally, enabling our multi-national and regional customers to leverage a VMware-compatible infrastructure-as-a-service platform on Google Cloud. For more details, please read the press release.Enterprise-grade infrastructureWith 99.99% availability for a cluster in a single zone, fully dedicated 100 Gbps east-west networking with no oversubscription, and all nonvolatile memory express storage, Google Cloud VMware Engine provides the highest performance required for the most demanding workloads. Let us look at what’s new:Preview – Google Cloud KMS integration: You already have the ability to bring your own keys to encrypt your vSAN datastores. With this new capability, organizations that want to eliminate the overhead of managing external key providers can leverage a Google managed key provider, using Cloud KMS. This brings increased flexibility in securing workloads and data by enabling vSAN encryption by default for newly instantiated VMware Private Clouds. This feature is currently in Preview. HIPAA compliance: Since April, Google Cloud VMware Engine is Health Insurance Portability and Accountability Act (HIPAA) compliant. This opens the service up to healthcare organizations, that can now migrate and run their HIPAA-compliant VMware workloads in a fully compatible VMware Cloud Verified stack running natively in Google Cloud with Google Cloud VMware Engine, without changes or re-architecture to tools, processes, or applications. Read more in this blog.NSX-T support for Active Directory: With NSX-T support for Active Directory, you can now leverage your on-premises Active Directory as one of the lightweight directory access protocol identity sources for user authentication into NSX-T manager. This extends the theme of being able to leverage your on-premises tools with Google Cloud VMware Engine. For more information, read the documentation on how to set up identity sources.vSAN TRIM/UNMAP support: For space-efficiency, vSAN allows creating thin-provisioned disks that grow gradually as they are filled with data. However, files that are deleted within the guest operating system (OS) do not result in vSAN freeing up space allocated. To increase space efficiency, guest OS file systems have the ability to reclaim capacity that is no longer used, using TRIM/UNMAP commands. vSAN is fully aware of these commands that are sent from the guest OS and enables reclamation of previously allocated storage as free space. We have enabled TRIM/UNMAP for vSan by default in Google Cloud VMware Engine.Simplicity in experience and operationsWith Google Cloud VMware Engine, you only need to worry about your workloads—not patching, upgrading, and updating the solution layer, for fewer interoperability issues and infrastructure maintenance. IIn addition, we have pre-built service accounts to enable your third-party VMware-supported tools and solutions to work seamlessly in VMware Engine. Access to Google services privately over local connections is also natively supported, enabling enrichment of existing applications and modernization over time. Finally, this service brings the power of Google Cloud Virtual Private Cloud (VPC) design by natively providing multi-VPC, multi-region networking that’s unique. Let’s look at what’s new:Dashboards for Day 2 operations: To speed up cloud transformation and enable efficiency, Google Cloud VMware Engine administrators can take advantage of Cloud Operations dashboards for the solution. In addition, administrators can create custom policies through cloud alerting and enable notifications via channels of their choice (SMS, email, Slack, and more). For more details on how to set up cloud monitoring, please refer to Setting up Cloud Monitoring. For the latest updates, bookmark Google Cloud VMware Engine release notes.Thanks to Manish Lohani, Product Management, Google Cloud; Nargis Sakhibova, Product Management, Google Cloud; and Wade Holmes, Solutions Management, Google Cloud; for their contributions to this blog post.Related ArticleRetire your tech debt: Move vSphere 5.5+ to Google Cloud VMware EngineMigrating your legacy VMware vSphere environment to Google Cloud VMware Engine can be a quick and easy way to get your systems back into …Read Article
Quelle: Google Cloud Platform