Think big: Why Ricardo chose Bigtable to complement BigQuery

With over 3.7 million members, Ricardo is the most trusted, convenient, and largest online marketplace in Switzerland. We successfully migrated from on-prem to Google Cloud in 2019, a move that also raised some new use cases that we were keen to solve. With our on-premises data center closing, we were under deadline to find a solution for these use cases, first looking at our data stream process. We found a solution using both Cloud Bigtable and Dataflow from Google Cloud. Here, we take a look at how we decided upon and implemented that solution, as well as look at future use cases on our roadmap. Exploring our data use cases For analytics, we had originally used the Microsoft SQL data warehouse, and had decided to switch to BigQuery, Google Cloud’s enterprise data warehouse. That meant that all of our workloads had to be pushed there as well, so we chose to run the imports and the batch loads from Kafka into BigQuery through Apache Beam. We also wanted to give internal teams the ability to perform fraud detection work through our customer information portal, to help protect our customers from the sale of fraudulent goods or from actors using stolen identities. Also, our engineers had to work quickly to address how to move our two main streams of data that had been stored in separate systems. One is for articles—essentially, the items for sale posted to our platform. The other is for assets, which contain the various descriptions of the articles. Before, we’d insert the streams into BigQuery, and then do a JOIN. One of the challenges is that Ricardo has been around for quite some time, so we sometimes have an article that hasn’t been stored since 2006, or gets re-listed, so it may miss some information in the asset stream. One problem, which solution?Doing research into solving our data stream problem, I came across a Google Cloud blog that provided a guide to common use patterns for Dataflow (Google Cloud’s unified stream and batch processing service), with a section on Streaming Mode Large Lookup Tables. We have a large lookup table of about 400 GB with our assets, in addition to our article stream. But we needed to be able to look up the asset for an article. The guide suggested that a column-oriented system could answer this kind of query in milliseconds, and could be used in a Dataflow pipeline to both perform the lookup and update the table. So we explored two options to solve the use case. We tried out a prototype with Apache Cassandra, the open-source, wide column store, NoSQL database management system, which we can stream into from BigQuery using Apache Beam to preload it with the historical data. We built a new Cassandra cluster on Google Kubernetes Engine (GKE), using the CASS Operator, released by Datastax as open source. We created an index structure, optimized the whole thing, did some benchmarks, and happily found that everything worked. So we had the new Cassandra cluster, the pipeline was consuming assets and articles, and the assets were looked up from the Cassandra store where they were also stored. But what about day-to-day tasks and hassles of operations? Our Data Intelligence (DI) team needs to be completely self-sufficient. We’re a small company, so we need to move fast, and we don’t want to build a system that quickly becomes legacy. We were already using and liking the managed services of BigQuery. So using Bigtable, which is a fully managed, low-latency, wide column NoSQL database service, seemed like a great option. A 13 percent net cost savings with BigtableIn comparison to Bigtable, Cassandra had a strike against it in the area of budgeting. We found that Cassandra needed three nodes to secure the availability guarantees. With Bigtable, we could have a fault-tolerant data pipeline/Apache Beam pipeline running on Apache Flink. We could also have fault tolerance in the case of low availability, so we didn’t need to run the three nodes. We were able to schedule 18 nodes when we ingested the history from BigQuery into Bigtable, for the lookup table, but as soon as the lookup table was in, we could scale down to two or one node, because it can handle 10,000 requests per second guaranteed. Bigtable takes care of availability and durability behind the scenes and so it supplies guarantees even with one node.With this realization, it became quite clear that the Bigtable solution was easier to manage than Cassandra, and it was also more cost-effective. As a small team, when we factored in the ops learning costs, the downtime, and the tech support needed for the Cassandra-on-GKE solution, it was already more available to use one TB in a Bigtable instance to start out with, versus the Cassandra-on-GKE solution with three times the E2 node cluster, which is pretty small, at an 8 CPU GM. Bigtable was the easier, faster, and less expensive answer. By moving such lookup queries to Bigtable, we ultimately saved 13 percent in BigQuery costs. (Keep in mind that these are net savings, so the additional cost for running Bigtable is already factored in.)As soon as this new solution lifted off, we moved another workload to Bigtable where we integrated data from Zendesk tickets for our customer care team. We worked on integrating the customer information, making it available in Bigtable to have the product key lookup linked with the Zendesk data so that this information could be presented to our customer care agents instantly. Benefiting from the tight integration of Google Cloud tools  If you’re a small company like ours, building out a data infrastructure where the data is highly accessible is high priority. For us, Bigtable is our store where we have processed data available to be used by services. The integration of the services between Bigtable, BigQuery, and Dataflow makes it so easy for us to make this data available. One of the other reasons we found the platform on Google Cloud to be superior is because with Dataflow and BigQuery, we can make quick adjustments. For example, one morning, thinking about an ongoing project, I realized we should have reversed the article ID—it should have a reverse string instead of a normal string to prevent hotspotting. To do that, we could quickly scale up to 20 Bigtable nodes and 50 Dataflow workers. Then the batch jobs read from BigQuery and wrote to the newly created schema with Bigtable, and it was all done in 25 minutes. Before Bigtable, this kind of adjustment would have taken days to complete.Bigtable’s Key Visualizer opens up opportunitiesThe idea to reverse the article ID came to me as I thought about the Key Visualizer from Bigtable, which is so nicely done and easy to use compared to our previous setup. It’s tightly integrated, but easy to explain to others. We use SSD nodes and the only configuration we need to worry about is the number of nodes, and if we want to have a replication or not. It’s like a volume on a stereo—and that was really mind-blowing, too. The speed of scaling up and down is really fast, and with Dataflow, it doesn’t drop anything, you don’t have to pre-warm anything, you can just schedule it and share it while it’s running. We haven’t seen ease of scaling like this before.Considering future use cases for BigtableFor future cases, we’re working on improvements to our fraud detection project involving machine learning (ML) that we hope to move to Bigtable. Currently we have a process, triggered every hour by Airflow in Cloud Composer, that takes the data from BigQuery for the last hour, and then runs over the data, with the Python container executing with the model that is being loaded and that takes the data as an input. If the algorithm is 100 percent sure the article is fraudulent, it would block the product, which would require a manual request from customer care to unblock. If the algorithm is less certain, it would go into a customer care inbox and get flagged, where the agents would check it.What’s currently missing in the process is an automated feedback loop, a learning adjustment if the customer care agent replies, “This is not fraud.” We could actually write out some code to perform the action, but we need a faster solution. It would make more sense to source this in the pipe directly from Bigtable for the learning models. In the future, we’d also like to have the Dataflow pipeline writing to BigQuery and Bigtable at the same time for all of the important topics. Then, we could source for these kinds of use cases and serve them directly from Bigtable instead of BigQuery, making them soft “real time.”With the 13 percent savings in BigQuery costs, and the tight integration of all the Google Cloud managed services like Bigtable, our small (but tenacious) DI team is free from the hassles of operations work on our data platform. We can devote that time to developing solutions for these future use cases and more. See what’s selling on Ricardo.ch. Then, check out our site for more information about the cloud-native key value store Bigtable.
Quelle: Google Cloud Platform

Top 5 trends for API-powered digital transformation in 2021

Our “State of the API Economy 2021” report confirms that though digital transformation has been among enterprises’ top business imperatives for years, the COVID-19 pandemic and changing market conditions have increased this urgency. Organizations across the world weathered the pandemic by compressing years of digital transformation into just a few months. Our research reflects this urgency: in response to the pandemic and its rippling effect on business in 2020, nearly three in four organizations continued their digital transformation investments. Two-thirds of those companies are increasing investments or completely evolving their strategies to become digital-first companies.Digital  transformation relies on an organization’s ability to package its services, competencies, and assets into modular pieces of software that can be repeatedly leveraged. Every company in the world already has valuable data and functionality housed within its systems. Capitalizing on this value, however, means liberating it from silos and making it interoperable and reusable in different contexts—including by combining it with valuable assets from partners and other third parties.APIs enable these synergies by letting developers easily access and combine digital assets in different systems, even if those systems were never intended to interoperate. In their most basic form, APIs are how software talks to software, but if the APIs are designed with the developer experience in mind, rather than just as bespoke integration projects, they become extremely powerful, enabling developers to repeatedly leverage data and functionality for new apps and automations.As part of creating the “State of API Economy 2021” research report, we surveyed over 700 IT executives around the globe to identify key trends about how they are responding to the pandemic and its rippling effects on business. Our findings identified five key trends in 2021 for API-first digital transformation:1. Increasing SaaS and Hybrid Cloud-based API DeploymentsWhen asked about future areas of technology focus and investment, one in two respondents reported increasing SaaS use to administer workloads, as well as an increase in hybrid cloud adoption —both areas in which APIs are crucial tools. APIs serve a variety of use cases from connecting internal applications to enabling digital ecosystem strategies, so it’s no surprise that many organizations are choosing to leverage a mix of on-premises and cloud infrastructure to host those APIs.2. Analytics Expand Competitive AdvantageWhile deploying APIs helps companies develop their digital presence, measuring API performance is key to optimizing their use and illuminating further routes to innovation. When our survey respondents were asked how APIs at the company are currently measured, top responses included metrics focused on API performance, those focused on traditional IT-centric numbers, and those focused on consumption of APIs. But when asked about preference for API measurement, business impact—including Net Promoter Score (NPS) and speed-to-market—tops the list.Leading businesses use API analytics to not only inform new strategies but also align leadership goals and outcomes. Because executive sponsors tend to support tangible results (like an API that’s attracting substantial developer attention or accelerating delivery of new products), teams can use API metrics to effectively unite leaders around digital strategies and justify continued platform-level funding for the API program. 3. AI and ML – Powered API Management is Gaining TractionWhile some aspects of API security and management are as straightforward as applying authentication mechanisms to control access or applying rate limits when API calls exceed a certain limit (such as during a DDoS attack), artificial intelligence (AI) and machine learning (ML) are emerging as important ways for organizations to bolster their API management and security capabilities.It’s no wonder that new, AI- and ML-powered API security and monitoring solutions are gaining widespread adoption to help companies detect and block malicious attacks. In fact, usage for anomaly detection, bot protection, and security analytics grew 230% year-over-year among Apigee customers between September 2019 and September 2020.4. API Ecosystems Are Innovation DriversAPIs are the backbone of digital business ecosystems that encompass networks of partners, developers, and customers. These ecosystems may be composed entirely of internal parties (i.e., developers within an organization) or may include external individuals and organizations, such as suppliers, third-party providers, contractors, customers, developers, regulators, or even competitors.Our research found that while companies of all API maturity levels are likely to be focused on speeding up development of new applications and connecting internal applications, high-maturity organizations are significantly more likely to focus on developing a developer ecosystem or B2B partner ecosystem around their APIs.5. API Security and Governance More Important Than EverIn 2020, virtually all industries, from retail and manufacturing to finance and hospitality, shifted how they do business, with a focus on digital maturity coming into sharp relief in the wake of COVID-19. Customer, employee, and partner-facing operations all moved to digital mediums. While this created new opportunities for innovation, it also exacerbated the difficulty of keeping up with growing security threats by opening up more avenues for hackers to access sensitive data.When designed and managed properly, APIs provide business optionality to control access to digital assets, to combine old systems with new technologies, and to empower developers to experiment, innovate, and react to changing customer needs. But APIs exposed without the proper controls, security protections, developer considerations, and visibility mechanisms can become a liability that puts corporate and customer data at risk. Our research demonstrates increased investment in security and governance remain top-of-mind to enable enterprises better leverage and protect their digital assets.Even as 2021 begins with carryover from many of 2020’s challenges, looking ahead, enterprises should start thinking beyond digital transformation for the coming decade and strive to achieve digital excellence. Businesses will need to leverage advanced cloud capabilities around security, global reach, access management, and artificial intelligence to support the growing global digital ecosystems. With well-designed and managed APIs, enterprises can help ensure that they’re able to adapt their business from one disruption to the next. And, at Google, we are here to partner with you to help achieve your digital excellence.Want to learn more? The “State of API Economy 2021” report describes how digital transformation initiatives evolved throughout 2020, as well as where they’re headed in the years to come. This report is based on Google Cloud’s Apigee API Management Platform usage data, Apigee customer case studies, and analysis of several third-party surveys conducted with technology leaders from enterprises with 1,500 or more employees, across the United States, United Kingdom, Germany, France, South Korea, Indonesia, Australia, and New Zealand. Read the full report
Quelle: Google Cloud Platform

Loading complex CSV files into BigQuery using Google Sheets

Building an ELT pipeline using Google Sheets as an intermediaryBigQuery offers the ability to quickly import a CSV file, both from the web user interface and from the command line:Limitations of autodetect and importThis works for your plain-vanilla CSV files, but can fail on complex CSV files. As an example of a file it fails on, let’s take a dataset of New York City Airbnb rentals data from Kaggle. This dataset has 16 columns, but one of the columns consists of pretty much free-form text. This means that it can contain emojis, new line characters, …Indeed, try to open this file up with BigQuery:and we get the errors like:This is because a row is spread across multiple lines, and so the starting quote on one line is never closed. This is not an easy problem to solve — lots of toolsstruggle with CSV files that have new lines inside cells.Sheets to the rescueGoogle Sheets, on the other hand, has a much better CSV import mechanism. Open up a Google Sheet, import the CSV file and voila …The cool thing is that by using a Google Sheet, you can do interactive data preparation in the Sheet before loading it into BigQuery.First, delete the first row (the header) from the sheet. We don’t want that in our data.ELT from a Google SheetOnce it is in Google Sheets, we can use a handy little trick — BigQuery can directly query Google Sheets! To do that, we define the Google Sheet as a table in BigQuery:Steps from the BigQuery UISelect a dataset and click on Create TableSelect Drive as the source, specify the Drive URL to the Google SheetSet Google Sheet as the file formatGive the table a name. I named it airbnb_raw_googlesheetSpecify the schema:This table does not copy the data from the sheet — it queries the sheet live.So, let’s copy the data as-is into BigQuery (of course, we could do some transformation here as well):How to automateYou can automate these steps:Here’s an article on how toread a CSV file into Sheets using PythonFrom then on, usedataform.co or BigQuery scripts to define the BigQuery table and do the ELT.To import complex CSV files into BigQuery, build an ELT pipeline using Google Sheets as an intermediary. This allows you to handle CSV files with new lines and other special characters in the columns. Enjoy!Related Article[New blog series] BigQuery explained: An overviewOur new blog series provides an overview of what’s possible with BigQuery.Read Article
Quelle: Google Cloud Platform

How Cloud Operations helps users of Wix’s Velo development platform provide a better customer experience

With more and more businesses moving online, and homegrown entrepreneurs spinning up new online apps, they’re increasingly looking for an online development platform to help them easily build and deploy their sites. Many choose Velo by Wix because it’s an open web development platform with an intuitive visual builder that accelerates front-end development and comes with a number of benefits including a robust serverless architecture, integrated database management, and access to a host of built-in Wix business solutions.But building a great app is only part of the job, you also need to ensure that it runs smoothly and provides the best user experience possible. To make this happen, we’ve collaborated with Wix to bring Google Cloud operations suite—formerly known as Stackdriver—to Velo to monitor, troubleshoot and improve the performance of applications built in their online environments.How customers are using Wix’s Velo and Cloud OperationsA number of online businesses are already using services from Cloud operations suite to help ensure a consistent online experience for their apps. Here are just two examples.PostSomeJoyCreated in the UK during the pandemic in April 2020, PostSomeJoy makes it easy for users to send unique postcards to their loved ones. They built their site on Wix’s Velo and use Cloud operations integration to aid a better customer experience and postcard delivery. The site provides a wide variety of photos and imagery for users to choose when they send their postcards. To do this, they use Cloudinary as their media management tool, accessed from their dashboard through an API call executed by Velo. Cloud Logging helps them troubleshoot the root cause of any importing issues with their images so they can continuously provide a great user experience. Local HoopsLocal Hoops is a Seattle-based elite basketball training academy for children ages 6-18. In March 2020, the organization used Velo to create a full virtual academy with a Members Login to allow athletes to continue their training at home during the pandemic. Local Hoops have created this virtual academy with a range of membership levels, and are using the app to deliver personalized workout plans and videos to their users. But sometimes things go wrong, and Operations logs these issues by error type, such as whether the user was not a member or at the wrong membership level to access certain content, so they can understand how best to resolve. By quickly reviewing and acting on error logs, Local Hoops can resolve customer issues faster, resulting in a better user experience.Thanks to Velo, we were able to move our training from the court to online and keep these young athletes active and motivated. Google Cloud Operations was monumental in the smooth operation of this. Kelly Edwards, Founder, Local HoopsNspect.ioFounded in the Czech Republic, Nspect.io is a cyber security software and services company that provides penetration testing services to ensure that IP addresses open to the Internet are more secure and less vulnerable to cyber attacks. They’ve built their platform with Wix’s Velo, and Cloud operations suite has been instrumental in helping them improve their customer experience. When their customers order software or security testing, a custom dashboard created in the Cloud Operations suite helps them provide their customers with the status of their order—from the start of an order, to approval and provisioning, and finally through to billing and completion. When Nspect.io sends emails, Cloud Logging helps them log any modifications to email sends and then records success or failure. This is important because logging is the only way Nspect.io developers can debug and understand failures. With the help of Cloud operations suite, Nspect.io has been able to scale at pace.Learn moreUsing Velo by Wix with Cloud operations suite helps businesses get online faster and respond to issues quickly, setting them up for success.Learn more about using Cloud operations suite with Wix’s Velo.
Quelle: Google Cloud Platform

2021 resolutions: Kick off the new year with free Google Cloud training

Tackle your New Year’s resolutions with our new skills challenge, which will provide you with no cost training to build cloud knowledge and an opportunity to earn Google Cloud skill badges to showcase your cloud competencies. There are 4 initial tracks in the skills challenge: Getting Started, Data Analytics, Hybrid and Multi-cloud, Machine Learning (ML) and Artificial Intelligence (AI). To begin, sign up for the skills challenge you’re interested in most to receive 30 days free access to Google Cloud labs. Each track will give you a chance to earn different skill badges such as the Foundational Infrastructure skill badge or Foundational Data, ML, and AI skill badge, which you can share with your network. To earn a skill badge, complete a series of hands-on labs on Google Cloud labs to learn new cloud skills and take a final assessment challenge lab to test your skills. Read on to find out which track in the skills challenge is best for you. Getting Started trackNew to Google Cloud? Select the Getting Started track and use your 30 days access to Google Cloud labs to demonstrate your core infrastructure skills. You’ll learn how to write cloud shell commands, deploy your first virtual machine, and run applications on Kubernetes. It’s a great place to start for cloud engineers, cloud architects, IT practitioners, or anyone with some cloud computing foundational knowledge.Data Analytics track This track is for data analysts ready to expand their skills into AI and machine learning. You will have a chance to demonstrate your understanding of BigQuery. You’ll learn how to do everything from writing and troubleshooting SQL queries and using Apps Script, to building classification and forecasting models.Hybrid and Multi-cloud trackThis track is for hybrid and multi-cloud architects ready to showcase their skills in managing containers with Google Kubernetes Engine and Anthos. You will also test your skills in security when deploying and managing production environments with Google Kubernetes Engine.ML and AI track This track is for data scientists and machine learning engineers ready to prove their skills with Google Cloud tools like BigQuery, Cloud Speech API, AI Platform, and Cloud Vision API.Registerfor our January 22 webinar for an introduction to Google Cloud, including a walk-through of the Google Cloud Console and a tour of the labs included in the Getting Started track of the skills challenge. Ready to jump into the skills challenge? Sign uphere.Related ArticlePrepare for Google Cloud certification with one free month of new Professional Certificates on CourseraTrain for Google Cloud certifications with one free month of Professional Certificates on CourseraRead Article
Quelle: Google Cloud Platform

Implementing leader election on Google Cloud Storage

Leader election is a commonly applied pattern for implementing distributed systems. For example, replicated relational databases such as MySQL, or distributed key-value stores such as Apache Zookeeper, choose a leader (sometimes referred to as master) among the replicas. All write operations go through the leader, so only a single node is writing to the system at any time. This is done to ensure no writes are lost and the database is not corrupted.It can be challenging to choose a leader among the nodes of a distributed system due to the nature of networked systems and time synchronization. In this article, we’ll discuss why you need leader election (or more generally, “distributed locks”), explain why they are difficult to implement, and provide an example implementation that uses a strongly consistent storage system, in this case Google Cloud Storage.Why do we need distributed locks?Imagine a multithreaded program where each thread is interacting with a shared variable or data structure. To prevent data loss or corrupting the data structure, multiple threads should block and wait on each other while modifying the state. We ensure this with mutexes in a single-process application. Distributed locks are no different in this regard than mutexes in single-process systems.A distributed system working on shared data still needs a locking mechanism to safely take turns while modifying shared data. However, we no longer have the notion of mutexes while working in a distributed environment. This is where distributed locks and leader elections come into the picture.Use cases for leader electionTypically leader election is used to ensure exclusive access by a single node to shared data, or to ensure a single node coordinates the work in a system.For replicated database systems such as MySQL, Apache Zookeeper, or Cassandra, we need to make sure only one “leader” exists at any given time. All writes go through this leader to ensure writes happen in one place. Meanwhile, the reads can be served from the follower nodes.Here’s another example. You have three nodes for an application that consumes messages from a message queue; however, only one of these nodes is to process messages at any time. By choosing a leader, you can appoint a node to fulfill that responsibility. If the leader becomes unavailable, other nodes can take over and continue the work. In this case, a leader election is needed to coordinate the work.Many distributed systems take advantage of leader election or distributed lock patterns. However, choosing a leader is a nontrivial problem.Why is distributed locking difficult?Distributed systems are like threads of a single-process program, except they are on different machines and they talk to each other over the network (which can be unreliable). As a result, they cannot rely on mutexes or similar locking mechanisms that use atomic CPU instructions and shared memory to implement the lock.The distributed locking problem requires the participants to agree on who is holding the lock. We also expect a leader to be elected while some nodes in the system are unavailable. This may sound simple, but implementing such a system correctly can be quite difficult, in part due to the many edge cases. This is where distributed consensus algorithms come into the picture.To implement distributed locking, you need a strongly consistent system to decide which node holds the lock. Because this must be an atomic operation, it requires consensus protocols such as Paxos, Raft, or the two-phase commit protocol. However, implementing these algorithms correctly is quite difficult, as the implementations must be extensively tested and formally proved. Furthermore, the theoretical properties of these algorithms often fail to withstand real-world conditions, which has led to more advanced research on the topic.At Google, we achieve distributed locking using a service called Chubby. Across our stack, Chubby helps many teams at Google make use of distributed consensus without having to worry about implementing a locking service from scratch (and doing so correctly).Cheating a bit: Leveraging other storage primitivesInstead of implementing your own consensus protocol, you can easily take advantage of a strongly consistent storage system that provides the same guarantees through a single key or record. By delegating the responsibility for atomicity to an external storage system, we no longer need the participating nodes to form a quorum and vote on a new leader.For example, a distributed database record (or file) can be used to name the current leader, and when the leader has renewed its leadership lock. If there’s no leader in the record, or the leader has not renewed its lock, other nodes can run for election by attempting to write their name to the record. First one to come will win, because this record or file allows atomic writes.Such atomic writes on files or database records are typically implemented using optimistic concurrency control, which lets you atomically update the record by providing its version number (if the record has changed since then, the write will be rejected). Similarly, the writes become immediately available to any readers. Using these two primitives (atomic updates and consistent reads), we can implement a leader election on top of any storage system.In fact, many Google Cloud storage products, such as Cloud Storage and Cloud Spanner, can be used to implement such a distributed lock. Similarly, open source storage systems like Zookeeper (Paxos), etcd (Raft), Consul (Raft), or even properly configured RDBMS systems like MySQL or PostgreSQL can provide the needed primitives.Example: Leader election with Cloud StorageWe can implement leader election using a single object (file) on Cloud Storage that contains the leader data, and require each node to read that file, or run for election based on the file. In this setup, the leader must renew its leadership by updating this file with its heartbeat.My colleague Seth Vargo published such a leader election implementation – written in Go and using Cloud Storage – as a package within the HashiCorp Vault project. (Vault also has a leader election on top of other storage backends).To implement leader election among distributed nodes of our application in Go, we can write a program that makes use of this package in just 50 lines of code:This example program creates a lock using a file in Cloud Storage, and continually runs for election.In this example, the Lock() call blocks until the calling program becomes a leader (or the context is cancelled). This call may block indefinitely since there might be another leader in the system.If a process is elected as the leader, the library periodically sends heartbeats keeping the lock active. The leader then must finish work and give up the lock by calling the Unlock() method. If the leader loses the leadership, the doneCh channel will receive a message and the process can tell that it has lost the lock, as there might be a new leader.Fortunately for us, the library we’re using implements a heartbeat mechanism to ensure the elected leader remains available and active. If the elected leader fails abruptly without giving up the lock, after the TTL (time-to-live) on the lock expires, the remaining nodes then select a new leader, ensuring the overall system’s availability.Fortunately, this library implements the mentioned details around sending so-called periodic heartbeats, or how frequently the followers should check if the leader has died and if they should run for election. Similarly, the library employs various optimizations via storing the leadership data in object metadata instead of object contents, which is costlier to read frequently.If you need to ensure coordination between your nodes, using leader election in your distributed systems can help you safely achieve there’s at most one node that has this responsibility. Using Cloud Storage or other strongly consistent systems, you can implement your own leader election. However, make sure you are aware of all the corner cases before implementing a new such library.Further reading:Implementing leader election using Kubernetes APILeader election in distributed systems – AWS Builders LibraryLeader election –  Azure Design Patterns LibraryThanks to Seth Vargo for reading drafts of this article. You can follow me on Twitter.
Quelle: Google Cloud Platform

Using machine learning to improve road maintenance

There’s a new way to look out for potholes in the road and it doesn’t involve better eyeglasses or dispatching costly repair crews. Bus-mounted cameras and machine learning can do it for you, as the City of Memphis discovered.    Staying on top of deteriorating roads when you can’t add more personnel is a never ending cycle of patching holes as increasing traffic only worsens the problem.     Google Cloud Partner, SpringML worked with the City of Memphis to tackle this problem, assisting in repairing 63,000 potholes in one year, a massive improvement in pothole detection over previous manual efforts.  Advances in analytics and machine learning are making it possible for authorities to not only fix roads faster but actually prevent damage from occurring in the first place.Memphis Area Transit Authority busUsing machine learning for road maintenance The City of Memphis struggled with a problem many cities have to face: the continuous degradation of paved roads and the formation, through usage and weather, of potholes. These gaps in the road not only frustrate drivers, they slow down traffic, delaying commutes and mass transit, and they lead to greater wear and tear on vehicles. They’re just no good. Potholes are inevitable, so the challenge for Memphis, and other cities, becomes how to keep up, putting repair resources where they can be most helpful. With limited hardware and staff, they can’t tackle every report from citizens. And those public reports don’t always present a full picture of the problem either.Enter SpringML, who partners with public sector customers to solve problems with technology in creative ways. As the SpringML team joined with Memphis to figure this out, they first looked at what sorts of data they could get access to. And voila: bus cameras!“Look for data you already have that can fuel your decision making, before you go out and try to acquire new data sets.” Eric Clark, AI Practice, SpringMLThe city buses in Memphis all have front-mounted cameras, gathering data the entire time that the bus is running, mostly for traffic purposes. Every bus in the city was watching the roads every day! Immediately the team had a treasure trove of data: every road covered by the mass transit system has daily recordings being captured. The bus routes are well defined and each bus has GPS to help correlate the footage with precise locations. The team set to work.At the end of the day they retrieved videos from each bus and uploaded them to on-prem storage —a fairly manual process.Downloading the video data manually from drives that were on the busesBus system IT rack, tracking location and camera data as it travels its routeThen a script checked for new files in the video directory nightly, and uploaded the new videos to Google Cloud Storage, to begin processing.From there the Google Cloud Video Intelligence API could start to work, running its detection model on the new videos to look for possible pothole images. To make the initial pothole detection AI model the SpringML team took existing images and manually picked out potholes. They also used data from higher quality cameras to improve detection and accuracy of the model, and continued to feed new data from the bus routes to improve the model over time. Results from the Video Intelligence model inference were sent to BigQuery, where the images, annotations, file metadata, location and scoring were kept and easily sorted or queried.Some of the data from the BigQuery model, as it outputs pothole location and severityApplication used by public works employees to evaluate potholesThe custom web-app presented possible potholes to public works employees, who could help correct the model when it made mistakes (frequently caused by stains, shadows or animals), or confirm a pothole and then trigger the next automated flow. Once a pothole is detected and confirmed, the team needs a work ticket to track the actual repair. So the web-app submits information about confirmed potholes to the city’s 311 information system, which can then generate a ticket, which will dispatch a work crew and repair vehicle to actually repair the road.The full process of pothole data collection and detectionA smooth road aheadAs well as detecting and fixing potholes faster, this effort has paved the way for future projects that can improve public infrastructure, as more of the data gets gathered and applied to decision making. Want to learn more? Read this Video Intelligence API quickstart to try it out. Listen to the interview with SpringML’s Eric Clark on the GCP Podcast, and check out more machine learning tools in our AI Platform.Related ArticleAnnouncing updates to AutoML Vision Edge, AutoML Video, and Video Intelligence APIWe’re introducing enhancements to our AI vision and video intelligence portfolio to help even more customers take advantage of machine le…Read Article
Quelle: Google Cloud Platform

Compute Engine explained: Scheduling the OS patch management service

Last year, we introduced the OS patch management service to protect your running Compute Engine VMs against defects and vulnerabilities. The service makes patching Linux and Windows VMs with the latest OS upgrades simple, scalable and effective. In this blog, we share a step-by-step guide on how to set up a project with a schedule to automatically patch filtered VM instances, resolve issues if an agent is not detected, and view an overview of patch compliance across your VM fleet.Getting startedImagine an example project with several VM instances hosting a mythical web service. You want to automatically keep the instances updated with the latest critical fixes and security updates against malicious software. You have a production fleet and a development fleet of machines for which you want to apply updates using different schedules. First, enable the service by navigating to GCE > OS Patch Management in the Google Cloud Console. Alternatively, you can also enable Cloud OS Config API and Container Analysis API through the Google Cloud Marketplace, or gcloud:Note that the OS Config agent is most likely already installed on the VM instances and just needs to be enabled via project metadata keys:After the agent collects data across the VM fleet, this data is then displayed on the patch compliance dashboard, which shows the state across all your VMs and operating systems, and displays a bird eye’s view of your patch compliance:You can now see some VM instances that you might like to patch more frequently, for example the CentOS and Red Hat Enterprise Linux (RHEL) fleet. Creating a patch deploymentYou can then click the New Patch Deployment at the top of the screen and walk through the steps to create a patch deployment for the target VMs, each with specific patch configurations and scheduling options.In the Target VMs section, you can use VM Instance name prefixes and labels to target only the VM instances with labels that start with a certain prefix. More instance filtering options are available, including zonal and combinations of label groups.In the Patch config option, you can select to patch RHEL, CentOS and Windows with critical and security patches, or specify exact Microsoft Knowledge Base (KB) numbers and packages to install. You can also exclude specific packages from being installed in the ‘Exclude’ fields.Finally, you can schedule the patch job. For example, here’s how to run the job every second Tuesday of the month for a maximum duration (three-hour maintenance window), from 11 AM to 2PM:After the patch job runs, you can see the result of the installed patches. This information is reported on the compliance dashboard and the VM instances tab:Patch your Compute Engine VMs todayTo learn more about the OS patch management service on Compute Engine including automating patch deployment, visit our documentation page.Related ArticleProtect your running VMs with new OS patch management serviceNew OS patch management service protects your Compute Engine VMsRead Article
Quelle: Google Cloud Platform

Introducing Ruby on Google Cloud Functions

Cloud Functions, Google Cloud’s Function as a Service (FaaS) offering, is a lightweight compute platform for creating single-purpose, stand-alone functions that respond to events, without having to manage a server or runtime environment. Cloud functions are a great fit for serverless, application, mobile or IoT backends, real-time data processing systems, video, image and sentiment analysis and even things like chatbots, or virtual assistants.Today we’re bringing support for Ruby, a popular, general-purpose programming language, to Cloud Functions. With the Functions Framework for Ruby, you can write idiomatic Ruby functions to build business-critical applications and integration layers. And with Cloud Functions for Ruby, now in Preview, you can deploy functions in a fully managed Ruby 2.6 or Ruby 2.7 environment, complete with access to resources in a private VPC network. Ruby functions scale automatically based on your load. You can write HTTP functions to respond to HTTP events, and CloudEvent functions to process events sourced from various cloud and Google Cloud services including Pub/Sub, Cloud Storage and Firestore.You can develop functions using the Functions Framework for Ruby, an open source functions-as-a-service framework for writing portable Ruby functions. With Functions Framework you develop, test, and run your functions locally, then deploy them to Cloud Functions, or to another Ruby environment.Writing Ruby functionsThe Functions Framework for Ruby supports HTTP functions and CloudEvent functions. A HTTP cloud function is very easy to write in idiomatic Ruby. Below, you’ll find a simple HTTP function for Webhook/HTTP use cases.CloudEvent functions on the Ruby runtime can also respond to industry standard CNCF CloudEvents. These events can be from various Google Cloud services, such as Pub/Sub, Cloud Storage and Firestore.Here is a simple CloudEvent function working with Pub/Sub.The Ruby Functions Framework fits comfortably with popular Ruby development processes and tools. In addition to writing functions, you can test functions in isolation using Ruby test frameworks such as Minitest and RSpec, without needing to spin up or mock a web server. Here is a simple RSpec example:Try Cloud Functions for Ruby todayCloud Functions for Ruby is ready for you to try today. Read the Quickstart guide, learn how to write your first functions, and try it out with a Google Cloud free trial. If you want to dive a little bit deeper into the technical aspects, you can also read our Ruby Functions Framework documentation. If you’re interested in the open-source Functions Framework for Ruby, please don’t hesitate to have a look at the project and potentially even contribute. We’re looking forward to seeing all the Ruby functions you write!Related ArticleNew in Cloud Functions: languages, availability, portability, and moreCloud Functions includes a wealth of new capabilities that make it a robust platform on which to build your applicationsRead Article
Quelle: Google Cloud Platform

Migrating data, technology and people to Google Cloud

Editor’s note: Bukalapak, an ecommerce company based in Jakarta, is one of Indonesia’s largest businesses. As their platform grew to serve over 100 million customers and 12 million merchants, they needed a solution that would reliably and securely scale to handle millions of transactions a day. Here, they discuss their migration to Google Cloud and the value added from its managed services.Similar to many other enterprises, Bukalapak’s ecommerce platform did not originate in the cloud. It was initially built leveraging on-premises technologies that worked quite well at the beginning. However, as our business grew—processing over 2 million transactions per day and supporting 100 million customers—it became challenging to keep up with the necessary scale and availability needs. It wasn’t uncommon to see traffic spikes following promotional events, which were frequent. Our infrastructure and overall architecture, however, just wasn’t designed to handle this scale of demand. It was clear we needed a new way to support the success of the business, a way that would allow us to scale to meet fast-growing demand, while providing the best experience to our customers, all without overburdening our team. This led us to implement significant architectural changes, and consider a migration to the cloud.Choosing Google CloudGiven that this migration would be a large and complex endeavor, we wanted a partner in this journey, not just a vendor. We started by evaluating the product and services portfolio of potential providers, along with their ability to innovate and solve cutting-edge problems. With our very limited experience in the cloud, it was critical to have an experienced professional services team that could effectively guide and support us throughout the migration journey. We also evaluated the overall cost and the availability of data centers in Indonesia that would allow us to comply with government requirements for financial products. Finally, we needed to plan for how we would attract and retain talent, so we looked at the degree of adoption across the providers across Southeast Asia, and specifically Indonesia. After careful consideration across these areas, Google Cloud was the right choice for us.Embarking on the cloud migrationOur on-premises deployment included over 160 relational and NoSQL databases, We also maintained a Kubernetes cluster of over 1,000 nodes and over 30,000 cores, running 550 production microservices and one large monolith application. To address the large amount of technical debt our platform had, we decided against a lift-and-shift approach. Instead, we spent a good deal of time refactoring our services, particularly our monolith application (a.k.a., the mothership), and partitioning our databases. Enhancing our monitoring and alerting, deployment tooling, and testing frameworks were critical to improve the quality of our software, development and release processes, and performance and incident management. We also invested heavily in automation, moving away from manual testing to integration testing, API testing and front-end testing. Adopting the toolings and best practices of DevOps, MLOps and ChatOps increased our engineering velocity and improved the quality of our products and services. For a team that had very limited cloud experience, it was clear early on that this was not just a technology migration. It involved a cultural migration as well, and we wanted to ensure our team could perform the migration while gaining the skill set and experience needed to maintain and develop cloud-based applications. We started by training a smaller team, which took on the task of migrating our first services. Incrementally, we worked on expanding the training, and looping in more and more engineers in the migration efforts. As more engineering teams got involved, we paired them with one of the engineers who joined the migration early on and acted as a coach. This approach allowed us to transfer knowledge and roll out best practices, incrementally but surely, across the entire organization. We took a multi-step approach for the migration. We started by focusing on the cloud foundation work, introducing automation and new technologies like Ansible and Terraform. We also invested heavily in establishing a strong security foundation, onboarding WAF and Anti-DDoS, domain threat detection, network scanning, and image hardening tools, to name a few. From there, we started to migrate the smaller, simpler services and worked our way up to the more complex. That helped the team gain experience over time while managing risk appropriately. In the end, we successfully completed the migration in just 18 months, with very minimal downtime.  Managed services for greater peace of mindOur team selected Cloud SQL early on as the fully managed service for most of our MySQL and PostgreSQL databases. We appreciated how easy Cloud SQL made it to manage and maintain our databases. With just a few simple API calls, we could quickly set up a new instance or read replica. Auto-failovers and auto-increasing disk size ensured we could run reliably without a heavy operational burden. In addition to Cloud SQL, we’ve now been able to integrate across the other Google Cloud data services, including BigQuery, Data Studio, Pub/Sub, and Dataflow. These services have been instrumental in helping us process, store, and gain insights from a massive amount of data. That in turn allowed us to better understand our customers and consistently find new opportunities to make improvements on their behalf.Google Cloud’s managed services give a greater peace of mind. Our team spends less time on maintenance and operations. Instead, we have more time and resources to focus on building products and solving problems related to our core business. Our engineering velocity has increased, and our team has access to Google’s cutting-edge technology, enabling us to solve problems more efficiently. In addition, our platform now has higher uptime, and can scale with ease to keep up with unpredictable and growing demand. We also were able to improve the overall security of our platform and now have a standardized security model that can easily be implemented for new applications. The larger impact has been on what our lean infrastructure team is now able to accomplish. Migrating to Google Cloud gave us the strategic and competitive advantages we were looking for. Both throughout the migration and now that we’re running in production, Google Cloud has been a great partner to us. The Google Cloud team put a lot of effort into understanding what we needed to be successful, and advocating for our needs, often connecting us to product teams or experts from others in the organization. Their desire to go the extra mile on behalf of their customers made our experience positive and ultimately made our cloud migration successful.Learn more about Bukalapak and how you can migrate to Google Cloud managed databases.Related ArticleTo run or not to run a database on Kubernetes: What to considerIt can be a challenge to run a database in a distributed container environment like Kubernetes. Try these tips and best practices.Read Article
Quelle: Google Cloud Platform