Shazam: Why cloud GPUs finally make sense

By Ben Belchak, Head of Site Reliability Engineering, Shazam

At Shazam, we’ve been heavy users of graphics processing units (GPUs) for our recognition services since 2012, starting with the NVIDIA TESLA M2090 and working our way up to the K80 today. We’ve traditionally used bare metal servers because GPUs in the cloud have not been available, and when they have, they were far too expensive and not performant enough for our needs. Only recently have the economics of GPUs in the cloud really made sense for our business. This is what kicked off our journey to Google Cloud Platform (GCP).

For certain tasks, GPUs are a cost-effective and high-performance alternative to traditional CPUs. They work great with Shazam’s core music recognition workload, in which we match snippets of user-recorded audio fingerprints against our catalog of over 40 million songs. We do that by taking the audio signatures of each and every song, compiling them into a custom database format and loading them into GPU memory. Whenever a user Shazams a song, our algorithm uses GPUs to search that database until it finds a match. This happens successfully over 20 million times per day.

To meet that demand, we’ve been maintaining a fleet of GPUs on dedicated bare metal servers that we lease from a managed services provider. Because of the time it takes to source and provision a new physical server, we provision enough to meet peak demand and then run that capacity 24/7, 365 days a year. We kept costs under control by improving our algorithms and by taking advantage of ever-evolving GPU architectures and the performance improvements they brought. About six months ago, though, we began experimenting with GPUs running on Compute Engine. Thanks to the speed with which we can dial new instances up and down, we maintain GPU infrastructure to handle average use instead of the full capacity for our maximum peak load. Thus far, we’ve migrated about one-third of our infrastructure into Google Cloud.

In order to efficiently search our massive catalog of music, we maintain multiple levels of GPU server clusters that we call “tiers.” A first tier searches against a database of the most popular songs’ audio signatures, while subsequent tiers search longer samples against progressively more and more obscure music databases. In this way, Shazam identifies, say, “Thinking Out Loud” by Ed Sheeran in a single pass from a short sample, but might need several passes and a much longer sample to identify a 1950s recording of a Lithuanian polka group (being able to match really obscure music in addition to popular music is what makes using Shazam such a magical experience for our users).

Increasing the hit rate on the first line of servers depends on keeping the index files up to date with the latest popular music. That’s hard to do given how quickly music falls in and out of favor. Some surges in traffic we can plan and pre-scale for, such as the Super Bowl, the Grammy’s, or even our first branded game show, “BEAT SHAZAM.” Other surges we cannot predict — say, a local radio station in a large market reviving an old R&B hit, or when a track that was never popular is suddenly featured in a TV advertisement. And that’s not counting new music, which we add to our catalog every day through submissions from labels as well as by in-house music experts who are constantly searching for new music.

Of course, running on bare metal servers, we also need to provision extra capacity for the inevitable failure scenarios we all experience when operating services at scale. One of the amazing benefits of running in Google is that we can now replace a failed node in just minutes with a brand new one “off the shelf” instead of keeping a pool of nodes around just waiting for failures. In our managed service provider, we had to provision GPUs in groups of four cards per machine, with two dies per card. That meant that we could lose up to eight shards of our database when a node failed. Now, in Google, we provision one VM per shard, which localizes the impact of a node failure to a single shard instead of eight.

An unexpected benefit of using Google Cloud GPUs has been to increase how often we recalculate and update our audio signature database, which is actually quite computationally intense. On dedicated infrastructure, we update the index of popular songs daily. On Google Cloud, we can recompile the index and reimage the GPU instance in well under an hour, so the index files are always up-to-date.

This flexibility allows us to begin considering dynamic cluster configurations. For instance, because of the way our algorithm works, it’s much easier for us to identify songs that were Shazamed in a car, which is a relatively quiet environment, than it is to identify songs Shazamed in a restaurant, where talking and clanging obscure the song’s audio signature. With the flexibility that cloud-based GPUs afford us, we have many more options available to us for configuring our tiers to match the specific demands that our users throw at us at different times of day. For example, we may be able to reconfigure our clusters according to time of day — morning drive time, vs. happy hour at the bar.

It’s exciting to think about the possibilities that using GPUs in Google Cloud opens up, and we look forward to working with Google Cloud as it adds new GPU offerings to its lineup.

You can find out more details about our move to Google Cloud Platform here: https://blog.shazam.com/moving-gpus-to-google-cloud-36edb4983ce5
Quelle: Google Cloud Platform

Google Cloud Natural Language API launches new features and Cloud Spanner graduating to GA

By Apoorv Saxena and Dominic Preuss, Product Managers

Today at Google Cloud Next London we’re excited to announce product news that will help customers innovate and transform their businesses faster via the cloud: first, that Google Cloud Natural Language API is adding support for new languages and entity sentiment analysis, and second, that Google Cloud Spanner is graduating to general availability (GA).

Cloud Natural Language API beta

Since we launched Cloud Natural Language API, a fully managed service for extracting meaning from text via machine learning, we’ve seen customers such as Evernote and Ocado enhance their businesses in fascinating ways. For example, they use Cloud Natural Language API to analyze customer feedback and sentiment, extract key entities and metadata from unstructured text such as emails or web articles, and enable novel features (such as deriving action items from meeting notes).

These use cases, among many others, highlighted the need to expand language support and add improvements in the quality of our base NLU technology. We’ve incorporated this feedback into the product and are pleased to announce the following new capabilities under beta:

Expanded language support for entity, document sentiment and syntax analysis for the following languages: Chinese (Simplified and Traditional), French, German, Italian, Korean and Portuguese. This is in addition to existing support for English, Spanish and Japanese.
Understand sentiment for specific entities and not just whole document or sentence: We’re introducing a new method that identifies entities in a block of text and also determines sentiment for those entities. Entity sentiment analysis is currently only available for the English language. For more information, see Analyzing Entity Sentiment.
Improved quality for sentiment and entity analysis: As part of the continuous effort to improve quality of our base models, we’re also launching improved models for sentiment and entity analysis as part of this release.

Early access users of this new functionality such as Wootric are already using the expanded language support and new entity sentiment analysis feature to better understand customer sentiment around brands and products. For example, for customer feedback such as “the phone is expensive but has great battery life,” users can now parse that the sentiment for phone is negative while the sentiment for battery life is positive.

As the API becomes more widely adopted, we’re looking forward to seeing more interesting and useful applications of it.

Cloud Spanner enters GA
Announced in March at Google Cloud Next ‘17, Cloud Spanner is the world’s first fully managed, horizontally scalable relational database service for mission-critical online transaction processing (OLTP) applications. Cloud Spanner is specifically designed to meet customer requirements in this area for strong consistency, high availability and global scale — qualities that make it unique as a service.

During the beta period, we were thrilled to see customers unlock new use cases in the cloud with Cloud Spanner, including:

Powering mission-critical applications like customer authentication and provisioning for multi-national businesses
Building consistent systems for business transactions and inventory management in the financials services and retail industries
Supporting incredibly high-volume systems that need low-latency and high-throughput in the advertising and media industries

As with all our other services, GCP handles all the performance, scalability and availability needs automatically in a pay-as-you-go way.

On May 16, Cloud Spanner will reach a further milestone by becoming generally available for the first time. Currently we’re offering regional instances, with multi-regional instances coming later this year. We’ve been Spanner users ourselves for more than five years to support a variety of mission-critical global apps, and we can’t wait to see what new workloads you bring to the cloud, and which new ones you build next!
Quelle: Google Cloud Platform