Friday Five — March 19, 2021

The Friday Five is a weekly Red Hat® blog post with 5 of the week’s top news items and ideas from or about Red Hat and the technology industry. Consider it your weekly digest of things that caught our eye.

Quelle: CloudForms

Davie Street Enterprises embraces GitOps with GitLab

Andres Martinez, Principal Developer, has been having a rough time lately. While the root cause analysis following the now-infamous two-day outage acquitted his team’s code of wrongdoing, Martinez has spent enough time around the codebase that he knows a major event caused by his team’s code is not a matter of “If,” but “When.” 
Quelle: CloudForms

Analyzing Python package downloads in BigQuery

The Google Cloud Public Datasets program recently published the Python Package Index (PyPI) dataset into the marketplace. PyPI is the standard repository for Python packages. If you’ve written code in Python before, you’ve probably downloaded packages from PyPI using pip or pipenv. This dataset provides statistics for all package downloads, along with metadata for each distribution. You can learn more about the underlying data and table schemas here. Below, I’ll walk through a few examples of how you can leverage this data.As a Python enthusiast who has helped build out various Looker packages, I was particularly interested in jumping into this dataset to learn more about how the libraries are being used. First I began by looking at the number of installations each day for the past 12 months, for packages whose name contains looker.In the Looker platform, I can visualize this query and clearly see that looker-sdk, our official API wrapper, is the leader in terms of downloads. Additionally, I noticed a big jump at the end of August and the beginning of December, which may be the results of different marketing efforts.One other potential application of this data is to bring in competitor packages and visualize market share over time. As a software organization, trends in downloads can help us measure the effectiveness of different developer marketing strategies and make decisions on new programs.Next, I focused on just the looker-sdk package and looked at the number of installations by Python version. Knowing that most of our users are leveraging Python 3.6 means that we might want to prioritize features that are compatible with that version. This knowledge can also be helpful in messaging users regarding updating their environments for the best experience. I can easily save the results of the query to Google Sheets and share with our marketing team. Alternatively, in a tool like Looker, I can schedule the report to be emailed on a monthly basis so our team stays up-to-date on user trends. As a next step, I might join this data onto the Github dataset also available in the marketplace to see if there is a relationship between git activity and package installations. Interested in learning more about BigQuery?To get started with querying this dataset, or the many other public datasets hosted on BigQuery, check out thefree BigQuery sandbox and our quick start guides. You can also follow me on Twitter @leighajarett or connect with me on Linkedin at linkedin.com/in/leighajarett to stay informed on BigQuery news.Related ArticleCelebrating a decade of data: BigQuery turns 10BigQuery, Google Cloud’s data analytics platform, turns 10 in 2020. Here’s a look back on big data trends in the past decade.Read Article
Quelle: Google Cloud Platform

A2 VMs now GA—the largest GPU cloud instances with NVIDIA A100 GPUs

Today, we are excited to announce the general availability of A2 VMs based on the NVIDIA Ampere A100 Tensor Core GPUs in Compute Engine, enabling customers around the world to run their NVIDIA CUDA-enabled machine learning (ML) and high performance computing (HPC) scale-out and scale-up workloads more efficiently and at a lower cost. Our A2 VMs stand apart by providing 16 NVIDIA A100 GPUs in a single VM—the largest single-node GPU instance from any major cloud provider on the market today. The A2 VM also lets you choose smaller GPU configurations (1, 2, 4 and 8 GPUs per VM),  providing the flexibility and choice you need to scale your workloads.A2 VM shapes on Compute EngineThe new A2-MegaGPU VM: 16 A100 GPUs with up to 9.6 TB/s NVIDIA NVlink BandwidthAt-scale performanceA single A2 VM supports up to 16 NVIDIA A100 GPUs, making it easy for researchers, data scientists, and developers to achieve dramatically better performance for their scalable CUDA compute workloads such as machine learning (ML) training, inference and HPC. The A2 VM family on Google Cloud Platform is designed to meet today’s most demanding HPC applications, such as CFD simulations with Altair ultraFluidX. For customers seeking ultra-large GPU clusters, Google Cloud supports clusters of thousands of GPUs for distributed ML training and optimized NCCL libraries, providing scale-out performance. The single VM shape offering with 16 A100 GPUs tied together with NVIDIA’s NVlink fabric is unique to Google Cloud and is not offered by any other cloud provider. Thus, if you need to scale up large and demanding workloads, you can start with one A100 GPU and go all the way up to 16 GPUs without having to configure multiple VMs for a single-node ML training. A2 VMs are also available in smaller configurations, offering the flexibility to match differing application needs along with up to 3 TB of Local SSD for faster data feeds into the GPUs. As a result, running the A100 on Google Cloud delivers more than 10X performance improvement on BERT Large pre-training model compared to the previous generation NVIDIA V100, all while achieving linear scaling going from 8 to 16 GPU shapes. In addition, developers can leverage containerized, pre-configured software available from NVIDIA’s NGC repository to get up and running quickly on Compute Engine A100 instances.What customers are sayingWe first made A2 VMs with A100 GPUs available to early access customers in July, and since then, have worked with a number of organizations pushing the limits of machine learning, rendering and HPC. Here’s what they had to say:Dessa, an artificial intelligence (AI) research firm recently acquired by Square was an early user of the A2 VMs. Through Dessa’s experimentations and innovations, Cash App and Square are furthering efforts to create more personalized services and smart tools that allow the general population to make better financial decisions through AI.“Google Cloud gave us critical control over our processes,” said Kyle De Freitas, a senior software engineer at Dessa. “We recognized that Compute Engine A2 VMs, powered by the NVIDIA A100 Tensor Core GPUs, could dramatically reduce processing times and allow us to experiment much faster. Running NVIDIA A100 GPUs on Google Cloud’s AI Platform gives us the foundation we need to continue innovating and turning ideas into impactful realities for our customers.”HyperConnect is a global video technology company in video communication (WebRTC) and AI. With a mission of connecting people around the world to create social and cultural values, Hyperconnect creates services based on various video and artificial intelligence technologies that connect the world.“A2 instances with new NVIDIA A100 GPUs on Google Cloud provided a whole new level of experience for training deep learning models with a simple and seamless transition from the previous generation V100 GPU. Not only did it accelerate the computation speed of the training procedure more than twice compared to the V100, but it also enabled us to scale up our large-scale neural networks workload on Google Cloud seamlessly with the A2 megagpu VM shape. These breakthroughs will help us build better models for enhancing the user experience on Hyperconnect’s services.” – Beomsoo Kim, ML Researcher, HyperconnectDeepMind(an Alphabet subsidiary) is a team of scientists, engineers, machine learning experts and more, working together to advance the state of the art in AI.“At DeepMind, our mission is to solve intelligence, and our researchers are working on finding advances to a variety of Artificial Intelligence challenges with help from hardware accelerators that power many of our experiments. By partnering with Google Cloud, we are able to access the latest generation of NVIDIA GPUs, and the a2-megagpu-16g machine type helps us train our GPU experiments faster than ever before. We’re excited to continue working with Google Cloud to develop and build future ML and AI infrastructure.” – Koray Kavukcuoglu, VP of Research, DeepMindAI2 is a non-profit research institute founded with the mission of conducting high-impact AI research and engineering in service of the common good. “Our primary mission is to push the boundaries of what computers can do, which poses two big challenges: modern AI algorithms require massive computing power, and hardware and software in the field changes quickly; you have to keep up all the time. The A100 on GCP runs 4x faster than our existing systems, and does not involve major code changes. It’s pretty much plug and play. At the end of the day, the A100 on Google Cloud gives us the ability to do drastically more calculations per dollar, which means we can do more experiments, and make use of more data.” – Dirk Groeneveld, Senior Engineer, Allen Institute for Artificial IntelligenceOTOY is a cloud graphics company, pioneering technology that is redefining content creation and delivery for media and entertainment organizations around the world.“For nearly a decade we have been pushing the boundary of GPU rendering and cloud computing to get to the point where there are no longer constraints on artistic creativity. With Google Cloud’s NVIDIA A100 instances featuring massive VRAM and the highest OctaneBench ever recorded, we have reached a first for GPU rendering – where artists no longer have to worry about scene complexity when realizing their creative visions. OctaneRender GPU-accelerated rendering democratized visual effects enabling anyone with an NVIDIA GPU to create high-end visual effects on par with a Hollywood studio. Google Cloud’s NVIDIA A100 instances are a major step in further democratizing advanced visual effects, giving any OctaneRender and RNDR users on-demand access to state of the art NVIDIA GPUs previously only available in the biggest Hollywood studios” – Jules Urbach, Founder and CEO, OTOY.GPU pricing and availabilityNVIDIA A100 GPU instances are now available in the following regions: us-central1, asia-southeast1 and europe-west4 with additional regions slated to come online throughout 2021. A2 Compute Engine VMs are available via on-demand, preemptible and committed usage discounts and are also fully supported on Google Kubernetes Engine (GKE), Cloud AI Platform, and other Google Cloud services.  A100 GPUs are available for as little as $0.87 per hour per GPU on our preemptible A2 VMs. You can find full pricing details here. Getting started You can get up and running quickly, start training ML models, and serving inference workloads on NVIDIA A100 GPUs with our Deep Learning VM images in any of our available regions. These images include all the software you’ll need: drivers, NVIDIA CUDA-X AI libraries, and popular AI frameworks like TensorFlow and PyTorch. Our pre-built and optimized TensorFlow Enterprise Images also support A100 optimizations for current and older versions of TensorFlow (1.15, 2.1, and 2.3). We handle all software updates, compatibility, and performance optimizations, so you don’t have to think about it. Check out our GPU page to learn more about the wide selection of GPUs available on Google Cloud.Related ArticleNew Compute Engine A2 VMs—first NVIDIA Ampere A100 GPUs in the cloudGoogle Cloud’s new Accelerator-Optimized (A2) VM family is based on the NVIDIA Ampere A100 GPU, and designed for demanding HPC and ML wor…Read Article
Quelle: Google Cloud Platform

Building real-time market data front-ends with websockets and Google Cloud

For those in the financial industry, the core applications for electronic trading of financial instruments have stringent tolerances around latency, throughput, and jitter. But there are also peripheral use cases that would benefit from real-time market data streams, and that are more tolerant of performance variability, such as data visualization and monitoring applications. Until recently, the high onboarding, licensing, and overall maintenance costs of real-time market data made it difficult for many firms to adopt. However, cloud-based delivery has now made real-time market data accessible to a wider array of applications.As firms reimagine their technology strategy through a cloud-first lens, they have an opportunity to apply real-time data (and its attendant benefits) to these peripheral use cases and accelerate their time to insight. In this blog post, we’ll share a cloud-first architectural pattern that provides developers a low-friction means of accessing real-time market data streams, with a focus on delivery to application front-ends. This pattern is based on an analytical market data app that we built at Google Cloud last year for our Next OnAir ‘20 conference.Simplifying access to real-time market dataThe traditional method of accessing real-time market data requires firms to co-locate in data centers, purchase and maintain physical hardware, and manage connectivity between the providers and their own data centers. Larger capital markets participants can have entire global teams responsible for these activities. By streaming market data in the cloud, the barrier to entry for consumers becomes lower. Application types such as retail screen trading, mark-to-market exposure monitoring, index publishing, now-casting, fan charting, and model-driven prediction stand to benefit from this expanded access to real-time data sources—without the infrastructure and maintenance costs that, as a practical matter, have been limited to institutions with the requisite budget and organizational structure.One relatively young asset class, however, has been cloud-first just about its entire existence. Cryptocurrency trading applications often have real-time market data streamed to trader front-ends via websockets. We applied this model of real-time dissemination to the markets for commodity and financial futures when we built the Next OnAir’20 app. Here’s more detail on that implementation.Examining the architecture for real-time visualization The real-time data source we used was Smart Stream, a service available on Google Cloud from CME Group. The data originates at the CME Globex trading platform as a User Datagram Protocol (UDP) stream running over multicast. Instrument pricing data is forwarded over an interconnect to different Pub/Sub topics, each corresponding to a single product, like silver or butter futures.As soon as a message is published to its corresponding Pub/Sub topic, it is available globally to subscribers. The delivery latency of the message to the subscriber depends on the clients’ proximity to the publishing region. Typical average packet latencies on consumer broadband connections, for example, are on the order of 10s to 100s of milliseconds – making it a good fit for live web front-end visualizations, given that the threshold of human perception hovers at around 80ms.Figure 1: Multicast adaptation to Pub/SubPub/Sub is a great core distribution mechanism for applications running globally in Google Cloud. However, there are some scenarios where applications sitting outside Google Cloud may need access to a Pub/Sub topic’s messages. One example would be a public website that allows the display of topic messages to anonymous consumers. To help address this and similar needs, we’ve open-sourced a package that we’ve nicknamed Autosocket to distribute Pub/Sub messages over standard websocket connections.  Autosocket serves as an adapter that receives Pub/Sub messages published to a single topic and delivers them to front-end clients that connect to the cluster over the websockets protocol, on a load-balanced IP address. It is a containerized application that uses Cloud Run and is configured with the Pub/Sub topic name to be mirrored. The deployed cluster is stateless, diskless and elastic, and features global load balancing. Additionally, upon connecting to the cluster, websocket clients are streamed a cache of the last 10 messages that were published to the topic, which can facilitate a better user experience during periods of low message traffic.Figure 2: Websocket endpoint client connectivityGetting started with implementationThere are two main steps to implementing a similar architecture in your own environment:Deploy a Cloud Run instance that provides the bridge between a Pub/Sub topic and the websocket data that is sent to front-end web applicationsCode the front-end application to manage the websocket connections and the continual refresh of visualizations based on the inbound data streamRelated ArticleRead ArticleConnection managementAutosocket is an open source package that automatically mirrors a Pub/Sub topic and handles connections between a Cloud Run instance and frontend web clients. We maintain a separate endpoint for each trading symbol that the application visualizes. When a user clicks on the relevant tab for each symbol, the current symbol’s endpoint connection is disconnected and another connection is established for the newly selected symbol. Connection management needs to be simple since build up and tear down happen frequently. The code in Figure 3 illustrates one straightforward way to manage the websocket connections.Figure 3: Websocket connection management in JavascriptVisualization at scaleOne of the visualizations in the application is a live forward curve chart, which reads the top-of-book data stream from the exchange. The prices plotted on most forward curve charts represent single settlement or last sale prices. Since we have the live prices of a two-sided market available via Smart Stream, we plotted those instead. This gives visibility into the relative bid-ask spreads across a given futures contract’s term structure. Figure 4 is a snapshot of the forward curve visualization from the application:Figure 4: Forward curve visualizationA summary of the coding approach to visualizing the chart is listed below, followed by a sample of the Javascript implementation in Figure 5. The visualizations were rendered using Google Charts:Connect to a product code’s websocket endpoint using Javascript’s Websocket APIKey a Javascript map by the expiry months of the futures contracts and maintain the most recent price as the corresponding value of each entryUpdate this map in the processMessage() routine that is called by SocketManager with each new message. This could add a new expiration month to the map, or it may update the latest price for an existing month already in the mapSort the map using Object.keys(), to represent the chronological range of expiration monthsUpdate the visualization with new data in your drawChart() routine, and use drawChart() as the callback for setInterval(x,i) to render the chart with current values from the map on a continual basisFigure 5: Refreshing a chart with live price data in JavascriptLive market data feeds can be combined with data from order management systems or exchange drop copies to display a trader’s real-time mark-to-market exposure. The unrealized profit-and-loss (P&L) visualization uses two websocket connections in parallel, one for the trader’s entered position and the other for the symbol’s real-time price. Each originates from a separate Pub/Sub topic. The P&L amount is derived at runtime based on the difference between the trader’s position entry price and the live market price, with an arrow preceding the expiration month indicating the direction (long or short) of the trade.Figure 6: Real-time trader unrealized profit and loss visualizationIn Figure 6, the leftmost box highlights the data that arrived from the order stream, which contains the trade direction, instrument, entry price and trader name. The rightmost box highlights the live market price of the instrument that arrives via the price stream, allowing the P&L column to be reevaluated continually as the price changes. How it looks in practiceThe Market data in the cloud site implements this architectural pattern concretely using real-time data from CME Group. Behind the scenes, we can quickly spin up and interconnect independent pillars of the app using Google Cloud Build. And of course Terraform and Google Cloud team up to enable declarative, repeatable deployments that guard against configuration drift.Learn more about Google Cloud for financial services.Related ArticleNew white paper: Strengthening operational resilience in financial services by migrating to Google CloudLearn how migrating to Google Cloud can play a critical role in strengthening operational resilience in the financial services sector.Read Article
Quelle: Google Cloud Platform