A Cloud developer advocate’s top infrastructure sessions at Next OnAir

It’s week 3 of Google Cloud Next ’20: OnAir, and this week is all about infrastructure and operations. This is an exciting space where we have both mature services and rapid improvements. We have a bunch of great talks this week and I hope you will enjoy them and learn a lot!After checking out the talks below, if you have questions, I’ll be hosting a developer and operator focused recap and Q&A session as part of our weekly Talks by DevRel series this Friday at 9 AM PST. Our APAC team will also host a recap Friday at 11 AM SGT. Hope to see you then!Here are a few talks that I think are particularly useful:Google Compute Engine: Portfolio Overview and What’s New—GCE Senior PMs Aaron Blasius and Director Krish Sivakumar give you a rundown of announcements and updates with virtual machines and Compute Engine.Where to Store Your Stuff: A Storage Overview—Director of Product Management Dave Nettleton describes each of the main storage options, why you would choose one over the other, and talks about what’s new and what’s coming.Achieving Resiliency on Google Cloud—Ben Treynor Sloss, the founder of Google Site Reliability Engineering team, talks both about Google’s approach to building and running reliable services and strategies for you to build and evolve applications without compromising on reliability. Additionally, this week’s Cloud Study Jam gives you the change to get hands-on cloud experience through our workshops on infrastructure. Google Cloud experts will guide you through labs on cloud monitoring, orchestrating in the cloud with Kubernetes, and more. Be sure to take a look at the whole session catalog for this week—these sessions go deep in a wide variety of areas including: specific workloads you might have, optimization, logging and monitoring, multi-cloud/hybrid and hopefully anything else you’re thinking about.
Quelle: Google Cloud Platform

Google breaks AI performance records in MLPerf with world's fastest training supercomputer

Fast training of machine learning (ML) models is critical for research and engineering teams that deliver new products, services, and research breakthroughs that were previously out of reach. Here at Google, recent ML-enabled advances have included more helpful search results and a single ML model that can translate 100 different languages.The latest results from the industry-standard MLPerf benchmark competition demonstrate that Google has built the world’s fastest ML training supercomputer. Using this supercomputer, as well as our latest Tensor Processing Unit (TPU) chip, Google set performance records in six out of eight MLPerf benchmarks.Figure 1: Speedup of Google’s best MLPerf Training v0.7 Research submission over the fastest non-Google submission in any availability category. Comparisons are normalized by overall training time regardless of system size, which ranges from 8 to 4096 chips. Taller bars are better.1We achieved these results with ML model implementations in TensorFlow, JAX, and Lingvo. Four of the eight models were trained from scratch in under 30 seconds. To put that in perspective, consider that in 2015, it took more than three weeks to train one of these models on the most advanced hardware accelerator available. Google’s latest TPU supercomputer can train the same model almost five orders of magnitude faster just five years later.In this blog post we’ll look at some of the details of the competition, how our submissions achieve such high performance, and what it all means for your model training speed.MLPerf models at-a-glanceMLPerf models are chosen to be representative of cutting-edge machine learning workloads that are common throughout industry and academia. Here’s a little more detail on each MLPerf model in the figure above:DLRM represents ranking and recommendation models that are core to online businesses from media to travel to e-commerceTransformer is the foundation of a wave of recent advances in natural language processing, including BERTBERT enabled Google Search’s “biggest leap forward in the past five years” ResNet-50 is a widely used model for image classificationSSD is an object detection model that’s lightweight enough to run on mobile devicesMask R-CNN is a widely used image segmentation model that can be used in autonomous navigation, medical imaging, and other domains (you can experiment with it in Colab)In addition to the industry-leading results at maximum scale above, Google also provided MLPerf submissions using TensorFlow on Google Cloud Platform that are ready for enterprises to use today. You can read more about those submissions in this accompanying blog post.The world’s fastest ML training supercomputerThe supercomputer Google used for this MLPerf Training round is four times larger than the Cloud TPU v3 Pod that set three records in the previous competition. The system includes 4096 TPU v3 chips and hundreds of CPU host machines, all connected via an ultra-fast, ultra-large-scale custom interconnect. In total, this system delivers over 430 PFLOPs of peak performance.Table 1: All of these MLPerf submissions trained from scratch in 33 seconds or faster on Google’s new ML supercomputer.2Training at scale with TensorFlow, JAX, Lingvo, and XLATraining complex ML models using thousands of TPU chips required a combination of algorithmic techniques and optimizations in TensorFlow, JAX, Lingvo, and XLA. To provide some background, XLA is the underlying compiler technology that powers all of Google’s MLPerf submissions, TensorFlow is Google’s end-to-end open-source machine learning framework, Lingvo is a high level framework for sequence models built using TensorFlow, and JAX is a new research-focused framework based on composable function transformations. The record-setting scale above relied on model parallelism, scaled batch normalization, efficient computational graph launches, and tree-based weight initialization. All of the TensorFlow, JAX, and Lingvo submissions in the table above—implementations of ResNet-50, BERT, SSD, and Transformer—trained on 2048 or 4096 TPU chips in under 33 seconds each.TPU v4: Google’s fourth-generation Tensor Processing Unit chipGoogle’s fourth-generation TPU ASIC offers more than double the matrix multiplication TFLOPs of TPU v3, a significant boost in memory bandwidth, and advances in interconnect technology. Google’s TPU v4 MLPerf submissions take advantage of these new hardware features with complementary compiler and modeling advances. The results demonstrate an average improvement of 2.7 times over TPU v3 performance at a similar scale in the last MLPerf Training competition. Stay tuned, more information on TPU v4 is coming soonFigure 2: TPU v4 results in Google’s MLPerf Training v0.7 Research submission show an average improvement of 2.7 times over comparable TPU v3 results from Google’s MLPerf Training v0.6 Available submission at the identical scale of 64 chips. Improvements are due to hardware innovations in TPU v4 as well as software improvements.3Rapid, ongoing progressGoogle’s MLPerf Training v0.7 submissions demonstrate our commitment to advancing machine learning research and engineering at scale and delivering those advances to users through open-source software, Google’s products, and Google Cloud.You can use Google’s second-generation and third-generation TPU supercomputers in Google Cloud today. Please visit the Cloud TPU homepage and documentation to learn more. Cloud TPUs support TensorFlow and PyTorch, and a JAX Cloud TPU Preview is also available.1. All results retrieved from www.mlperf.org on July 29, 2020. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Chart compares results: 0.7-70 v. 0.7-17, 0.7-66 v. 0.7-31, 0.7-68 v. 0.7-39, 0.7-68 v. 0.7-34, 0.7-66 v. 0.7-38, 0.7-67 v. 0.7-29.2. All results retrieved from www.mlperf.org on July 29, 2020. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Table shows results: 0.7-68, 0.7-66, 0.7-68, 0.7-66, 0.7-68, 0.7-65, 0.7-68, 0.7-66.3. All results retrieved from www.mlperf.org on July 29, 2020. MLPerf name and logo are trademarks. See www.mlperf.org for more information. Figure compares results 0.7-70 v. 0.6-2.
Quelle: Google Cloud Platform

Azure Cost Management + Billing updates – July 2020

Whether you're a new student, thriving startup, or the largest enterprise, you have financial constraints, and you need to know what you're spending, where, and how to plan for the future. Nobody wants a surprise when it comes to the bill, and this is where Azure Cost Management + Billing comes in.

We're always looking for ways to learn more about your challenges and how Azure Cost Management + Billing can help you better understand where you're accruing costs in the cloud, identify and prevent bad spending patterns, and optimize costs to empower you to do more with less. Here are a few of the latest improvements and updates based on your feedback:

Drilling into empty fields and untagged resources in cost analysis.
What's new in Cost Management Labs.
New ways to save money with Azure.
New videos and learning opportunities.
Documentation updates.

Let's dig into the details.

 

Drilling into empty fields and untagged resources in cost analysis

Azure Cost Management + Billing includes all usage, purchases, and refunds for your billing account. Seeing every line item in the full usage and charges file allows you to reconcile your bill at the lowest level, but since each record can represent different charge types, which may have different properties, aggregating them within cost analysis can result in groups of empty results. This is when you see groups like "no value," "other purchases," or "untagged". Now you can filter down to these empty values and group by other attributes to drill in and understand your costs.

You can drill into data in cost analysis by either adding an explicit filter using the filter pills at the top or by clicking any grouped segment in the charts. When you add a filter using the filter pills, you'll see a new "No value" option. This accounts for any and all scenarios where that property might be empty. Here are a few examples:

Other subscription resources: Services that aren't deployed to resource groups do not have a resource group name.
Untagged resources: There are 3 categories of costs that don't have tags: Resources that simply don't have tags applied (Untagged), resources with tags that aren't included in usage data (Tags not available), and charges that cannot be tagged at all (Tags not supported).
Purchases: Since purchases aren't associated with an Azure resource, you might see placeholders for Azure or Marketplace purchases. Azure purchases cover Microsoft offers, like reservations and Azure Active Directory. Marketplace purchases cover any third-party offers available from the Azure Marketplace.

After filtering down to "No value," group data by different properties to get a clearer picture of what that represents. As an example, group by publisher type or charge type to identify Marketplace costs or purchases, respectively, when you see meter and service properties are empty.

You can also click a chart segment to drill into these costs. Clicking any of the placeholders will automatically apply the "No value" filter pill for that property.

Use this new filtering capability to drill in to and understand your costs and let us know what you'd like to see next.

 

What's new in Cost Management Labs

With Cost Management Labs, you get a sneak peek at what's coming in Azure Cost Management and can engage directly with us to share feedback and help us better understand how you use the service, so we can deliver more tuned and optimized experiences. Here are a few features you can see in Cost Management Labs:

Show billing menu items on the Cost Management menu – Now available in the portal.
See all Cost Management + Billing menu items together in one place with quick navigation between scopes.

Of course, that's not all. Every change in Azure Cost Management is available in Cost Management Labs a week before it's in the full Azure portal. We're eager to hear your thoughts and understand what you'd like to see next. What are you waiting for? Try Cost Management Labs today.

 

New ways to save money with Azure

We're always looking for ways to help you optimize costs. Here's what's new this month:

Save even more on VMs with five-year Hybrid Benefit reservations.
Support for Azure Hybrid Benefit v2 VMs in Japan East.
Reduce your Data Lake storage costs with the new, ultra low-cost Archive tier.
More flexible options with ephemeral OS disks, enabling you to save on storage costs.

 

New videos and learning opportunities

For those visual learners out there, here's one new video you might be interested in:

Azure Cosmos DB: A cost-effective database for cloud native applications (part one) (12 minutes).
Azure Cosmos DB: A cost-effective database for cloud native applications (part two) (11 minutes).
How to optimize costs with Azure Kubernetes Service (AKS) and PostgreSQL (10 minutes).
Cost optimization with Windows containers (6 minutes).

Follow the Azure Cost Management + Billing YouTube channel to stay in the loop with new videos as they're released and let us know what you'd like to see next.

Want a more guided experience? Start with Control Azure spending and manage bills with Azure Cost Management + Billing.

 

Documentation updates

Here are a couple documentation updates you might be interested in:

Noted that early termination fees are not being charged for reservation refunds.
Documented support for budget alert thresholds above 100 percent.

Want to keep an eye on all of the documentation updates? Check out the Cost Management + Billing doc change history in the azure-docs repository on GitHub. If you see something missing, select Edit at the top of the document and submit a quick pull request.

 

What's next?

These are just a few of the big updates from last month. Don't forget to check out the previous Azure Cost Management + Billing updates. We're always listening and making constant improvements based on your feedback, so please keep the feedback coming.

Follow @AzureCostMgmt on Twitter and subscribe to the YouTube channel for updates, tips, and tricks. And, as always, share your ideas and vote up others in the Cost Management feedback forum.

We know these are trying times for everyone. Best wishes from the Azure Cost Management team. Stay safe, and stay healthy!
Quelle: Azure