Azure Scales 530B Parameter GPT-3 Model with NVIDIA NeMo Megatron

This post was co-authored by Hugo Affaticati, Technical Program Manager, Microsoft Azure HPC + AI, and Jon Shelley, Principal TPM Manager, Microsoft Azure HPC + AI.

Natural language processing (NLP), automated speech recognition (ASR), and text-to-speech (TTS) applications are becoming increasingly common in today’s world. Most companies have leveraged these technologies to create chatbots for managing customer questions and complaints, streamlining operations, and removing some of the heavy cost burden that comes with headcount. But what you may not realize is they’re also being used internally to reduce risk and identify fraudulent behavior, reduce customer complaints, increase automation, and analyze customer sentiment. It’s prevalent in most places, but especially in industries such as healthcare, finance, retail, and telecommunications.

NVIDIA recently released the latest version of the NVIDIA NeMo Megatron framework, which is now in open beta. This framework can be used to build and deploy large language models (LLMs) with natural language understanding (NLU).

Combining NVIDIA NeMo Megatron with our Azure AI infrastructure offers a powerful platform that anyone can spin up in minutes without having to incur the costs and burden of managing their own on-premises infrastructure. And of course, we have taken our benchmarking of the new framework to a new level, to truly show the power of the Azure infrastructure.

Reaching new milestones with 530B parameters

We used Azure NDm A100 v4-series virtual machines to run the GPT-3 model's new NVIDIA NeMo Megatron framework and test the limits of this series. NDm A100 v4 virtual machines are Azure’s flagship GPU offerings for AI and deep learning powered by NVIDIA A100 80GB Tensor Core GPUs. These instances have the most GPU memory capacity and bandwidth, backed by NVIDIA InfiniBand HDR connections to support scaling up and out. Ultimately, we ran a 530B-parameter benchmark on 175 virtual machines, resulting in a training time per step of as low as 55.7 seconds (figure1). This benchmark measures the compute efficiency and how it scales by measuring the time taken per step to train the model after steady state is reached, with a mini-batch size of one. Such outstanding speed would not have been possible without InfiniBand HDR providing excellent communication between nodes without increased latency.

Figure 1: Training time per step on the 530B-parameter benchmark from 105 to 175 virtual machines.

These results highlight an almost linear speed increase, guaranteeing better performance for a higher number of nodes—paramount for heavy or time-sensitive workloads. As shown by these runs with billions of parameters, customers can rest assured that Azure’s infrastructure can handle even the most difficult and complex workloads, on demand.

“Speed and scale are both key to developing large language models, and the latest release of the NVIDIA NeMo Megatron framework introduces new techniques to deliver 30 percent faster training for LLMs,” said Paresh Kharya, senior director of accelerated computing at NVIDIA. “Microsoft’s testing with NeMo Megatron 530B also shows that Azure NDm A100 v4 instances powered by NVIDIA A100 Tensor Core GPUs and NVIDIA InfiniBand networking provide a compelling option for achieving linear training speedups at massive scale.”

Showcasing Azure AI capabilities—now and in the future

Azure’s commitment is to make AI and HPC accessible to everyone. It includes, but is not limited to, providing the best AI infrastructure that scales from the smallest use cases to the heaviest workloads. As we continue to innovate to build the best platform for your AI workloads, our promise to you is to use the latest benchmarks to test our AI capabilities. These results help drive our own innovation and showcase that there is no limit to what you can do. For all your AI computing needs, Azure has you covered.

Learn more

To learn more about the results or how to recreate them, please see the following links.

A quick start guide to benchmarking LLM models in Azure: NVIDIA NeMo Megatron—Results.
A quick start guide to benchmarking LLM models in Azure: NVIDIA NeMo Megatron—Steps.

Quelle: Azure

Introducing the Docker+Wasm Technical Preview

The Technical Preview of Docker+Wasm is now available! Wasm has been producing a lot of buzz recently, and this feature will make it easier for you to quickly build applications targeting Wasm runtimes.

As part of this release, we’re also happy to announce that Docker will be joining the Bytecode Alliance as a voting member. The Bytecode Alliance is a nonprofit organization dedicated to creating secure new software foundations, building on standards such as WebAssembly and WebAssembly System Interface (WASI).

What is Wasm?

WebAssembly, often shortened to Wasm, is a relatively new technology that allows you to compile application code written in over 40+ languages (including Rust, C, C++, JavaScript, and Golang) and run it inside sandboxed environments.

The original use cases were focused on running native code in web browsers, such as Figma, AutoCAD, and Photoshop. In fact, fastq.bio saw a 20x speed improvement when converting their web-based DNA sequence quality analyzer to Wasm. And Disney built their Disney+ Application Development Kit on top of Wasm! The benefits in the browser are easy to see.

But Wasm is quickly spreading beyond the browser thanks to the WebAssembly System Interface (WASI). Companies like Vercel, Fastly, Shopify, and Cloudflare support using Wasm for running code at the edge, and Fermyon is building a platform to run Wasm microservices in the cloud.

Why Docker?

At Docker, our goal is to help developers bring their ideas to life by conquering the complexity of app development. We strive to make it easy to build, share, and run your application, regardless of the underlying technologies. By making containers accessible to all, we proved our ability to make the lives of developers easier and were recognized as the #1 most-loved developer tool.

We see Wasm as a complementary technology to Linux containers where developers can choose which technology they use (or both!) depending on the use case. And as the community explores what’s possible with Wasm, we want to help make Wasm applications easier to develop, build, and run using the experience and tools you know and love.

How do I get the technical preview?

Ready to dive in and try it for yourself? Great! But before you do, a couple quick notes to keep in mind as you start exploring:

Important note #1: This is a technical preview build of Docker Desktop, and things might not work as expected. Be sure to back up your containers and images before proceeding.Important note #2: This preview has the containerd image store enabled and cannot be disabled. If you’re not currently using the containerd image store, then pre-existing images and containers will be inaccessible.

You can download the technical preview build of Docker Desktop here:

macOS Apple SiliconmacOS IntelWindows AMD64Linux Arm64 (deb)Linux AMD64 (deb, rpm, tar)

Are there any known limitations?

Yes! This is an early technical preview and we’re still working on making the experience as smooth as possible. But here are a few things you should be aware of:

Docker Compose may not exit cleanly when interruptedWorkaround: Clean up docker-compose processes by sending them a SIGKILL (killall -9 docker-compose).Pushes to Hub might give an error stating server message: insufficient_scope: authorization failed, even after logging in using Docker DesktopWorkaround: Run docker login in the CLI

Okay, so how does the Wasm integration actually work?

We’re glad you asked! First off, we need to remind you that since this is a technical preview, things may change quite rapidly. But here’s how it currently works.

We’re leveraging our recent work to migrate image management to containerd, as it provides the ability to use both OCI-compatible artifacts and containerd shims.We collaborated with WasmEdge to create a containerd shim. This shim extracts the Wasm module from the OCI artifact and runs it using the WasmEdge runtime.We added support to declare the Wasm runtime, which will enable the use of this new shim.

Let’s look at an example!

After installing the preview, we can run the following command to start an example Wasm application:

docker run -dp 8080:8080 –name=wasm-example –runtime=io.containerd.wasmedge.v1 –platform=wasi/wasm32 michaelirwin244/wasm-example

Since a few of the flags might be unfamiliar, let’s explain what they’re doing:

–runtime=io.containerd.wasmedge.v1 – This informs the Docker engine that we want to use the Wasm containerd shim instead of the standard Linux container runtime–platform=wasi/wasm32 – This specifies the architecture of the image we want to use. By leveraging a Wasm architecture, we don’t need to build separate images for the different architectures. The Wasm runtime will do the final step of converting the Wasm binary to machine instructions.

After the image is pulled, the runtime reads the ENTRYPOINT of the image to locate and extract the Wasm module. The module is then loaded into the Wasm runtime, started, and networking is configured. We now have a Wasm app running on our machine!

This particular application is a simple web server that says “Hello world!” and echos data back to us. To verify it’s working, let’s first view the logs.

docker logs wasm-example
Server is now running

We can get the “Hello world” message by either opening to http://localhost:8080 or using curl.

curl localhost:8080

And our response will give us a Hello world message:

Hello world from Rust running with Wasm! Send POST data to /echo to have it echoed back to you

To send data to the echo endpoint, we can use curl:

curl localhost:8080/echo -d ‘{“message”:”Hi there”}’ -H “Content-type: application/json”

And we’ll see the data sent back to use in the response:

{“message”:”Hi there”}

To remove the application, you can remove it as you do any other Docker service:

docker rm -f wasm-example

The new integration means you can run a Wasm application alongside your Linux containers (even with Compose). To learn more, check out the docs!

What’s next for Wasm and Docker?

Another great question! Wasm is rapidly growing and evolving, including exploration on how to support multi-threading, garbage collection, and more. There are also many still-to-tackle challenges, including shortening the developer feedback loop and possible paths to production.

So try it out yourself and then let us know your thoughts or feedback on the public roadmap. We’d love to hear from you!
Quelle: https://blog.docker.com/feed/

Mit nativer Spark- und Hive-Tez-UI Amazon-EMR-Serverless-Aufträge in Echtzeit überwachen

Wir freuen uns, bekannt zu geben, dass Sie jetzt Aufträge in EMR Serverless mit nativen Apache-Spark- und Hive-Tez-UIs überwachen und debuggen können. Die Apache-Spark- und Hive-Tez-UIs sind visuelle Schnittstellen mit detaillierten Informationen zu Ihren laufenden und abgeschlossenen Aufträgen. Sie können Details zu auftragsspezifischen Metriken aufrufen sowie Informationen zu Ereigniszeitpänen, Stufen, Aufgaben und Ausführer für jeden Auftrag anzeigen. 
Quelle: aws.amazon.com

AWS Managed Microsoft AD ist jetzt auf Windows Server 2019 verfügbar

Ab heute laufen alle neuen AWS Directory Service für Microsoft AD (AWS Managed Microsoft AD)-Verzeichnisse auf Windows Server 2019. Kunden mit bestehenden Verzeichnissen können die Aktualisierung mit wenigen Klicks oder programmatisch über die API durchführen. Mit dieser Funktion können Sie Aktualisierungen für bestehende Verzeichnisse dann einleiten, wenn es am günstigsten ist, z. B. außerhalb der Hauptgeschäftszeiten. Außerdem wird AWS ab März 2023 damit beginnen, alle AWS Managed Microsoft AD-Verzeichnisse automatisch auf Windows Server 2019 zu aktualisieren.
Quelle: aws.amazon.com

Amazon EC2 fügt Service Quotas für Amazon Machine Images (AMIs) hinzu

Ab heute fügen wir Service Quotas für Amazon Machine Images (AMIs) hinzu. Sie sehen nun drei neue Kontingente im EC2-Abschnitt auf der Konsolenseite für Service Quotas. Das erste Kontingent bezieht sich auf die Gesamtzahl der AMIs in Ihrem AWS-Konto, das zweite Kontingent auf die Gesamtzahl der öffentlichen AMIs in Ihrem AWS-Konto und das dritte Kontingent auf die Anzahl der Anteile, die Sie für jedes AMI besitzen können. Diese Kontingente werden standardmäßig für alle AWS-Konten hinzugefügt und erfordern kein Eingreifen Ihrerseits. Jedes Kontingent gilt pro AWS-Konto und pro AWS-Region.
Quelle: aws.amazon.com