AWS Lambda now supports Availability Zone metadata

AWS Lambda now provides Availability Zone (AZ) metadata through a new metadata endpoint in the Lambda execution environment. With this capability, developers can determine the AZ ID (e.g., use1-az1) of the AZ their Lambda function is running in, enabling them to build functions that make AZ-aware routing decisions, such as preferring same-AZ endpoints for downstream services to reduce cross-AZ latency. This capability also enables operators to implement AZ-aware resilience patterns like AZ-specific fault injection testing.
Lambda automatically provisions and maintains execution environments ready to serve function invocations across multiple AZs within an AWS Region to provide high availability and fault tolerance without any additional configuration or management overhead for customers.  As development teams scale their serverless applications, their functions often need to interact with other AWS services like Amazon ElastiCache and Amazon RDS that provide endpoints specific to each AZ. Until now, Lambda did not provide a way for functions to determine which AZ they were running in. With the new metadata endpoint, functions can now retrieve their AZ ID with a simple HTTP request, making it easy to implement AZ-aware logic without building and maintaining custom solutions.
To get started, use the Powertools for AWS Lambda metadata utility or call the metadata endpoint directly using the environment variables that Lambda automatically sets in the execution environment. This capability is supported for all Lambda runtimes, including custom runtimes and functions packaged as container images, and integrates seamlessly with Lambda capabilities like SnapStart and provisioned concurrency, regardless of whether your functions are VPC-enabled. 
AZ metadata support is available at no additional cost in all commercial AWS Regions where Lambda is available. To learn more, visit Lambda documentation.
Quelle: aws.amazon.com

Amazon EC2 C8gn instances are now available in additional regions

Starting today, Amazon Elastic Compute Cloud (Amazon EC2) C8gn instances, powered by the latest-generation AWS Graviton4 processors, are available in the AWS Region Asia Pacific (Jakarta, Hyderabad, Tokyo), South America (Sao Paulo), and Europe (Zurich). The new instances provide up to 30% better compute performance than Graviton3-based Amazon EC2 C7gn instances. Amazon EC2 C8gn instances feature the latest 6th generation AWS Nitro Cards, and offer up to 600 Gbps network bandwidth, the highest network bandwidth among network optimized EC2 instances. 
  Take advantage of the enhanced networking capabilities of C8gn to scale performance and throughput, while optimizing the cost of running network-intensive workloads such as network virtual appliances, data analytics, CPU-based artificial intelligence and machine learning (AI/ML) inference.    For increased scalability, C8gn instances offer instance sizes up to 48xlarge, up to 384 GiB of memory, and up to 60 Gbps of bandwidth to Amazon Elastic Block Store (EBS). C8gn instances support Elastic Fabric Adapter (EFA) networking on the 16xlarge, 24xlarge, 48xlarge, metal-24xl, and metal-48xl sizes, which enables lower latency and improved cluster performance for workloads deployed on tightly coupled clusters.    C8gn instances are available in the following AWS Regions: US East (N. Virginia, Ohio), US West (Oregon, N.California), Europe (Frankfurt, Stockholm, Ireland, London, Spain, Zurich), Asia Pacific (Singapore, Malaysia, Sydney, Thailand, Mumbai, Seoul, Melbourne, Jakarta, Hyderabad, Tokyo), Middle East (UAE), Africa (Cape Town), Canada West (Calgary, Central), South America (Sao Paulo), AWS GovCloud (US-East, US-West).   To learn more, see Amazon C8gn Instances. To begin your Graviton journey, visit the Level up your compute with AWS Graviton page. To get started, see AWS Management Console, AWS Command Line Interface (AWS CLI), and AWS SDKs.
Quelle: aws.amazon.com

AWS adds support for NIXL with EFA to accelerate LLM inference at scale

AWS announces support for NVIDIA Inference Xfer Library (NIXL) with Elastic Fabric Adapter (EFA) to accelerate disaggregated large language model (LLM) inference on Amazon EC2. This integration enhances disaggregated inference serving through three key improvements: increased KV-cache throughput, reduced inter-token latency, and optimized KV-cache memory utilization. NIXL with EFA enables high throughput, low-latency KV-cache transfer between prefill and decode nodes, and it enables efficient KV-cache movement between various storage layers. NIXL is interoperable with all EFA-enabled EC2 instances and integrates natively with frameworks including NVIDIA Dynamo, SGLang, and vLLM. Combined, NIXL with EFA enables flexible integration with your EC2 instance and framework of choice, providing performant disaggregated inference at scale. AWS supports NIXL version 1.0.0 or higher with EFA installer version 1.47.0 or higher on all EFA-enabled EC2 instance types in all AWS regions at no additional cost. For more information, visit the EFA documentation.
Quelle: aws.amazon.com

NVIDIA Nemotron 3 Super now available on Amazon Bedrock

Amazon Bedrock now supports NVIDIA Nemotron 3 Super, an open hybrid Mixture-of-Experts (MoE) model designed for complex multi-agent applications. Built for agentic workloads, Nemotron 3 Super delivers fast, and cost-efficient inference enabling AI agents to maintain focus and accuracy across long, multi-step tasks without losing context. Fully open with weights, datasets, and recipes, the model supports easy customization and secure deployment, making it well-suited for enterprises, startups, and individual developers building multi-agent workflows, and advanced reasoning applications.
Amazon Bedrock gives customers access to Nemotron 3 Super through a single, fully managed API — with no infrastructure to provision or models to host. Bedrock’s serverless inference, built-in security controls, and compatibility with OpenAI API specifications make it easy to integrate Nemotron 3 Super into existing workflows and deploy at production scale with confidence.
NVIDIA Nemotron 3 Super is now available in Amazon Bedrock across select AWS Regions. For the full list of available AWS Regions, refer to the documentation. To learn more and get started, visit the Amazon Bedrock console or the service documentation here. To get started with Amazon Bedrock OpenAI API-compatible service endpoints, visit documentation here.
Quelle: aws.amazon.com

Minimax M2.5 and GLM 5 models now available on Amazon Bedrock

Amazon Bedrock expands model selection for customers by adding support for GLM 5 and Minimax M2.5. GLM 5 is a frontier‑class, general‑purpose large language model optimized for complex systems engineering and long‑horizon agentic tasks. It builds on the GLM 4.5 agent‑centric lineage and is designed to support multi‑step reasoning, math (including AIME‑style benchmarks), advanced coding, and tool‑augmented workflows, with long context support suitable for sophisticated agents and enterprise applications. MiniMax M2.5 is an agent‑native frontier model trained explicitly to reason efficiently, decompose tasks optimally, and complete complex workflows under real‑world time and cost constraints. It achieves task completion speeds comparable to or faster than leading proprietary frontier models by combining high inference throughput with reinforcement learning focused on token‑efficient reasoning and better decision‑making in agentic scaffolds.
MiniMax M2.5 and GLM 5 are now available in Amazon Bedrock across select AWS Regions. For the full list of available AWS Regions, refer to the documentation.
Quelle: aws.amazon.com