SageMaker JumpStart now offers optimized deployments, enabling customers to deploy foundation models with pre-configured settings tailored to specific use cases and performance constraints. SageMaker JumpStart optimized deployments simplify model deployment by offering task-aware configurations that optimize for cost, throughput, or latency based on your workload requirements – whether content generation, summarization, or Q&A. This launch includes support for 30+ popular models from Meta, Microsoft, Mistral AI, Qwen, Google, and TII, with visibility into key performance metrics like P50 latency, time-to-first token (TTFT), and throughput before deployment.
With SageMaker JumpStart optimized deployments, customers can select from use case-specific configurations (such as generative writing or chat-style interactions) and choose optimization targets including cost-optimized, throughput-optimized, latency-optimized, or balanced performance. Models deploy to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters with pre-set configurations that eliminate guesswork while maintaining full visibility into deployment details. Available models include Meta Llama 3.1 and 3.2 variants, Microsoft Phi-3, Mistral AI models including the new Mistral-Small-24B-Instruct-2501, Qwen 2 and 3 series including multimodal Qwen2-VL, Google Gemma, and TII Falcon3. All deployments leverage SageMaker’s VPC deployment capabilities, ensuring data control and production-ready infrastructure with enterprise-grade security. The feature is available in all AWS regions where SageMaker JumpStart is curretly supported.
To get started with optimized deployments, navigate to Models in SageMaker Studio, select your desired foundation model in the JumpStart Models tab, choose “Deploy,” and select your use case and performance optimization target. For details, visit the SageMaker JumpStart documentation. AWS is actively expanding support to include additional models.
Quelle: aws.amazon.com
Published by