Amazon SageMaker HyperPod now supports data capture for inference workloads, enabling customers to record inference request and response payloads for model monitoring, compliance, debugging, and offline analysis. Organizations deploying generative AI and machine learning models on HyperPod need systematic visibility into the inputs flowing into their models and the outputs returned to clients to detect model drift, satisfy regulatory audit requirements, debug production issues, and build ground-truth datasets for fine-tuning. Previously, customers had to either accept limited operational visibility into their inference workloads or build expensive custom logging pipelines outside the HyperPod Inference Operator. With data capture, you can choose to record inference traffic at the SageMaker endpoint, at the load balancer, or at the model pod, depending on the level of visibility you need, and combine these options for layered observability. Captured data is delivered asynchronously to your Amazon S3 bucket and supports configurable sampling and encryption with customer-managed AWS KMS keys, so you can balance coverage with cost while keeping sensitive data protected. Data capture is designed to never block inference, ensuring production availability is preserved. You can enable data capture by configuring it on your inference endpoint when deploying models through the HyperPod Inference Operator or with SageMaker JumpStart. This feature is available for SageMaker HyperPod clusters using the EKS orchestrator in all AWS Regions where Amazon SageMaker HyperPod is supported. To learn more, see Data capture for inference on HyperPod.
Quelle: aws.amazon.com
Published by