Amazon SageMaker HyperPod now supports AMI-based node lifecycle configuration for Slurm clusters

Amazon SageMaker HyperPod now supports AMI-based configuration that provisions Slurm cluster nodes with the software and configurations needed for a production-ready environment to run AI/ML training workloads. This removes the need to download, configure, or upload lifecycle configuration scripts to Amazon S3. With fewer operational steps to prepare a cluster and no lifecycle configuration scripts executing during node provisioning, cluster creation time is significantly reduced, so you can start running jobs sooner.
AMI-based configuration includes required software such as Docker, Enroot, and Pyxis, and configurations such as Slurm accounting, SSH key generation, Slurm log rotation and user home directory setup. To enable AMI-based configuration, omit the LifeCycleConfig block from the instance group configuration when creating clusters using the CreateCluster API, or when using the SageMaker AI console, select “None” under Lifecycle scripts in Custom setup. For additional customization on top of the AMI-based configuration baseline, an extension script can be provided, allowing you to focus only on what capabilities and software to add, such as user configuration, observability, or LDAP integration.
Extension scripts can be configured when creating clusters through both the API and the SageMaker AI console. Using the CreateCluster API, specify the new OnInitComplete parameter and SourceS3Uri in the LifeCycleConfig block. Via the console, provide the S3 URI to the extension script in the “Extension script file in S3″ field in Custom setup. For advanced use cases that require full control over provisioning, custom lifecycle configuration scripts remain fully supported through both the API and the SageMaker AI console.
This feature is available in all AWS Regions where SageMaker HyperPod is available. To get started with creating HyperPod Slurm clusters with AMI-based node lifecycle configuration, see Getting started with SageMaker HyperPod using the AWS CLI or Getting started with SageMaker HyperPod using the SageMaker AI console in the SageMaker AI developer guide.
Quelle: aws.amazon.com

Published by