Apache Spark lineage now available in Amazon SageMaker Unified Studio for IDC based domains
Amazon SageMaker announces general availability of Data Lineage for Apache Spark jobs executed on Amazon EMR and AWS Glue in SageMaker Unified Studio for IDC based domains. Data Lineage provides you with the information you need to identify the root cause of complex issues and understand the impact of changes. This feature supports lineage capture of schema and transformations of data assets and columns from Spark executions in EMR-EC2, EMR-Serverless, EMR-EKS, and AWS Glue. You can then explore this lineage visually as a graph in SageMaker Unified Studio or query it using APIs. You can also use lineage to compare transformations across Spark job’s history. Spark lineage is available in all existing SageMaker Unified Studio regions. For detailed information on how to get started with lineage using these new features, refer to the documentation.
Quelle: aws.amazon.com