A retail analytics company stores historical data aws video

 ·  PT1H46M27S  ·  EN

machine-learning video for a retail analytics company stores historical data in .csv files in Amazon S3. The data is partially populated, lacks column labels,

Full Certification Question

A retail analytics company stores historical data in .csv files in Amazon S3. The data is partially populated, lacks column labels, and contains missing values. The company needs to prepare and structure this data so it can be used effectively for training ML models. Given this context, consider the following five steps: Use AWS Glue crawlers to infer the schemas and available columns. Use Amazon EMR with Apache Spark for data cleaning and feature engineering. Store the resulting data back in Amazon S3. Use Amazon Redshift Spectrum to infer the schemas and available columns. Use AWS Glue DataBrew for data cleaning and feature engineering. What is the correct order in which three of these steps should be selected to achieve this task efficiently?