This is a dedicated watch page for a single video.
A healthcare company is developing a machine learning model using the Amazon SageMaker XGBoost algorithm to classify patients as either high-risk or low-risk for a specific disease. The model performs exceptionally well on the training dataset but poorly on new patient data. The ML engineer suspects that noise in the dataset is causing performance issues and wants to optimize the model to improve its generalization on unseen data. What is the best recommendation to address this problem?