A healthcare company is building a predictive model to identify high-risk patients for hospital readmission. The dataset includes patient records such as demographic information, past diagnoses, and admission history. The data is stored in Amazon S3 and a relational database hosted on an on-premises PostgreSQL server. The dataset has a class imbalance issue where very few patients are flagged as high-risk, which affects the performance of the model. Additionally, the dataset contains both categorical features (e.g., "diagnosis type") and numerical features (e.g., "days in hospital"). The ML engineer must preprocess the data to resolve the class imbalance and ensure the dataset is ready for training, using a solution that requires minimal operational effort. Which solution will meet these requirements?