While training a regression model with SageMaker's Linear Learner to predict individual incomes based on age and years in school, the training data encompasses various distinct groups. To ensure optimal outcomes from the model, which TWO pre-processing steps should be undertaken? (SELECT TWO)