Your organization is running machine learning training jobs on a Dataproc cluster. Each training task takes approximately 30 minutes to complete. You observe that shutting down idle worker nodes aggressively has caused the overall job completion time to increase. You want to reduce costs while ensuring the training job completes on time. What should you do?