You're part of a rapidly growing social media company, where your team builds TensorFlow recommender models on an on-premises CPU cluster. With billions of historical user events and 100,000 categorical features in the data, you've observed increasing model training times as the data grows. Now, you're planning to migrate the models to Google Cloud and seek the most scalable approach that minimizes training time. What should you do?