This is a dedicated watch page for a single video.
You are deploying a machine learning model on Amazon SageMaker to support a real-time fraud detection system. The application requires both low latency and high throughput to handle peak traffic. The model is currently deployed on a single ml.m5.large instance, but you are experiencing performance issues during traffic spikes. What is the BEST approach to improve both latency and throughput without significant downtime?