This is a dedicated watch page for a single video.
You are a machine learning engineer at a healthcare startup using an Amazon SageMaker endpoint for real-time diagnostics based on patient data. The model is experiencing increased latency and request timeouts during periods of high traffic, while the startup is operating on a tight budget. Which approach is the MOST EFFECTIVE for troubleshooting and resolving these capacity concerns while balancing cost and performance?