You are a machine learning engineer at a healthcare startup that uses an Amazon SageMaker endpoint to deliver real-time diagnostics based on patient data. The model needs to handle a high volume of requests with low latency to ensure timely results. Recently, the startup has experienced rapid growth, leading to occasional periods of high traffic where users experience increased latency and, in some cases, request timeouts. You also need to be mindful of cost, as the startup operates on a tight budget. Which approach is the MOST EFFECTIVE for troubleshooting and resolving the capacity concerns while balancing cost and performance?