You have recently deployed a scikit-learn model to a Vertex AI endpoint and are now in the process of testing it with live production traffic. While monitoring the endpoint, you've noticed that the number of requests per hour is twice as high as initially expected throughout the day. Your goal is to ensure that the endpoint can efficiently scale to meet increased demand in the future, thus preventing users from experiencing high latency. What actions should you take to address this situation?