This is a dedicated watch page for a single video.
You have deployed a scikit-learn model to a Vertex AI endpoint using a custom model server. You enabled autoscaling; however, the deployed model fails to scale beyond one replica, leading to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?