This is a dedicated watch page for a single video.
You've developed an ML model using AI Platform and are now moving it into production. The model currently serves a few thousand queries per second but is facing latency issues. Requests are handled by a load balancer, which distributes them across multiple CPU-only Kubeflow pods on Google Kubernetes Engine (GKE). To enhance serving latency without altering the underlying infrastructure, what steps should you take?