You plan to build a machine learning model using BigQuery ML and then deploy the model as an endpoint on Vertex AI . The goal is to support near real-time processing of continuous streaming data coming from multiple vendors . The incoming data may contain invalid or malformed values , so data sanitization is also necessary before model consumption. What is the best approach to build this pipeline?