You are rebuilding a batch data processing gcp video

 ·  PT1H46M27S  ·  EN

data-engineer-pro video for you are rebuilding a batch data processing pipeline on Google Cloud to handle structured data. Currently, you use PySpark for

Full Certification Question

You are rebuilding a batch data processing pipeline on Google Cloud to handle structured data. Currently, you use PySpark for large-scale transformations, but the pipeline takes more than 12 hours to complete. To accelerate both development time and execution speed, you prefer a serverless solution that supports SQL-based transformations. Your raw data already resides in Cloud Storage. How should you design this new pipeline to meet your performance and scalability goals?