Video upload date:  · Duration: PT1H46M27S  · Language: EN

You are rebuilding a batch data processing gcp video

data-engineer-pro video for you are rebuilding a batch data processing pipeline on Google Cloud to handle structured data. Currently, you use PySpark for

This is a dedicated watch page for a single video.

Full Certification Question

You are rebuilding a batch data processing pipeline on Google Cloud to handle structured data. Currently, you use PySpark for large-scale transformations, but the pipeline takes more than 12 hours to complete. To accelerate both development time and execution speed, you prefer a serverless solution that supports SQL-based transformations. Your raw data already resides in Cloud Storage. How should you design this new pipeline to meet your performance and scalability goals?