Video upload date:  · Duration: PT1H46M27S  · Language: EN

You’ve migrated a complex Spark-based analytical gcp video

data-engineer-pro video for you’ve migrated a complex Spark-based analytical workload from an on-premises Hadoop cluster to Dataproc using Cloud Storage (GCS)

This is a dedicated watch page for a single video.

Full Certification Question

You’ve migrated a complex Spark-based analytical workload from an on-premises Hadoop cluster to Dataproc using Cloud Storage (GCS) as your data layer. The workload involves significant shuffling , and the input data is stored in Parquet format, with individual files ranging from 200–400 MB . After migration, you're observing performance degradation . To minimize costs, your current Dataproc setup uses mostly preemptible VMs , with only two non-preemptible workers to maintain cluster stability. Given the complexity of the workload and cost constraints, how should you optimize the Spark job for better performance?