You are designing a data pipeline where new files gcp video

 ·  PT1H46M27S  ·  EN

data-engineer-pro video for you are designing a data pipeline where new files are asynchronously uploaded to a Cloud Storage bucket by an upstream system. Upon

Full Certification Question

You are designing a data pipeline where new files are asynchronously uploaded to a Cloud Storage bucket by an upstream system. Upon file arrival, a Dataproc job must be triggered to transform the data and load it into BigQuery. After this, further BigQuery transformation queries—unique to each target table—must be executed. These transformation jobs are long-running and may take several hours. Your goal is to build an efficient, maintainable, and scalable workflow to handle processing for hundreds of tables and ensure data freshness for consumers. What is the best approach?