A team is designing a pipeline to ingest large amounts of raw data from multiple external sources into BigQuery for further analysis. They plan to perform basic data transformations, such as removing duplicates and standardizing date formats, before loading the data. Which data manipulation methodology is most suitable for this scenario?