data-engineer-professional video for a data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB.
A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part-file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used. Which strategy will yield the best performance without shuffling data?