This is a dedicated watch page for a single video.
A junior data engineer is using the following code to de-duplicate raw streaming data and insert them in a target Delta table spark . readStream . table ( "orders_raw" ) . dropDuplicates ([ "order_id" , "order_timestamp" ]) . writeStream . option ( "checkpointLocation" , "dbfs:/checkpoints" ) . table ( "orders_unique" ) A senior data engineer pointed out that this approach is not enough for having distinct records in the target table when there are late-arriving, duplicate records. Which of the following could explain the senior data engineer’s remark?