A company is transitioning a legacy application to a data lake on Amazon S3. During the course of the review, a data engineer discovered duplicates in the legacy data. What is the most efficient way for the data engineer to eliminate these duplicates from the legacy application data with minimal operational effort?