Scenario: You need ads data for AI models and historical data for analytics. Identifying longtail and outlier data points is crucial. The data must be cleansed in near-real time before being used in AI models. Question: What steps should be taken to cleanse the data before running it through AI models?