A company is launching a generative AI initiative to create personalized content for its global customer base. Their existing customer data is stored across multiple legacy systems, is often inconsistent, and has many missing fields. Before they can effectively use this data to train personalization models, what is the most critical initial challenge related to their data that they must address?