A machine learning engineer is training a model to automatically categorize news articles into predefined topics like "Sports," "Politics," and "Technology." To do this, they have a dataset where each article has already been manually assigned one of these topic categories by a human annotator. What type of data is the engineer primarily using for this training process?