A Machine Learning Specialist will be using different sets of training data to evaluate the various machine learning classification models against each other. She wants to rank each generated model by its ability to predict true positives. Which performance metric is the MOST appropriate for the problem?