Full AWS Practitioner Certification Question

A logistics company is evaluating a vision foundation model designed to classify package damage levels. To ensure the model meets industry standards, they want to accurately assess its performance using appropriate validation methods. Which of the following approaches is most suitable for evaluating the model's classification performance?