You are employed by a hotel and have a dataset containing customer comments, which are scanned from paper-based feedback forms stored as PDF files. These forms all have a consistent layout. Your task is to quickly predict an overall satisfaction score based on the customer comments found on each form. How should you best achieve this objective?