A government agency is building a policy summarization tool using Amazon Bedrock. They need to verify that the generated summaries are coherent, fact-based, and aligned with original documents. Which strategies support effective model evaluation on Amazon Bedrock? (Select TWO)