A Generative AI Engineer is working on a news summarization RAG pipeline. The source documents are long, and users tend to ask high-detail queries. However, inference costs are a major concern. What should the engineer prioritize when selecting the embedding model?