A startup is developing an application that allows users to type a short story plot, and the application then generates a unique, coherent, and contextually relevant oil painting visualizing a key scene from that plot. Which specific type of generative AI model is most likely at the core of the image creation capability?