I would love for multimodal models to learn generative art process. e.g. processing or houdini, etc. Being able to map programs in those languages to how they look visually would be a great multiplier for generative artists. Then exploring the latent space through text.