Google Launches Muse, A New Text-to-Image Transformer Model






Since the beginning of 2021, the development of numerous text-to-image models powered by deep learning, including Midjourney, Stable Diffusion, and DALL-E-2, to mention a few, has completely changed the landscape of AI research. Google's Muse, a text-to-image Transformer model that aspires to reach cutting-edge image generating performance, is another name to add to the list.

A new text-to-image converter model called Google Muse was created by Google Research . It is intended to provide photos that are comparable to those from current models, but it is said to be quicker and more effective. It is trained on a sizable text-to-image dataset and employs a compressed, discrete latent space. It is intended to offer picture synthesis capabilities for a variety of purposes, from developing graphics from complicated concepts to creating images from text descriptions.




Given the text embedding obtained from a large language model (LLM) that has already been trained, Muse is trained on a masked modelling task in discrete token space. Muse has been trained to predict randomly masked image tokens. Muse asserts to be more effective than pixel-space diffusion models like Imagen and DALL-E 2 since it uses discrete tokens and requires fewer sample iterations. The model generates a zero-shot, mask-free editing for free by iteratively resampling image tokens conditioned on a text prompt.

Model Architecture:




More info on this : https://muse-model.github.io






Comments