Generates high-res images from prompts.
DeepFloyd IF is a modular neural network based on the cascaded approach that generates high-resolution images in a cascading manner. It is built with multiple neural modules that tackle specific tasks and join forces within a single architecture to produce a synergistic effect. The base model produces low-resolution samples, which are then boosted by a series of upscale models to create stunning high-resolution images. DeepFloyd IF's base and super-resolution models adopt diffusion models, making use of Markov chain steps to introduce random noise into the data before reversing the process to generate new data samples from the noise. The tool operates within the pixel space, as opposed to latent diffusion. It has achieved a state-of-the-art zero-shot FID score and a deep text understanding by employing a large language model T5-XXL as a text encoder. Different texts, styles, textures, and spatial relations can be fused. The image-to-image translation can be achieved by resizing the original image to 64 pixels, adding some level of noise via forward diffusion, and denoising the image with a new prompt during the backward diffusion process. This approach opens up vast possibilities to tweak the style, patterns, and details in the output while preserving the essence of the source image. DeepFloyd IF specializes in text-to-image and can embroider it on fabric, insert it into a stained-glass window, include it in a collage, or light it up on a neon sign. It can be creatively used for various use cases to add a touch of uniqueness and creativity to the output.