DALL·E 2: Hierarchical text-conditional image creation with CLIP Latent

Modern AI systems can create realistic images and art from descriptions in natural language.

Previously, two approaches to the problem of text-conditional image formation have been proposed: contrast models such as CLIP and diffusion models. Recently, OpenAI has proposed a new method for this task: DALL·E 2.

Example of generated image.  credit: DALL E 2

Example of generated image. credit: DALL E 2

This new method produces more realistic and accurate images with 4x higher resolution than its predecessor DALL·E. The novel system combines two previous methods: a diffusion decoder is trained to reverse the CLIP image encoder.

In addition to creating original, realistic images and art from text details, DALL·E 2 can perform realistic edits such as adding or removing elements to existing images. It can also use an image as an input and create different variations of it inspired by the original. In addition to empowering people to express themselves creatively, the research also helps humans better understand how advanced AI systems see and understand our world.

Link: https://openai.com/dall-e-2/


Leave a Reply

Your email address will not be published.