ML News

DALL-E 3: A Year of Evolution Powered by GPT-4

Oct 4, 2023

5 min read

In the ever-evolving realm of artificial intelligence, OpenAI's latest marvel, DALL-E 3, is taking center stage. A year has passed since the launch of the previous iteration, and this cutting-edge model is poised to reshape the landscape of image generation. What sets it apart? The answer lies in the powerful engine that fuels its creative prowess: OpenAI's most formidable Language Model, GPT-4.

Join us on this journey as we dive into the intriguing world of DALL-E 3, exploring how it has grown and flourished in the year following its predecessor's debut, all while harnessing the incredible capabilities of GPT-4. In this blog, we'll unravel the intricacies of this remarkable AI innovation and unveil the fascinating possibilities it brings to the table.

What is OpenAI and Why Should We Care about DALL-E?

OpenAI is an AI Research Lab that operates on a non-profit basis to promote and provide friendly AI that benefits humanity. OpenAI got its reputation from its fantastic team and remarkable research & engineering. GPT models, mainly the GPT-4, are now considered one of the most revolutionary NLP models.

DALL-E 2 and Other Predecessors

The first iteration of DALL E is a 12 Billion parameter version of the GPT-3 Architecture, tweaked to generate images from text. Deriving its roots from Image GPT. Much like a language transformer model, DALL E treats images as a set of tokens that help generate and derive more images with the help of higher lingual tokens like sentences and such.

However, launched in August 2022, DALL E 2 (head over to our breakdown of this model to know more) could use an image as an input for the model, which not only gave you the ability to perform the previous functionalities more smoothly and efficiently but also performs tasks like Image Inpainting, Advanced Image Augmentation, and Scene Touchups and Generations.

DALL E-2, the first of its kind to be deployed on such a scale, was responsible for the commencement of the conversations surrounding ethics and development AI, giving the platform and paving the way to companies like Stability AI and more with models like Stable Diffusion and SDXL (pst… hang around our website to try it out for yourself).

DALL-E 3 and How it Works?

DALL E gave us images from text, and DALL E-2 gave us images from images; how do you upgrade from that? How about integrating ChatGPT (powered by GPT-4 BTW) to create the text that will generate the images? Context is something that even the most experienced designers fail to understand sometimes, be it the designers’ lack of context gathering or a client’s ability to provide one. DALL E-3 comes with OpenAI’s most powerful context processor in the form of GPT-4, which, behind its proprietary walls, is one of the best LLMs in the market right now!

In an attempt to fight the ethical backlash the prior model received, OpenAI representatives have announced that DALL-E 3 has been trained not to generate images resembling the style of living artists, in contrast to DALL-E 2, which can somewhat emulate the artistic styles of certain artists upon request.

OpenAI is taking this step as a preventive measure against potential legal issues. Additionally, they allow artists to exclude their artwork from future iterations of text-to-image AI models. Artists can submit images they own the rights to and request their removal through a form on the OpenAI website. In a future release of DALL-E, results resembling the artist's image and style can be blocked. This move comes in response to lawsuits filed by artists against DALL-E competitors Stability AI and Midjourney, as well as art platform DeviantArt, alleging the unauthorized use of their copyrighted work for training text-to-image models.

Conclusion

This remarkable model, powered by GPT-4, exemplifies the astounding progress the AI field has made in image generation over the past year. Its creative abilities and capacity to adapt and learn inspire awe and anticipation for the future.

However, as we journey into the future, it is essential to tread carefully. The world of proprietary AI models, like DALL-E 3, holds a dazzling allure, but it is not without its caveats. The proprietary nature of such models can limit accessibility, innovation, and, in some cases, even transparency. The proprietary wall can impede researchers, educators, and small-scale creators from harnessing the full potential of these technologies.

The importance of open-source initiatives and transparency cannot be overstated in the spirit of a more equitable AI landscape. By fostering collaborative environments, sharing knowledge, and making AI tools more accessible, we can collectively steer the AI ship toward a future that benefits humanity. OpenAI has recognized this need, opening avenues for artists and creators to protect their works and promoting ethical AI usage.

Written By

Aryan Kargwal

Data Evangelist