ML News

Stable Diffusion AI: What it is and How it Works

Oct 7, 2022

4 min read

In the last few months, with OpenAI’s DALL E 2 and DeepMind’s Gato, the world has tasted the sheer generality and potential of artificial neural networks in the creative fields. Next to this long line of complex code and exclusive models comes Standard Diffusion, one of the most talented open sources of AI artists out there.

With the open-sourced weights and standard hardware friendly compute requirements, it may just be the eye-catcher for your next personal project! So let us dive deeper into the workings and potential of Stable Diffusion and how you can use it to pump up the creative cells of your pipeline or startup.

What is Stable Diffusion AI?

Ever wondered what “Mario eating a croissant in front of the Taj Mahal” may look like? Me neither 🙄…

Stable Diffusion is a machine learning-based Text-to-Image model capable of generating graphics based on text. Till now, such models (at least to this rate of success) have been controlled by big organizations like OpenAI and Google (with their model Imagen). This is Primarily to avoid unethical use of the model, it kind of sucks due to limited access to genuinely curious people.

However, this status quo has been challenged by Stability.ai, which publicly released their model complete with weights and API compatibility in Collaboration with Hugging Face on the 22nd of August. Since then, we have seen an outburst of AI-generated art on big platforms like Instagram and Twitter.

Stable Diffusion Image Generator - How does it work?

The origins of the network and the API applications can be traced back to the initial research paper written by StabilityAI and Runway ML, High-Resolution Image Synthesis with Latent Diffusion Models. The network is an example of a latent text-to-image diffusion model.

The model has been trained by taking image-text pairs from the LAION dataset. More specifically, the model has been very expensively trained on 512x512 images taken from the LAION-5B subset from the larger dataset.

The model architecture derives its roots from the initial diffusion models from 2015 and introduces variance in the form of Latent Diffusion Models. Rather than denoising the image in question to gain context from the picture, the model works towards breaking down the image into a lower-dimensional latent space. Once the latent vision has been achieved, the primitive method of noising and denoising is applied to gain the final contextual decoding into the pixel space.

The final decoding is used to map image context to the image and eventually generate the artistic marvels we have witnessed over the past few weeks.

Stable Diffusion AI vs. DALL-E 2

Before discussing the difference between the two, let us take a quick look at DALL-E 2. DALL-E 2 is the second generation of the text-to-image generative models by OpenAI, which is smaller yet arguably better than its predecessor. How about checking out our article for the same to know more about the network?

Now let us look at the difference between the two:

FAQs on Stable Diffusion AI

Can I run Stable Diffusion on my system?

The core concept behind making the model and network open-sourced was to enable users to use the model on their system, however, there is a hardware requirement for the same which may go up to GPUs with at least 6GB VRAM to generate heavier and intricate images.

What if I don’t have a GPU?

The model is very modular in the sense that it can be easily run on online model training services like Google Colab.

Is Stable Diffusion really the cheapest way to generate images right now?

Well, the model before stable diffusion to set precedent was DALL E 2, but it came with over 3.5 Billion trainable parameters, however, Stable Diffusion stands at a measly 890 Million Parameters.

Where can I try out Stable Diffusion without setting it up on my system?

Stability AI has worked towards creating a platform where you can easily try out the model without touching the actual code to tweak the results. You can check out Dream Studio for the same.

Conclusion

Again like at the end of the DALL E 2 hype, we are at a crossroads to ponder the same question as to the ethical implications of enabling a machine to be creative and where it drives the future of art and artists.

But, the question aside, the model is letting even non-artist discover their artistic side and maybe become an inspiration for more.

Written By

Aryan Kargwal

Data Evangelist