We are back with yet another addition to the Image Generation Cinematic Universe, with the new iteration coming from an existing network that has been the talk of the town since its launch in August 2022. The word in the city inspired a bunch of competitor labs to bring out their versions of Image Generation models powered by various techniques, so which one is it? (Hint: we have talked about it before in our publication)
Stable Diffusion, the model that has inspired a bunch of Instagram and TikTok trends, has been the first instance of bringing non-technical people towards the wonders of AI on a broader scale, with their Open-Source access from the get-go. The milestone network also adopted a DALL E 2 feature in September 2022.
The model comes back with a new iteration that tries to tackle a more user-oriented image generation with whatever Stability AI the author of the first paper and partners learned from the findings of the prior model. So let us dive deeper into Stable Diffusion v2 and why it exists in this blog post.
The new stable diffusion model offers a bunch of new features inspired by the other networks that have emerged since the introduction of the first iteration. Some of the features that can be found in the new model are:
The model can now generate images in a 768x768 resolution, offering more information to be shown in the generated images.
The model also comes with a new diffusion model capable of running upscaling on the images generated. Upscaled images can be adjusted up to 4 times the original image.
The model comes with a new refined depth architecture capable of preserving context from prior generation layers in an img2img setting. This structure preservation helps generate images that look like each other, yet different.
The model comes with an updated inpainting module built upon the previous model. This text-guided inpainting makes switching out parts in the image easier than before.
The model had strived to work efficiently on a single GPU hardware and was inspired by various iterations, suggestions, and issues created by users when the first iteration was made publicly available on GitHub. Let us have a comprehensive look at how the model differs from the first Stable Diffusion Model.
One most significant advantage that Stable Diffusion has given itself over other diffusion and generative models is that the first model was open-sourced, light, and readily available to the public through their collaboration with HuggingFace, for easy integration in people’s machine learning pipelines. With this advantage came a solid feedback mechanism that has greatly helped the development of Stable Diffusion v2.
Some of the direct differences between the first and second iterations of the models are:
The model comes with a new robust encoder, OpenClip, created by LAION and aided by Stability AI; this version V2 significantly enhances the produced photos over the V1 versions. In addition, the text-to-image models in this edition can make pictures with default resolutions of 512x512 pixels and 768x768 pixels, respectively.
The DeepFloyd team at Stability AI curated an aesthetically pleasing subset of the LAION-5B dataset for model training and then used LAION's NSFW filter to eliminate any potentially offensive material.
A new feature named Depth-to-Image preserves the depth map of the reference image, and even though the art style and design may vary, the picture keeps the general essence of the first image.
With its unique training potential, more significant resolutions, better context preservation, better text intuition, and features like Depth-to-Image, this new model opens up many creative opportunities for all users and potential competitors for labs working on similar models.
Stable Diffusion v2 is projected to bring new levels of needed engagement toward future artistic creation and may alter how AI-calculated images are generated. With enough time, who knows what will happen? We may see a new Renaissance brought by AI centuries from now, who knows…