AI Art Using Stable Diffusion
Stable Diffusion – A promising upcomer in AI art generation
AI art generators are programs that use AI to create images based on textual descriptions. In earlier features, you can read about Midjourney and DALL-E 2, applications that have both become very popular on the internet in recent times. Some claim that the newly released Stable Diffusion is even more sophisticated. Stable Diffusion is a machine learning model developed by Stability AI that similar to the other two generates digital images from natural language descriptions. The model can be used for different tasks like generating image-to-image translations guided by text prompt as well as upscaling images.
How does Stable Diffusion differ from Dall-E and Midjourney?
Stable diffusion: diffusion in a single direction
Stable Diffusion is, is unlike DALL·E 2 open source and went for public release in late August this year. A Figma plugin is already underway for designers to use in their workflows. Running it – like DALL·E 2 – breaks the image generation process down into a process of “diffusion”. You start with pure noise and then refine the image over time, making it incrementally closer to a given text description until there’s no noise left at all. To enable training on limited resources while retaining its quality and flexibility, the creators of Stable Diffusion adopted a method where instead of using the actual pixel space, they applied the diffusion process over a lower dimensional latent space.
Dall·E: diffusion at two stages, early and late
The DALL·E 2 researchers tried two options for the prior: an autoregressive prior and a diffusion prior. Both choices resulted in similar performance, but the diffusion model was more computationally efficient. Therefore, it was selected as the prior of choice for DALL·E 2.
Midjourney: diffusion at both stages
Users of Midjourney one can over time somewhat recognize the style of its imagery and the “folds” and “smudges” it often reproduces. It’s impressively good when working with imaginary prompts and often generates a very specific, slightly grotesque style.
Each model with its traits and flaws
If you’ve tried out the same prompt in all three applications it’s obvious each model has its “style”, strengths, and flaws. As mentioned Midjourney works best with fantasy-like imagery whereas DALL-E 2 can produce images impressively close to real-life photography. For realistic faces, Midjourney seems like the less suitable choice as the output often looks synthetic or cartoonish. You can spend multiple hours of iterations with each application before you fully grasp their capabilities and flaws yet the consumer-facing versions available to the public today, probably only show a fraction of their full capacity.
Images generated by Stable Diffusion
How do I get access to Stable Diffusion?
You can access Stable Diffusion for free through DreamStudio, however, a paid subscription is required beyond a limited set of credits. 100 credits ( 1 credit / image ) will cost you approx $1 compared to $15 for the same amount with DALL·E 2. You can also run Stable Diffusion for free through your own GPU as shown in the video.
Video by Polyfjord