Create AI art with Stable Diffusion

September 27, 2022

Exponential growth is nearly impossible to comprehend. Imagine a lake that starts a single lily pad, but every day the number doubles until the entire lake is covered with lily pads on day 30… How much of the lake’s surface would be covered after 25 days? Only 3%. The lake would be almost entirely clear and then just 5 days later you wouldn’t see any water at all. That’s exponential growth. Nothing seems to change and then suddenly everything does.

Artificial intelligence (AI) is on an exponential growth curve. And just like those lily pads, it’s hard to comprehend how quickly it can change. One day AI is the butt of jokes for creating people with two heads and then suddenly it’s so good you can’t tell if you’re looking at AI art or a real photograph. Where are we on that growth curve? I’m not quite sure, but an AI generated image just took first place at the Colorado State Fair. Things are improving quickly these days and I think it pays to have at least some basic understanding of AI and what it might mean for your art – even if you don’t care about it yet.

There are three interesting new AI platform which have been recently launched and allow you to simply type some words and have an AI generate an image for you: Stable Diffusion, MidJourney, and DALL-E. Each of them have their own merits (which I’ll discuss further below), but I’m going to focus this tutorial on Stable Diffusion because you can use it right inside Photoshop.

To set up Stable Diffusion for Photoshop:

Stable Diffusion is open-source software released by Stability AI Ltd. They also created DreamStudio to provide a web interface and API key for Stable Diffusion. I don’t understand why they didn’t name the web interface the same as the underlying software – you just need to know that DreamStudio gives you access to Stable Diffusion.

Sign up for DreamStudio. You can use it for free on their website, but I think it’s worth starting with $10 to explore in depth and get an API key for the PS plugin.
Install Christian Cantrell’s Stable Diffusion plugin for PS.
Go to your member page on DreamStudio and click on the API Key tab, copy your API key, and paste it into the PS plugin.
You can always check your balance and add funds as needed on your member page (the PS plugin will give you a warning if you need to add funds).

How to create images with the plugin:

There are settings which impact the image content in significant ways, settings which affect the quality and cost, and some which may impact both. The best strategy for quickly getting to good results is to use low quality options for speed while trying to refine your prompt and settings and then increase the quality to create the final results.

You will get unique results if you change any of the following: prompt, prompt strength, width, height, seed (using no seed will use a random seed), and any input image you provide.

You will get similar or identical results when changing the following: steps, number of images, and many of the sampler options produce similar results.

To start exploring an idea:

Fix your width and height if you need a specific output size or ratio. Changing image dimensions will change the results, so don’t bother exploring a low resolution first. So I would lock in dimensions if you know what you need (such as 1024 wide x 576 tall for a 16:9 ratio). However, some aspect ratios work better for some images due to AI bias, so don’t be afraid to play if you’re open to different aspect ratios for the final image.
Extremely low or high prompt strengths seem to produce poor results. Try staying between 5 and 10.
In the advanced options, set steps to 20. These will improve speed and cost while iterating without causing significant changes when you increase it later for quality.
Leave the sampler at the default “k_lms“. This seems to generate the best results most of the time and you could burn a lot of time and money iterating this setting looking for small differences.
Set number of images to 2-8. This will help give a good sample of different results under the current prompt.
Click “Dream” to generate images

The thumbnails in the plugin can be hard to evaluate. I like to work with a 1024×1024 image open, so that I can click the “layer” link under any of the thumbnails to see a much larger version. If you are using a source image, be sure your original source is visible before clicking “Dream” again, or you’ll be creating a derivate from Stable Diffusion’s output instead of your source. This can produce interesting results, but probably isn’t what you want to do.

Once you’ve found a version you like and want to finalize your work, use the following to refine and narrow down the image

Click “seed” by that image to copy it to the seed field above and lock in on that image.
Set number of images to 1 (so you don’t pay for images you don’t need)
Increase the steps to 50-100. I don’t generally see much improvement beyond 50 and the cost increases for larger values.
If the final results changes in unexpected ways, review any changes you made. Increasing steps from a very low value can result in big changes. Otherwise, changes probably come from some other unintended change (or failure to set the seed).

Because the output size is limited to low resolutions, upscaling can be extremely helpful. I recommend Topaz Gigapixel (and you can get it for 15% off with discount code gbenz15) for best results. Alternatively, PS’s Image Size command works well with the “preserve details” method (v1 not v2). Be sure to rasterize your layer first (smart objects are not supported) and try the “low resolution” model if using Gigapixel.

How to use a source image:

You can provide your own source image either to refine it, or to help guide your prompt. Use the following workflow:

Check “Use Document Image“. This tells the plugin to work from the current image as you see it at the moment you click the “Dream” button.
Try varying the image strength between 25 and 50. I generally like around 25-35 for using the image to as general inspiration. Values around 50 are much more literal.
Note that the quality of the source image matters and I recommend using something with at least as much resolution as your intended output. It does not have to match the output aspect ratio (it will effectively use a cropped version of the source and you may wish to crop the image to better control which portion of the source is used.

Other tips for working with Stable Diffusion:

As with any text to image AI, your prompt matters significantly.
Many people add references to famous artists as a quick shortcut to achieve specific looks easily, but I recommend avoiding this creative trap. Imitating others limits your potential in the long run. Try spending more time experimenting with different language and details prompts.
Some prompts just seem to get ignored. Try requesting an image with two different people and you’ll probably just see the first person in your prompt. Try asking for a single car and you may still see several. It’s not perfect, just keep experimenting.
Stable Diffusion was trained on 512 x 512 images. This sometimes seems to provoke some strange results with larger output sizes for portraits. You may find better results limiting the output to smaller sizes. I expect training will be done with much more detailed data in time as the project progresses. I expect these sorts of quirks go away as the AI is improved or retrained with larger images.
When you supply a reference image, larger output sizes can be used much more reliably.
Some portraits seem to show as blurry. This may be a bug, or some mechanism meant to obscure results with potential copyright issues. As with any bad version, just try again.
Try seeing what prompts are working for others. Lexica is a helpful site to see a wide range of examples. A few more examples: here.

How does Stable Diffusion compare to other options?

I haven’t personally tried DALL-E because I find the degree of personal data they require for sign up intrusive and unnecessary. However, the images I’ve seen others show excellent images of people. I get the sense that it’s well ahead of the others in this category. Many people rave about it.

Comparing Stable Diffusion (SD) and MidJourney (MJ):

It’s very easy and useful to use a source image with SD. You can specify an image for MJ via URL, but that makes things cumbersome if you need to upload images and generate links to use it.
I generally find MidJourney does a better job interpreting prompts. If you have a very specific idea in mind, I’d recommend MJ.
The SD plugin is very handy and simplifies the learning curve by removing the need to specify options with strange text prompts like “–quality 4” or “–ar 16:9”.
The SD plugin doesn’t lend itself well currently to working on several ideas simultaneously. With MJ’s Discord interface, you can work on numerous ideas at the same time. However, it gets messy and potentially confusing as everything shows up in one long thread.
MidJourney offers higher resolution on paper, but I find that it often has small artifacts and using the beta upsizing to avoid them ultimately generates results which I believe are comparable to what you can upsize from Stable Diffusion.

Ultimately, each of these platforms currently suffers from low resolution, artifacts, and other limitations. You might love or hate them right now. What I find most interesting about them is how quickly they’ve gotten to a point where some people take them very seriously. Just like the lily pads, things are going to change very quickly in the coming years. What feels like a joke now will be replaced with something truly amazing in a few years.

MidJourney resources:

Greg Benz Photography