Jenxi Seow

How to use Stable Diffusion (Part 1)

Jenxi Seow — Fri, 19 Jan 2024 14:34:56 +0000

To get started with Stable Diffusion, you’ll first need to use a user interface to let you operate it without having to key in commands into the terminal. In this guide, you’ll learn how to use Stable Diffusion using the Stable Diffusion Web UI by Automatic1111. This is the easier tool available out there and I feel is perfect for beginners to get started.

Once you have understood the process of creating images using the Web UI, you can then explore other tools like ComfyUI that are designed for power users.

ComfyUI workflow

This article will guide you through the process to generating your first image in Stable Diffusion. I started with the goal of helping you understand the basics in a single post, but I realised that there are a lot to digest if I do a deep dive and try to cover each parameter.

I’ve streamlined this guide so you understand enough to create your first AI generated image. Once you have a grasp of the process of how to use Stable Diffusion, I suggest reading the individual articles for the different steps and concepts to get a better grasp of what each step does, why I suggest doing it in a certain way, and explore how you can experiment on your own.

Install Stable Diffusion Web UI
Download a Stable Diffusion checkpoint
Generate image with txt2img
Next steps

Install Stable Diffusion Web UI

If you haven’t already, install Stable Diffusion Web UI.

I have written up a guide on how to install Stable Diffusion Web UI. I try to keep my installation guide updated, but things move fast in the Stable Diffusion world, so don’t be surprised things don’t work.

Download a Stable Diffusion checkpoint

When you first launch Stable Diffusion, the first option in the top left is the Stable Diffusion checkpoint option. This dropdown option lets you select the checkpoint you want to use to generate your image.

Check out my lists of the top Stable Diffusion checkpoints to browse the popular checkpoints. Sign up for my newsletter to get the free Top 10 Realistic Checkpoints Database.

The icon beside the dropdown lets you refresh the list if you moved a new checkpoint into the checkpoints folder. Click on the icon to update the list without having to restart the Web UI.

X/Y plot to compare DreamShaper v8 output for different VAE and Clip Skip values.

If you have already installed Stable Diffusion Web UI using my guide, you would already have the DreamShaper checkpoint. You can skip to the next step.

For beginners, I recommend using DreamShaper, a good general purpose checkpoint.

Download Dreamshaper v8 from Civitai or Hugging Face.
Move the checkpoint file into the checkpoints folder \models\Stable-diffusion\.

Click on the icon to update your list of checkpoints and select DreamShaper v8.

Generate image with txt2img

There are several tabs in your Web UI screen. My tabs might differ from yours since you are most likely on a default installation. My Web UI is very customised, so it will be different from what you see.

The txt2img tab is where you start your image generation if you want to create an image from a prompt.

Prompt structure

I won’t go into the details of prompt engineering in this article, so I’ll just touch briefly on the way I write prompts.

There are several ways of writing prompts for Stable Diffusion. You can go with natural language and describe the scene you want to see. This method is commonly used in Midjourney and DALL-E3. You then let the AI interpret your description to create the image.

I personally prefer to use Danbooru tags in my prompts because that gives me more control over to the elements I want to appear. For example, you can use 1girl or 1boy to specify that there is just a girl or a boy in the image. If you use them both together, you will get a girl and a boy. If you go with 2girls or 2boys, you will get two girls or two boys. Use both and you get four people in the image.

Of course, you can also do a mix of the natural language and booru tags in your prompt. Stable Diffusion will have no problems interpreting it.

The rule of thumb for prompting with Danbooru tags is to tag what you see, not what you know. For example, don’t tag footwear if you won’t see the feet in the image.

However, I have a general structure for creating my prompts:

Quality – These are tags that help to determine the quality and style, such as photorealistic, painting, anime, or illustrations etc. For example, masterpiece, best quality, photorealistic.
Composition – These are tags that describe the camera angle and lighting, such as a close-up or wide shot. You can refer to the list of camera angles and lighting types, For example, cowboy shot, cinematic lighting.
Subject – These are the tags that describe the subject. For example, 1girl, long hair, blue eyes.
Supplementary – These are tags that add on to the subject, such as the outfit or pose. For example, red dress, standing, looking at viewer.
Background – These are the tags that describe the background in the image, For example, Chinese village.
Miscellaneous – Any other tags that I want to add on would be added at this part. I prefer to place my camera lens tags here if I use them. For example, depth of field, canon 85mm f1.2.

Based on the examples above, the final prompt would be:
masterpiece, best quality, photorealistic, cowboy shot, cinematic lighting, 1girl, long hair, blue eyes, red dress, standing, looking at viewer, Chinese village, depth of field, canon 85mm f1.2

This is just a general approach to creating a prompt. Some checkpoints have a recommended way to prompting and would give better results if you follow that.

Prompt length

Stable Diffusion interprets prompts at 75 tokens per chunk. Once it goes beyond that, it will split the process into two parts. The prompt fields will show 0/75 when empty. Once you go past 75 tokens, it will show a total of 150 instead, to indicate the token count left before you use up the second part.

47 out of 75 tokens in 1 chunk

96 out of 150 tokens in 2 chunks

Token weightage

The tags in your prompts are weighted based on how far in front they are in your prompts. Tags with stronger weights are given more priority. If your output is not giving you the results you want, you might need to move your tags around to tweak their weightage.

Repeating tags

If you look at other people’s prompts, something I highly recommend you to do to study how others build their prompts, you might see them repeating the same tags, or similar ones. For example, 1girl, solo. Both tags have the same effect but repeating the similar concepts helps to give emphasis and tells the AI that this concept is important.

Some people repeat the same tags, for example, having blue eyes repeated again near the end of the earlier prompt that we had.

masterpiece, best quality, photorealistic, cowboy shot, cinematic lighting, 1girl, long hair, blue eyes, red dress, standing, looking at viewer, Chinese village, blue eyes, depth of field, canon 85mm f1.2

Moving tokens

You can move your prompt elements around in Stable Diffusion Web UI using the keyboard shortcut ⌥ Option/ALT + left/right arrow keys to move them forward or backwards.

Note that this moves comma-separated elements. Meaning that tags within two sets of commas (one at the start or end of the prompt) will be moved about.

Adjusting weights

You can also increase your tags weight by adding brackets and the weight using the syntax(tag:weight). This allows you to adjust the weight of the prompt without having to move the tags around. Weightage can apply to not just tags, but also other prompt elements such as LoRAs and embedding.

You can add the weight of the token in Stable Diffusion Web UI using the keyboard shortcut ⌘ CMD/CTRL + up/down arrow keys to adjust the weight.

By default, the tags have a weight of 1 at the position in the prompt. You can increase (above 1.0) or decrease (below 1.0) the weightage to adjust its strength in your prompt. You can actually give negative values to the tags, though at this point you might want to just place the token in the negative prompt.

Negative prompt

Those of you who have tried other AI image generators like Midjourney, Mage Space or Leonardo AI would be familiar with negative prompts. A negative prompt is basically what you want the AI to avoid in the image. Common negative tags are worst quality, low quality, deformed hands, bad anatomy.

You can also place embeddings and LoRAs in the negative prompt to reverse their effect. Bear in mind that this would only work if they support negative weights.

For example, placing a LoRA that adds detail in the negative prompt will actually remove details instead of adding details.

Negative embeddings

Instead of trying to add too many negative prompts there are negative embeddings that makes it easier to insert common negative tags. I won’t go into details of negative embeddings in this article since it is another topic on its own.

The DeepNegative is an example when you use the NG_DeepNegative_V1_75T embedding that includes 75 tokens, it actually applies 75 tokens to your negative prompt. Notice that the negative prompt field will show 75/75 tokens used when you input just NG_DeepNegative_V1_75T in the negative prompt.

Other kinds of negative embeddings are textual inversions or LoRAs trained with concepts you don’t want to, such as bad drawings or anatomy. Applying these in the negative prompt will make the AI avoid these concepts.

A fun, but potentially scary/scarring, experiment to try is to place a negative embedding in the positive prompt.

Next steps

Now that we have our prompt ready, we move on to generating the image.

Don’t forget to sign up for my newsletter to get the free Top 10 Realistic Checkpoints Database. You can also browse my lists of top Stable Diffusion checkpoints on the blog.

10+ Best Stable Diffusion checkpoints (SD 1.5)

Jenxi Seow — Fri, 19 Jan 2024 14:32:26 +0000

Now that you know what Stable Diffusion is, how to install it, and the basic usage of Stable Diffusion, you must be wondering what Stable checkpoints to use. I’ve compiled the top 10 best Stable Diffusion checkpoints for SD 1.5.

You’ve probably downloaded Stability AI’s official Stable Diffusion 1.5 model (SD 1.5) during the installation process. While it is a big improvement over the Stable Diffusion 1.4 model (SD 1.4), it is still a base model for general use with the main purpose of showcasing what Stable Diffusion 1.5 is capable of.

If you followed my installation guide, you would have installed the DreamShaper v8 checkpoint. It’s the model I recommend to people who are new to Stable Diffusion.

To bring your Stable Diffusion to the next level, you need to get a custom checkpoint like DreamShaper. Before we dive into the top checkpoints, let’s have a brief look at what best Stable Diffusion checkpoints are.

The best Stable Diffusion checkpoints ranked
Top 10 Stable Diffusion checkpoints
Round-up
Free Notion resource

The best Stable Diffusion checkpoints ranked

I have listed the top 10 best Stable Diffusion checkpoints based on their popularity, ranking them based on the total number of downloads they have on Civitai.

Top 10 Stable Diffusion checkpoints

These checkpoints are ranked by popularity as of writing. Note that some of these checkpoints differ by very small number of downloads, so expect the rankings to fluctuate.

Regardless of their standings, these checkpoints are very established and popular amongst Stable Diffusion and are great to starting points for your Stable Diffusion journey.

1. Realistic Vision

Type: Merge
Usage: Photograph
Download: Civitai

It is a close fight between Realistic Vision and ChilloutMix, but Realistic Vision edges out slightly probably because it can generate a wider range of face types. It is merged from a long list of realistic checkpoints to squeeze the most realism out of them.

While I don’t use Realistic Vision that much, I use the inpainting checkpoint all the time. It is great for fixing photorealistic images.

2. ChilloutMix

Type: Merge
Usage: Photograph, digital painting
Download: Civitai

ChilloutMix is so good at creating realistic images that it stirred up a lot of controversy around the generation of images with a real person’s face. The issue snowballed to a point where the creator had to transfer ownership of the checkpoint to Civitai and go into hiding to avoid legal repercussions. I take this as proof of the checkpoint’s capability and popularity.

Besides being the most popular realistic checkpoint for Asian faces, ChilloutMix is also used by to train many LoRAs and checkpoints, even non-realistic ones.

ChilloutMix is released under the Dreamlike License due to the checkpoint it used in the merge that is rather restrictive. Model creators seem to have bypassed this by training models using generated images instead of just merging.

3. DreamShaper

Type: Trained
Usage: Photograph, digital painting, anime
Download: Civitai, Hugging Face

DreamShaper by Lyon is the checkpoint I recommend to all Stable Diffusion beginners. If you’ve followed my installation and getting started guides, you would already have DreamShaper installed.

It is a very flexible checkpoint and can generate a wide range of styles and realism levels.

4. MajicMix Realistic

Type: Merge
Usage: Photograph, digital painting
MajicMix Realistic has become the standard for photorealism in the later half of 2023. It is one of my most used checkpoints, though nowadays I tend to switch between one of the MajicMix Realistic derivatives out there for a more specific look.

How popular is it? MajicMix Realistic has a recognisable face, especially before v7. It’s so easy to spot the face and I see it a lot on Taobao. Yes, Taobao sellers are already using AI-generated images for product images in mid-2023.

5. Uber Realistic Porn Merge (URPM)

Type: Merge
Usage: Photograph, digital painting
Download: Civitai (NSFW)

You can tell from the name that URPM is created to generated realistic NSFW images. That explains its popularity. It is very good with anatomy, naturally, and hence is used in many checkpoint merges for realistic and non-realistic models. Realistic Vision contains URPM.

Realistic Vision delivers better skin and hair textures due to the other checkpoints in the merge. But if you want a high level of NSFW concepts in your output, you might want to consider URPM.

6. epiCRealism

Type: Merge
Usage: Type: Photograph
Download: Civitai

EpiCRealism by epinikion takes realistic output to the next level with the amazing skin and hair texture it generates. On top of realism, it can create images with a photographic look if you prompt for it. I love the cinematic lighting it’s capable of. This special look that epiCRealism delivers has led to many checkpoints being merged or trained with it.

Realistic Vision contains epiCRealism, but I feel that it lost some of the latter’s magic touch in the merge. When I want the photos to look like a photograph, epiCRealism is the first checkpoint I would use. Its derivatives are also useful if you want to go for a particular look.

7. ReV Animated

Type: Merge
Usage: Digital painting
Download: Civitai

If you’re looking to create artwork with intricate details, ReV Animated is the checkpoint. It is so good at generate complex details and delivers a stunning digital painting style with the 2.5D to semi-realistic look it produces.

Unfortunately, the creator is no longer maintaining the model. I used to start off almost all my pieces with a Rev Animated draft, but it’s age is showing and there are other checkpoints out there that handles hands and complex poses better.

8. Perfect World

Type: Merge
Usage: Render, digital painting
Download: Civitai

Perfect World specialises in a semi-realistic look inspired by artwork from the game of the same name. It exaggerates body proportions, a goal that it proudly strives for. If that’s your thing, then this checkpoint is perfect for render-like and digital painting outputs.

9. MeinaMix

Type: Merge
Usage: Anime
Download: Civitai

MeinaMix is hands down my favourite anime checkpoint for the typical anime look. It strikes a good balance between the character and background.

10. Beautiful Realistic Asians

Type: Merge
Usage: Photograph
Download: Civitai

Beautiful Realistic Asians (BRA) by pleasebankai is capable of generating very realistic photographs of Asian subjects. However, I find that it takes good prompt engineering to coax the best out of it, and this kind of explains why it is less popular than ChilloutMix. When used right, BRA generates better realism than ChilloutMix.

The photographs BRA generates have a cinematic aesthetic to them like epiCRealism but specialising in Asian features. Why the need for Asian-specific checkpoints? Well, most of the general checkpoints give Asian faces that lean towards the Western stereotypes or preferences. These Asian-specific checkpoints delivers output that with Asian aesthetics.

Many realistic checkpoints use BRA in their merge partly because of its quality, and partly to avoid the licensing issue with ChilloutMix.

11. CyberRealistic (Bonus)

Type: Merge
Usage: Photograph
Download: Civitai, Hugging Face

To me, CyberRealistic sits between Realistic Vision and epiCRealism. I feel that it delivers output closer to what Realistic Vision is trying to achieve while retaining a more photographic aesthetic. It also works very well with textual inversions and LoRAs, though I’ll need to do some tests to determine if it is more versatile than the other two checkpoints.

Realistic Vision contains CyberRealistic. I suggest trying all three to see which you prefer, or just switch between them. All three are great for photorealistic output, especially if you want to avoid Asian faces.

12. Counterfeit (Bonus)

Type: Merge
Usage: Anime
Download: Civitai, Hugging Face (v2.0, v2.5, v3.0)

I had to add this as another bonus to the Top 10 because the majority of anime checkpoints out there can be traced back to Counterfeit. They either use Counterfeit or a derivative in their merges, or are trained with Counterfeit-generated data sets.

Before Counterfeit and Abyss Orange Mix came about, anime checkpoints were mostly trained on the controversial Anything V3. Anything V3 is suspected to be the leaked NovelAI checkpoint, and hence has a lot of question marks hanging over it regarding copyright infringement.

Round-up

Top 10 best Stable Diffusion checkpoints of SD 1.5

These top ten or twelve best Stable Diffusion checkpoints are ranked based on their total number of downloads on Civitai at the time of publishing this article. This might not be a truly accurate reflection of their popularity, because some of them have multiple versions and downloads of all the versions add up to the total downloads.

A checkpoint with ten versions can have someone downloading all eight out of ten versions, whereas another with one version gets only one download. Still, it is a good list to get you started if you are new to Stable Diffusion, and you’re looking for checkpoints to play with.

Having more versions means more effort has been put in to keep improving the checkpoints. If you’re interested to rank the checkpoints based on the actual number of downloads, check out my Ultimate Stable Diffusion Checkpoint Database for a comprehensive breakdown by version.

Free Notion resource

I’m giving away my personal top 10 Stable Diffusion photorealistic checkpoint list for free when you join my newsletter. While it is less comprehensive than the paid Ultimate Database, this free list is a good way starting point if you are new to the world of Stable Diffusion.

I’ve split the list so that you can view the top 10 trained checkpoints, top 10 merged checkpoints, or the overall top 10 checkpoints.

What about anime or painterly checkpoints? You can find them in the Ultimate Database. There’s just an overwhelming number of anime checkpoints out there. Besides, they differ a lot by art style and that is subject to personal taste, so it doesn’t make sense for to rank them.

You get the full-sized comparison grids to study the effect of different variables and parameters on the checkpoints output. I create these for my own research and write down my notes in the Ultimate Database, and I’m sharing it with you.

Feel free to reach out if you have checkpoints to suggest or if you want me to do a more in-depth review.

Asuka Langley Soryu AI art

Jenxi Seow — Thu, 18 Jan 2024 14:00:23 +0000

Ever since I got into anime in the late 90s, both Rei Ayanami and Asuka Langley Soryu have been two of the most prominent heroines. This was due to the sheer popularity of Neon Genesis Evangelion and their character designs. Their posters and figurines filled the otaku shops that I frequented.

So it’s inevitable that I would work on Rei and Asuka pieces once I started creating realistic artwork of my favourite anime characters.

Creation

Anime checkpoints are able to get her interface headset right, but they start turning into hair bands or accessories with realistic checkpoints.

Asuka is a US citizen of Japanese and German descent. Most depictions show her with more Caucasian features. I decided to try to play around with her facial features while trying to retain her red hair and blue eyes. However, the hair and eye colours were weird in some of the styles so I tweaked them to go with a more cosplay-like approach rather than a character recreation.

Tools used

Stable Diffusion Web UI
Adobe Photoshop (Beta)
Huion Kamvas Pro 16

Downloads

The HD versions of these images available for download on Afdian. See the links below each image.

The 4K versions will be available to members on Patreon and DeviantArt. Subscribe to my newsletter to be notified when these and new content become available.

Asuka Langley Soryu hello there

Asuka greets her new pilot.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt.

Asuka Langley Soryu dash

Asuka running through the battle field to get to her mecha.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt

Chun-Li AI art

Jenxi Seow — Mon, 08 Jan 2024 12:19:29 +0000

Chun-Li is one of the first female fighters in gaming history, and has become one of the most iconic game characters. It was a pretty easy decision to pick up Chun-Li as my third fighting game character art after Mai Shiranui and Sakura Kasugano.

Creation
Tools used
Downloads
- Chun-Li challenges you

Creation

The challenge with Chun-Li is getting her hair bunds and spiked bracelets right. The spikes were so troublesome that I went with cloth bracelets for the first piece instead. I like how that turned out in the image.

Chun-Li is known for her large, muscular thighs, and that was what I depicted. However, the most feedback I’ve received so far is about how the thighs are too big to be realistic. That’s the character design, and the fun of portraying a character in photorealistic style. You get to have fun and be creative instead of just trying to make it look real.

Tools used

Stable Diffusion Web UI
Adobe Photoshop (Beta)
Huion Kamvas Pro 16

Downloads

The HD versions of these images available for download on Afdian. See the links below each image.

The 4K versions will be available to members on Patreon and Pixiv. Subscribe to my newsletter to be notified when these and new content become available.

Chun-Li challenges you

Are you up for a fight against Chun-Li?

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

What is a Stable Diffusion checkpoint

Jenxi Seow — Sat, 25 Nov 2023 16:10:20 +0000

You might have heard of checkpoints in the context of machine learning, especially in generative AI image creation. What is a Stable Diffusion checkpoint? Is it a model or is it something different?

What are checkpoints

Checkpoints and models are fundamental concepts in machine learning that are related but distinct. It can get a bit confusing when the terms are commonly used interchangeably.

A model is a complex algorithm trained to make predictions based on input data. The process of model training is where the model learns patterns and information from a given training dataset.

A checkpoint is a snapshot during the training that captures the state of a model at a specific stage in the training process. In other words, checkpoints are a type of AI models. There are other types of Stable Diffusion models like LoRAs, LoCONs, LoHAs, LECOs and so on, but we will only be looking at checkpoints today.

Think of checkpoints as save points in a video game, allowing you to capture the state of your model at specific intervals during training. When you use a checkpoint, you are able to generate images using the concepts and knowledge it has learnt up to the checkpoint.

Types of checkpoints

If you know me well enough, you’re most likely aware of how fussy I am about organising things. I group my Stable Diffusion checkpoints based on the output they are able to produce. There are several ways to group checkpoints, including trained knowledge, image output, and realism.

I don’t group my checkpoints based on trained knowledge, but it is useful to know how they are trained to understand what they are capable of. If you are way of using checkpoints trained on copyrighted material, knowing how they are created would be key.

You can also group checkpoints by the level of realism they can achieve. The realism here generally refers to the proportions I find this a good starting point to identify the best checkpoint to use. Of course, some checkpoints are capable of multiple levels of realism.

However, I don’t group my checkpoints by realism. I prefer to sort my checkpoints based on their image output capability. There are checkpoints that can achieve different types of output, but I
find that the better checkpoints are generally specialised for a particular type of usage instead of being able to produce different looks.

Note that other than the categorisation by type, the other groupings can be subjective and are just a general way to group the checkpoints to make it easier to organise them. You’ll find that there many Stable Diffusion checkpoints fall under a few categories in actual usage.

All these different categorisation can be a little confusing that’s why I created my Stable Diffusion checkpoint databases to help me track what the checkpoints are capable of.

Checkpoint types – trained knowledge

One way of grouping Stable Diffusion checkpoints is based on how they are trained.

Trained checkpoints

Models like the SD 1.4 or SD 1.5 models are models trained by Stability AI on a large dataset. Model creators can create similar base models by training a new model with their own dataset. These are referred to as trained checkpoints.

You can also fine-tune a model by using a base model as a starting point to train your dataset. This base model can be the SD 1.4 or SD 1.5 checkpoints, or another checkpoint. Fine-turning is done to adapt an existing model for a specific task or dataset, such as a particular art style, person or character.

Both base models and fine-tuned models are referred to as trained checkpoints.

Merged checkpoints

Checkpoints can also be combined to blend the trained knowledge together, either to improve the quality or to mix different art styles together. These are called merged checkpoints, often denoted with a “Mix” in the checkpoint’s name.

Checkpoint types – image output

The main way I group my Stable Diffusion checkpoints is by the type of output they are able to generate.

So, let’s look at the types of photos you can generate. These are some of the broad looks people create:

Photorealistic – hyperrealistic images that resemble photographs
Digital painting – concept or fantasy art images that mimics realism with artistic expression
Render – 3D-rendered image style
Anime – anime style with exaggerated proportions
Illustration – distinct brush strokes, including line art and sketches

Photorealistic checkpoints

Photorealism is an art style that tries to mimic realism in paintings. Photorealistic checkpoints are capable of generating hyperrealistic images that look like photographs. Do not confuse the photorealistic style with the amount of realism it generates.

Get my Top 10 Most Popular Realistic Checkpoints database when you sign up for my newsletter.

Digital painting checkpoints

Digital painting checkpoints generate images with realistic look, but the texture is less realistic than photorealistic checkpoints. They balance detail with artistic interpretation, allowing for greater stylistic flexibility, such as visible brush strokes or a more painterly quality, depending on the training data and model design.

The images they create are reminiscent of digital and traditional artwork. I use these checkpoints if I want a concept art or digital art look.

Render checkpoints

Render checkpoints are often trained with 3D-rendered images and mimic rendering styles, such as Disney’s Pixar style. These checkpoints produce images with render-like qualities. The images created have realistic lighting, but often with texture and details of 3D models.

A popular look is the 3D Niji style from Midjourney. You can find Stable Diffusion trained on 3D Niji images.

Anime checkpoints

Anime checkpoints generate images with the distinctive anime style, including exaggerated proportions, expressions, and hair colours and styles. I generally group checkpoints for manga and anime fan art here, unless the lines are so loose that they fall under illustration checkpoints instead.

The use of generative AI to create anime-style images is immensely popular and a major driving force in the development of AI image generation. Thus, you’ll find many anime checkpoints covering different anime styles.

I prefer to group comic checkpoints here as well, unless they have such a high level of realism that warrants their grouping under digital painting checkpoints.

Illustration checkpoints

Illustration checkpoints produce images with distinctive brush strokes. These could range from wet to dry media, including oil painting, water colour, line art, and sketches. The checkpoints are trained to mimic the brush strokes of the particular medium.

General purpose checkpoints

Some checkpoints are trained to be able to produce different image styles. These are referred to as general purpose checkpoints. They are the Swiss Army knives checkpoints that lets you create a variety of styles without having to swap checkpoints.

Checkpoint types – realism

When I look at realism, I consider both the human proportions and how three-dimensional the images look. This is more subjective than the image output because you can often alter the level of realism through prompting.

Nevertheless, I prefer to also group the realism to help me track what the checkpoints can achieve with these categories:

Realistic – realistic proportions
Semi-realistic – 3D look with almost realistic proportions
2.8D – between 2.5D and 3D look
2.5D – non-flat shading
2D – flat-shading

Realistic checkpoints

Realistic checkpoints generate people with life-like proportions and details. These includes both photorealistic and digital painting checkpoints that both aim to replicate the look for real-world or high-fidelity art.

Semi-realistic checkpoints

Semi-realistic checkpoints create characters with a three-dimensional look but the proportions are not quite life-like. These are often anime or comic style look with some level of fantastical proportions, or render checkpoints.

2.8D checkpoints

2.8D checkpoints straddle between 2.5D and 3D look, with more realism than 2.5D but not quite 3D level of realism. 2.8D is not an actual technical style, and I did not use this category initially. However, th number of checkpoints targeting this specific look has led to me adding it as a distinct category on its on.

These checkpoints are often anime or digital painting checkpoints with a very stylised look.

2.5D checkpoints

2.5D checkpoints are have more realistic shading to give the subjects more depth and definition compared to the 2D look. Like 2.8D checkpoints, these are commonly anime or digital painting checkpoints with a stylised look.

2D checkpoints

2D checkpoints have flat shading look of traditional anime style. Most anime checkpoints can produce the 2D look. However, this art style extends beyond just anime, including any sort of two-dimensional artistic styles.

Other types of categories

I focus mainly on portraits, hence I only look at these few features in the checkpoints when groping them. There are other checkpoints that specialises in generating environments, icons, logos, or backgrounds.

Since I rarely generate these kind of images, I won’t talk much about them for now.

Choosing Stable Diffusion checkpoints

There are hundreds of Stable Diffusion checkpoints out there to choose from. You can find checkpoints on sites like Civitai, Hugging Face, and LibLib AI, to name a few resources.

How do you know which one is the best one? It depends on the type of images you are looking to generate and your preferred workflow.

Stay tuned for guides on choosing checkpoints and my review of my favourite checkpoints.

How to improve the performance of Stable Diffusion Web UI

Jenxi Seow — Sat, 18 Nov 2023 17:19:06 +0000

When you are generating images in large sizes and batches, knowing how to improve the performance of Stable Diffusion Web UI mean a significant reduction in generation time required.

The minimum requirement for Stable Diffusion Web UI is 2GB VRAM, but generation will be slow and you will run out of memory once you try to create images larger than 512 x 512. Fortunately, there are several ways to optimise Stable Diffusion Web UI to speed up the image generation process.

From my experience, the best setup to use Stable Diffusion is a Windows machine with Nvidia GPU that meets the recommendation of 6GB VRAM.

Bear in mind that many variables affect the optimisation options, so it is best to test the different combinations to find what gives you the best performance. Test with different settings using the same checkpoint to generate 512 x 512 images with 20 steps using the Euler sampling method. Compare how fast it takes Web UI to generate an image.

Cross-attention optimisation
Token merging
- Setting token merging
Negative guidance minimum sigma
- Setting negative guidance minimum sigma
Command line arguments
- Optimisation method arguments
- Performance options arguments
See also

Cross-attention optimisation

One of the critical operation Stable Diffusion uses is cross-attention calculation. It involves the interaction between two sets of data or vectors: the query and the key. Cross-attention can consume significant amount of memory and time.

Imagine you have a box of building blocks, and you want to build a tall tower. Some blocks are important for making it tall and stable, while others are not so important. You have a pair of special glasses that make the important blocks glow when you look at them through the glasses.

Cross-attention is like using the special glasses to allow the model to focus on the different parts of the input data on what’s important to generate the image.

Setting cross-attention optimisation

Due to the impact of cross-attention calculation, optimising its is the key to speeding up Stable Diffusion. You can set the cross-attention optimisation method in the Stable Diffusion Web UI.

Launch Stable Diffusion Web UI.
Go to the Settings tab and select the Optimization in the sidebar.
Choose your preferred cross-attention optimisation from the dropdown menu. The default is set to Automatic.
Click Apply Settings to save the settings.

Doggettx

This is a historical improvement to cross-attention operations that offers a decent performance boost, but has been surpassed by newer options. Doggettx submitted the improvements to the original implementation in Stable Diffusion.

xFormers

The Meta AI team developed the xFormers, pronounced transformers. It is a transformer library that increased the attention operation’s speed while reducing memory usage through memory-efficient attention and Flash Attention techniques.

Transformers are a type of neural network architecture that uses self-attention to determine the importance of different parts of the input data. xFormers integrates with PyTorch and CUDA libraries. CUDA is limited to Nvidia hardware, and hence xFormers is only available if you are using an Nvidia GPU.

Memory-efficient attention uses an algorithm that uses less steps and memory to compute the attention operation, making it more efficient for large models and inputs.

Flash Attention uses tiling to compute attention one small piece at a time, reducing memory usage and speeding up calculations.

Scaled-Dot-Product (sdp) Attention

SDP attention is an alternative implementation of memory-efficient attention and Flash Attention native to PyTorch that is available in PyTorch 2 and newer. Depending on your hardware setup, you might get better performance with SDP attention than xFormers. Note that it uses more VRAM than xFormers, so your hardware might run into issues with it.

SDP attention gives non-deterministic output, meaning that the results are reproducible. This is a problem if you want to be able to reproduce the same image when you use the same parameters.

If you are using Stable Diffusion to create art or images for general use, you generally won’t need deterministic output in your workflow. It is only crucial in research.

SDP Attention without Memory-Efficient Attention (SDP-no-mem)

SDP-no-mem is an implementation of SDP attention without the memory-efficient attention technique. This makes it produce deterministic output, and hence allows you to reproduce the results with the same parameters.

The drawback of using SDP-no-mem is sacrificing the memory-efficient optimisations in exchange for deterministic output.

Sub-Quadratic (sub-quad) Attention

Sub-quad attention is another implementation of memory-efficient attention. It significantly reduces the required memory, but this comes at a cost of speed.

This is useful if you’re unable to run xFormers or SDP. Sub-quad attention allows you to generate larger image sizes if you are on macOS.

Split-Attention v1

Split-attention v1 is an older implementation of memory-efficient attention that has been surpassed by the other techniques like xFormers or SDP that use memory-efficient attention.

You should be using xFormers or SDP where possible. Split-attention v1 uses less VRAM, so it might be a useful option if your hardware has limited memory. However, it is more limiting on the maximum image size it can generate.

Invoke AI

The Invoke AI is an alternative GUI. Its cross-attention optimisation is useful for macOS machines without Nvidia GPUs.

Token merging

Token merging (ToMe) is a new technique that accelerates Stable Diffusion by reducing the number of tokens that need processing. It does this by identifying and combining redundant tokens. Merging tokens changes the prompt processed, and hence changes the image output. This could be an issue if you are trying to reproduce the same image with the same parameters.

I personally find it a better habit to practice good prompt engineering and optimise your prompt length. Be mindful when creating prompts and avoid using redundant prompts.

You’ll find that many prompts out there are very badly structured. Instead of just copying prompts, take the time to remove redundancies. If you have a sample image to refer to, remove the tokens that don’t appear in the output that you want to generate.

With less tokens to process, the generation is naturally faster. However, it doesn’t seem to deliver that much improvements compared to cross-attention optimisations. I would avoid using this unless you are getting very long generation times with your setup.

Setting token merging

Launch Stable Diffusion Web UI.
Go to the Settings tab and select the Optimization in the sidebar.
Choose your preferred token merging ratio by dragging the slider or keying in the ratio value.
Click Apply Settings to save the settings.

Negative guidance minimum sigma

Negative guidance minimum sigma is an optimisation that adjusts the sigma, a parameter that represents randomness in the generation process. By increasing the minimum sigma value, you are increasing the chances of the generation process skipping the negative prompt for some steps when the image is almost ready.

Increasing the sigma value reduces the generation time, though I find the performance boost on par with token merging. Negative guidance minimum sigma alters the image output, but to a lesser extent than token merging. If you had to choose between the two, I would suggest going with negative guidance minimum sigma.

Again, I would avoid using this unless you are getting very slow performance with your setup.

Setting negative guidance minimum sigma

Launch Stable Diffusion Web UI.
Go to the Settings tab and select the Optimization in the sidebar.
Choose your preferred negative guidance minimum sigma by dragging the slider or keying in the sigma value.
Click Apply Settings to save the settings.

Command line arguments

Since Stable Diffusion Web UI is a command-line application, you can provide command-line arguments to configure it when launching Web UI. Some of these arguments can be used in combination to improve the performance of Stable Diffusion Web UI.

If you launch Web UI from the terminal, you can add the arguments to the command. If you launch Web UI by double-clicking on the webui-user.bat or run.bat files, you can edit the webui-user.bat (Windows) or webui-user.sh (Mac or Linux) in a text editor and add the variables.

In webui-user.bat, add the arguments to the line set COMMANDLINE_ARGS=.
In webui-user.sh, add the arguments to the line export COMMANDLINE_ARGS=.

For example, set COMMANDLINE_ARGS=--skip-torch-cuda-test --no-half-vae —api --opt-sdp-attention

There is a full list of command line arguments you can use with Stable Diffusion Web UI on GitHub.

Optimisation method arguments

These are the arguments that enable the optimisations mentioned in this article:

--opt-sdp-attention – Enables SDP attention optimisation
--opt-sdp-no-mem-attention – Enables SDP-no-mem
--xformers – Enables xFormers
--force-enable-xformers – Enables xFormers regardless of whether the program thinks you can run it
--opt-split-attention – Enables cross-attention layer optimisation; enabled by default for torch.cuda for both Nvidia and AMD cards
--disable-opt-split-attention – Disables the cross-attention optimisation
--opt-sub-quad-attention – Enables sub-quad attention optimisation
--opt-split-attention-v1 – Enables split attention v1

Performance options arguments

You can also add other arguments to improve the performance of Stable Diffusion Web UI:

--medvram – Splits the Stable Diffusion into three parts and only loads one in VRAM at all times, keeping the others in CPU RAM. It slows down generation speed but allows you to generate the image with a lower VRAM ceiling.
--medvram-sdxl – Enables --medvram only for SDXL models
--lowvram – An even more thorough optimisation that splits the third part, the unet, into many modules, and keeping only one module is kept in VRAM. Very, very slow generation.
--lowram – Load Stable Diffusion checkpoint weights to VRAM instead of RAM for machines that have limited RAM
--upcast-sampling – Improves generation speed for machines that need to run with --no-half. Better performance and VRAM usage than --no-half.

How to install Stable Diffusion Web UI

Jenxi Seow — Sat, 04 Nov 2023 04:26:40 +0000

Some of you mentioned that the official guide is a bit too technical, so I have written a simplified guide on how to install Stable Diffusion Web UI.

I will try to keep my installation guide updated, but things move fast in the Stable Diffusion world, so don’t be surprised things don’t work.

Check the official GitHub installation guide for the latest information. The official guide is a bit more technical, so my guide makes it a bit easier for beginners to understand.

The process is a bit different depending on the device you’re installing Stable Diffusion on. The official guide covers installing on Nvidia GPUs, AMD GPUs, Apple Silicon, and Intel Silicon.

I’ve only installed Stable Diffusion Web UI on Windows 10 with Nvidia GPU and macOS on M1 Max, so I’ll only be sharing the guides for these two setups.

Downloading DreamShaper
Installing on Windows with Nvidia GPU
- Using the Web UI Windows installer
- Manual Web UI Windows installation
Installing on macOS with Apple Silicon
Updating Stable Diffusion Web UI
- Updating on Windows with Nvidia GPU
- Updating on macOS with Apple Silicon
See also

Downloading DreamShaper

Before we start, I recommend that you download your first Stable Diffusion checkpoint. You can learn more about what Stable Diffusion checkpoints are and the top checkpoints available out there in my other articles.

Download DreamShaper v8 from Civitai or Hugging Face. You don’t need to do anything with it now.

Why download DreamShaper? It is a good starting checkpoint for beginners compared to the default Stable Diffusion 1.5 model. Downloading the model now also makes it easier for you to track the download.

If the installer doesn’t detect an existing model, it will automatically download the large 4 GB file. If you are on a slow connection, you might think that the process is stuck.

Installing on Windows with Nvidia GPU

There are two ways to install Stable Diffusion Web UI on Windows. The easy way is to use the installer package. The second method pulls the source from GitHub and requires you to have some technical knowledge to operate git.

The difference between the method is that it’s easier to get started with the installer, whereas using git allows you to switch between different commits. This is useful when you encounter bugs with a certain version or an extension.

Using the Web UI Windows installer

Download sd.webui.zip from this GitHub release page.
Extract the zip file at where you want to install Web UI. Bear in mind that the models and extensions you install will take up space, so I would recommend choosing a drive that has most space available.
Double click to run update.bat. This will update Web UI to the latest version. Wait until the update completes, then close the window.
If you have downloaded DreamShaper, move it into the sd.webui\webui\models\Stable-diffusion\ folder.
Double click run.bat to launch Web UI. It will download all the required files during first launch. There are many files so it might take a while depending on your Internet connection speed.
When everything has been downloaded and installed successfully, you will see the message “Running on local URL: http://127.0.0.1:7860”.
Copy and paste the URL http://127.0.0.1:7860 in your preferred browser, or click on this link to go to the Web UI.

Manual Web UI Windows installation

Download and install Python 3.10.6. Select the “Add to PATH” option when installing. Skip this step if you already have Python 3.10 installed.
Download and install git. Skip this step if you already have git installed.
Launch Command Prompt. Navigate to the folder or drive you want to install the Web UI, and then run git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.
If you have downloaded DreamShaper, move it into the stable-diffusion-webui\webui\models\Stable-diffusion\ folder.
Double click webui-user.bat to launch Web UI. It will download a large amount of the dependencies during first launch, so it might take a while depending on your Internet connection speed.
When everything has been downloaded and installed successfully, you will see the message “Running on local URL: http://127.0.0.1:7860”.
Copy and paste the URL http://127.0.0.1:7860 in your preferred browser, or click on this link to go to the Web UI.

Installing on macOS with Apple Silicon

The installation process for macOS is similar to the manual installation for Windows. The only difference is that you need to install Homebrew, if you’ve never installed it before.

If you haven’t installed Homebrew, follow the installation instructions at https://brew.sh. You can either install Homebrew using the script or use the .pkg installer. Keep the terminal window open after Homebrew finishes installing.
Follow the instructions under “Next steps” to add Homebrew to your PATH.
Open a new terminal window and run brew install cmake protobuf rust python@3.10 git wget. This will install the main dependencies and might take some time depending on your internet connection speed.
Clone the Web UI’s GitHub repository by running git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
If you have downloaded DreamShaper, move it into the stable-diffusion-webui/models/Stable-diffusion/ folder.
Run cd stable-diffusion-webui and then ./webui.sh to launch the Web UI. A Python virtual environment will be created and activated using venv. It will automatically download and install any missing dependencies. Once again, it might take a while depending on your internet connection speed.
When everything has been downloaded and installed successfully, you will see the message “Running on local URL: http://127.0.0.1:7860”.
Copy and paste the URL http://127.0.0.1:7860 in your preferred browser, or click on this link to go to the Web UI.

Updating Stable Diffusion Web UI

As all things in the world of Stable Diffusion, things move fast and break fast with the Stable Diffusion Web UI. While I recommend staying up-to-date with your software, I don’t suggest doing so with the Web UI.

Updating the Web UI might break compatibility with a certain extension that is critical to your image generation workflow. If things are working well enough, don’t update immediately. Give it some time for bugs and issues to be reported.

I’ve learnt the hard way, but I also install via git so it is easy for me to switch back to an earlier commit that works.

Updating on Windows with Nvidia GPU

If you installed using the Web UI Windows installer

Double click the update.bat to update web UI and wait for the process to finish before closing the window.
Once this is done, double click run.bat to launch Web UI.

If you installed manually

Launch Command Prompt and navigate to the stable-diffusion-webui folder.
Run git pull and wait for it to pull the latest files from GitHub.
Once the process is complete, double click webui-user.bat to launch Web UI or run .\webui-user.bat in the command prompt.

Updating on macOS with Apple Silicon

Open Terminal, run cd stable-diffusion-webui and then git pull and wait for it to update to the latest files.
When the process is done, run ./webui.sh to launch Web UI.

Sakura Kasugano AI art

Jenxi Seow — Tue, 10 Oct 2023 07:09:26 +0000

I started working on Sakura Kasugano AI art as a tribute to my favourite Street Fighter character. When the Street Fighter II was released and became the rage back in the 90s, I was partial to Ryu and Ken but I also enjoyed using other characters because their unique moves made them interesting to play.

Then, Sakura burst onto the scene when Street Fighter Alpha 2 was released, and boy was I smitten. She’s been stealing hearts ever since and is one of the undisputed fan favourites alongside Cammy and Chun-Li. In fact, she was voted the third most popular character in Capcom’s 2002 15th Street Fighter anniversary poll, after Chun-Li and Cammy.

I had just gotten into my serafuku phase, so the outfit was on point. Her character backstory also appealed to Ryu fans. She is a fan of Ryu and travelled to find her idol so that she could ask him to train her as a fighter. That iconic white headband was given to her by Ryu, and she later switched a red headband after he started wearing the red one Ken gave to him.

Now, what sets Sakura apart from the other characters is that she’s entirely self-taught. She picked up moves from watching Ryu in action and even developed her own unique style. The character story resonates with Ryu fans. Her moves are also similar to Ryu and Ken’s so that adds to her appeal.

Background
Creation
Tools used
Downloads

Background

Due to their immense popularity, Marvel and DC character art was flooding the AI art scene. I didn’t want to work on yet another Supergirl or Wonder Woman. Being an otaku, the natural source for me to find inspiration was from anime and video games. I had created a commissioned art of Lucy Kushinada from Cyberpunk 2077:Edgerunners, so I skipped this obvious choice.

As much as I love the Gundam series, the characters are relatively more niche. I mean, how can any of the characters top Rei Ayanami or Asuka Langley Soryu? But I believe they also lose out to Motoko Kusanagi in global appeal. Despite all the controversies, people outside the otaku scene would still remember Scarlett Johansson’s portrayal of Motoko in the live action movie, or perhaps even because of the backlash.

I love the Final Fantasy franchise as much as Gundam. But everyone was doing Tifa Lockhart and Aerith Gainsborough, so I avoided them for a while. If you follow my Instagram, or other platforms where I post my art, you’d have seen that I’ve succumbed and worked on Tifa and Aerith, and the aforementioned Motoko.

Studio Ghibli characters are popular worldwide. I jumped on my favourite Miyazaki character, Princess Mononoke, or, as I prefer to call her, Mononoke-hime.

Then, I looked at fighting games and Street Fighter and King of Fighter were at the top of my list. I wrote previously about how my creation process for Mai Shiranui from the King of Fighters.

Creation

The challenging part of creating Sakura art is the gloves. Hands are notoriously difficult to produce accurately with AI. Hands wearing gloves? The AI starts getting extremely creative in how it generates the hands.

That said, these were done when I was just started learning Stable Diffusion. I’ve since picked up new techniques and tricks to optimise my workflow, allowing me to create better hands with less effort. Though, fingernails still remain hit-or-miss. Luckily, we just need fists with Sakura, right?

Tools used

Stable Diffusion Web UI
Adobe Photoshop (Beta)
Huion Kamvas Pro 16

Downloads

The HD versions of these images available for download on Afdian. See the links below each image.

The 4K versions will be available to members on Patreon and Pixiv. Subscribe to my newsletter to be notified when these and new content become available.

Sakura Kasugano sakura season

Sakura Kasugano walking along a street filled with sakura trees.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

Sakura Kasugano twist

Sakura Kasugano turning to Shoryuken… or would it be a Tatsumaki?

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

Sakura Kasugano headband

Sakura Kasugano getting ready to kick your butt.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

Sakura Kasugano stance

Sakura Kasugano dropping into fighting stance.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

Sakura Kasugano park

Sakura Kasugano training in the park.

Get it on Ko-fi
Get it on Patreon
Get it using Alipay or WeChat Pay on Afdian 爱发电
Collect it on DeviantArt and Pixiv

What is Stable Diffusion

Jenxi Seow — Mon, 25 Sep 2023 06:20:45 +0000

When I first started posting my AI art, many people reached out to me asking how I made the art pieces because they were different from those circulating the internet. After I posted about my AI art journey, the questions became “What is Stable Diffusion” and, for those who have tried Stable Diffusion, “How do you get such results?”

Compared to images you see on social media and on the web, the key difference is due to my insistence on generating realistic images with a lot of details that are not easy to achieve with just prompting alone.

Cyber wuxia Yan by Jenxi

This is partly due to my personal goal of trying to see how close generative AI can get to realism, and also because I wanted to see how AI would impact my business since visual content generation is a big part of it.

I’m aware of this obsession with realism, and the need to drop this fixation to spend more time on improving my composition, but I digress.

Keep your feedback coming as your voice is invaluable in shaping the content I put out. My goal is to share what I’ve learnt so you can skip past the hundreds of hours I poured into research and trial-and-error. If you find this content helpful in any way, consider buying me a coffee on Ko-fi or join my Patreon to get my art in glorious high resolution.

Unraveling the Mystery: What is Stable Diffusion?
The Mechanics of Stable Diffusion
The Potential of Stable Diffusion
Using Stable Diffusion: Tools and Resources
Start your Stable Diffusion journey

Let’s dive into the captivating world of Stable Diffusion, where imagination and algorithms join forces to create art like never before.

Unraveling the Mystery: What is Stable Diffusion?

Stable Diffusion, in simple terms, is a remarkable technique that uses the power of artificial intelligence to create stunning and mesmerizing images. It simulates the process of colors or patterns spreading and blending harmoniously, resulting in visually captivating transformations.

What’s in a name? Well, it can aptly describe what something is but confuse beginners. We learnt about diffusion in chemistry class, so how is it related to artificial intelligence and machine learning? And why is it stable? What happens if it’s unstable? (You don’t want to know what Unstable Diffusion is. Yes, it’s a thing and it’s NSFW.)

The term “Stable” in the name comes from the startup Stability AI that developed Stable Diffusion model. “Diffusion” refers to it being a latent diffusion model.

Before I start, I’d like to make it clear that I’m not a machine learning expert. What I share here is my understanding of Stable Diffusion. While I do my best to provide accurate information and explanations, I’m also well aware that I might be completely wrong. Please feel free to correct me where I’m wrong. This is what learning in public is all about.

Water mage by Jenxi

The Mechanics of Stable Diffusion

So, let’s get down to demystifying Stable Diffusion. You probably remember diffusion from chemistry class. If you don’t or aren’t familiar with it, I’ll try to explain it with an analogy.

Core Concept: Diffusion

Imagine you’re in a room filled with coloured smoke. At first, the colours might be clustered, forming pockets of intensity. Diffusion occurs when these clusters gradually spread and blend until the entire room is a harmonious mix of hues.

Stable Diffusion operates on a similar principle, but with data and features instead of colours. It’s like a digital artist’s brushstroke, smoothly transitioning and merging details to create a seamless and realistic image. This process ensures that every element in the artwork harmonises, resulting in a refined and captivating visual.

In machine learning, diffusion models learn the latent structure of a dataset by modelling how data points diffuse through the latent space, where items that resemble each other are positioned close to each other.

In computer vision, the neural network is trained to denoise blurred images with Gaussian noise by learning to reverse the diffusion process.

Stable Diffusion denoising process. Source: Wikipedia

The Foundation: Generative Adversarial Networks (GANs)

To understand Stable Diffusion, you need to get acquainted with Generative Adversarial Networks, or GANs for short.

Imagine there are two forces in the neural network – a generator and a discriminator. The generator’s job is to conjure up images with the aim of creating something that resembles reality. The discriminator then decides if the generated image is real or a computer-generated imitation.

This is where things get interesting. The generator learns from the discriminator’s feedback and is constantly improving its ability to create increasingly convincing images. Likewise, the discriminator is also getting better at determining whether the images are real. It is a never-ending dance to keep pushing towards increasingly higher performance to produce images that are very close to reality.

This is what makes GANs so powerful in AI image generation.

The Algorithmic Dance: GANs in Stable Diffusion

How does all this tie into Stable Diffusion? Imagine a Taichi master directing these two forces to achieve perfect balance and harmony. Stable Diffusion operates on the principle of probability by leveraging the inherent uncertainty and randomness in the generation process. It fine-tunes the interplay between the generator and discriminator in a GAN, enhancing its ability to generate images that are highly detailed and and realistic.

Through meticulous adjustments to the training process, we are able to train the algorithm to generate results with increasingly higher quality and more refined outputs.

The Potential of Stable Diffusion

Stable Diffusion is a transformative technique that represents a significant leap forward in generative art that has taken the world by storm.

Elevating Artistic Expression with AI

This technique opens up a world of possibilities for artists, photographers, and creatives. Stable Diffusion acts as a catalyst, enhancing the artist’s ability to express themselves through the synergy of human creativity and artificial intelligence.

It’s a tool that empowers us to explore uncharted territories, unlocking styles and concepts that were once beyond imagination. With Stable Diffusion, artists transcend the limitations of conventional art creation, venturing into a realm where imagination knows no bounds.

Beyond Basics: Exploring Concepts

What does that mean? You’ve probably seen some AI generated images and know that AI can generate images of a person or an object. It is able to imitate an artist’s style, painting medium, painting style. It can also reproduce photographs based to the focal length, lighting, or a specific type of look or a photographer’s style. It can even understand composition and placement of subjects. But there’s more to generative AI.

We can go beyond just a look or style. You can train concepts such as a pose or pattern. It is able to learn what different clothing look like. You can even train textures or materials. A stormtrooper from Star Wars wearing armour made of blue and white porcelain. Or a mage conjuring water or magic.

Porcelain stormtrooper by Jenxi blends blue and white designs on the ceramic armour

You can even create what we call world morphs. These are concepts that influence and transform everything that appears in the world, hence the name. From cyberpunk to steampunk that we are familiar with, all the way to fantasy worlds where candy, bones or mushrooms dominate the world.

Whatever concept you can come up with, you can attempt to train with Stable Diffusion. And this is just scratching the surface. You can merge concepts to create images generated based on concepts that are out of this world. The limit is your imagination.

Stable Diffusion is an open invitation to artists to break free from the confines of established styles and concepts. It’s a gateway to uncharted territories, where the exploration of diverse artistic expressions becomes not just possible, but exhilarating. Whether it’s blending genres, experimenting with new techniques, or venturing into unexplored thematic realms, Stable Diffusion empowers artists to bring their imaginations to life.

Bridging the Gap: The Human-AI Collaboration

If you’re an artist who is still adamant on boycotting generative AI for ethical reasons, I urge you to give it a try. You can maintain your stance while experimenting in private for research purposes. Stable Diffusion heralds a new era in artistic methodologies and I believe every artist should try to understand what a powerful tool it can be.

It redefines how artists approach their craft, introducing innovative techniques that fuse human intuition with the capabilities of generative AI. The result is a dynamic interplay that pushes the boundaries of what’s achievable, paving the way for novel creative processes and groundbreaking artistic endeavours, not dissimilar to how Photoshop and digital painting and photography has transformed the art world.

AI self portrait of Jenxi

Stable Diffusion is not just a tool. It’s a creative partner that harmonises the human touch with the precision of AI. The results are collaborative works that transcend individual capabilities. The artist and the algorithm engage in a symbiotic dance, each contributing their strengths to craft art that is a testament to the potential of human-AI collaboration. Together, they bridge the gap between traditional artistic methods and the cutting-edge world of AI-generated art.

Tradition meets innovation as Stable Diffusion blurs the lines between conventional and digital art forms. It challenges preconceived notions about the boundaries of artistic expression, proving that the digital realm is a canvas as versatile and expressive as any traditional medium. This paradigm shift invites artists and audiences alike to embrace the limitless possibilities offered by technology in the pursuit of creative excellence.

AI-generated art is at this intersection that innovation flourishes, birthing a new breed of art that resonates with the digital age.

Comparing Stable Diffusion with Midjourney, & DALL-E

Stable Diffusion, Midjourney, and DALL-E represent the vanguard of AI art generation, each with its distinctive approach and its own set of strengths and limitations. I’ve listed down the key pros and cons that a beginner should consider.

DALL-E

Pros:

Easy to use. DALL-E has a simple interface with minimal learning curve, making it easy for beginners to learn.
Hosted resources. It’s an online service so you leverage on OpenAI’s servers to do the processing.

Cons:

Limited creative options. DALL-E’s simplicity means you rely on text prompting to generate images.
Pay to use. You need to buy credits to generate images. Each prompt generates four images and uses one credit. It costs USD 15 for 115 credits, that’s around USD 0.13 per prompt or USD 0.0325 per image. It used to give out free monthly credits but that option is gone.

Midjourney

Pros:

Amazing output. Midjourney is probably the most well-known among the three, or even all computer vision tools, for the highly artistic images it can generate with simple prompts.

Many resources available. There are many Midjourney prompts out there for you to refer and use to generate images.

Cons:

Pay to use. While DALL-E uses a credit system, Midjourney charges a monthly subscription that limits the number of generation or the duration of the generation. For example, the basic plan is limited to 200 generations per month, while the standard and pro plans get unlimited relaxed generations and 15 hours and 30 hours of fast generations respectively.

Account required. You need to register for a Discord account to join Midjourney’s server to use it. Not a hassle if you already have an account. There are many Discord servers for AI communities, so it’s a good idea to get an account to access them.

Not as easy as DALL-E. There is a slight learning curve in learning the commands for the Discord bot to generate images.

Stable Diffusion

Pros:

Most powerful out of the three. With Stable Diffusion, you get better control over the output and there are tools to train custom models. Midjourney generates output that is arguably better, depending on your tastes, but this point becomes moot once you discover that you can train Stable Diffusion models to imitate the Midjourney style.

You can run it locally. This let you customise how you run it and if you have a decent GPU, you can get generate outputs faster and at higher resolutions compared to the other two.

Free! It’s free to use and open source.

Large amount of resources. There is a large amount of custom models trained by others, ready for you to use. I current have over 2,000 models.

Cons:

Most difficult out of the three. Stable Diffusion has a steeper learning curve since there are more options and tools available.

Work needed to get it running. The installation process might be a hurdle, especially if you’re not used to working with Python. There are one-click installers, but things move so fast in the world of Stable Diffusion, so expect breakage and problems.

Hardware requirements. You need a decent setup, including at least 6 to 8 GB of VRAM and enough storage space. How much space do you need? Models range from several MBs to 10 GBs. My 2,000 plus models take up almost 2 TBs of space.

Using Stable Diffusion: Tools and Resources

There are several ways to use Stable Diffusion. You can use image generation sites that run on Stable Diffusion, run Stable Diffusion on a cloud service, or install it locally.

Image generation sites

Use an online service if you prefer not to go through the hassle of setting up a local installation of Stable Diffusion, or if your machine is unable to run Stable Diffusion. The more popular ones are RunDiffusion, Mage Space and PixAI. I started out using NightCafe. And there’s also Dream Studio from Stability AI themselves.

Like Midjourney and DALL-E, these sites have to pay for the site development, hosting, and maintenance costs on top of the GPU processing cost to generate images for you. So they all require you either pay a subscription or buy credits.

These sites ensure that the models and different extensions work well so you can focus on your AI art generation without worrying about the technicalities.

Run an online instance

If you don’t want or are unable to run Stable Diffusion locally, there’s another option available for you. You can run Stable Diffusion online by using Google Colaboratory, or Colab for short. Google Colab allows you to run Python code on Google’s server using your Google Drive to store the models and images generated.

People were using Google Colab to run Stable Diffusion for free but Google has since changed their policies to require a Colab Pro subscription of USD 9.99 per month to run Stable Diffusion on Colab.

You can easily get started using the Fast Stable Diffusion Colab notebook shared by TheLastBen. The instructions are in the notebook and you can get your Stable Diffusion up and running pretty quickly.

Alternatives include Hugging Face spaces and Runpod.

When you use an online instance, you pay based on your GPU usage. The advantage of this over image generation sites is greater control over the Stable Diffusion instance. You can run Stable Diffusion wherever you are since it is a cloud instance.

Local installation

You can install Stable Diffusion locally on your computer. You need a GPU with at least 6 GB of VRAM to run Stable Diffusion 1.5 and 2.1, and at least 8 GB of VRAM to run Stable Diffusion XL.

Stable Diffusion is available on GitHub. However, you will need to run it using a graphical user interface if you don’t want to operate it via a command prompt.

The most popular GUI is Stable Diffusion Web UI by AUTOMATIC1111. It is often referred to as the Automatic1111 Web UI or A1111. This is what I use for my AI image generation. It is very well-supported and I’ve witnessed it growing by leaps and bounds from version 1.4 to the current 1.6.

Stable Diffusion Web UI

There is a popular fork by vladmandic called SD.Next that started out adding improvements to the Automatic1111 WebUI but has since diverged so much that it is considered a standalone GUI for Stable Diffusion. It is sometimes referred to as Vlad’s Automatic.

Some, especially power users, swear by ComfyUI. It is another GUI that takes a modular approach to operating Stable Diffusion, allowing you to create advanced pipelines for your workflows. The complex nature of ComfyUI means that it has a steeper learning curve compared to A1111, but once you get the hang of it, it becomes a very powerful tool.

ComfyUI

Training tool

You can train models with Automatic1111 Web UI. For better control over the training setup and parameters, there’s Kohya’s Stable Diffusion training scripts. Like the A1111, you need a GUI to make operating it easier. The GUI for Kohya’s scripts is Kohya’s GUI by bmaltais, more commonly referred to as kohya_ss.

Resources

The two largest repositories for Stable Diffusion models are Civitai and Hugging Face.

Besides hosting models, Civitai also showcases user generated output from the models and has a discussion and review system to help the community gauge the quality of the models.

Hugging Face is the GitHub of machine learning. You find more than just Stable Diffusion models on the site. There are also other resources for computer vision, natural language processing, audio, and others.

Civitai is dedicated to Stable Diffusion and has a better features for the community. Due to Civitai’s popularity driving its rapid growth, the site was very unstable, though it has since improved vastly. So many creators upload their files on Hugging Face as a backup.

Start your Stable Diffusion journey

Stable Diffusion is a powerful tool. It is the most powerful of out the options available as I mentioned above. Of course, there are prolific AI artists who combine Midjourney and Stable Diffusion to produce amazing artwork. However, I would suggest focusing on mastering one tool first if you’re just getting started in AI art.

Mononoke Hime by Jenxi

I hope this overview gives a good introduction to Stable Diffusion and demystifies AI image generation. Share this article the next time someone asks, “What is Stable Diffusion?”

I’ll be sharing guides on how I use Stable Diffusion to generate art and answer some of the common questions I get to help you get started in computer vision. Pateron members get a peek at the behind-the-scenes of how certain pieces are made.

Ready to stay updated with the latest developments in Stable Diffusion, AI image generation, and explore exciting computer vision techniques? Subscribe to my newsletter where I share my journey and learnings.

Remember, with Stable Diffusion, the canvas becomes your playground, and the possibilities are limitless!

How I got started in generative AI art

Jenxi Seow — Fri, 15 Sep 2023 09:39:32 +0000

If you follow my Instagram and Facebook accounts, you’d have seen my generative AI art pieces.

Many people have been asking about how I create my AI art, so I decide to share about my journey in AI art, and what I learnt along the way.

How it all started, and stopped
ChatGPT, Midjourney, Stable Diffusion
AI Controversies
- Abuse and misuse
- Plagiarism
Model training
- Ethically-trained models
- Train your own style
My generative AI art

How it all started, and stopped

My dad is an oil painter, and I grew up exposed to art at a young age. I started drawing at two and I’ve always had an interest in creating art. I generated my first computational graphics artwork using Apophysis and Ultra Fractal back in 2004, before the current crop of generative AI tools came into existence.

I first dabbled in generative AI art in June 2022 when I tried out NovelAI. As someone who dabbles in creative writing and text role playing, I was intrigued by a text-generation platform that allowed AI-assisted storytelling. However, I thought it was just a novelty and was turned off by the subscription cost and lost interest in it.

Despite that, NovelAI was still on my radar and when it launched the text-to-image generation feature in October 2022, I gave it a try. To have a better idea of the development of AI image generation, I also experimented with OpenAI’s DALL-E 2 that had just gotten rid of its waitlist, the Midjourney beta that had just launched, and NightCafe that ran Stability AI’s Stable Diffusion.

Back then, I had no idea what prompt engineering was and the images I generated were so horrible that I didn’t save a copy of them. I wish I did so I could show them here. If you saw those outputs, you would understand why I concluded that AI image generation was not ready for the mainstream.

ChatGPT, Midjourney, Stable Diffusion

Then, OpenAI launched ChatGPT at the end of November 2022, and it blew up in December and took the world by storm in January.

ChatGPT

Being a tech geek working on content creation, it was inevitable that I jumped on the ChatGPT bandwagon early on. I won’t go into much details on ChatGPT since it’s a separate topic. In short, besides using ChatGPT to help generate content, I was using it to brainstorm ideas, structure strategies and plans, and even wrote a couple of WordPress plugins.

All these were done through giving the right instructions through prompt engineering, the art of structuring instructions to get the generative AI model to perform tasks as intended. It was frustrating initially, having to fight ChatGPT to get the desired outcome, but very rewarding once you get the hang of it.

It was like having an AI assistant you could rely on, when it didn’t hallucinate.

Midjourney

While ChatGPT was the most talked-about thing in December 2022, becoming the fastest-growing consumer software in history by gaining over 100 million users within a month, another software was also taking the creative industry by storm – Midjourney.

Visual artists and content creators were creating artwork with Midjourney. Images flooded all my social media feeds. People were gushing over what Midjourney is able to generate. On the other end of the spectrum, people were also protesting as loudly about the ethical issues, which I’ll briefly touch on in a bit.

I gave Midjourney another go, tapping on my prompt engineering abilities. This gave results that were a lot better than what I generated half a year ago.

Stable Diffusion

This rekindled my interest in generative AI art. I went around trying the different cloud platforms before I decided to give running a Stable Diffusion instance a go. First, I tried the Stable Diffusion macOS apps Draw Things and DiffusionBee, but I found them lacking in a lot of ways, especially after I studied what was possible with Stable Diffusion.

I managed to install Stable Diffusion Web UI on my M1 Max Macbook, and that started me down my generative AI art journey as I discovered tricks to constantly improve my image generation output. However, it isn’t optimised for the Mac and produced very slow generation. I get generation speeds of around 20 seconds per iteration for a simple 512×512 image using the Euler sampler. Compare this to around 5 iterations per second on my PC.

There’s the Stable Diffusion optimisation for Core ML on macOS to leverage on the Apple Neural Engine, but it doesn’t perform as well as a PC with a decent Nvidia GPU. It also required converting the Stable Diffusion models from PyTorch to Core ML. That is quite a pain especially when I have over a thousand models.

To speed up my generation workflow, I built a cheap PC to run Stable Diffusion Web UI. There are many of these on Taobao targeting the AIGC (AI Generative Content) market. AIGC is huge in China and continues to grow rapidly. I run the Web UI on my local network so I can work on it from a browser on my Macbook.

First and second pass compared.

There are many techniques that I learnt to help me improve my generative AI art. With my prior knowledge in photography, Photoshop image manipulation, art direction, and Python, I was able to grasp the nuances of Stable Diffusion quickly and find ways to hack my workflows.

I plan to share as I grow and develop a Learn In Public series. If you’re interested to follow my journey, sign up for my newsletter!

Many people have also asked about workshops and lessons. I hear you and watch this space to be the first to know if that happens.

AI Controversies

There are many concerns over generative AI images. The two major ones I come across most often are training of AI models without artists’ consent, and the potential for abuse and misuse.

I held off from diving into creating generative AI art because I wanted to learn more about the ethical issues. As always, I have strong opinions, weakly held. My views are constantly changing as I gain more knowledge to make a better judgement.

Abuse and misuse

The latter is the lowering of the bar to abuse by bad actors. This includes creation of graphic and sensitive content, and spreading misinformation through fake images.

This is not something new that generative AI introduced. Photo manipulation has been around since the early 19th century, and deepfakes have been around for a few years. Yet, little has been done to deal with such deception and hoax.

Generative AI makes it a lot easier to produce a convincing fake image. And you can batch produce images at scale. Something needs to be done to make viewers know that the image is AI-generated and not real. While I don’t think there is any way to stop criminals from creating harmful content, there should be at least some form of safeguard.

Plagiarism

The other major source of outrage from those who oppose AI image generation is the unauthorised use of artists’ work to train the AI models. As an artist myself, I stand against any attempt to infringe upon the copyright of a creator.

However, once I understood how the diffusion models work, I realised that the choice to plagiarise artwork lies in the hands of the person creating the image, not the AI model.

Can you choose to take an exact copy of a photo that another photographer took? You can, but it makes you look bad. Can you imitate your favourite photographer’s style in the process of finding your own style? You can, and many amateur photographers do that while we attempt to discover our own style. Likewise for painting.

Just because these occur doesn’t mean that we should have an outright ban or boycott of AI image generation. People take images with their smartphones. And then there are those who engage in illegal photography such as up-skirt and other nonconsensual images. Should they be allowed to take such photos? No. Do these warrant a ban on smartphone cameras? No, because it won’t solve the problem and a ban would deprive people of the ability to take legit photos.

Model training

If your concern is infringement of copyright, I’m sure you won’t create an image that replicates another artist’s style even if the model allows it. You can take this one step further by choosing to use models that are trained ethically, meaning they use a training set sourced from images that consent to being used for training AI or from images in the public domain.

Ethically-trained models

A good model creator documents how they trained their models. This includes how they source their training data or the models used to create merged models. By keeping this transparent, it allows others who iterate on these models to train or fine-tune new models to make an informed decision.

Of course, there will be those who choose to train their models using unethical or even illegal training data sources. I think that these will remain as prevalent as the piracy of software, films, and books.

Train your own style

Instead of viewing AI as a threat, I believe it is important to learn how to use it properly to empower yourself.

While the debate over the ethical issues of generative AI art continues, some artists have already jumped on the technical advantages of generative AI and started training models based on their own photography or art style. By doing so, they are then able to generate images with their signature style using AI and experiment with concepts.

My generative AI art

To see more of my AI art, check out the overview page and follow me on the various platforms.

Check out the Mai Shiranui series.

Jenxi Seow

How to use Stable Diffusion (Part 1)

Contents

Install Stable Diffusion Web UI

Download a Stable Diffusion checkpoint

Generate image with txt2img

Prompt structure

Prompt length

Token weightage

Repeating tags

Moving tokens

Adjusting weights

Negative prompt

Negative embeddings

Next steps

10+ Best Stable Diffusion checkpoints (SD 1.5)

Contents

The best Stable Diffusion checkpoints ranked

Top 10 Stable Diffusion checkpoints

1. Realistic Vision

2. ChilloutMix

3. DreamShaper

4. MajicMix Realistic

5. Uber Realistic Porn Merge (URPM)

6. epiCRealism

7. ReV Animated

8. Perfect World

9. MeinaMix

10. Beautiful Realistic Asians

11. CyberRealistic (Bonus)

12. Counterfeit (Bonus)

Round-up

Free Notion resource

Asuka Langley Soryu AI art

Creation

Tools used

Downloads

Asuka Langley Soryu hello there

Asuka Langley Soryu dash

Chun-Li AI art

Contents

Creation

Tools used

Downloads

Chun-Li challenges you

What is a Stable Diffusion checkpoint

Contents

What are checkpoints

Types of checkpoints

Checkpoint types – trained knowledge

Trained checkpoints

Merged checkpoints

Checkpoint types – image output

Photorealistic checkpoints

Digital painting checkpoints

Render checkpoints

Anime checkpoints

Illustration checkpoints

General purpose checkpoints

Checkpoint types – realism

Realistic checkpoints

Semi-realistic checkpoints

2.8D checkpoints

2.5D checkpoints

2D checkpoints

Other types of categories

Choosing Stable Diffusion checkpoints

How to improve the performance of Stable Diffusion Web UI

Contents

Cross-attention optimisation

Setting cross-attention optimisation

Doggettx

xFormers

Scaled-Dot-Product (sdp) Attention

SDP Attention without Memory-Efficient Attention (SDP-no-mem)

Sub-Quadratic (sub-quad) Attention

Split-Attention v1

Invoke AI

Token merging

Setting token merging