Project 5: Stable Diffusion 1
Posted: Sat Oct 21, 2023 10:35 am
Project 5: Stable Diffusion
Access to Stable Diffusion: http://vislab4.mat.ucsb.edu:7860/
For this assignment, present a minimum of 5 images realized in Stable Diffusion proviving all metadata as shown by Weihao, and giving an analysis of each image's results. FInal resolution to aim for: 1920 x 1080 (HD), but for testing you can use 960 x 540 pixels and then upscale the selected images for final presentation.
Weihao Instructions at https://safe-beryl-797.notion.site/Stab ... 413af23cef
# Stable Diffusion
**Use Stable Diffusion with WebUI**: http://vislab4.mat.ucsb.edu:7860/
### Steps to generate a image
1. Select model (default is v1-5….)
2. Type text prompt & negative text prompt
3. Setting Parameters
4. Download image
- Click the button
- Wait for the “number changed” before download
- Download the image
[img-to-img translation](https://www.notion.so/img-to-img-transl ... 485?pvs=21)
[The effects of different samplers](https://www.notion.so/The-effects-of-di ... ff5?pvs=21)
[Instructions on using the lab pc](https://www.notion.so/Instructions-on-u ... f14?pvs=21)
---
### Parameter Guidance:
**Tex-to-image :**
- Steps: 20 - 50
- Width & Height:
- for SD 1.5: Optimal at 512, 720 is fine,, repetition appears after
- for SDXL 1.0: Optimal at 1024,
- Sampling Method: not sure what’s the difference but they add different flavor to the images
- CFG Scale: around 7.0, up to 14 ; lower → more abstract, broken images | higher → more “finished” images
**Img-to-img:**
- Denoising Strength: define how far the result can be changed from the original image. Also affecting the total producing time because the actual steps it takes to generate the image is the setting steps times the denoising ratio. For example, if you set 50 steps and pair with a denoising ration of 0.7, then the actual steps of producing the image is 50x0.7 = 35 steps.
**Scripts:**
X/Y/Z plot: draw a matrix of different


### **Term:**
**Textual Inversion** → a fine tuning method. Linking a particular style/subject into a keyword (similar to Dreambooth but shallower). Images are changed only when the keyword appears.
**Hypernetworks** → a fine tuning method, that change the overall flavor of all produced images. ( Adding a network before the cross-attention module of Stable Diffusion)
**Lora** → a fine tuning method (most common one). Change the overall flavor of the produced images. (The layers are embedded in the cross-attention module of Stable Diffusion)
### Advanced (require access to the lab PC):
- [ControlNet](https://github.com/lllyasviel/ControlNet)
- Adding new LoRA or using different model: https://civitai.com/
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)
Access to Stable Diffusion: http://vislab4.mat.ucsb.edu:7860/
For this assignment, present a minimum of 5 images realized in Stable Diffusion proviving all metadata as shown by Weihao, and giving an analysis of each image's results. FInal resolution to aim for: 1920 x 1080 (HD), but for testing you can use 960 x 540 pixels and then upscale the selected images for final presentation.
Weihao Instructions at https://safe-beryl-797.notion.site/Stab ... 413af23cef
# Stable Diffusion
**Use Stable Diffusion with WebUI**: http://vislab4.mat.ucsb.edu:7860/
### Steps to generate a image
1. Select model (default is v1-5….)
2. Type text prompt & negative text prompt
3. Setting Parameters
4. Download image
- Click the button
- Wait for the “number changed” before download
- Download the image
[img-to-img translation](https://www.notion.so/img-to-img-transl ... 485?pvs=21)
[The effects of different samplers](https://www.notion.so/The-effects-of-di ... ff5?pvs=21)
[Instructions on using the lab pc](https://www.notion.so/Instructions-on-u ... f14?pvs=21)
---
### Parameter Guidance:
**Tex-to-image :**
- Steps: 20 - 50
- Width & Height:
- for SD 1.5: Optimal at 512, 720 is fine,, repetition appears after
- for SDXL 1.0: Optimal at 1024,
- Sampling Method: not sure what’s the difference but they add different flavor to the images
- CFG Scale: around 7.0, up to 14 ; lower → more abstract, broken images | higher → more “finished” images
**Img-to-img:**
- Denoising Strength: define how far the result can be changed from the original image. Also affecting the total producing time because the actual steps it takes to generate the image is the setting steps times the denoising ratio. For example, if you set 50 steps and pair with a denoising ration of 0.7, then the actual steps of producing the image is 50x0.7 = 35 steps.
**Scripts:**
X/Y/Z plot: draw a matrix of different


### **Term:**
**Textual Inversion** → a fine tuning method. Linking a particular style/subject into a keyword (similar to Dreambooth but shallower). Images are changed only when the keyword appears.
**Hypernetworks** → a fine tuning method, that change the overall flavor of all produced images. ( Adding a network before the cross-attention module of Stable Diffusion)
**Lora** → a fine tuning method (most common one). Change the overall flavor of the produced images. (The layers are embedded in the cross-attention module of Stable Diffusion)
### Advanced (require access to the lab PC):
- [ControlNet](https://github.com/lllyasviel/ControlNet)
- Adding new LoRA or using different model: https://civitai.com/
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)