Project 5: Stable Diffusion 1

glegrady · Post by **glegrady** » Sat Oct 21, 2023 10:35 am

Project 5: Stable Diffusion

Access to Stable Diffusion: http://vislab4.mat.ucsb.edu:7860/

For this assignment, present a minimum of 5 images realized in Stable Diffusion proviving all metadata as shown by Weihao, and giving an analysis of each image's results. FInal resolution to aim for: 1920 x 1080 (HD), but for testing you can use 960 x 540 pixels and then upscale the selected images for final presentation.

Weihao Instructions at https://safe-beryl-797.notion.site/Stab ... 413af23cef

# Stable Diffusion

**Use Stable Diffusion with WebUI**: http://vislab4.mat.ucsb.edu:7860/

### Steps to generate a image

1. Select model (default is v1-5….)
2. Type text prompt & negative text prompt
3. Setting Parameters
4. Download image
- Click the button
- Wait for the “number changed” before download
- Download the image

[img-to-img translation](https://www.notion.so/img-to-img-transl ... 485?pvs=21)

[The effects of different samplers](https://www.notion.so/The-effects-of-di ... ff5?pvs=21)

[Instructions on using the lab pc](https://www.notion.so/Instructions-on-u ... f14?pvs=21)

---

### Parameter Guidance:

**Tex-to-image :**

- Steps: 20 - 50
- Width & Height:
- for SD 1.5: Optimal at 512, 720 is fine,, repetition appears after
- for SDXL 1.0: Optimal at 1024,
- Sampling Method: not sure what’s the difference but they add different flavor to the images
- CFG Scale: around 7.0, up to 14 ; lower → more abstract, broken images | higher → more “finished” images

**Img-to-img:**

- Denoising Strength: define how far the result can be changed from the original image. Also affecting the total producing time because the actual steps it takes to generate the image is the setting steps times the denoising ratio. For example, if you set 50 steps and pair with a denoising ration of 0.7, then the actual steps of producing the image is 50x0.7 = 35 steps.

**Scripts:**

X/Y/Z plot: draw a matrix of different

![Screenshot 2023-10-31 at 12.35.05 PM.png](https://prod-files-secure.s3.us-west-2. ... .05_PM.png)

![00002-2383148329.png](https://prod-files-secure.s3.us-west-2. ... 148329.png)

### **Term:**

**Textual Inversion** → a fine tuning method. Linking a particular style/subject into a keyword (similar to Dreambooth but shallower). Images are changed only when the keyword appears.

**Hypernetworks** → a fine tuning method, that change the overall flavor of all produced images. ( Adding a network before the cross-attention module of Stable Diffusion)

**Lora** → a fine tuning method (most common one). Change the overall flavor of the produced images. (The layers are embedded in the cross-attention module of Stable Diffusion)

### Advanced (require access to the lab PC):

- [ControlNet](https://github.com/lllyasviel/ControlNet)
- Adding new LoRA or using different model: https://civitai.com/
- [ComfyUI](https://github.com/comfyanonymous/ComfyUI)

autumnsmith · Post by **autumnsmith** » Thu Nov 02, 2023 10:23 am

Through stable diffusion, I used similar concepts as the ones explored within Midjourney. My focus for this project was to see how these images contrasted with Midjourney’s programming, to understand Stable Diffusion’s limitations, and to test various visual inputs to assess how close I could get the imagery to mimic some of my other works or ideas. The projects below are broken down into three sections of sets. Each set follows a different concept and comprises three images: The first iteration, the midpoint, and the final stage where I felt that the program had taken me as far as I could go within that prompt.

Image/Set #1
The first set of images is a scene in Paris. This was a detailed exploration to see how specifically I could create a scene within a prompt. With this, however, my prompts ran into several issues, including human forms glitching and becoming uncanny. Additionally, this process served as studies to analyze the stepping process and control variations with the program. With this prompt, I found it difficult to image the city and would often hit a glitching “wall” if you will that almost appears like a 0.5-shot panorama image.

00029-1.png
Rainy day in Paris, man standing on corner holding a violin and a bottle of wine, with a dalmatian by his side
Steps: 34, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0, 20.0", Version: v1.6.0

00048-5.png
Rainy day in Paris, man standing on corner holding a violin and a bottle of wine, with a dalmatian by his side, at night outside of a well lit club with flashing lights. Hyperrealistic, close up scene, background blurring out. Foreground and background cohesively integrated into the same style.  Negative prompt: frontal view of humans Steps: 48, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 5, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "5, 10, 100 ", Fixed X Values: "5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0

00061-3012681544.png
Rainy day in french countryside, man standing on corner holding a violin and a bottle of wine, with a dalmatian by his side, at night outside of a well lit club with flashing lights. Hyperrealistic, close up scene, background blurring out. Foreground and background cohesively integrated into the same style. Cool colored lighting only

Negative prompt: frontal view of humans, no grey, red, warm colors, silhouettes
Steps: 5, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 3012681544, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "5, 10", Y Type: CFG Scale, Y Values: "3.0, 120", Version: v1.6.0

Image/Set #2
In previous weeks, my focus has included a still-life mock-up based on my painting and sculpture work. This has been also translated into blender files and other installation works. The exploration of this prompt and negatives was to see if I could get a composition with the right colors and forms as what I was requesting. How close to my paintings, sculptures, or blender files could I get without direct visual image inputs? I continued using the stepping method, as a way to see where the most interesting or successful combinations of commands lay. One of the outputs that I found during this process, was that a sort of fracturing process began that I had a hard time fighting. To fight this, I tried many variations of negative prompts and removing steps - but none were successful. In general, however, I found this to be much easier to control and manipulate as a program with significantly fewer visual stylizations (as Midjourney had).

00093-483270006.png
3D blender file of four blue spheres and eight torus' in a 3D room

Negative prompt: frontal view of humans, grey
Steps: 5, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 483270006, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "5, 10", Y Type: CFG Scale, Y Values: "3.0, 120", Version: v1.6.0

00097-376395918.png
3D blender file of four blue spheres and eight warm toned torus' in a 3D room, reflective light, spheres recedeing into space at different levels, the torus' varying in size and distance, dramatic spotlights, sharp, chunky graphics, smooth surfaced objects  Negative prompt: humans, grey, discoball, fracturing, white Steps: 5, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 376395918, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "5, 10, 20", Y Type: CFG Scale, Y Values: "1.0, 3.0, 15.0, 120", Version: v1.6.0

00102-2846200529.png
3D realistic acrylic painting on canvas of four blue spheres and eight toned torus' in red, yellow, orange floating. Reflective light, spheres recedeing into space at different levels, the torus' varying in size and distance, dramatic spotlights, sharp, smooth surfaced objects, high saturation, high contrast lighting, chiaroscuro, all unique objects, hyperrealistic, hue only, simple. Lower shot looking up at the objects  Negative prompt: grey, discoball, fracturing, white, greyscale, neutral tones, purple, green, corners, ground Steps: 5, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 2846200529, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "5, 10, 20, 50", Y Type: CFG Scale, Y Values: "1.0, 3.0, 15.0, 120", Version: v1.6.0

Image/Set #3
Set #3 I found to be extremely exciting directionally. This was because, with minimal direction as a base or starting point, it could get significantly closer to what I was hoping for. For this case, a cartoon comic strip with a scene unfolding. This was something that Midjourney not only really struggled with but was never able to get as close visually. Within only a few prompt adjustments, I felt much closer than I had previously to creating a visual product with the help of AI. With this in mind, however, the final images are still very far from perfect or exact in the way I had wanted. This is apparent through the glitching out and uncanny nature of the figures represented in the scenes. There appears to be a figure who is pants less and merging into a dog, the balloon storyline is evaded entirely, and the thought bubbles or text is unreadable. It is nonsensical at best. The letters even flow between abstractions of what appear to be various languages or letters - all equally as vague. This further leads me to question the database of images and if StableDiffusion is pulling images from other more internationally focused cartoons - which is why it’s leading to the confirmation of an unclear text.

00105-1157974087.png
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors, story unfolding through multiple scenes about a six year old girl who has a dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint  Negative prompt: abstract lines Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1157974087, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "10, 20, 50", Y Type: CFG Scale, Y Values: "1.0, 3.0, 7.0, 15.0", Version: v1.6.0

00108-100943486.png
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters  Negative prompt: abstract lines, no adults Steps: 50, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 100943486, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: 50, Y Type: CFG Scale, Y Values: "3.0, 7.0, 15.0", Version: v1.6.0

00114-4087678422.png (23 and 25)
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters  Negative prompt: abstract lines, no adults Steps: 50, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 4087678425, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

colindunne · Post by **colindunne** » Thu Nov 02, 2023 11:26 am

Hairy Tea Party
v1

grizzly bear, tea party, human little girl, sitting at table, pink
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 2499024078, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

v2

grizzly bear, tea party, human little girl, sitting at table, pink
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 3821478148, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

v3

grizzly bear, tea party, human little girl, sitting at table, pink
Steps: 20, Sampler: DPM++ 2S a, CFG scale: 7, Seed: 3821478148, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

v4

grizzly bear, tea party, human little girl, sitting at table, pink
Steps: 20, Sampler: DPM++ 2M SDE Heun Exponential, CFG scale: 7, Seed: 3821478148, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

v5

grizzly bear, tea party, human little girl, sitting at table, pink
Steps: 20, Sampler: DPM++ 2M SDE Heun Exponential, CFG scale: 7, Seed: 3821478148, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 1.45, Hires upscaler: Latent (antialiased), Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.8, Version: v1.6.0

US soldier(s) with Nerf guns
v1

us soldier with toy nerf gun, close
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1084451030, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v2

us soldier with toy nerf gun, close
Steps: 20, Sampler: PLMS, CFG scale: 7, Seed: 1084451030, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v3

us soldier with toy nerf gun, close
Steps: 20, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 1084451030, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v4

us soldier with toy nerf gun, close
Steps: 20, Sampler: DPM++ 2M SDE Exponential, CFG scale: 7, Seed: 1084451030, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

Paper airplane traffic
v1

paper airplane traffic, sky
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 965382064, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v2

paper airplane traffic, sky, traffic, intersection
Negative prompt: cars, street, ground
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 725240586, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v3

paper airplane traffic, sky, traffic, intersection, blue, clouds
Negative prompt: cars, street, ground
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1430695884, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

Smiley
v1

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2576662448, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v2

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: DPM2 a, CFG scale: 7, Seed: 2431356438, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v3

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: LMS Karras, CFG scale: 7, Seed: 3216706954, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v4

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 4244067560, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v5

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 657201603, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

v6

one smiley face, yellow, black outline, cartoon, center
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 3867064235, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

Reflection
For this project I explored multiple aspects of stable diffusion. However, I continuously found myself drawn to primarily experimenting with the different sampling methods. Beginning with the DPM++ 2M Karras sampler, the process I used was creating a prompt and adjusting the parameters and text repeatedly until I got a result somewhat akin to what I was looking for. The sampler parameter appealed to me because with the same prompt and seed, it would produce very similar-looking images that could be a more refined version of the original. In the same manner, it would also at times produce completely different image results with complete changes in style, composition, and elements. The images seen above are displayed with their parameters to show this and the patterns that may emerge. An example I noticed was it appeared like some samplers produced better results of people where they aren't broken or have too many limbs. I look forward to exploring this further and producing more extensive documentation of this.

Other things I noticed when exploring stable diffusion were related to Weihao’s point of the AI’s inclination to repeat elements when it is overloaded or struggling. This was extremely recurrent and the main thing I struggled with when trying to get results that weren’t mashed up and incomprehensible. At the same time, though it took longer and felt more difficult with the number of modifiable parameters I still was given the impression and feeling that I had more control over stable diffusion in comparison to Midjourney.

luischavezcarrillo · Post by **luischavezcarrillo** » Thu Nov 02, 2023 11:47 am

Prompt 1:
comic style, dragon, cartoony, volcano background, professional camera, anatomically accurate
Negative: ugly, blurry, low quality, distorted

Overall the samples here were very far from a traditional comic style, despite it being the first filter. It may be possible that the cartoony filter overrode the original filter. The rest of the filters, except anatomically accurate, or maybe professional camera since it was a cartoony style, were applied correctly. A dragon was shown, we know its a dragon despite it being warped. The images generated from seed 1, seem to have originated from not a dragon picture, but a volcano picture given the CFG 10's lack of dragon and pure volcano, and the CFG 5 and 3 samples used the volcano filter, as a literal picture, rather than a background.

Prompt 2:
outrun aesthetic, sunset, minimalist, speeding car, anime style
Negative: ugly, blurry, low quality, distorted

I made an assumption here, that perhaps stable diffusion struggles a lot more with organic shapes, than MidJourney, so I shifted my focus to inanimate objects. It appears that for seed 1, the minimalist aesthetic combined with the anime style (which would lead the bot to Japanese styles) and the sunset filters, to create some images that are reminiscent of old Japanese paintings. The outrun aesthetic presented itself in more color than anything else, barring a few exceptions where it was a lot more apparent, and the speeding of the cars likely blended with anime style to utilize speed lines, a commonly used technique in anime, to demonstrate speed. Overall, only seed 10's cars were anywhere near acceptable in shape, but they were a lot closer. I should have made the car the first filter, as that was supposed to be the focus, but instead I chose to move on to series 3.

Prompt 3:
1973 Dodge Charger, black and white, realistic, professional camera
Negative: ugly, blurry, low quality, distorted, humans, creatures

This series basically confirmed my suspicion that organic things are difficult for the bot to render. Even at CFG 1, the bot made the cars very recognizable, even if they are still distorted to a degree. CFG Scale 3 was almost indistinguishable to a normal car if not for very minor details. It may also be likely that the older car model, being more blocky, was easier to render properly than a newer, more curvy car.

Prompt 4:
helicopter, comic book style, fiery skies, guns blazing
Negative: ugly, blurry, low quality, distorted

\

The bot appears to have interpreted the query as a war scene, likely due to the fiery skies and guns blazing filters. Seed 3 seems to have been based on a fire fighting helicopter, yet despite that, as the CFG was lowered, all seeds showed more and more fighting scenes and destructive scenes. The additional render below the main render, shows a very close image series to what I wanted originally. Even at CFG 10 it showed a battle-esque scene. This does make me wonder if using a seed modifier bases the entire image on one base image, rather than randomly generating it.

Prompt 5:
minecraft world, cyberpunk aesthetic, blocky, night time, battle scene
Negative: ugly, blurry, low quality, distorted

I've been playing Minecraft lately, so I figured I'd see how the bot does with it as a prompt. It being very blocky may be a simple render, but the results were rather interesting. The blocky aesthetic of Minecraft was overall preserved until the CFG was low enough for chaos to ensue. The cyberpunk element only showed in seed 23 in the first attachment as color, other images didn't have that feel. They did look worn out from battle, but night time was never really depicted. The second attachment was to see how different lower levels of CFG would perform, as that's when the bot seems to be more free to create. The cyberpunk aesthetic became a lot more apparent, with color and buildings to support it, and the bot even went as far as to make an image that is very similar to a convenience store. However, it's also very easy for the bot to stray off the query's parameters when at such low CFG levels. I believe for any images intended for purposes other than experimentation and testing the bot's capablities, such as display for art shows etc, CFG scales 2 through 5 at the user's discretion.

Side Note: The bot repeated the exact same image with CFG scale 3 and seed 23 across both different sample sets.

Prompt 6:
cyberpunk aesthetic, new york city, professional camera, purple and neon highlights
Negative: ugly, blurry, low quality, distorted

To see if the cyberpunk aesthetic could be brought out more by the bot, I ran an extra series and specified more purple and neon coloring. After the Minecraft series, and this one, it's safe to say that CFG 0.5 may be too low for anything other than abstract art. The rest of the images, were up to satisfaction. I might have been better off using a different large city to combine cyberpunk with, as New York is already known for its massive screens and lit up buildings. Still, the results are satisfactory.

gracefeng · Post by **gracefeng** » Sun Nov 05, 2023 6:12 pm

Prompt 1: Fisheye lens, aliens, band poster, shoegaze, dream pop, strobe lights, fuzzy lighting, soft edges, vintage camera
Steps: 34, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0, 20.0", Version: v1.6.0

Prompt 2: Fisheye lens, aliens, fleshy, non-human, dreamy, android camera, aquatic, strobe lights, fuzzy lighting, soft; no humans
Steps: 34, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0, 20.0", Version: v1.6.0

Prompt 3: Fisheye lens, underwater alien lair, aliens, science fiction, high-tech, guillermo del toro, dreamy, aquatic monsters
Steps: 43, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0

Prompt 4: Fisheye lens, underwater alien lab, aliens, fantasy, science fiction, minimalist, guillermo del toro, dreamy, aquatic creatures
Steps: 43, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0

Prompt 5: underwater cave with glowing eyes, deep sea eyeless creatures, found footage, grainy quality, dark, red backlight
Steps: 43, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0

Prompt 6: deep sea eyeless creature, film camera
Steps: 43, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "1, 2, 5, 10, 100 ", Fixed X Values: "1, 2, 5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0

pratyush · Post by **pratyush** » Mon Nov 06, 2023 5:56 pm

My sincerest apologies for the delay in submitting this. I struggled with attaching all the images directly through the attachment option below, due to enormous file sizes. Towards the end, I had to resort to uploading images via our discord server.

-----

For this week’s project, I tried out Stable Diffusion v1.6 for the first time. There are significant differences between the functional processes of MIdjourney and Stable Diffusion, which is why the switch was a bit unexpected and the learning curve was quite steep. As a matter of fact, I am still getting used to all the various manipulable parameters in Stable Diffusion.

To begin with, Stable Diffusion, as this exercise would reveal eventually, is more consistent with its results than Midjourney is. This is because Midjourney is a phase within Generative Adversarial Networks (GANs) training that represents a balance between the generator and discriminator, while Stable Diffusion is a training technique applied to ensure stable training, particularly during the later stages, by introducing a controlled noise diffusion process. Generative Adversarial Networks, or GANs, are a type of AI system that use deep learning techniques, like neural networks, to create new data that looks like it came from a certain source. Think of it like an artist who can paint pictures that seem very similar to real ones. GANs have two main parts: a generator that tries to make fake data, and a discriminator that tries to tell if the data is real or fake. These two parts play a game where they try to outsmart each other. The generator keeps getting better at making fake data, while the discriminator gets better at telling real from fake. Eventually, the generator becomes so good that it can make data that's almost indistinguishable from the real thing. Since Midjourney is an early stage in the process of AI training, where it is still in its learning phase, somewhere down the line in the process of image generation, the AI in Midjourney seems to intervene with human expectations and produces images according that sometimes stray very far from the given prompt. Whereas, Stable Diffusion is a training technique introduced to stabilize the training of GANs, particularly during the later stages of training when the model approaches convergence. It helps mitigate issues like mode collapse and instability, which are common challenges in GAN training. It is the special tool that helps the AI make better and more stable drawings. But to attain stability and consistency one must make sure the parameters used to generate those images are precisely measured. It required a lot of trial and error (as well as watching Youtube tutorials) to attain precision in the results.

My aim, was to play with the Noise patterns and use them as aesthetic parameters for the images produced. I wanted to see how the diffusion process regenerates the noise into fully formed image signals that distinguishable from total noise, while still keeping some aspects of the noise intact. The idea was to see if noise could still be visibly present in the images, perhaps as brush stroke patterns and yet does not appear unpleasant. As we all know, diffusion is a crucial stage in the development of text-to-image generators. In this procedure, incremental doses of Gaussian noise is introduced (often referred to as "random" visual noise) to an image. Simultaneously, the AI undergoes training with each iteration of the image, which becomes progressively more distorted or "noisy." This process is then reversed and the AI is tasked with the challenge of generating an image that bears a visual resemblance to the original training image, starting from entirely random pixels. In Stable Diffusion v1.6, the parameters help bring stability to this diffusion step is applied to produce consistent image outputs that are in line with the natural language prompts .

On week 2 of our course, I had used this prompt on Midjourney:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic --ar 16:9 --c 25 --style raw --s 250 - @MAT 255 (fast)

In order to to create the following images:

I tried the same prompt on Stable diffusion, without mentioning the aspect ratio, chaos, style and seed factors in the prompt, as they Stable Diffusion allows the user to manipulate these parameters separately below the text prompt box. Even though there is a designated area for putting in negative prompts, for my first trial however, I refrained from using any. This is what I had put in into the SDXL text prompt box:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic

SDXL Prompt 1:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 60, Sampler: LMS Karras, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.6, Version: v1.6.0

Image 1:

Upscaled:

Georges Seurat (1859-1891) was a French painter who played a significant role in the development of a painting technique known as "pointillism." Pointillism involves creating pictures using tiny dots of colour placed closely together, which, when viewed from a distance, blend to form a complete and detailed image. As we see here, my idea to use the noise patterns as the Pointillist brush strokes made famous by the Seurat. But in this first attempt, the noise patterns seemed to have turned out more prominent than I had expected them to be. As a result, the pointillist strokes seems overdone. However, it does end up rendering an overall dreamlike atmosphere to the picture. The difference between these images and the ones created on MIdjourney are in their closeness to Seurat’s Pointillist brush strokes or emulate Vincent Van Gogh's 'Stary NIghts' (1889) {for instance, see the V4 of the Midjourney Image posted above}. Even though, the images in Midjourney looked a bit more refined, they feel like photographs or digital drawings manipulated in Photoshop to achieve the pointillist effect. In some instances, it is confusing to ascertain whether the Midjourney images were really following 's Seurat’s style Whereas the SDXL images seem to have that enigmatic quality and iconic ambiguity, typical of a pointillist painting where instead of regular brushstrokes, pictures are created by putting tiny dots of paint closely knit and right next to each other. Objects and figures are less clearly comprehensible in pointillist works and one needs to interpret or apprehend it by observing closely. Even though, the rendition was a bit too course and required refinement, I was happy that it was going in the direction I had hope for. Furthermore, as with Midjourney's images, it remains uncertain whether the scene is underwater or if the human figures can be perceived as students. The sampling steps were set at 60, with the Sampler enabled at Checkpoint: v1-5-pruned-emaonly.safetensors [6ce0161689]and I set it to be Switched at: 0.6

Below is a screenshot of my SDXL parameters:

SDXL Prompt 2:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 60, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.6, Version: v1.6.0

Image 2:

Upscaled:

Sampling steps and check point were kept intact, but changed sampler to DPM++ 2M Karras. The outcomes, quite strangely, closely mirrored the results from prompt 1. This could be the result of using the same seed factor.

I have attached the screenshot of these parameters below:

SDXL Prompt 3:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 100, Sampler: LMS, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires steps: 2, Hires upscaler: Latent, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.5, Version: v1.6.0

image 3:

Upscaled:

I increased the sampling steps to 100, Sampling method to LMS, refiner to switch at: 0.5 An Upscaler was added. I used LMS Hires and the steps were increased to 2; the Denoising strength was decreased to 0.5
The resulting Image was completely scrambled and was over 5 times the size of the last one. I do not know if this could be qualified as “total noise” but I definitely look forward to hear how the rest of the class interprets this anomaly. Below is the screenshot again, of the parameters used:

SDXL prompt 4:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 95, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Hires upscale: 2, Hires steps: 2, Hires upscaler: LDSR, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.5, Version: v1.6.0

image 4:

Upscaled:

Sampling steps were decreased to 95, Sampling method changed to DPM++ 2S a Karras, refiner was set to switch at: 0.5. I used the LDSR Upscaler this time and the Hires steps were kept at 2; Denoising strength maintained at 0.5.

This resulted in a more refined image, where the characters are clearly visible and could be interpreted as “students”in some variations of the image. The Pointillist Style is consistent here and further refined. But one can still not tell if it is taking place underwater. Below is the screenshot of the parameters used:

SDXL prompt 5:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 110, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Hires upscale: 2, Hires steps: 2, Hires upscaler: LDSR, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.5, Version: v1.6.0

image 5:

Upscaled:

I only changed the ampling steps increased to 110, while everything else was kept unchanged from the last attempt . We can see a clear resemblance with the Pointillist style as images get more refined. V4 looks like a group of young women, potentially students, in a swimming pool (although still not underwater). Below is the screenshot of the SDXL parameters:

SDXL prompt 6:

A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic
Steps: 110, Sampler: Euler a, CFG scale: 7, Seed: 250, Size: 720x720, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Hires upscale: 2, Hires steps: 2, Hires upscaler: LDSR, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.5, Version: v1.6.0

image 6:

Upscaled:

Sampling method changed to Euler a. but every other parameters were left unchanged. The images look almost the same as the results I got from Prompt 5. Even though the Pointillist style remains intact, a noticeable degree of blur has been introduced. This blurring effect causes the dots or noise to appear somewhat hazy, and it imparts a smoother quality to the brush stroke contours.Below is the screenshot of my SDXL parameters:

Even though, I have yet to achieve results that more accurately follow the text prompts, I must say that in terms of consistency of results and evading randomness in output, SDXL is more superior than Midjourney. I noticed that the images from SDXL exhibit variations as the parameters are adjusted, not only in regard to their resolution and clarity but also concerning the content portrayed within them. This is evinced as some of the images started depicting younger characters being inside or close to swimming pools, for instance. What is more important here is that it seemed SDXL teats the human agency behind creative endeavours more seriously than Midjourney does. By allowing the human creators to manipulate almost every single aspect of the image creation process (not just the text prompt), it allows more control to be exerted over the machinic processes of Generative AI and surpasses the limitations of Midjourney. In MJ, the accuracy of the results rely mostly on the precision of the text prompt and there is no guarantee that the same text prompt with the same seed parameters will give consistent results every time it is used. Although, I have yet to try out the image to image tab of SDXL in depth, I am very much content to achieve more consistency over my results with SDXL’s text to image options.

While I haven't achieved results that precisely align with the text prompts, it's worth noting that when it comes to result consistency and reducing randomness in the output, SDXL surpasses Midjourney considerably. Notably, I've observed that SDXL-produced images exhibit variations as parameters are adjusted, affecting not only resolution and clarity but also the pictorial elements (object, figures) in the depicted/reproduced image content. For instance, due to the fact that both “students” and “underwater” were parts of my text prompt, some images started featuring younger characters near swimming pools. What's more significant here is that SDXL seems to value the human role in the creative process more seriously than Midjourney. By enabling human creators to control almost every aspect of image creation, not limited to just the text prompt, it offers the human agent to exert greater control over the generative AI processes, overcoming Midjourney's limitations. In Midjourney, result accuracy largely hinges on the precision of the text prompt, without a guarantee of consistent results when using the same text prompt with the same seed parameters each time. The CFG scale parameter in SDXL allows for more accuracy in image results by making sure that the AI follows the text prompt as closely as possible (the higher the number on the scale, the closer it will be). But since I got very consistent results that follow the prompt closely enough with the CFG value kept at 7 through and through, I did not intend to change it at all this time. However, with more complex prompts in the near future, I definitely intend to experiment with it thoroughly. Although I haven't extensively explored SDXL's image-to-image capabilities, but I am genuinely pleased with the increased consistency I've achieved using SDXL's text-to-image options and look forward to experiment further with it.

bsierra · Post by **bsierra** » Tue Nov 07, 2023 1:46 am

Series 1

grassy hills, blue cloudy sky, bright direct flash digital camera photography, fisheye perspective, model in foreground wearing rick owens balenciaga, motion blur, clipping glitch contortion, lsd dream emulator

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 820435165, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

grassy hills, blue cloudy sky, bright direct flash digital camera photography, fisheye perspective, model in foreground wearing rick owens balenciaga, motion blur, clipping glitch contortion, lsd dream emulator

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2718471316, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

grassy hills, blue cloudy sky, bright direct flash digital camera photography, fisheye perspective, model in foreground wearing rick owens balenciaga, motion blur, clipping glitch contortion, lsd dream emulator

Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3932953839, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

I used the same prompt from my first Midjourney series to create these images now using Stable Diffusion. What I like about Stable Diffusion is the amount of chaos that can be created in the images you generate. I feel as though Stable Diffusion has fewer boundaries when it comes to contextualizing prompts, and the AI will attempt to amalgamate all of the text you throw at it. In doing so you can get crazy results which I think lends itself well to the style of images I like to create. These images are a lot more coherent than the other series', as the backgrounds in this specific series are the focus of the images. As the backgrounds become simpler in my subsequent series', the subjects become the main interest. With this series, I'm trying to capture a liminal space as well as a vaporware aesthetic, which comes through in the composition of the background.

Series 2

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2519971638, Size: 920x680, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.8, Version: v1.6.0

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3183714495, Size: 920x680, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Version: v1.6.0

010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 837905424, Size: 920x680, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.75, Version: v1.6.0

Image sample

Img2Img is really fun to experiment with, both in Midjourney, and Stable Diffusion. I enjoy the mixing of multiple images in Midjourney, but I also enjoy the stylistic output of the text prompts in Stable Diffusion. With this series, I used a photograph I took during the Summer at an AirBnB in Lake Tahoe. With the photo I was attempting to capture a liminal space, however the images generated using Stable Diffusion created lots of chaos. I'm unsure why so many subjects were generated in the image, but I love the sense of crowding and movement from the "motion blur" prompt. It's also really interesting to see how the AI was turning the image into a new one, as the composition of the original can still be made out in the image generation. Changing the denoising setting was interesting as well, as I tried to keep the setting at a middle ground, so that the room was remained somewhat present in my generations. I also really love the aspect ratio of these images, I feel as though the 1:1 ratio makes the image feel a bit suffocated. Another notable aspect is the composition of the people in these images. I like to think that Stable Diffusion creates images that look and feel like a cut-and-paste collage. I felt that in this series that effect is definitely applied to the subjects.

Series 3

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3644795927, Size: 512x680, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.71, Version: v1.6.0

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized, empty room
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1920692051, Size: 936x680, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Version: v1.6.0

Image sample

Again I utilize Img2Img, using another photo I took during the Summer, and an art piece a friend of mine painted last year. These images are a lot less chaotic, however that stitched together, cut-and-paste look of the subjects still remains. I love the contrast of a simple, empty background, and the concentrated chaos of the subject. I feel like there is an interesting balance of digitization and the human form in the subjects, which is a relationship I wish to continue to explore.

Series 4

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized, empty room
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 15.5, Seed: 687475643, Size: 936x792, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Version: v1.6.0

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized, empty room, rasquachismo art, virgin mary
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 15.5, Seed: 216998799, Size: 936x792, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Version: v1.6.0

2010s tumblr bloghouse indie sleaze, 2013 instagram, intensely bright direct flash photography, model in foreground wearing rick owens balenciaga maison margiela, intense motion blur, fisheye perspective, crystal castles, colorized, empty room, rasquachismo art, virgin mary, angel saint, catholic
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 15.5, Seed: 3544260117, Size: 936x792, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.5, Version: v1.6.0

Image sample

Again another Img2Img series, using a photo of an old transistor from a previous class project. I enjoyed the 2D nature that the sample image was giving me, and wanted to explore that further mixing in the same prompts I had been using in the previous series'. I added prompts like "rasquachismo art" and "catholic" to try to capture a Mexican art form that utilizes readily available resources to create pieces of art. In addition to being religious objects, the pieces used in this type of art also represent Mexican culture in a way that illustrates resilience and resourcefulness. I feel I could have done a better job in visualizing this, and I will definitely keep trying to do so. "Rasquachismo art" is not a prompt Stable Diffusion can really replicate, so I'll have to create a workaround using other text prompts.

Media Arts and Technology

Project 5: Stable Diffusion 1

Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1

Re: Project 5: Stable Diffusion 1