Project 6: Stable Diffusion 2: Comparison

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

Project 6: Stable Diffusion 2: Comparison

Post by glegrady » Sat Oct 21, 2023 10:34 am

Project 6: Stable Diffusion 2: Comparison

Second project working with Stable Diffusion. At this time we want to identify similarieis and differences comparing results between MidJourney and Stable Diffusion using the same prompts. Ideally you will have 5 different comparisons to share. Your approach can be to use the same prompt or different prpompts but the important thing is to get results that will inform you (and us) as to how the two software perform.

Here is the introduction to read: https://stability.ai/blog/stable-diffusion-v2-release

Further reading:

https://github.com/Stability-AI/StableDiffusion
https://techcrunch.com/2023/10/27/a-gro ... ecting-ai/
George Legrady
legrady@mat.ucsb.edu

pratyush
Posts: 9
Joined: Wed Oct 04, 2023 9:27 am

Re: Project 6: Stable Diffusion 2: Comparison

Post by pratyush » Wed Nov 08, 2023 11:24 pm

For this week's assignment, I opted to conduct a comparative analysis between two prompts employed on both Midjourney and Stable Diffusion XL v.1.6. My focus was particularly on exploring the image-to-image option of SDXL, aiming to discern how the same parameter within the program compares to that of Midjourney. Despite maintaining the resolution at 960x540, as suggested in the previous week, the resulting image files remained unexpectedly large -- which I then had to compress down on Photoshop so that they could be uploaded here.

The primary objective for this week was to experiment with the "Scripts" option on SDXL, aiming to generate a series of matrices plotted along the X/Y/Z axes. The goal was to comprehensively explore and compare variations in the scales of specific parameters (such as sampling steps, seed, CFG scale, Denoiser, etc.) as the AI created images. The intention was to observe how these variations influenced the image output with each incremental or decremental change to the settings, employing different configurations on each occasion.
The series of six images presented below illustrate the X/Y/Z plotting of the same parameters for two distinct images and prompts. For images 1, 2, and 3, I utilized a screening poster or flyer that I designed for a recent documentary film as the image prompt. For the remaining three images, I employed a selfie as the prompt. Given Midjourney's focus on text prompts, the prompts used for that platform naturally varied, incorporating aspect ratios, chaos values, and seed values. The details of my study are outlined below.


Case Study 1:

Image Prompt:
Screenshot 2023-11-09 at 13.08.13.png
On Midjourney:

Prompt A:

https://s.mj.run/VAeQcfNrR_w as cinematic movie poster, use dramatic grunge style, propaganda, abstract, --ar 9:16 --c 10 --no change in text --style raw --s 250 - @MAT 255 (fast)

Image A:
Screenshot 2023-11-09 at 13.17.05.png
On SDXL:

Prompt 1:

as cinematic movie poster, use dramatic grunge style, propaganda, abstract Negative prompt: change in text
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 250, Size: 960x540, Model hash: 7440042bbd, Model: sd_ x_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdx_vae.safetensors, Denoising strength: 0.35, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type:
CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Denoising, Z Values: "0.35,0.75,0.80,0.95,1", Version: v1.6.0

Saved: 00178-250.png

Image 1:

A00186-250@0.33x.png
Prompt 2:

as cinematic movie poster, use dramatic grunge style, propaganda, abstract
Negative prompt: change in text
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 250, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type: CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Sampler, Z Values: "DPM++ 2M Karras,DPM++ 2M SDE Karras,Euler a,DPM++ 3M SDE Karras,LMS Karras", Version: v1.6.0

Saved: 00180-250.png


Image 2:

A00178-250@0.33x.png


Prompt 3:

as cinematic movie poster, use dramatic grunge style, propaganda, abstract
Negative prompt: change in text
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 3313367218, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type: CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Seed, Z Values: "-1,10,50,100,250,500", Fixed Z Values: "3313367218, 10, 50, 100, 250, 500", Version: v1.6.0

Saved: 00181-1428166616.png


Image 3:

A00181-1428166616@0.33x.png


Case Study 2:



Image Prompt:

Screenshot 2023-11-09 at 13.25.47.png


On Midjourney:



Prompt B:

https://s.mj.run/NR3HEpg14cw as Dali's clockwork --ar 16:9 --c 10 --s 250 --no clocks --no watches --style raw - @MAT 255 (fast)


Image B:
Screenshot 2023-11-09 at 13.22.26.png

On SDXL:

Prompt 4:



as Dali's clockwork
Negative prompt: clocks, watches
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 1655254073, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type: CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Seed, Z Values: "-1,10,50,100,250,500", Fixed Z Values: "1655254073, 10, 50, 100, 250, 500", Version: v1.6.0

Saved: 00183-1655254073.png


Image 4:

A00183-1655254073@0.33x.png

Prompt 5:

as Dali's clockwork
Negative prompt: clocks, watches
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 250, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.35, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type: CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Denoising, Z Values: "0.35,0.75,0.80,0.95,1", Version: v1.6.0

Saved: 00185-250.png


Image 5:


A00183-1655254073@0.33x.png


Prompt 6:

as Dali's clockwork
Negative prompt: clocks, watches
Steps: 10, Sampler: DPM++ 2M Karras, CFG scale: 2.0, Seed: 250, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Refiner: v1-5-pruned-emaonly [6ce0161689], Refiner switch at: 0.8, Script: X/Y/Z plot, X Type: Steps, X Values: "10,20,40,60,80", Y Type: CFG Scale, Y Values: "2.0,7.0,10.0,20.0,30.0", Z Type: Sampler, Z Values: "DPM++ 2M Karras,DPM++ 2M SDE Karras,Euler a,DPM++ 3M SDE Karras,LMS Karras", Version: v1.6.0

Saved: 00186-250.png


Image 6:


A00185-250@0.33x.png

Interestingly enough, even though it is notably more challenging to exert control over the image outcomes on Midjourney, my encounter with SDXL this time around was considerably more peculiar when comparing the performances of the two. In the case of Case Study 1, the results on Midjourney, while quite distant from my expectations, were not entirely unforeseeable. Conversely, on SDXL, the same image and text prompt exhibited consistency initially, adhering to Prompt 1, but eventually devolved into complete randomness as the parameters on the X/Y/Z plots underwent continuous changes.
Understandably, this deviation may be attributed to my use of the seed value as a variable type factor on the Z axis for two specific prompts (prompts 3 and 4). However, regardless of the prompt, randomness emerged as an issue that couldn't be entirely circumvented for SDXL. Only in the case of Prompt 1 did the AI maintain consistency in emulating the original image produced, although this wasn't precisely my goal, as I intended for the AI to enhance the image. Intriguingly, Prompt 5 mimicked my selfie almost exactly as it is, continually aging it with each step on the matrix.
For both of these prompts, the Z type was set to Denoising, and it appeared that at the lowest values on the Denoising scale (0.35), the emulation remained more or less consistent throughout the matrix. Randomness infiltrated the results with even the slightest increment on the Denoising scale (0.75). Surprisingly, it seemed that the Denoising value played a more decisive role in maintaining image consistency with the prompt than the CFG scale value. This observation appeared peculiar, as according to the article I consulted below, it is the CCFG scale that determines how closely the results align with the image or text prompt. Further experiments will be conducted, exploring both values in greater detail in my future tests.

Article on SDXL's CFG Scale:

https://decentralizedcreator.com/cfg-sc ... to-use-it/
Attachments
poster.png
A00180-250@0.33x.png
Last edited by pratyush on Thu Nov 09, 2023 1:28 pm, edited 2 times in total.

gracefeng
Posts: 8
Joined: Tue Oct 03, 2023 1:12 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by gracefeng » Thu Nov 09, 2023 11:37 am

Prompt: frutiger metro, collage, mixed media, maximalism, dreamscape, fuzzy camera, realism, digitalism, surrealism, randomness
img2img, Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7.0, Seed: -1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: None
Reference Image:
Pic2.png
Generated Image:
image (16).png
Prompt: frutiger metro, collage, mixed media, maximalism, dreamscape, fuzzy camera, realism, digitalism, surrealism, randomness
img2img, Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7.0, Seed: -1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: None
Reference Image:
Pic7.png
Generated Image:
image (17).png
Prompt: frutiger metro, mixed media, focus on the cat, clear composition, maximalism, dreamscape, fuzzy camera, realism, digitalism, surrealism
img2img, Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7.0, Seed: -1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: None
Reference Image:
Pic7.png
Generated Image:
image (18).png
Prompt: frutiger metro, keep original elements, focus, mixed media, maximalism, dreamscape, fuzzy camera, realism, digitalism, surrealism
img2img, Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7.0, Seed: -1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: None
Reference Image:
Pic2.png
Generated Image:
image (20).png
Prompt: frutiger metro, eyeball, mixed media, maximalism, dreamscape, fuzzy camera, realism, digitalism, surrealism
img2img, Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7.0, Seed: -1, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: None
Reference Image:
Pic2.png
Generated Image:
image (19).png

luischavezcarrillo
Posts: 8
Joined: Thu Oct 05, 2023 2:48 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by luischavezcarrillo » Thu Nov 09, 2023 12:49 pm

For each of the following series, the images are organized, and were generated from default settings (pure text prompt no modifications), to attempts for more different results, to finally, drastic attempts for different results. Each image is paired with an image from the other bot, example: Minimal user influence (default settings) images are put together, mild attempts at variation are put together, and drastic attempts at variation are put together for easy comparison of each bot's parameters can do.



Series 1/3

SD: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood"
Default SD settings with Batch Size 4 (used for all non XYZ plot images)
MJ: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood --style raw --s 250"
p6SD1.png
p6MJ1.png
SD: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood"
Refiner Checkpoints used (for all XYZ plots)
MJ: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood --style raw --s 250 --c 100"
p6SD2.png
p6MJ2.png
SD: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood"
MJ: "Soulless Corporate Worker Staring at the camera, professional camera, bleak background, sad mood --style raw --s 250 --c 100 --weird 3000"
p6SD3.png
p6MJ3.png
Series 2/3

SD: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman"
MJ: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman --style raw --s 250"
p6SD4.png
p6MJ4.png
SD: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman"
MJ: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman --style raw --s 250 --c 100"
p6SD5.png
p6MJ5.png
SD: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman"
MJ: "1973 Dodge Challenger speeding down a highway being chased by a donut policeman --style raw --s 250 --c 100 --weird 3000"
p6SD6.png
p6MJ6.png
Series 3/3

SD: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear"
MJ: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear --style raw --s 250"
p6sd7.png
p6MJ7.png
SD: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear"
MJ: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear --style raw --s 250 --c 100"
p6SD8.png
p6MJ8.png
SD: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear"
MJ: "cyberpunk city in a pouring acid rain, red skies, cloudy, evil, dreadful atmosphere, people running in fear --style raw --s 250 --c 100 --weird 3000"
p6SD9.png
p6MJ9.png
Observations: Refiner Checkpoint is a very good parameter for SD for making images, especially faces, less distorted, as we can see in Series 1. It also makes CFG 0.5 more coherent, while preserving its ability to deviate heavily from the original prompt. Midjourney also produces more wild results when using both weird and chaos at the same time. However, it still produces images similar to things that seem "correct" (versus SD's images that require a bit more thought and time to fully interpret and decide what they show).

colindunne
Posts: 7
Joined: Tue Oct 03, 2023 1:09 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by colindunne » Mon Nov 13, 2023 6:38 pm

Bear Tea Party

Image
Midjourney
rough sketch of bears drinking tea, tea party, black and white, graphite --ar 16:9 --style raw --s 250

Image
Stable Diffusion
rough sketch of bears drinking tea, tea party, black and white, graphite
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3960230014, Size: 970x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0

Continuing off of the earlier project, in this comparison test I used the same text prompts for both and didn't adjust extra parameters. With just a handful of regenerations, I picked out the result that looked closest to the general style and concept I was imagining of a traditional graphite drawing style of bears drinking tea at a table. The notable differences I found were that Midjourney's extreme emphasis and push of its own style consistently shine through, oftentimes producing results that ignore numerous areas of the prompt. This is seen in the example above where even though it is the closest to what I was envisioning, it fails to hold the style I was looking for and uses a completely different medium than a graphite sketch. Meanwhile, Stable Diffusion produced results very early on that appeared much more similar to what I was looking for, but at the cost of a less polished result. The image above adheres to the prompt well, but the reoccurring visual errors are seen in the repetition of elements such as texture and the bear's head. I believe that the contrast in the way the systems both handle the text prompt differently in terms of weight and the results they produce is heavily related to us as course participants finding a much stronger feeling of control with Stable Diffusion over Midjourney.

Image
Mixing Results - Stable Diffusion using img2img with Midjourney image above
rough sketch of bears drinking tea, tea party, black and white, graphite
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3839878858, Size: 1456x816, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0

After reflecting on the strengths and weaknesses of both systems, I was curious about using them as references and tools for each other. I utilized the more visually polished and compositionally sound Midjourney result as an image reference for Stable Diffusion's "img2img" feature while still using the same text prompt. At first I was getting unsatisfactory results where Midjourney was heavily repeating elements more than previous. This is where I began to realize and better understand what was mentioned in class about the extreme influence of the size parameter's impact. After appropriately adjusting, I began to receive much more satisfactory results and what is seen above is my favorite. I believe working between both Midjourney and Stable Diffusion can produce not only interesting results but can be more confidently used as a tool to create a visual much closer to that in which is desired or imagined.


1971 Chevy Impala Coast Driving

Image
Midjourney
1971 baby blue chevy impala driving on coast, pastel colors, Low-Angle Shot, US-101, palm trees --ar 16:9 --style raw --s 250

Image
Stable Diffusion
1971 baby blue chevy impala driving on coast, pastel colors, Low-Angle Shot, US-101, palm trees
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2395947976, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0

Image
For presentation, here's a photo of what a 1971 Chevy Impala looks like.
https://www.google.com/url?sa=i&url=htt ... AdAAAAABAE

For this comparison I continued with using the same prompt for both systems, but this time adjusted Stable Diffusion's sampling method after getting an initial result that looked more similar to what I wanted. Midjourney's results boasted clear results with little to no errors in the structure of the car or other elements. This appears to be heavily in part to the imposed style Midjoureny has, which also proved to be the likely reason it frequently failed to take the requested baby blue color of the car. On the other hand, Stable Diffusion often produced broken results of the car giving it seven wheels or multiple angles of the same car merged into one. However, it did not once fail to take in all requested elements of the prompt such as the car, color, and location. It struggled with the camera angle, but this portion of the prompt was based around Midjourney and what I read here and I was curious if Stable Diffusion would reciprocate to the same wording:
https://www.reddit.com/r/midjourney/com ... ra_angles/

I wound up changing one parameter for Stable Diffusion (the sampling method) because of its continued broken results of the car. The result then became very quickly exactly what I was looking for, which is why I did not employ the image-to-image process here like I did above in my first comparison.


Stylized UCSB Storke Tower Sunset

Image
Stable Diffusion
balcony view of ucsb storke tower, sunset, oil painting
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3634639295, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0

Image
Stable Diffusion
balcony view of ucsb storke tower, sunset, oil painting
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 550334784, Size: 960x540, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0

Image
Midjourney
balcony view of ucsb storke tower, sunset, oil painting --ar 16:9 --style raw --s 250

As I write this I was fortunate enough to see the sunset with a view of Storke Tower! With this inspiration, I was curious how both systems would comparatively handle a more niche prompt and how they would once again handle style. With this comparison, the previous comparative observations made in the last two comparisons were reciprocated and incredibly apparent with Midjouney's imposed style completely diverging from the requested prompt and with Stable Diffusion showing a much greater strength adhering to the prompt. One large difference in this comparison from previous ones was Stable Diffusion didn't struggle at all with producing clear and unmorphed elements. Both systems did not produce results appearing like Storke Tower, but Stable Diffusion was consistently somewhat closer than Midjourney.


US Nerf Soldier

Image
Stable Diffusion
us soldier with toy nerf gun, close
Steps: 20, Sampler: PLMS, CFG scale: 7, Seed: 1084451030, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

Image
Midjourney
us soldier with toy nerf gun, close --ar 16:9 --style raw --s 250

Image
Midjourney
us soldier with toy nerf gun, close --ar 16:9 --style raw --s 250

This comparison used a Stable Diffusion prompt I made in a previous project. The prompt is intended to be a satire of the Nerf brand of kids toy guns being put in place of actual weapons someone like a US soldier would use. The complete difference in results was shocking. Stable Diffusion interpreted the prompt as actual adult US soldiers while Midjourney showed results of smiling child soldiers holding toy guns. The toy guns Midjourney produces are significantly more accurate and like toy Nerf guns than Stable DIffusion's results, which makes me infer the influence of the intent to commercialize this technology has something to do with this extreme difference in prompt interpretation and result.


Paper Airplane Traffic

Image
Stable Diffusion
paper airplane traffic, sky, traffic, intersection, blue, clouds
Negative prompt: cars, street, ground
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1430695884, Size: 960x540, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Version: v1.6.0

Image
Midjourney
paper airplane traffic, sky, traffic, intersection, blue, clouds --ar 16:9 --style raw --s 250

Image
Midjourney
paper airplane traffic, sky, traffic, intersection, blue, clouds --ar 16:9 --no cars, street, ground --style raw --s 250

Image
Midjourney
paper airplane traffic, sky, traffic, intersection, blue, clouds --ar 16:9 --no cars, street, ground, airplane --style raw --s 250

Image
Midjourney
paper airplane traffic, sky, traffic, intersection, blue, clouds --ar 16:9 --no cars, street, ground, airplane --chaos 50 --weird 10 --style raw --s 250

Note: in this comparison, all the Midjourney prompts bold the change or difference from the previous Midjourney prompt above it.

In this fifth and final comparison, I once again sourced a Stable Diffusion result from an earlier project to see how Midjourney would handle the same request. I chose this particular image because I believed that Midjouney's imposed style could be more beneficial in this instance to get a stronger result. I first tested this with the same text prompt but without the "negative" prompt elements to remove from the results. Interestingly, Midjouney also produced a result that included a road, which was a reoccurring element in my Stable Diffusion results that eventually led to me including the negative prompt. I then adjusted the Midjourney prompt to include the same negative prompt with the "--no" parameter. Given the results produced a bunch of airplanes I then included "airplane" in the negative parameter which began to distance itself from the original Stable Diffusion prompt. My final Midjouney prompt test included the "weird" and "chaos" parameters as I found them useful parameters in Midjouney but hadn't used them yet with my other comparisons. Like I was hoping, I believe the last two prompt results ultimately showed the potential strength that Midjourney's imposed style can give in offering structure where Stable Diffusion may struggle or lack.

autumnsmith
Posts: 10
Joined: Tue Oct 03, 2023 1:08 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by autumnsmith » Mon Nov 13, 2023 11:36 pm

1A. Stable Diffusion
00097-376395918.png
3D blender file of four blue spheres and eight warm toned torus' in a 3D room, reflective light, spheres recedeing into space at different levels, the torus' varying in size and distance, dramatic spotlights, sharp, chunky graphics, smooth surfaced objects

Negative prompt: humans, grey, discoball, fracturing, white
Steps: 5, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 376395918, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Steps, X Values: "5, 10, 20", Y Type: CFG Scale, Y Values: "1.0, 3.0, 15.0, 120", Version: v1.6.0
1a.png

1B. Midjourney
3D blender file of four blue spheres and eight warm toned torus' in a 3D room, reflective light, spheres recedeing into space at different levels, the torus' varying in size and distance, dramatic spotlights, sharp, chunky graphics, smooth surfaced objects --ar 16:9 --style raw --s 250 -
1B.png

For the initial comparison, I wanted to use the same prompt within Stable Diffusion and Midjourney to see how the program differed in a side-by-side comparison. I took the same prompt from a previous Stable Diffusion set of images and applied as much of the original text as possible into Midjourney. We can see apparent stylistic and tonal differences in the two images and general AI understanding differences. On one hand, we see that Stable Diffusion is better at abstracting the image and creating a larger set of approaches or variations for a given prompt. Midjourney stayed within a similar visual range for this prompt; even when I tried to add more chaos to the prompt, the iterations were difficult to push.



2A. Midjourney
cute claymation cartoon clay sculpture of girl becoming her enviornment in a rugged, metal, industrial junk yard, futurism, weird lighting, mist --ar 16:9 --stylize 50 --style raw - Image #4
2a.png

2B.Stable Diffusion
cute claymation cartoon clay sculpture of a girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 2174031945, Size: 800x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0
Saved: 00526-2174031945.png
2B.png

2C. Stable Diffusion
cute claymation cartoon clay sculpture of a girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist, whimsical, post-apocalyptic urban utopia, inventor
Negative prompt: scary, blue, cool colors, fear, scared expression
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 130743858, Size: 800x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0
Saved: 00531-130743858.png
2C.png

This second set of three images began based on a Midjourney prompt and generated images. Then I took the successful Midjourney prompt and applied it to Stable Diffusion without adding any negatives. From there, I made adjustments to the prompt to correct some of the visual components that Stable Diffusion came up with. and make it more in line with what I was searching for. Overall, based on the set of three images, even after making adjustments and adding negatives to the prompt, I felt that Stable Diffusion was not able to get nearly as close as Midjourney was, given the set of information. One of the things I was most intrigued by during this stepping process was the way the different AIs approached the prompts. The animation of the figure I was requesting, for example, the initial set of four images took a slightly horrific or scary type of visualization, whereas Midjourney created, a warmer more welcoming environment. In the 2C set of images from this prompt, the environment almost looks like a miniature, or this figure is placed on something like a soundboard. This differs from the Midjourney visual creation, as it did, a better job of placing the character within the environment. The other thing I would like to note is, that it appears that the figure is almost afraid for its life in 2C between images 1 and 4.



3A. Stable Diffusion
comic book scene strip based on 1950s style hand drawn illustrations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters
Negative prompt: abstract lines, no adults
Steps: 50, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 4087678425, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0
3A.png

3B. Midjourney
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters Negativeabstract lines, no adults Steps--ar 16:9 : 50, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 4087678425, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0 --style raw --s 250 -

3B.png

3C. Midjourney
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters Negativeabstract lines, no adults Steps --ar 16:9 --style raw --s 250 -

3C.png

3D. Midjourney
https://cdn.discordapp.com/attachments/ ... 681f7a9e02& :: https://cdn.discordapp.com/attachments/ ... 655cd92fe6& :: comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters Negative abstract lines, no adults Steps --ar 16:9
https://cdn.discordapp.com/attachments/ ... 90f196& -

https://cdn.discordapp.com/attachments/ ... 6270ce29&
3D.png

In this third set of images, I kept pushing this idea of comic book style, narration, and sequencing of events. I took the initial image from Stable Diffusion that we looked at last week for my starting point. From there, I took the same prompt and applied it to Midjourney. Upon doing this, I felt that this was the closest of the images that I had seen the AI programming get to with creating a narrative or sequencing that makes sense. After this step, I tried to refine the prompt and the visual proportions. From there, I took the original Stable Diffusion file and the Midjourney file that I felt were the closest, and put it into Midjourney with the prompt to try and create a storytelling sequencing. I felt that this was relatively successful in terms of the elements that I requested, the style, and the text. Although it doesn’t include all of the components that I was looking for like the balloon, for example, it includes a girl and a dog with a story that moves or unfolds and a style close to what I was hoping for.



4A. Stable Diffusion
Rainy day in Paris, man standing on corner holding a violin and a bottle of wine, with a dalmatian by his side, at night outside of a well lit club with flashing lights. Hyperrealistic, close up scene, background blurring out. Foreground and background cohesively integrated into the same style.

Negative prompt: frontal view of humans
Steps: 48, Sampler: DPM++ 2M Karras, CFG scale: 1.0, Seed: 5, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Script: X/Y/Z plot, X Type: Seed, X Values: "5, 10, 100 ", Fixed X Values: "5, 10, 100", Y Type: CFG Scale, Y Values: "1.0, 3.0, 5.0, 10.0", Version: v1.6.0
4A.png

4B. Midjourney
Rainy day in Paris, man standing on corner holding a violin and a bottle of wine, with a dalmatian by his side, at night outside of a well lit club with flashing lights. Hyperrealistic, close up scene, background blurring out. Foreground and background cohesively integrated into the same style. Negativefrontal view of humans --ar 16:9 --style raw --s 250 - https://cdn.discordapp.com/attachments/ ... d4d24e9c&
4B.png

Of all the prompts this was the most realisticly rendered set of images. Again, I took a successful visualization from Stable Diffusion, took the prompt, and put it into Midjourney. As a starting point, Midjourney was much closer to what I was looking for and better at rendering the characters requested. Stable Diffusion continues to struggle with forming figures with an uncanny visual twist. One thing that I did notice was strange in the Midjorney, while doing relatively a good job at combining all the components that I asked for from the prompt - did have a weird component in image #2. Here we can see the dog's paw becoming a hand and taking the place of playing the instrument with humanesque fingers.
Last edited by autumnsmith on Tue Nov 14, 2023 12:44 pm, edited 4 times in total.

autumnsmith
Posts: 10
Joined: Tue Oct 03, 2023 1:08 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by autumnsmith » Mon Nov 13, 2023 11:49 pm

5A Stable Diffusion
comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters

Negative prompt: abstract lines, no adults
Steps: 50, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 4087678425, Size: 800x510, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0
5A.png

5B Midjourney
https://s.mj.run/byClk27osCI :: https://s.mj.run/uc2JroyeiEk :: comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters Negativeabstract lines, no adults Steps --ar 16:9 --style raw --s 250 - Image #4
https://cdn.discordapp.com/attachments/ ... 122e6ea2&
5B.png
5C Midjourney
https://s.mj.run/byClk27osCI :: https://s.mj.run/uc2JroyeiEk :: comic book scene strip based on 1950s style hand drawn illustruations, simple colors, heavy black outlines. Sepia colors. Story unfolding through multiple scenes about a six year old girl who has a pet dog and loses a yellow balloon in some trees. Looks like it is within a paper print newsprint. Playful and fun setting in a neighborhood, simple cartoon human characters Negativeabstract lines, no adults Steps --ar 16:9 --style raw --s 250 - Variations (Strong) by @MAT 255 (fast)
https://cdn.discordapp.com/attachments/ ... 92586c7d&
5C .png

In the last set of images for #5, I took a similar approach to the third set of images. This also included using one of the same Midjourney files for one of the two image inputs to inform the AI of the visual direction to go in. All of these files were from the same set of four images based on the same prompt, either within stable, diffusion, or Midjourney. The difference in this last set of images is that I took another comic book Stable Diffusion generated image from a previous sequencing. This one was significantly more nonsensical and had a large amount of text. One of the other things that I found most interesting was how this evolved to go from a lot of text to no text in the final panel and minimize the text as iterations unfolded. I was curious to see how successful Midjourney would be in approaching this, and while I did feel that it did a good job overall, in the very last 5C set of images, it portrays a narrative without the comic book strip style. The images still have a comic book nature, but don’t resemble, a sequence of events within the same image. Overall, there are some successful moments within the series of four generated images, for example, the balloon, the dog, and the girl. However, this does become a little uncanny in the third image, when the child is half-dog. This was a glitch by the Stable Diffusion software and I wanted to see if it could be corrected by Midjourney. The previous set of comic-based images (series 3) was the most successful of the five series that I explored during this project.
Last edited by autumnsmith on Tue Nov 14, 2023 12:44 pm, edited 2 times in total.

bsierra
Posts: 8
Joined: Tue Oct 03, 2023 3:08 pm

Re: Project 6: Stable Diffusion 2: Comparison

Post by bsierra » Tue Nov 14, 2023 3:25 am

Series 1

Stable Diffusion
00096-2855084593.png
overcrowded bus, people commuting, intense motion blur, dark, bus interior, CCTV footage, people standing shoulder to shoulder, no room street photography, mirrors edge
Negative prompt: cartoon, drawing, colorful, graphic
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 14, Seed: 2855084593, Size: 808x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0

00162-2988795630.png
00195-1271328216.png
overcrowded, house party, commuters, intense motion blur, underground subway interior, CCTV footage, people standing shoulder to shoulder, no room, street photography, mirrors edge, ergo proxy, fisheye perspective
Negative prompt: cartoon, drawing, colorful, graphic, digital art
Steps: 20, Sampler: Euler a, CFG scale: 14, Seed: 2988795630, Size: 808x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0

Midjourney
ucsbmat255_overcrowded_bus_people_commuting_intense_motion_blur_367f5b6e-2ac6-463d-94e6-676fa5a53a70.png
overcrowded bus, people commuting, intense motion blur, dark, bus interior, CCTV footage, people standing shoulder to shoulder, no room street photography, mirrors edge --no cartoon, drawing, colorful, graphic --ar 4:3 --style raw --s 250


Series 2

Stable Diffusion
00223-2336870463.png
00224-2336870464.png
mirrors edge, underground atrium, tree, overgrown foliage,
Negative prompt: cartoon, drawing, colorful, graphic, digital art
Steps: 20, Sampler: Euler a, CFG scale: 14, Seed: 2336870463, Size: 808x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.

00471-2860590446.png
mirrors edge, ergo proxy, underground atrium, tree, overgrown foliage, volumetric lighting, low-quality digital camera photography, subject in foreground, searching for connection, spiritual connection to the internet, human nature technology, ethereal, symmetrical
Negative prompt: cartoon, drawing, colorful, graphic, digital art, visible face, looking at camera,
Steps: 20, Sampler: Euler a, CFG scale: 14, Seed: 2860590446, Size: 808x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

Midjourney
ucsbmat255_mirrors_edge_ergo_proxy_underground_atrium_tree_over_037846a9-1a31-4fce-9226-1bba217f5658.png
mirrors edge, ergo proxy, underground atrium, tree, overgrown foliage, volumetric lighting, low-quality digital camera photography, subject in foreground, searching for connection, spiritual connection to the internet, human nature technology, ethereal, symmetrical --no cartoon, drawing, colorful, graphic, digital art, visible face, looking at camera --ar 4:3 --style raw --s 250


Series 3

Stable Diffusion
grid-0049.png
police mugshot
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1173423829, Size: 512x744, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, Version: v1.6.0

grid-0051.png
police mugshot
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3796989430, Size: 512x744, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

grid-0055.png

portrait photography
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4041707412, Size: 512x744, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

grid-0060.png
portrait photography
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1300809016, Size: 512x744, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0

Midjourney
ucsbmat255_police_mugshot_d00c1aac-df14-4bbf-af93-7764e26ed2a4.png
police mugshot --ar 3:4 --style raw --s 250

ucsbmat255_portrait_photography_579cc2c6-82cc-45dd-b607-b9c74e1104b2.png
portrait photography --ar 3:4 --style raw --s 250


Series 4

Stable Diffusion
00350-1258614224.png
the internet as a natural data resource, extraction, human connectivity, spirituality, nature, foggy overgrown forest, symmetrical composition
Negative prompt: clip art, drawing, scary, dark, cartoon,
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 1258614224, Size: 896x704, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0

00375-204081976.png
the internet as a natural data resource, extraction, human connectivity, spirituality, nature, foggy overgrown forest, symmetrical composition, data points, wireframe, ergo proxy, mirrors edge, serial experiments lain
Negative prompt: clip art, drawing, scary, dark, cartoon,
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 204081976, Size: 896x704, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0

00499-1699801894.png
00466-4227007039.png
the internet as a natural data resource, resource extraction, human connectivity, spirituality, nature, foggy overgrown forest, data points, wireframe, ergo proxy, mirrors edge, serial experiments lain, 2010s tumblr flicker indie sleaze bloghouse digital camera photography, volumetric lighting, obelisk, clipping glitch artifact
Negative prompt: clip art, drawing, scary, dark, cartoon, video game
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 4227007039, Size: 896x704, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

00515-3954399544.png
the internet as a natural data resource, resource extraction, human connectivity, spirituality, nature technology juxtaposition, foggy overgrown forest, data points, wireframe, ergo proxy, mirrors edge, serial experiments lain, 2010s tumblr flicker indie sleaze bloghouse digital camera photography, volumetric lighting, obelisk, clipping glitch artifact, subject model in foreground, symmetry
Negative prompt: clip art, drawing, scary, dark, cartoon, video game
Steps: 20, Sampler: Euler a, CFG scale: 12, Seed: 3954399544, Size: 896x704, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0

Midjourney
ucsbmat255_the_internet_as_a_natural_data_resource_resource_ext_f0ba12bb-5eba-4ee8-81ba-ef7087dce949.png
the internet as a natural data resource, resource extraction, human connectivity, spirituality, nature technology juxtaposition, foggy overgrown forest, data points, wireframe, ergo proxy, mirrors edge, serial experiments lain, 2010s tumblr flicker indie sleaze bloghouse digital camera photography, volumetric lighting, obelisk, clipping glitch artifact, subject model in foreground, symmetry --no clip art, drawing, scary, dark, cartoon, video game --ar 4:3 --style raw --s 250

Post Reply