wk06 - Stable Diffusion 1 - Nov 3, 2022

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

wk06 - Stable Diffusion 1 - Nov 3, 2022

Post by glegrady » Fri Sep 16, 2022 8:06 am

wk06 - Stable Diffusion 1

We are looking at your first results realized in Stable Diffusion. Please post here so that we can discuss on Thursday, November 3, 2022

Add any technical, conceptual and aesthetic discussion you may want to address as we move over to this other software.
George Legrady
legrady@mat.ucsb.edu

jkilgore
Posts: 7
Joined: Tue Mar 29, 2022 3:34 pm

Re: wk6 - Stable Diffusion 1 - Nov 3, 2022

Post by jkilgore » Wed Nov 02, 2022 9:10 pm

First Try With Stable Diffusion: Hitting Too Close

In this first run through of Stable Diffusion I use a couple prompts I explored in Midjourney. The results illustrate how Stable Diffusion has a higher tendency towards literal images, phone photos, stock images.

Prompt: children drawings of god; cave paintings
1667334631_seed_6859068116077931351_upscaled_2_img2imgenhanced.png
If you look back at my previous post, I used the same exact prompt in Midjourney. Midjourney focused on the idea of a child drawing itself, presenting flat images of drawings on stone/paper that resemble a drawing done by a kid. Stable Diffusion built an entire seen of a (distorted) child drawing in a cave. So Midjourney focused on the artifact, while Stable Diffusion focused on the act that would lead to such an artifact. This perhaps illustrates Stable Diffusion's tendency to focus on the human, while Midjourney focuses on the abstract.

Prompt: first person shooter in a catholic church; 4k; ultrarealistic; wildfire
1667336431_seed_209319902677503235_upscaled_2_sharpened_1.png
1667335897_seed_2083_upscaled_2_sharpened_1.png
Again, I am using a prompt similar to a common motif found in all of my Midjourney explanation. This studies comes out much more collage-y, with a huge focus, again, on people. The photos don't look video game-y, but more an uncanny valley version of a photograph on a phone.

Prompt: first person shooter in a catholic church; 4k; ultrarealistic; protestors in a strip mall
1667338458_seed_478434652693290005_upscaled_2_sharpened_2.png
1667337245_seed_3363458392645591595_upscaled_2_sharpened_2.png
1667336594_seed_8952119884884555126_upscaled_2_sharpened_1.png
1667336992_seed_3900772748414195987_upscaled_2_sharpened_2.png
These come out much more disturbing in Stable Diffusion... The protestor word adds quite a bit of realism and it mixes in a (tasteless?) way with the first person shooter prompt. But also interesting in capturing some essence of American political life HMMMMM. Also note the neat 3 panel layout of one of the images.

Here are some more images following the same line of reasoning ( I forgot the prompt, but you can guess some of the keywords):
IMG_0536.JPG
IMG_0506.JPG
IMG_0504.JPG
IMG_0513.JPG
IMG_0520.JPG
IMG_0523.JPG

lu_yang
Posts: 9
Joined: Mon Sep 26, 2022 10:23 am

Re: wk6 - Stable Diffusion 1 - Nov 3, 2022

Post by lu_yang » Wed Nov 02, 2022 10:46 pm

Stable Diffusion produces very consistent results for the same prompts. Therefore this experiment is straightforward to see how factors affect the image.

Images are produced at https://lexica.art

The sequence of the prompts matter, each following prompt will build its influence upon the previous ones, and the image will be initiated by the first prompt. So it would be ideal to put the description of the image at the beginning, then add style and effect with more weight at the front.

All images are using Seed 2089620160, Guidance scale 7, Dimensions 800 × 512

Prompts: Long lines, ink pen by josan gonzalez
Long lines, ink pen by josan gonzalez.png
Prompts: Long lines, ink pen by josan gonzalez, victo ngai
Long lines, ink pen by josan gonzalez, victo ngai.png
Prompts: Long lines, ink pen by josan gonzalez, victo ngai, kilian eng
Long lines, ink pen by josan gonzalez, victo ngai, kilian eng.png
Prompts: Long lines, ink pen by josan gonzalez, kilian eng
Long lines, ink pen by josan gonzalez, kilian eng.png
Prompts: Long lines, ink pen by josan gonzalez, , kilian eng, victo ngai
Long lines, ink pen by josan gonzalez, , kilian eng, victo ngai.png
______________________________________________________

Prompts: Long lines, ink pen by kilian eng
Long lines, ink pen by kilian eng.png
Prompts: Long lines, ink pen by kilian eng, victo ngai
Long lines, ink pen by kilian eng, victo ngai.png
Prompts: Long lines, ink pen by kilian eng, victo ngai, josan gonzalez
Long lines, ink pen by kilian eng, victo ngai, josan gonzalez, Seed 2089620160, Guidance scale 7, Dimensions 800 × 512.png
Prompts: Long lines, ink pen by kilian eng, josan gonzalez
Long lines, ink pen by kilian eng, josan gonzalez.png
Prompts: Long lines, ink pen by kilian eng, josan gonzalez, victo ngai
Long lines, ink pen by kilian eng, josan gonzalez, victo ngai.png
______________________________________________________

Prompts: Long lines, ink pen by victo ngai
Long lines, ink pen by victo ngai.png
Prompts: Long lines, ink pen by victo ngai, kilian eng
Long lines, ink pen by victo ngai, kilian eng.png
Prompts: Long lines, ink pen by victo ngai, kilian eng, josan gonzalez
Long lines, ink pen by victo ngai, kilian eng, josan gonzalez.png
Prompts: Long lines, ink pen by victo ngai, josan gonzalez
Long lines, ink pen by victo ngai, josan gonzalez.png
Prompts: Long lines, ink pen by victo ngai, josan gonzalez, kilian eng
Long lines, ink pen by victo ngai, josan gonzalez, kilian eng.png
Different sequence of the prompt leads to different results, and additional effect added by later prompts can be easily identified in this process.

jiarui_zhu
Posts: 7
Joined: Mon Sep 26, 2022 10:25 am

Re: wk6 - Stable Diffusion 1 - Nov 3, 2022

Post by jiarui_zhu » Thu Nov 03, 2022 10:43 am

I am in awe of Stable Diffusion with my first exploration. I found it is easier to generate high-detailed photorealistic images with stable diffusion than MidJourney with similar prompt. In addition, I found the “Explore this style” a very useful feature to explore the images with the same style and learn how others put in the prompt to generate beautiful images.

All the images below are generated with Lexica.

My first exploration with Stable Diffusion is to use the same prompt I had used in MidJourney in previous weeks and compare the results.

Prompt: A photorealistic tree house on a mountain, a crooked path leading towards it
Snip20221103_1.png
Snip20221103_2.png
Snip20221103_3.png
Snip20221103_4.png
Snip20221103_5.png
The tree house generations with MidJourney are here. viewtopic.php?f=86&t=365

It is clear that the images generated with MidJourney are not photo realistic and only represent a vibe. However, with Stable Diffusion, I can have much more details in the image. I am very amazed by the texture of the wood as well as the overall light of the images. It looks very real to me and I have to convince myself that those are not real photos.

Then I use the “Explore this style” to explore some other generations of tree houses, with similar prompts. I was very surprised by what other users can come up with slightly different prompts. Here’re some of the images.
Snip20221103_6.png
Snip20221103_8.png
Snip20221103_9.png



Then I go on to generate images with abstract prompts.

Prompt: everlasting, hyperdetailed, time and space distorted by huge gravitational field in the universe
Snip20221103_10.png
Snip20221103_11.png
Snip20221103_12.png
My previous exploration of abstract prompts with MidJourney is here: viewtopic.php?f=86&t=364&sid=5710bd6d20 ... be878d6c25

In terms of abstract image generations, Stable Diffusion and MidJourney are very similar. They both make sense of the abstract context and use shape and color to give a representation of it. However, I found that Stable Diffusion’s generations are more color-rich. And there’re more variations as well.




In the end, I go on to generate ancient Chinese drawings.
Prompt: Summer manor with peony flowers and lake, chinese landscape painting
Snip20221103_13.png
Snip20221103_14.png
Prompt: extremely intricate qing dynasty artwork, royal palace wall artwork, beautiful historical art, emperor's palace
Snip20221103_15.png
It is very interesting that some of the results came out NOT of the default dimensions. I am very satisfied with the ancient Chinese dewarings generated by Stable Diffusion. They are very realistic and I can’t tell if they are authentic without professional knowledge.

tinghaozhou
Posts: 6
Joined: Mon Sep 26, 2022 10:24 am

Re: wk6 - Stable Diffusion 1 - Nov 3, 2022

Post by tinghaozhou » Thu Nov 03, 2022 1:16 pm

For this week's project, I experimented with both the COLAB version of Stable Diffusion and Lexica. My first interactions with these two platforms, or in general, with Stable Diffusion were pretty messy. In comparison to Midjourney, the time Stable Diffusion took to generate images was generally longer, it required re-logging in after a period of time, and the lack of relevant coding knowledge and skills rendered more difficult the debugging and the stable diffusion is obviously not the easiest platform to interact with (even though our dear TA Yixuan has made Stable Diffusion way way more accessible to us already, thank you Yixuan!). Nonetheless, the Stable Diffusion COLAB I'm practicing with is still very interesting to look at, especially if we try to compare it with Midjourney. Two questions emerged during the process: 1. How does the system understand/process our prompts (aka. how should we modify our prompt composition in order to get a picture we want)? 2. How should we understand the aesthetics question here (aka. what's our metrics of evaluating "aesthetics" in relation to different AIs--styles or details)? 3. How "intelligent" they are in comparison to each other or in what dimensions they are "intelligent"?

I started with a bunch of generation with the old prompts I used for Midjourney. Of course, the results (the styles, the compositions of the image, the details) were significantly different from what I got from Midjourney. For example, I started with this as my first prompt:
"Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, a psychedelic drug–induced hallucination."
Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, a psychedelic drug–induced hallucination_512_1024_None_7.5_0.jpg
The image I got is extremely fascinating:
1. The scale of the image is large--a bird-eye view manifestation of a city's landscape with LOTS OF Nigerian-style architectures. Without giving out detailed instructions in my prompt, the AI understood that the prompt might indicate a landscape, supposedly serving as an "established shot" of the imaginary world. The Stable Diffusion, it seems to me, tried to present as much information as possible from the prompt;
2. However, the resolution is not very high, as we can see here, many details are blurry even though the AI tried to give out more information;
3. The most interesting part here is that the AI presented an image of cityscape that has depth and width. Not only did it understand the difference between the foreground and the background, it also understood perspectives in image composition, because what we see here is essentially a 3-dimensional image (like the satellite mode of google map).

Same situation happened when I modified the prompt a little bit:
"Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s."
Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s_512_1024_None_7.5_0.jpg
Apparently, it shared a very similar composition/arrangement/perspective with the last one. They also shared similar level of blurriness and information. I'm not quite sure how it picked up the animation part of the prompt though. The style was a little bit different but not in a significant manner.

In comparison to Midjourney, I think Stable Diffusion has an internal weighing on the prompt in a rather fixed manner (without many variations like how Midjourney has presented us). When I reran this prompt:
"Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s."
It came up with something really similar, even though for this time, it became less blurry and the resolution is a little bit higher. I didn't put in any words in describing the framing of the image but the scale of the image is a little bit smaller (though it's still a cityscape image).
Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s._512_1024_None_7.5_0.jpg
I'm fascinated by the fact that the Stable Diffusion could possibly understand perspective in image-making so I tried to play with it and reconfirm this assumption a little bit.
"Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s, close up"
Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s, close up_512_1024_None_7.5_0.jpg
Here, I couldn't tell if the Stable Diffusion actually picked up the word "close up," because all it showed was just a smaller scale of the satellite view cityscape. Interestingly, it might also indicate the fact that a close-up of a cityscape prompt is possibly an oxymoron for the AI. Does it mean the AI understand the relational logic between words? What's an oxymoron for an AI?

The next one is better in showing that the Stable Diffusion understands perspective and modifies its generation based on different perspective relations. I used this prompt this time, adding to it "view through a window":
"Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s, view through a window."
Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, ancient animations from the early 1900s, view through a window_512_1024_None_7.5_0.jpg
Apparently, the perspective was changed from a bird-eye/satellite view to a view standing on the ground like we are actually looking out a street scene from a window. More interestingly, the image contains lots of human-like figures!! It not only means that the Stable Diffusion understands the perspective and the logical, cultural context that the perspective might fashion but it also shows the very anthropocentric bias of the AI, as Jack similar pointed out.

merttoka
Posts: 21
Joined: Wed Jan 11, 2017 10:42 am

Re: wk06 - Stable Diffusion 1 - Nov 3, 2022

Post by merttoka » Mon Dec 05, 2022 11:03 pm

This week, I started exploring Stable Diffusion (SD) running locally on my desktop. Setting up such a complicated system on a Windows machine is usually a pain, but the conda environments and hugging face login integrations made this easy.

Initially, the first thing I noticed in SD compared to Midjourney is how realistic the output looks. Midjourney's custom style doesn't exist in the default prompts in Stable Diffusion. Running my initial test in Midjourney with SD gives this realistic picture frame with a sketch inside:
Text prompt: making and breaking symmetry
making and breaking symmetry, framed_512-912_4776_7.5_0.jpg

I wanted to see how the prompt guidance scale affects the output:
Text prompt: making and breaking symmetry
Guidance scales in order: 1, 2, 3, 4, 5, 6, 7.5, 9, 10
making and breaking symmetry_512-912_4776_1_0.jpg
making and breaking symmetry_512-912_4776_2_0.jpg
making and breaking symmetry_512-912_4776_3_0.jpg
making and breaking symmetry_512-912_4776_4_0.jpg
making and breaking symmetry_512-912_4776_5_0_1667538560.jpg
making and breaking symmetry_512-912_4776_6_0_1667538599.jpg
making and breaking symmetry_512-912_4776_7.5_0.jpg
making and breaking symmetry_512-912_4776_9_0_1667538656.jpg
making and breaking symmetry_512-912_4776_10_1667582736.jpg

It is also quite interesting how SD interprets the prompts compared to Midjourney. The following results look much more "literal" than the ones in 2nd week's posts:
Text prompt: upon an infinite archive that is at once mnemonic, ephemeral, digital and physical
1667801550_upon an infinite archive that is at once mnemonic, ephemeral, digital and physical_512-912_4448032080710605_7.5_0.jpg
Text prompt: mnemonic, ephemeral, digital and physical
1667801613_mnemonic, ephemeral, digital and physical_512-912_2657002112777012_7.5_0.jpg
Even though nothing changed in the text prompt, the following two results are more collage-like than the previous one. The following two images are generated in the same batch -- it might be somehow affected by this.
1667801771_mnemonic, ephemeral, digital and physical_512-512_953745653733051_7.5_0.jpg
1667801771_mnemonic, ephemeral, digital and physical_512-512_4295573045971797_7.5_1.jpg

wqiu
Posts: 14
Joined: Sun Oct 04, 2020 12:15 pm

Re: wk06 - Stable Diffusion 1 - Nov 3, 2022

Post by wqiu » Tue Dec 06, 2022 9:40 am

Image-to-image translation with stable diffusion


I have been experimenting with stable diffusion model since week #1. Here is the link to its Colab.
https://colab.research.google.com/githu ... sers.ipynb


My main goal is to generate fencing photos given input of an arbitrary human pose. I explored two ways of doing it: patch-translation method and entire-image-translation method. I also experimented with the effects on final images controlled by the two parameters, guidance_scale and strength, by changing them independently.

#1 Patch-translation
Unknown-5.png
The input is a squared image with a stick-figure in it to give diffuser an idea how the pose should look like. Three parameters can be used to change the final look: the prompt, the guidance_scale and the strength. After tried a few different text prompts, I picked one and keep it unchanged. Then I changed the values of the two parameters to see how it affects the result.
Two Parameters in Stable Diffusion.jpg
Observation:
- Low Strength -> stick to the original pose too much, limited variation, trying to fit a body to the stick-figure;
- High Strength -> drift away from the original pose
- Low guidance_scale -> stick to the original style
- High guidance_scale -> match the style described in text prompt scale better, such as keywords of “fencing”, “sports magazine”, etc.

There has to be a trade-off between the pose fitness and the style fitness. I picked Strength = 0.82 and guidance_scale = 7.5, in which case the image has the correct pose and proper level of photorealistic details.


#2 Entire-image-translation
Screenshot 2022-12-06 at 9.46.29 AM.png
In this experiment, instead of synthesis an image patch for each pose and project them back to the original image, I synthesis the entire image consist of the two pose in one step.

After tweaking the text prompt and the two numeric parameters, the chosen was “Two people in pure white fencer suit, Fencing Games Scene, sports magazine”, along with the strength = 0.7 and guidanc_scale = 8. I used this parameters to convert a sequence of input images into a sequence of two fencer fighting. The sequence was then blended together with multiple-exposure effect to simulate the chronophotography process.
Screenshot 2022-11-30 at 4.42.00 PM.png

Post Reply