Project 4: Stable Diffusion / Flux1.dev
Project 4: Stable Diffusion / Flux1.dev
Project 4: Stable Diffusion / Flux1.dev
This assignment is to try out Stable Diffusion or Flux1.dev to get a sense of the difference between it and MidJourney. It is an open ended assignment. Any strategies or methods are applicable. The important components are that there the images be produced with Stable Diffusion, that you provide the conceptual description of what you were after in your approach, and that you include the technical information at the bottom of the generated image.
SCHEDULE
Nov 14 - Maybe there will be projects to review, otherwise jst lab
Nov 19 - Projects to review or else individual meetings
Nov 21 - Individual metings
Nov 26 - Research
Nov 28 - Thanksgiving
Dec 3 - Final Presentations
Dec 5 - Final Presentations
This assignment is to try out Stable Diffusion or Flux1.dev to get a sense of the difference between it and MidJourney. It is an open ended assignment. Any strategies or methods are applicable. The important components are that there the images be produced with Stable Diffusion, that you provide the conceptual description of what you were after in your approach, and that you include the technical information at the bottom of the generated image.
SCHEDULE
Nov 14 - Maybe there will be projects to review, otherwise jst lab
Nov 19 - Projects to review or else individual meetings
Nov 21 - Individual metings
Nov 26 - Research
Nov 28 - Thanksgiving
Dec 3 - Final Presentations
Dec 5 - Final Presentations
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
-
- Posts: 5
- Joined: Thu Sep 26, 2024 2:14 pm
Re: Project 4: Stable Diffusion / Flux1.dev
For this project, I wanted to leverage learnings from previous projects (incorporating cultural specificity and context into images, constructing complex descriptive text prompts, and tweaking prompt and image parameters to get closer to desired results) while being more intentional with compositional and aesthetic choices. Based on the feedback that Stable Diffusion is the tool of choice for professional artists engaging with Gen AI imagery, I was interested in seeing how the platform would perform differently from the more consumer/lay user geared Midjourney platform for prompts that were largely similar to earlier projects.
Prompt 1: indian fakir sitting on a raised platform under a banyan tree hugging an iMac idol Cinemascope aspect ratio mid shot rule of thirds composition photojournalism look grainy 800 ASA film Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 484941044, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Prompt 2: indian sadhu sitting on a raised platform under a banyan tree hugging a 27 inch iMac idol Cinemascope aspect ratio mid shot rule of thirds composition photojournalism look grainy 800 ASA film Steps: 55, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 50, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Refiner: sd_xl_refiner_1.0 [7440042bbd], Refiner switch at: 0.8, Script: X/Y/Z plot, Version: v1.6.0
Prompt 3: indian sadhu sitting on top of 27 inch iMac laughing 16:9 aspect ratio low angle closeup rule of thirds composition photojournalism style grainy 800 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 30, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Prompt 4: indian sadhu sitting smearing ash on Apple MacBook under giant tree surrounded by forests sundown twilight background 16:9 aspect ratio closeup photojournalism style grainy 1600 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 30, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Prompt 5: An Indian monk praying in front of giant iMac under tree surrounded by forests late evening blue light 16:9 aspect ratio 50mm lens shallow focus 2.8mm aperture closeup photojournalism style grainy 1600 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 55, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 45, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: None, Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0
Prompt 6: Giant laptop with one Indian sadhu crouched in front foreground giant tree in background surrounded by forests early morning light square aspect ratio 50mm lens portrait shallow focus 2.8mm aperture closeup photojournalism style grainy 1600 ASA film
Negative prompt: photorealistic style, buildings, cars, multiple humans Steps: 55, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 45, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: None, Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0
Overall, I thought the image quality and texture in Stable Diffusion was superior and more appealing to my taste than I ever managed to get from Midjourney. However for some reason with the later prompts I kept getting grotesque looking figures which always came in groups of 3 despite that never being in the prompt anywhere. I couldn't get the platform to drop the extra figures in spite of using the negative prompts for this (multiple humans).
Prompt 1: indian fakir sitting on a raised platform under a banyan tree hugging an iMac idol Cinemascope aspect ratio mid shot rule of thirds composition photojournalism look grainy 800 ASA film Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 484941044, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Prompt 2: indian sadhu sitting on a raised platform under a banyan tree hugging a 27 inch iMac idol Cinemascope aspect ratio mid shot rule of thirds composition photojournalism look grainy 800 ASA film Steps: 55, Sampler: DPM++ 2M Karras, CFG scale: 15, Seed: 50, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.7, Hires upscale: 2, Hires upscaler: Latent, Refiner: sd_xl_refiner_1.0 [7440042bbd], Refiner switch at: 0.8, Script: X/Y/Z plot, Version: v1.6.0
Prompt 3: indian sadhu sitting on top of 27 inch iMac laughing 16:9 aspect ratio low angle closeup rule of thirds composition photojournalism style grainy 800 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 30, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Prompt 4: indian sadhu sitting smearing ash on Apple MacBook under giant tree surrounded by forests sundown twilight background 16:9 aspect ratio closeup photojournalism style grainy 1600 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 30, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Prompt 5: An Indian monk praying in front of giant iMac under tree surrounded by forests late evening blue light 16:9 aspect ratio 50mm lens shallow focus 2.8mm aperture closeup photojournalism style grainy 1600 ASA film
Negative prompt: no photorealistic style, buildings, cars Steps: 55, Sampler: DPM++ 3M SDE Karras, CFG scale: 7, Seed: 45, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: None, Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0
Prompt 6: Giant laptop with one Indian sadhu crouched in front foreground giant tree in background surrounded by forests early morning light square aspect ratio 50mm lens portrait shallow focus 2.8mm aperture closeup photojournalism style grainy 1600 ASA film
Negative prompt: photorealistic style, buildings, cars, multiple humans Steps: 55, Sampler: DPM++ SDE Karras, CFG scale: 7, Seed: 45, Size: 1920x1080, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, ControlNet 0: "Module: none, Model: None, Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.6.0
Overall, I thought the image quality and texture in Stable Diffusion was superior and more appealing to my taste than I ever managed to get from Midjourney. However for some reason with the later prompts I kept getting grotesque looking figures which always came in groups of 3 despite that never being in the prompt anywhere. I couldn't get the platform to drop the extra figures in spite of using the negative prompts for this (multiple humans).
Re: Project 4: Stable Diffusion / Flux1.dev
Assignment 4
Stable Diffusion imagining Cat, Cyber Truck, and Mars
Yuehao Gao
When playing with this open-ended AI picture generation model, the first thing that questioned me is how it is utilizing its trained model to understand something more abstract, or something very unlikely to exist in the real world. To test Stable Diffusion's ability to imagine such a scene, I asked it to imagine the following scene:
"A lovely cat on the top of a Cyber Truck parked on the surface of Mars, in a realistic picture style."
While also told it not to have "Earth, Elon Musk, Robot", because they are the elements that are commonly "inter-correlated" with Cyber Truck or Mars, which are disturbing elements that should not appear.
The first image was generated by the "sd_xl_base_1.0" checkpoint and "Automatic" SD_VAE, so that the model is given the absolute decisiveness with only the prompt given. All the parameters are set to the default mode, with a training step number of 20, a "DMP++ 2M Karras" Sampling Method, and also a size of 512*512. It resulted in something like this: It is highly cartoon-styled, in general. The position of the cat and the Cyber Truck are all correct. While the cat looks logistically right, the truck is a regular truck with loads of woods on it, not Tesla's Cyber Truck. This is one misunderstanding part. But the surface of the Mars looks totally fine.
So I changed the sample step from 20 all the way to 100 and got this: It is funny to see how the truck turned into a Jeep, with multiple other cars in the background. The gesture of the cat changed from lying on the car to "standing-lying" like human beings. While it is still highly cartoon-style, it does seems more realistic compared to the previous one.
So I changed the sample steps to the maximum possible value and tried to get the optimized output. But something happened: Stable Diffusion got frozen again and got completely stuck when approaching 53%, only having a blurry outline, like being applied with a massive Gaussian Blurriness, but never got anything further. It is a common trouble that bothers people when using Stable Diffusion. This is how it looked like: Stable Diffusion also tend to crash by having not enough memory spaces, even after clearing the Python process in the backend.
So I compared it with Midjourney with the same prompt and got the following pictures: Now I learned why neither models are not generating the correct pictures: I need to specify it as "Tesla Cyber Truck". So I updated the prompt and got these in Midjourney instead: Boom, Midjourney got it. Now let's go back to Stable Diffusion and try again: Ahh... okay. I will stick with Midjourney for my Final project instead...
Stable Diffusion imagining Cat, Cyber Truck, and Mars
Yuehao Gao
When playing with this open-ended AI picture generation model, the first thing that questioned me is how it is utilizing its trained model to understand something more abstract, or something very unlikely to exist in the real world. To test Stable Diffusion's ability to imagine such a scene, I asked it to imagine the following scene:
"A lovely cat on the top of a Cyber Truck parked on the surface of Mars, in a realistic picture style."
While also told it not to have "Earth, Elon Musk, Robot", because they are the elements that are commonly "inter-correlated" with Cyber Truck or Mars, which are disturbing elements that should not appear.
The first image was generated by the "sd_xl_base_1.0" checkpoint and "Automatic" SD_VAE, so that the model is given the absolute decisiveness with only the prompt given. All the parameters are set to the default mode, with a training step number of 20, a "DMP++ 2M Karras" Sampling Method, and also a size of 512*512. It resulted in something like this: It is highly cartoon-styled, in general. The position of the cat and the Cyber Truck are all correct. While the cat looks logistically right, the truck is a regular truck with loads of woods on it, not Tesla's Cyber Truck. This is one misunderstanding part. But the surface of the Mars looks totally fine.
So I changed the sample step from 20 all the way to 100 and got this: It is funny to see how the truck turned into a Jeep, with multiple other cars in the background. The gesture of the cat changed from lying on the car to "standing-lying" like human beings. While it is still highly cartoon-style, it does seems more realistic compared to the previous one.
So I changed the sample steps to the maximum possible value and tried to get the optimized output. But something happened: Stable Diffusion got frozen again and got completely stuck when approaching 53%, only having a blurry outline, like being applied with a massive Gaussian Blurriness, but never got anything further. It is a common trouble that bothers people when using Stable Diffusion. This is how it looked like: Stable Diffusion also tend to crash by having not enough memory spaces, even after clearing the Python process in the backend.
So I compared it with Midjourney with the same prompt and got the following pictures: Now I learned why neither models are not generating the correct pictures: I need to specify it as "Tesla Cyber Truck". So I updated the prompt and got these in Midjourney instead: Boom, Midjourney got it. Now let's go back to Stable Diffusion and try again: Ahh... okay. I will stick with Midjourney for my Final project instead...
Re: Project 4: Stable Diffusion / Flux1.dev
I've been really interested in the concept of "degradation" in image-generation software that was first introduced to me in the study we read for the first week of class. I wanted this project to, in some way, mirror the "Disintegration Loops" (https://www.youtube.com/watch?v=mjnAE5go9dI) made by William Basinski by playing cassette tape on loop until it physically degrades. So I decided to input an image into the img2img part of Stablediffusion and then use "interrogate clip" to get a textual description of that image. (I tried using the DeepBooru version as well, but it gave descriptions in the form of a list rather than a sentence, and I liked the "sentence" aspect since I think it allows for a lot more variation and includes superfluous words like conjuntions, prepositions, etc. rather than pure "matter.") I then generated an image from that description, screenshotted (as to wipe the metadata) the new, generated image, ran it in the image-to-text program, generated a new image from that text, etc. etc. etc.
First, I began with a self-portrait I had taken in the Troy Public Library.
a woman in a library holding a camera up to her face and looking at the camera with a surprised look on her face, promotional image, a character portrait, transgressive art, Cindy Sherman
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2384183260, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman with a camera taking a picture of herself in a library with bookshelves behind her and a red lip, portrait photography, a character portrait, new objectivity, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3724694681, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman holding a camera in a library with bookshelves behind her and a camera in front of her, portrait photography, a character portrait, art photography, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3433578283, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman holding a camera in front of a bookcase with books on it and a camera in her hands, portrait photography, a character portrait, art photography, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2765732771, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I seemed to be stuck at a dead-end (though I was very flattered that my photograph read as Cindy Sherman-esque), so I decided to switch gears and instead work from an AI generated image. I needed a base image, so I input a quote from JG Ballard's Crash:
For him these wounds were the keys to a new sexuality born from a perverse technology. The images of these wounds hung in the gallery of his mind like exhibits in the museum of a slaughterhouse.
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 16, Seed: 2201383423, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0
I found these images really compelling; they reminded me a bit of Francis Bacon and I found the compositional aspects of the bottom two to feel quite unique for AI images with the panels within the frames. The bottom-right image looked almost like celluloid film strips. I decided to put the screenshot of all four images into the image-to-text program, because I felt like having a very busy and chaotic composition might allow for more complex prompts and lengthen the lifespan of this loop. The description I got was:
a collage of pictures of a man with a knife and a woman with a knife and blood on his body, anatomical, a jigsaw puzzle, neoplasticism, Adolf Born
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3766466781, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man and a woman with red paint on their faces and body, both of them are half - painted, behance hd, an ambient occlusion render, photorealism, Dirk Crabeth
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 114898119, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
two nude men with red marks on their bodies are standing next to each other, facing opposite directions, with a gray background, behance hd, an ambient occlusion render, neo-figurative, Beeple
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1515535195, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
(I had to look up what "Beeple" was. I found out that it was really stupid!)
two male mannequins standing in a dark room with a spotlight on them, one of them is red, physically based rendering, an ambient occlusion render, figurativism, Évariste Vital Luminais
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1068555549, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
two red men standing next to each other in a dark room with a spotlight on them and a black background, physically based rendering, a computer rendering, photorealism, Andries Both
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3983265064, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a couple of red men standing next to each other in front of a spotlight light in a dark room, dynamic lighting, a raytraced image, sots art, Bourgeois
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3235525563, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man standing in front of a light beam in the dark with a red suit on and a red tie on, spotlight, a hologram, holography, Dirk Crabeth
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 799836148, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man in a red suit and tie standing in front of a blue background with a spotlight behind him, promotional image, a character portrait, arbeitsrat für kunst, Dirck van der Lisse
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1339353598, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man in a red suit and tie looking up at the sky with a bright light behind him and a spotlight behind him, character portrait, a character portrait, sots art, Arie Smit
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3165065920, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I found it interesting how immediately... bland? the images got, starting from the first image I generated after the one based on the actual Ballard text. And of course, there is an obvious commonality between both final results: all roads lead to a headshot of a white person centered in the frame and illuminated by soft, diffused light. I don't think I'll be able to discuss every image in depth, but I found the "two male mannequins" image to be particularly compelling. Two images below that, something I found noteworthy about the silhouette/clipart-y figure in front of the beam of light is the fact that it is just slightly off-center, and cuts off at the shins. I also found the text descriptions themselves to be quite valuable--the artists that it suggests as reference points are usually very far off from the visuals, in both the image-to-text and text-to-image iterations. The only exception might be Louise Bourgeois. Other references were certainly really interesting: sots art and Arbeitsrat für Kunst in particular seem like strange choices. Of course, in terms of bias, most (though not all) of these artists are European men. Interestingly enough, for the loop I made using my own photograph, it gave me two female artists.
First, I began with a self-portrait I had taken in the Troy Public Library.
a woman in a library holding a camera up to her face and looking at the camera with a surprised look on her face, promotional image, a character portrait, transgressive art, Cindy Sherman
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2384183260, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman with a camera taking a picture of herself in a library with bookshelves behind her and a red lip, portrait photography, a character portrait, new objectivity, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3724694681, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman holding a camera in a library with bookshelves behind her and a camera in front of her, portrait photography, a character portrait, art photography, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3433578283, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a woman holding a camera in front of a bookcase with books on it and a camera in her hands, portrait photography, a character portrait, art photography, Anka Zhuravleva
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2765732771, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I seemed to be stuck at a dead-end (though I was very flattered that my photograph read as Cindy Sherman-esque), so I decided to switch gears and instead work from an AI generated image. I needed a base image, so I input a quote from JG Ballard's Crash:
For him these wounds were the keys to a new sexuality born from a perverse technology. The images of these wounds hung in the gallery of his mind like exhibits in the museum of a slaughterhouse.
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 16, Seed: 2201383423, Size: 512x512, Model hash: 6ce0161689, Model: v1-5-pruned-emaonly, Version: v1.6.0
I found these images really compelling; they reminded me a bit of Francis Bacon and I found the compositional aspects of the bottom two to feel quite unique for AI images with the panels within the frames. The bottom-right image looked almost like celluloid film strips. I decided to put the screenshot of all four images into the image-to-text program, because I felt like having a very busy and chaotic composition might allow for more complex prompts and lengthen the lifespan of this loop. The description I got was:
a collage of pictures of a man with a knife and a woman with a knife and blood on his body, anatomical, a jigsaw puzzle, neoplasticism, Adolf Born
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3766466781, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man and a woman with red paint on their faces and body, both of them are half - painted, behance hd, an ambient occlusion render, photorealism, Dirk Crabeth
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 114898119, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
two nude men with red marks on their bodies are standing next to each other, facing opposite directions, with a gray background, behance hd, an ambient occlusion render, neo-figurative, Beeple
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1515535195, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
(I had to look up what "Beeple" was. I found out that it was really stupid!)
two male mannequins standing in a dark room with a spotlight on them, one of them is red, physically based rendering, an ambient occlusion render, figurativism, Évariste Vital Luminais
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1068555549, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
two red men standing next to each other in a dark room with a spotlight on them and a black background, physically based rendering, a computer rendering, photorealism, Andries Both
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3983265064, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a couple of red men standing next to each other in front of a spotlight light in a dark room, dynamic lighting, a raytraced image, sots art, Bourgeois
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3235525563, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man standing in front of a light beam in the dark with a red suit on and a red tie on, spotlight, a hologram, holography, Dirk Crabeth
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 799836148, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man in a red suit and tie standing in front of a blue background with a spotlight behind him, promotional image, a character portrait, arbeitsrat für kunst, Dirck van der Lisse
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1339353598, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
a man in a red suit and tie looking up at the sky with a bright light behind him and a spotlight behind him, character portrait, a character portrait, sots art, Arie Smit
Steps: 44, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3165065920, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I found it interesting how immediately... bland? the images got, starting from the first image I generated after the one based on the actual Ballard text. And of course, there is an obvious commonality between both final results: all roads lead to a headshot of a white person centered in the frame and illuminated by soft, diffused light. I don't think I'll be able to discuss every image in depth, but I found the "two male mannequins" image to be particularly compelling. Two images below that, something I found noteworthy about the silhouette/clipart-y figure in front of the beam of light is the fact that it is just slightly off-center, and cuts off at the shins. I also found the text descriptions themselves to be quite valuable--the artists that it suggests as reference points are usually very far off from the visuals, in both the image-to-text and text-to-image iterations. The only exception might be Louise Bourgeois. Other references were certainly really interesting: sots art and Arbeitsrat für Kunst in particular seem like strange choices. Of course, in terms of bias, most (though not all) of these artists are European men. Interestingly enough, for the loop I made using my own photograph, it gave me two female artists.
Re: Project 4: Stable Diffusion / Flux1.dev
My exploration is an attempt to achieve what scholars are calling Afro-now-ism, where near futures are imagined within a Black cultural context. Inspired by Curry Hackett's AI imagery, I'm testing how well Stable Diffusion can achieve these results. My plan was to start by achieving something similar to the inspiration image below and then build from there. However, as is often the case with this technology, phase 1 of achieving an unbiased, satisfactory baseline image took more time than expected. As a result, I did not progress beyond this.
Inspiration image:
Black quilters of Gee’s Bend in Alabama building a skyscraper with a quilted façade. Created by Curry Hackett using Midjourney
Prompt 1: Black quilters of Gee’s Bend in Alabama building a skyscraper with a quilted façade surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3551628543, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
There are no people actively creating the building like I originally intended. However, it was able to capture the desired effect of having quilted walls.
Prompt 2: Black women quilters and architects in Alabama designing a quilted building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3413652898, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Still no people in the image. The model seems to be clinging too strongly to the word building.
Prompt 3: Black women quilters and architects in Alabama sewing a quilted building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3853714752, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I swapped the word 'designing' for 'sewing' which managed to adjust the focus from the buildings to the quilts. This prompt also managed to include abstract people, which I appreciate.
Prompt 4: Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 3.5, Seed: 2977550761, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
This result also includes abstract people in the foreground, which is closer to my goal. However, I would like to move beyond people and buildings being embedded within a quote, shifting instead to photorealistic people creating a quilted building. My inclusion of the word 'standing' sought to achieve this, but it was unsuccessful.
Prompt 5: Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 2, Seed: 154977898, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Here, we're finally beginning to achieve semi-realistic people. However, the strange bodies are particularly disturbing for me as they are reminiscent of racist imagery prominent in the early to mid-20th century. It was here that I decided it was time to try out image-to-image generation.
Note that I've been playing with the CFG value in each of these images, attempting to get closer to more lifelike imagery. I've also produced many more images than are shown here. The ones featured here are my most successful results.
Prompt 6: (referencing inspiration image) Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4166668743, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
I like this image's use of colors and patterns. I also like that there are clearly Black women featured. I still feel like I'm not achieving the aesthetic properties of my inspiration image, though.
Prompt 7: (referencing inspiration image) black women quilters
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 2, Seed: 1531868076, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
I limited my prompt, hoping to focus my results. This was clearly unsuccessful.
From here, Stable Diffusion continued to devolve into shapes and figures largely unrecognizable. I played around with the CFG and prompts, but I remained unable to get back on track.
Sanity Check 1: people making a quilt
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 4.5, Seed: 2813239669, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Here, I removed any reference to my inspiration image as well as any mention of Black people or women. This result was the closest to my intended goal.
Sanity Check 2: people making a building made of quilts
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 762389455, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I continued to play with the prompt, adding and removing words associated with blackness and women. Prompts that included the words 'Black' or 'African American' strayed much farther from the prompt and realism. Even when bumping up the CFG on these prompts, I continued to receive sensible results.
Prompt 8: (referencing Sanity Check 1) Black women quilters standing hand sewing a building
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 3, Seed: 1068477856, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Although referencing the Sanity Check image, I could not achieve a similar result with Black women. Instead, each output featured dilapidated buildings with little signs of quilts.
Prompt 9: (referencing Sanity Check 2) women quilters standing hand sewing a building
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 12, Seed: 2250608360, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Deleting the word 'black' and keeping all other parameters the same, quilted, rather than decaying, buildings returned.
Final thoughts: AI is lauded as a time saver, handling simple, mundane tasks so people can spend that excess time enacting larger, more exciting jobs. AI imagery, in particular, is often used as a source of inspiration or a quick way to visualize larger ideas. Yet, due to the biases within these systems, I spend most of my time getting these models to accurately represent Black people, inhibiting me from taking that next step. It fails to produce quick and inspiring imagery, requiring excessive hand-holding, only to arrive five blocks from my destination. I end up much more limited in what I can create than if I had picked up an old-school digital or analog tool. My feelings are echoed by Toni Morrison:
Inspiration image:
Black quilters of Gee’s Bend in Alabama building a skyscraper with a quilted façade. Created by Curry Hackett using Midjourney
Prompt 1: Black quilters of Gee’s Bend in Alabama building a skyscraper with a quilted façade surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3551628543, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
There are no people actively creating the building like I originally intended. However, it was able to capture the desired effect of having quilted walls.
Prompt 2: Black women quilters and architects in Alabama designing a quilted building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3413652898, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Still no people in the image. The model seems to be clinging too strongly to the word building.
Prompt 3: Black women quilters and architects in Alabama sewing a quilted building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3853714752, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I swapped the word 'designing' for 'sewing' which managed to adjust the focus from the buildings to the quilts. This prompt also managed to include abstract people, which I appreciate.
Prompt 4: Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 3.5, Seed: 2977550761, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
This result also includes abstract people in the foreground, which is closer to my goal. However, I would like to move beyond people and buildings being embedded within a quote, shifting instead to photorealistic people creating a quilted building. My inclusion of the word 'standing' sought to achieve this, but it was unsuccessful.
Prompt 5: Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 2, Seed: 154977898, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Here, we're finally beginning to achieve semi-realistic people. However, the strange bodies are particularly disturbing for me as they are reminiscent of racist imagery prominent in the early to mid-20th century. It was here that I decided it was time to try out image-to-image generation.
Note that I've been playing with the CFG value in each of these images, attempting to get closer to more lifelike imagery. I've also produced many more images than are shown here. The ones featured here are my most successful results.
Prompt 6: (referencing inspiration image) Black women quilters standing in Alabama hand sewing a building surrealism
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4166668743, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
I like this image's use of colors and patterns. I also like that there are clearly Black women featured. I still feel like I'm not achieving the aesthetic properties of my inspiration image, though.
Prompt 7: (referencing inspiration image) black women quilters
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 2, Seed: 1531868076, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
I limited my prompt, hoping to focus my results. This was clearly unsuccessful.
From here, Stable Diffusion continued to devolve into shapes and figures largely unrecognizable. I played around with the CFG and prompts, but I remained unable to get back on track.
Sanity Check 1: people making a quilt
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 4.5, Seed: 2813239669, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
Here, I removed any reference to my inspiration image as well as any mention of Black people or women. This result was the closest to my intended goal.
Sanity Check 2: people making a building made of quilts
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 762389455, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0
I continued to play with the prompt, adding and removing words associated with blackness and women. Prompts that included the words 'Black' or 'African American' strayed much farther from the prompt and realism. Even when bumping up the CFG on these prompts, I continued to receive sensible results.
Prompt 8: (referencing Sanity Check 1) Black women quilters standing hand sewing a building
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 3, Seed: 1068477856, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Although referencing the Sanity Check image, I could not achieve a similar result with Black women. Instead, each output featured dilapidated buildings with little signs of quilts.
Prompt 9: (referencing Sanity Check 2) women quilters standing hand sewing a building
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 12, Seed: 2250608360, Size: 880x495, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Deleting the word 'black' and keeping all other parameters the same, quilted, rather than decaying, buildings returned.
Final thoughts: AI is lauded as a time saver, handling simple, mundane tasks so people can spend that excess time enacting larger, more exciting jobs. AI imagery, in particular, is often used as a source of inspiration or a quick way to visualize larger ideas. Yet, due to the biases within these systems, I spend most of my time getting these models to accurately represent Black people, inhibiting me from taking that next step. It fails to produce quick and inspiring imagery, requiring excessive hand-holding, only to arrive five blocks from my destination. I end up much more limited in what I can create than if I had picked up an old-school digital or analog tool. My feelings are echoed by Toni Morrison:
These models produced images portraying Black children as poor and unclothed, so I spent two weeks trying to work around that. Stable diffusion struggled to depict Black female quilters, so I spent multiple days fighting it. These distractions, as Morrison calls them, have been a defining feature of my experience using AI.The function, the very serious function of racism is distraction. It keeps you from doing your work. It keeps you explaining, over and over again, your reason for being. Somebody says you have no language and you spend twenty years proving that you do. Somebody says your head isn’t shaped properly so you have scientists working on the fact that it is. Somebody says you have no art, so you dredge that up. Somebody says you have no kingdoms, so you dredge that up. None of this is necessary. There will always be one more thing.
Re: Project 4: Stable Diffusion / Flux1.dev
My analysis in this project focuses on fidelity, creative reinterpretation, and the balance between cultural memory preservation and modern influence.
A scene featuring artifacts from the Terracotta Army, with soldiers, horses, and chariots partially unearthed and showing the wear of centuries, set within an ancient excavation site, close-up
Negative prompt: polished and modernized appearances
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3562864257, Size: 512x392, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 The images generated showed the expected figures of the Terracotta Warriors but struggled to integrate realistic aging features. Despite using a negative prompt, the surfaces of the figures appeared unnaturally smooth, lacking the dirt, cracks, and wear that would be expected from artifacts buried for centuries.
So I use specifics like "weathered surfaces, mottled marks, and century-old dirt accumulation" in my next attempt:
Terracotta warriors partially unearthed in an ancient excavation site, showing weathered surfaces, mottled marks, and century-old dirt accumulation under natural sunlight, Close-up of the terracotta warrior's face.
Negative prompt: polished and modernized appearances
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 278847009, Size: 688x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 CFG Scale lower: CFG Scale higher (16): Sampling steps higher: A close-up view of the intricate details on the Bronze Chariot and Horses
Negative prompt: ornate, complex
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1392847653, Size: 688x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 Dunhuang murals of Flying Apsaras
Steps: 15, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 1919447904, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Script: X/Y/Z plot, X Type: CFG Scale, X Values: "3,5,7,9,11", Y Type: Steps, Y Values: "15,20,25,30,35", Version: v1.6.0 Size: 616x472 More specific value use and prompt use:
A traditional Dunhuang mural of Flying Apsaras, with intricate flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 3, Seed: 2506153082, Size: 616x472, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 Img to img:
A traditional mural of Flying Apsaras from Dunhuang, with flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1761100857, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Original IMG: Generated IMG: Steps: 35, CFG scale: 3 A traditional mural of Flying Apsaras from Dunhuang, with flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 15, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 1818130502, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Script: X/Y/Z plot, X Type: CFG Scale, X Values: "3,5,7,9,11", Y Type: Steps, Y Values: "15,20,25,30,35", Version: v1.6.0 Encourage Creativity:
A reinterpretation of Flying Apsaras from Dunhuang, blending traditional Chinese mural art with modern abstract influences
Negative prompt: traditional
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 10, Seed: 3122917326, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
A scene featuring artifacts from the Terracotta Army, with soldiers, horses, and chariots partially unearthed and showing the wear of centuries, set within an ancient excavation site, close-up
Negative prompt: polished and modernized appearances
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3562864257, Size: 512x392, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 The images generated showed the expected figures of the Terracotta Warriors but struggled to integrate realistic aging features. Despite using a negative prompt, the surfaces of the figures appeared unnaturally smooth, lacking the dirt, cracks, and wear that would be expected from artifacts buried for centuries.
So I use specifics like "weathered surfaces, mottled marks, and century-old dirt accumulation" in my next attempt:
Terracotta warriors partially unearthed in an ancient excavation site, showing weathered surfaces, mottled marks, and century-old dirt accumulation under natural sunlight, Close-up of the terracotta warrior's face.
Negative prompt: polished and modernized appearances
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 278847009, Size: 688x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 CFG Scale lower: CFG Scale higher (16): Sampling steps higher: A close-up view of the intricate details on the Bronze Chariot and Horses
Negative prompt: ornate, complex
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1392847653, Size: 688x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 Dunhuang murals of Flying Apsaras
Steps: 15, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 1919447904, Size: 512x512, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Script: X/Y/Z plot, X Type: CFG Scale, X Values: "3,5,7,9,11", Y Type: Steps, Y Values: "15,20,25,30,35", Version: v1.6.0 Size: 616x472 More specific value use and prompt use:
A traditional Dunhuang mural of Flying Apsaras, with intricate flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 3, Seed: 2506153082, Size: 616x472, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Version: v1.6.0 Img to img:
A traditional mural of Flying Apsaras from Dunhuang, with flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1761100857, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
Original IMG: Generated IMG: Steps: 35, CFG scale: 3 A traditional mural of Flying Apsaras from Dunhuang, with flowing garments, delicate brushwork, and vivid, historically accurate colors.
Steps: 15, Sampler: DPM++ 2M Karras, CFG scale: 3.0, Seed: 1818130502, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Script: X/Y/Z plot, X Type: CFG Scale, X Values: "3,5,7,9,11", Y Type: Steps, Y Values: "15,20,25,30,35", Version: v1.6.0 Encourage Creativity:
A reinterpretation of Flying Apsaras from Dunhuang, blending traditional Chinese mural art with modern abstract influences
Negative prompt: traditional
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 10, Seed: 3122917326, Size: 1200x592, Model hash: 7440042bbd, Model: sd_xl_refiner_1.0, VAE hash: 63aeecb90f, VAE: sdxl_vae.safetensors, Denoising strength: 0.75, Version: v1.6.0
-
- Posts: 7
- Joined: Thu Sep 26, 2024 2:13 pm
Re: Project 4: Stable Diffusion / Flux1.dev
I chose this project to explore how stable diffusion can generate accurate representations of paris metro stations using text only prompts. I wanted to see if the AI can reimagine these spaces from purely descriptive text, and output their architecture and atmosphere. By iterating on descriptions, I aim to test how well the AI can refine its interpretation of these iconic spaces, and try to understand its spatial and cultural understanding.
Prompt 1:
Concorde Metro Station
actual image
1. paris metro station concorde (angle facing the wall)
Steps: 3, Sampler: DPM++ 2M Karras, CFG scale: 1, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
2. Concorde Métro station in Paris, with white ceramic tiled walls, geometric mosaics, and vintage-style wrought iron railings. The platform features arched ceilings with soft, warm lighting, and classic art deco accents. Traditional signage and elegant, timeless design create a refined yet functional atmosphere.
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
3. A close-up shot of the wall at Concorde Métro station in Paris, featuring white ceramic tiles arranged in a large grid, resembling a word search puzzle. The tiles display scrambled letters that form the Declaration of the Rights of Man, with no punctuation, creating a subtle, intricate design. The letters are carefully placed in a structured pattern, evoking the significance of the text. Soft lighting casts a warm glow over the tiles, highlighting the historical and artistic elements of the wall, commemorating the 200th anniversary of the French Revolution.
Steps: 129, Sampler: DPM++ 2M Karras, CFG scale: 21.5, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
a.~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 1.5, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
b~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
c~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 29, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
after all of this, i wasn't particularly content with the images that were being outputted so i used the image to image section of stable diffusion and added the original image and then the following prompts.
4. Concorde Métro station in Paris Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4150237821, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Denoising strength: 0.75, Version: v1.6.0
after adding this image, the outputted image did not resemble the original at all and was animated.
4b. Concorde Métro station in Paris photorealistic
all of the images that were referencing the original were entirely animated.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 24.5, Seed: 3266760655, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Denoising strength: 0.77, Version: v1.6.0
/////////
prompt 2: Arts et Métiers
actual images
1. 2.
1. Arts et Métiers paris metro station
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2807480768, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
2. shot of Paris Metro station 'Arts et Métiers, featuring intricate copper riveted walls with porthole-like peepholes. The design has a steampunk, industrial feel, with a focus on the textured surface of the copper and the circular openings. The atmosphere is futuristic yet vintage, with subtle reflections of light on the metal, evoking a sense of innovation and subterranean exploration.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1398495929, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
3~~~~ Steps: 31, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 3519034721, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
4. Wide-angle view entering Paris Metro station Arts et Métiers, with a focus on the copper riveted walls and porthole-like peepholes. The perspective captures the station from an entrance angle, revealing the full steampunk-inspired design. The curved, industrial architecture stretches ahead, with the gleaming copper surfaces reflecting ambient light.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1286061463, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Overall, I felt as though this last metro station was much easier to replicate due to its distinctive copper background. Rather than the more complex pattern of the Concorde station which is individualized text. But nonetheless, I was surprised at its inability to accurately illustrate these iconic metro stations.
after all of this I wanted to briefly see how midjourney would do... which quickly resulted in closer images.
this is a midjourney output of
1.
A close-up shot of the wall at Concorde Métro station in Paris, featuring white ceramic tiles arranged in a large grid, resembling a word search puzzle. The tiles display scrambled letters that form the Declaration of the Rights of Man, with no punctuation, creating a subtle, intricate design. The letters are carefully placed in a structured pattern, evoking the significance of the text. Soft lighting casts a warm glow over the tiles, highlighting the historical and artistic elements of the wall, commemorating the 200th anniversary of the French Revolution.
2.
shot of Paris Metro station 'Arts et Métiers, featuring intricate copper riveted walls with porthole-like peepholes. The design has a steampunk, industrial feel, with a focus on the textured surface of the copper and the circular openings. The atmosphere is futuristic yet vintage, with subtle reflections of light on the metal, evoking a sense of innovation and subterranean exploration.
Prompt 1:
Concorde Metro Station
actual image
1. paris metro station concorde (angle facing the wall)
Steps: 3, Sampler: DPM++ 2M Karras, CFG scale: 1, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
2. Concorde Métro station in Paris, with white ceramic tiled walls, geometric mosaics, and vintage-style wrought iron railings. The platform features arched ceilings with soft, warm lighting, and classic art deco accents. Traditional signage and elegant, timeless design create a refined yet functional atmosphere.
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
3. A close-up shot of the wall at Concorde Métro station in Paris, featuring white ceramic tiles arranged in a large grid, resembling a word search puzzle. The tiles display scrambled letters that form the Declaration of the Rights of Man, with no punctuation, creating a subtle, intricate design. The letters are carefully placed in a structured pattern, evoking the significance of the text. Soft lighting casts a warm glow over the tiles, highlighting the historical and artistic elements of the wall, commemorating the 200th anniversary of the French Revolution.
Steps: 129, Sampler: DPM++ 2M Karras, CFG scale: 21.5, Seed: 3439858516, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
a.~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 1.5, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
b~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
c~
Steps: 103, Sampler: DPM++ 2M Karras, CFG scale: 29, Seed: 3250962192, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
after all of this, i wasn't particularly content with the images that were being outputted so i used the image to image section of stable diffusion and added the original image and then the following prompts.
4. Concorde Métro station in Paris Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 4150237821, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Denoising strength: 0.75, Version: v1.6.0
after adding this image, the outputted image did not resemble the original at all and was animated.
4b. Concorde Métro station in Paris photorealistic
all of the images that were referencing the original were entirely animated.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 24.5, Seed: 3266760655, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Denoising strength: 0.77, Version: v1.6.0
/////////
prompt 2: Arts et Métiers
actual images
1. 2.
1. Arts et Métiers paris metro station
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2807480768, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
2. shot of Paris Metro station 'Arts et Métiers, featuring intricate copper riveted walls with porthole-like peepholes. The design has a steampunk, industrial feel, with a focus on the textured surface of the copper and the circular openings. The atmosphere is futuristic yet vintage, with subtle reflections of light on the metal, evoking a sense of innovation and subterranean exploration.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1398495929, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
3~~~~ Steps: 31, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 3519034721, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
4. Wide-angle view entering Paris Metro station Arts et Métiers, with a focus on the copper riveted walls and porthole-like peepholes. The perspective captures the station from an entrance angle, revealing the full steampunk-inspired design. The curved, industrial architecture stretches ahead, with the gleaming copper surfaces reflecting ambient light.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 1286061463, Size: 512x512, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Overall, I felt as though this last metro station was much easier to replicate due to its distinctive copper background. Rather than the more complex pattern of the Concorde station which is individualized text. But nonetheless, I was surprised at its inability to accurately illustrate these iconic metro stations.
after all of this I wanted to briefly see how midjourney would do... which quickly resulted in closer images.
this is a midjourney output of
1.
A close-up shot of the wall at Concorde Métro station in Paris, featuring white ceramic tiles arranged in a large grid, resembling a word search puzzle. The tiles display scrambled letters that form the Declaration of the Rights of Man, with no punctuation, creating a subtle, intricate design. The letters are carefully placed in a structured pattern, evoking the significance of the text. Soft lighting casts a warm glow over the tiles, highlighting the historical and artistic elements of the wall, commemorating the 200th anniversary of the French Revolution.
2.
shot of Paris Metro station 'Arts et Métiers, featuring intricate copper riveted walls with porthole-like peepholes. The design has a steampunk, industrial feel, with a focus on the textured surface of the copper and the circular openings. The atmosphere is futuristic yet vintage, with subtle reflections of light on the metal, evoking a sense of innovation and subterranean exploration.
-
- Posts: 6
- Joined: Fri Sep 27, 2024 2:41 pm
Re: Project 4: Stable Diffusion / Flux1.dev
DreamBooth represents a subject-driven fine-tuning approach that enables personalization of text-to-image diffusion models through minimal exemplar learning. By conditioning the model on a small set of subject-specific images (typically 3-5) paired with a unique identifier token, it creates a robust binding between the semantic concept and the visual features of the subject, allowing for consistent subject regeneration while maintaining the model's broader compositional capabilities.
LoRA implements efficiency-optimized model adaptation through low-rank decomposition of weight updates, allowing for specialized training while maintaining a minimal parameter footprint. This approach enables the model to learn new domains or styles through compact transformations of the original weight space, facilitating both resource-efficient training and the ability to combine multiple specialized adaptations.
jupyter notebook
This code implements a powerful fine-tuning system that combines three major techniques: DreamBooth, which allows the model to learn new concepts from just a few images by binding them to special tokens; LoRA, which makes training efficient by adding small trainable matrices instead of modifying the whole model; and specific optimizations for Stable Diffusion XL's architecture with its dual text encoders. It's like teaching an artist to paint in a new style or recognize a specific subject, but doing it in a way that's memory-efficient and preserves the artist's general knowledge. The implementation includes sophisticated features like prior preservation (to prevent the model from "forgetting" what it knows), advanced scheduling, and comprehensive training management, all while being practical enough for production use.
Used my model and base Stable Diffusion model.
univeristy california media arts and technology department making art with artificial intelligence
humanoid venus fly trap woman wearing luxurious clothes sipping tea in the louvre arkansas grassland bones and all final scene manga style Citroën grace jones audi quattro mars red car in distant background gritty 1980s style
"A Subaru WRX towing the Titanic Ship behind it using a thick iron anchor chain, on the surface of the ocean, in huge waves and stormy rain. The wheels of the WRX are splashing waters behind it. The picture should be in an artistic brush-painting style.
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism
LoRA implements efficiency-optimized model adaptation through low-rank decomposition of weight updates, allowing for specialized training while maintaining a minimal parameter footprint. This approach enables the model to learn new domains or styles through compact transformations of the original weight space, facilitating both resource-efficient training and the ability to combine multiple specialized adaptations.
jupyter notebook
This code implements a powerful fine-tuning system that combines three major techniques: DreamBooth, which allows the model to learn new concepts from just a few images by binding them to special tokens; LoRA, which makes training efficient by adding small trainable matrices instead of modifying the whole model; and specific optimizations for Stable Diffusion XL's architecture with its dual text encoders. It's like teaching an artist to paint in a new style or recognize a specific subject, but doing it in a way that's memory-efficient and preserves the artist's general knowledge. The implementation includes sophisticated features like prior preservation (to prevent the model from "forgetting" what it knows), advanced scheduling, and comprehensive training management, all while being practical enough for production use.
Used my model and base Stable Diffusion model.
univeristy california media arts and technology department making art with artificial intelligence
humanoid venus fly trap woman wearing luxurious clothes sipping tea in the louvre arkansas grassland bones and all final scene manga style Citroën grace jones audi quattro mars red car in distant background gritty 1980s style
"A Subaru WRX towing the Titanic Ship behind it using a thick iron anchor chain, on the surface of the ocean, in huge waves and stormy rain. The wheels of the WRX are splashing waters behind it. The picture should be in an artistic brush-painting style.
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism
Last edited by emma_brown on Tue Nov 19, 2024 3:40 pm, edited 2 times in total.
Re: Project 4: Stable Diffusion / Flux1.dev
Labyrinth of Infinity and Complexity
Borges, Escher and Piranesi
I am using AI to create various types of labyrinths, which is an extension of my research from Project 3, where I explored themes of time and mazes inspired by Jorge Luis Borges's texts. Additionally, I found parallel concepts of labyrinth, infinity, complexity between the labyrinthine imagery in Borges's descriptions and the works of Escher and Piranesi. So combine piranesi and escher into the prompts from Borges’s works. I’m curious to see how Stable Diffusion and MidJourney could translate the description of time into space visually.
Test1 Time to Space: The Garden of Forking Paths
“Differing from Newton and Schopenhauer, your ancestor did not think of time as absolute and uniform. He believed in an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.”
SD: This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2074106242, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then add ”Maurits Cornelis Escher” into prompts, the image is still more or less a web, in a visual style of Escher of black and white drawings. SD: The web of time. an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. Maurits Cornelis Escher.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2345582116, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I delete “The web of time”, which is too literal, and see how SD interprets the concept of infinite series of times. The result is quite interesting with birdview of mazes and infinite patterns. SD: an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3460897779, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I tried the prompts in Midjourney. Though the text of “web of time” is kept here, it generates infinite stairs instead of web. Midjourney: The web of time. an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher
Then I deleted “The web of time”. Midjourney: an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher --ar 16:9 --style raw --weird 3000 --v 6.1
Another try of only “The Library of Babel, Maurits Cornelis Escher” Midjourney: The Library of Babel, Maurits Cornelis Escher --ar 16:9 --style raw --weird 3000 --v 6.1
SD: Giovanni Battista Piranesi. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 175364102, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Some thoughts and comparisons:
SD tends to create abstract, intricate, and geometric patterns when responding to prompts about labyrinths and the concept of infinite time. SD captures Escher’s style in a black-and-white, high-contrast aesthetic, but it remains grounded in abstraction. It feels less "constructed" and more like a conceptual map or flow of time and space.
MJ generates more realistic and immersive visuals, especially when architectural elements (like Escher-inspired "infinite stairs") are included in the prompt. For example, with "an infinite series of times," MJ renders visualizations like sprawling staircases or physically plausible labyrinthine spaces, giving a sense of scale and materiality. MJ often translates abstract concepts (e.g., "web of time") into more literal architectural metaphors, like infinite staircases or interconnected structures, making it feel grounded in physical space while still surreal.
2 Space: Immortal
“The impression of great antiquity was joined by others: the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. I had made my way through a dark maze, but it was the bright City of the Immortals that terrified and repelled me. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. ” ————Immortal
The images generated by SD look really like Giovanni Battista Piranesi's drawing. SD: The impression of great antiquity, the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3933986634, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
SD: There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3352221127, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I tried MJ + "Giovanni Battista Piranesi" prompts
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --weird 3000 --v 6.1
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --v 6.1
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --v 6.1
Midjourney: Giovanni Battista Piranesi. The impression of great antiquity, the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --weird 3000 --v 6.1
Midjourney: Giovanni Battista Piranesi. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility. --ar 16:9 --style raw --weird 3000 --v 6.1
Borges, Escher and Piranesi
I am using AI to create various types of labyrinths, which is an extension of my research from Project 3, where I explored themes of time and mazes inspired by Jorge Luis Borges's texts. Additionally, I found parallel concepts of labyrinth, infinity, complexity between the labyrinthine imagery in Borges's descriptions and the works of Escher and Piranesi. So combine piranesi and escher into the prompts from Borges’s works. I’m curious to see how Stable Diffusion and MidJourney could translate the description of time into space visually.
Test1 Time to Space: The Garden of Forking Paths
“Differing from Newton and Schopenhauer, your ancestor did not think of time as absolute and uniform. He believed in an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.”
SD: This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2074106242, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then add ”Maurits Cornelis Escher” into prompts, the image is still more or less a web, in a visual style of Escher of black and white drawings. SD: The web of time. an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. Maurits Cornelis Escher.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2345582116, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I delete “The web of time”, which is too literal, and see how SD interprets the concept of infinite series of times. The result is quite interesting with birdview of mazes and infinite patterns. SD: an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3460897779, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I tried the prompts in Midjourney. Though the text of “web of time” is kept here, it generates infinite stairs instead of web. Midjourney: The web of time. an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher
Then I deleted “The web of time”. Midjourney: an infinite series of times, in a dizzily growing, ever spreading network of diverging, converging and parallel times. architecture. Maurits Cornelis Escher --ar 16:9 --style raw --weird 3000 --v 6.1
Another try of only “The Library of Babel, Maurits Cornelis Escher” Midjourney: The Library of Babel, Maurits Cornelis Escher --ar 16:9 --style raw --weird 3000 --v 6.1
SD: Giovanni Battista Piranesi. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 175364102, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Some thoughts and comparisons:
SD tends to create abstract, intricate, and geometric patterns when responding to prompts about labyrinths and the concept of infinite time. SD captures Escher’s style in a black-and-white, high-contrast aesthetic, but it remains grounded in abstraction. It feels less "constructed" and more like a conceptual map or flow of time and space.
MJ generates more realistic and immersive visuals, especially when architectural elements (like Escher-inspired "infinite stairs") are included in the prompt. For example, with "an infinite series of times," MJ renders visualizations like sprawling staircases or physically plausible labyrinthine spaces, giving a sense of scale and materiality. MJ often translates abstract concepts (e.g., "web of time") into more literal architectural metaphors, like infinite staircases or interconnected structures, making it feel grounded in physical space while still surreal.
2 Space: Immortal
“The impression of great antiquity was joined by others: the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. I had made my way through a dark maze, but it was the bright City of the Immortals that terrified and repelled me. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. ” ————Immortal
The images generated by SD look really like Giovanni Battista Piranesi's drawing. SD: The impression of great antiquity, the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3933986634, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
SD: There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere.
Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3352221127, Size: 1920x1080, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Version: v1.6.0
Then I tried MJ + "Giovanni Battista Piranesi" prompts
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --weird 3000 --v 6.1
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --v 6.1
Midjourney: Giovanni Battista Piranesi. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --v 6.1
Midjourney: Giovanni Battista Piranesi. The impression of great antiquity, the impression of endlessness, the sensation of oppressiveness and horror, the sensation of complex irrationality. A maze is a house built purposely to confuse men; its architecture, prodigal in symmetries, is made to serve that purpose. In the palace that I imperfectly explored, the architecture had no purpose. There were corridors that led nowhere, unreachably high windows, grandly dramatic doors that opened onto monklike cells or empty shafts, incredible upside-down staircases with upside-down treads and balustrades. Other staircases, clinging airily to the side of a monumental wall, petered out after two or three landings, in the high gloom of the cupolas, arriving nowhere. --ar 16:9 --style raw --weird 3000 --v 6.1
Midjourney: Giovanni Battista Piranesi. This web of time - the strands of which approach one another, bifurcate, intersect or ignore each other through the centuries - embraces every possibility. --ar 16:9 --style raw --weird 3000 --v 6.1
Last edited by borouyu on Tue Nov 19, 2024 4:00 pm, edited 7 times in total.
Re: Project 4: Stable Diffusion / Flux1.dev
Starting with these original pieces I used interrogate CLIP and interrogate DeepBooru and then used the images as starting images.
building, chain-link fence, city, cityscape, fence, halftone, pixel art, rooftop, skyscraper Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1237, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
building, city, cityscape, fence, pokemon (creature), rooftop, skyscraper Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
building, chain-link fence, city, english text, eyelashes, fence, hat, multiple boys, open mouth, pixel art, poke ball (basic), pokemon (creature), skyscraper, web address Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
a colorful abstract image of a building with many windows and balconies on it's sides and a green background, triadic color scheme, a computer rendering, generative art, Benoit B. Mandelbrot Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
a computer generated image of a building with a lot of windows and doors in it's center, with a yellow and pink background, triadic color scheme, computer graphics, generative art, Benoit B. Mandelbrot Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0building, chain-link fence, city, cityscape, fence, halftone, pixel art, rooftop, skyscraper Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1237, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
building, city, cityscape, fence, pokemon (creature), rooftop, skyscraper Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
building, chain-link fence, city, english text, eyelashes, fence, hat, multiple boys, open mouth, pixel art, poke ball (basic), pokemon (creature), skyscraper, web address Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0
a colorful abstract image of a building with many windows and balconies on it's sides and a green background, triadic color scheme, a computer rendering, generative art, Benoit B. Mandelbrot Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 8, Seed: 1234, Size: 1280x720, Model hash: 31e35c80fc, Model: sdxlbase1.0, VAE hash: 63aeecb90f, VAE: sdxlvae.safetensors, Denoising strength: 0.8, Version: v1.6.0