Page 1 of 2

Project 1: Studies in AI generated images

Posted: Sat Sep 21, 2024 3:27 pm
by glegrady
This is the workspace area for the M255 course: A project-based course for creative exploration, followed by critical analysis of text-to-image and image-to image production techniques in MidJourney and Stable Diffusion software.

Course syllabus: https://www.mat.ucsb.edu/~g.legrady/aca ... 4f255.html

We are posting our weekly work here consisting of both AI generated images and also concepts.

2023 Fall work: viewforum.php?f=90
2022 Spring worj: viewforum.php?f=85
2022 Fall work: viewforum.php?f=86
2021 Spring work: viewforum.php?f=83
---------
Asignment 1: Post by clicking on "Post Reply" by Thursday, October 3, 2024 for review and discussion

The first projet is to just produce some images to start. I am introducing MidJourney but tyou can use anything AI software you are interested in. For each image or video, provide a response to your result in the following way:

Please post 4 to 6 images here with their text query. Following that, do an analysis of the results. For each image discuss the following:

1. To what degree does your text query influence the generated image?
2. What is the style of the image, and why do you think it has produced that?
3. Any thoughts about how the visual elements in the image are organized
4. How would you change the query?
5. Any other comments?
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result

Re: Project 1: Studies in AI generated images

Posted: Wed Oct 02, 2024 7:13 pm
by squire
I wanted to take a more conceptual approach to this assignment, beginning with the concept of an AI-generated self portrait. I feel that if all of the terms in the prompt apply to the person I am, that image is at least partly a "self portrait." So I ran two opposing types of prompts--one which focused almost entirely on the visual aspects of my identity, and another that used almost entirely on my internal identity (knowing I would still need to add at least a few concrete words so that the program would have something to grasp). My goal was two-fold: first, to see how the program would interpret the more abstract, conceptual terms used in the second prompt and secondly to see which images felt closer to an ""authentic"" self-portrait.

I used some terms which aligned with the aesthetics I'm interested in (film photography, 1970s cinema), and found these aesthetic suggestions either ignored or misinterpreted. I had hoped to capture the look of slide film, which I didn't see in my end results, and I'd attribute this to the multiple meanings of the word "slide." Perhaps I would find better results if I used more technical terms, like E-6 film. I found that the model was unable to replicate the style of Wim Wenders, which was unsurprising.

Something I found particularly interesting was that the "concrete" portraits had their subjects staring directly at the "camera" in 11/12 images generated, while the subjects of the "abstract" portraits looked directly at the "camera" in only 2/8 images.

Of the eight final images I produced, I think that this one from the "abstract" set of images feels the closest to a "self portrait," despite the fact that the figure bears little visual resemblance to me:
Screen Shot 2024-10-02 at 8.10.33 PM.png
"Concrete" Portraits: slide film photograph caucasian female graduate student very pale shoulder-length dirty-blonde wavy hair grey eyes wearing corduroys clogs and green cardigan --style raw --s 250 (RATING: 3.5/5)
Screen Shot 2024-10-02 at 8.09.59 PM.png
"Abstract" Portraits: wim wenders-style 35mm film photograph 24 year old feminine student who is idealistic sarcastic anxious frenetic and creative and loves cats and the smell of gardenia flowers --style raw --s 250 (RATING: 2/5)
Screen Shot 2024-10-02 at 8.10.19 PM.png

Re: Project 1: Studies in AI generated images

Posted: Wed Oct 02, 2024 8:30 pm
by yuehaogao
This post has been deleted. Please scroll down to see my post. Thanks!

Re: Project 1: Studies in AI generated images

Posted: Wed Oct 02, 2024 11:17 pm
by edachiardi
image 1: magnified big pixilated digital art of cats on reed college lawn during renn frayre midnight surprise --style raw --s 250
discord4.jpg

1. To what degree does your text query influence the generated image?
-- i don't think my text query was completely followed in terms of pixelization and magnification. I feel like the text is a loose framework for the image to be generated.

2. What is the style of the image, and why do you think it has produced that?
-- the style of the image is more inline with an oil painting, which i'm not totally sure why it produced that when prompted specific content. In addition, the other images with this prompt were pixilated to a certain extent.

3. Any thoughts about how the visual elements in the image are organized
-- in terms of organizing the image midjourney focused on solely on cats, midnight, and lawn and missed other keywords within my prompt.

4. How would you change the query?
-- i think I would rearrange the words to see if it would generate different content.


5. Any other comments?
-- i did really appreciate the movement and lighting of this image

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
-- 2


image 2: magnified big pixilated digital art of cats on reed college lawn during renn frayre midnight surprise --style raw --s 250
discord2.jpg

1. To what degree does your text query influence the generated image?
-- text query attempted to get closer to the image output I would've liked but still missed the mark. For instance, there was a lack of magnification or large pixels, instead we see small ones across the screen.

2. What is the style of the image, and why do you think it has produced that?

-- i think it doesn't fall under traditional pixel art but definitely does attempt to do so to a much smaller scale.

3. Any thoughts about how the visual elements in the image are organized

-- this image generated a closer idea of a college campus and lawn -- we can kind of make out the lamp posts / building in the background. overall, it does attempt to organize the text query into visual elements productively.

4. How would you change the query?

-- i think i would reword remove magnified and suggest/ even link the version of magnified pixel art that I wanted as an output.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result

-- 4

image 3: pixilated digital art of cats on reed college lawn during renn frayre midnight surprise --style raw --s 250
discord3.jpg

1. To what degree does your text query influence the generated image?

-- this image generated almost all of the content that was in the text query. It doesn't quite right register renn frayre or exact likeness of reed college but does attempt to satisfy all the keywords.

2. What is the style of the image, and why do you think it has produced that?

-- again, it attempts pixel art but in a much more realistic manner leaning away from traditional pixel art.


4. How would you change the query?

-- i think I would like to see the pixels at a greater size so I would add magnification or example images.


6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result

-- 4

image 4: todd hido photographic style magnified big pixilated digital art of cats on reed college lawn during renn frayre midnight surprise --style raw --s 250
discord5.jpg
1. To what degree does your text query influence the generated image?

-- the image was not pixelated at all but does attempt to mimic todd hido's photography style.

2. What is the style of the image, and why do you think it has produced that?

-- the image is in todd hido's style visualized through the blurriness, darkness, and fog. However, the image lacks depth in tonality or pastel colouration.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result

-- 1

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 12:03 pm
by pcroskey
Model: Midjourney v 6.1
My exploration was inspired by this image, which I found on Pinterest: Image

Prompt 1: humanoid venus fly trap woman wearing luxurious clothes sipping tea in the louvre --c 25 --ar 16:9 --weird 300
Image
I anticipated a lot of variation in these results, but none captured my desired aesthetic. I received female aleins, statues and a woman with hair full of plants. But this image was the most surprising:
  • My prompt specified clothes, and yet her breasts are exposed.
  • The parts of her body that are covered are not adorned by "luxurious" clothes, either.
  • I asked for a humanoid plant and received a white woman in two of my results.
  • There isn't a single venus fly trap in the image
I believe the breasts and white woman issue may be influenced by my use of the words "woman" and "humanoid" in the prompt.
Rating: 1

Prompts 2-5 were largely unsuccessful as well. I continued to receive white women with plant hair. None of these prompts resulted in exposed breasts, though, so there's something to celebrate. Here are those prompts:
  • humanoid carnivorous plant sipping tea wearing a ball gown in the louvre --c 25 --ar 16:9 --weird 300
  • (included reference image) face full of venus fly traps sipping tea in a gown at the louvre --c 25 --ar 16:9 --weird 300
  • face full of venus fly traps green skin plant hair sipping tea in a gown at the louvre --c 25 --ar 16:9 --weird 1800 (these were my least weird outputs)
  • venus fly trap wearing a dress with tea poured in plant in the louvre --c 25 --ar 16:9 --weird 3000
Prompt 6: (included reference image) venus fly trap wearing a dress with tea poured in plant in the louvre --c 25 --ar 16:9 --weird 3000
Image
These images were all closer to my original vision. By changing the prompt to "tea poured in the plant," I finally stopped receiving human faces; her body is covered, and her skin is green. I placed the image at the Louvre, but I do find the green back drop interesting. It looks like she's there for a photoshoot. The tea looks crunchy, but I was going for weird, so I'm okay with it.
Rating: 4

Prompts 7 and 8 referenced the above image, but it began to devolve into naked white women again. Here are those prompts:
  • (included above image as reference) venus fly trap covered in plants and greenery in a tea house in the louvre brown and green skin --c 10 --ar 16:9 --weird 3000
  • (included above image as reference) venus fly trap with plants covering entire body in a tea house in the louvre brown skin --c 10 --ar 16:9 --weird 3000
Prompt 9: (included original reference image) venus fly trap in a tea house in the louvre --c 10 --ar 16:9 --weird 3000
Image
In an attempt to avoid the continued objectification of the female body, I refined my prompt. I appreciate this image because it looks like the alien-like plant is growing out of a statue rather than a realistic human body. I also appreciate the framing as similar plants grow behind it. The colors complement each other, and the reddish-orange contrasts nicely against the emerald wall.
Rating: 5

Prompt 10 resulted in beautiful depictions of strange venus fly traps growing in the louvre. Although, I still wanted to get back to my original interest in a plant wearing a ball gown.
  • (included above image as reference) venus fly trap in tea house in the louvre plant body --c 10 --ar 16:9 --weird 3000 --no boobs
Prompt 11: (included above image as reference) venus fly trap wearing ball gown in a tea house in the louvre plant skin --no boobs --c 10 --ar 16:9 --weird 3000
Image
This is what I was originally looking for: a venus fly trap growing through a dress. The background evokes the Louvre and I like the way the vines extend out like arms.
Rating: 5

Bonus image:
Image

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 1:11 pm
by jazer
Prompt: a human with ten heads
ucsbmat255_a_human_with_ten_heads_--chaos_60_--ar_169_--stylize_51188af4-d8c4-40bd-952a-5440efc0f097.png
1. To what degree does your text query influence the generated image?
There are humans and 8 heads, but not a single human with ten heads

2. What is the style of the image, and why do you think it has produced that?
Illustration style. Seems like that was the style of the image it was based on, given the URL displayed prominently.

3. Any thoughts about how the visual elements in the image are organized
The layout makes sense but the literal repetition for most of the heads doesn't make that much sense with the attempt to fade into the background

4. How would you change the query?
probably include words about many necks or one body.

5. Any other comments?
I choose to include this result mostly because of the prominent URL.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
1


Prompt: univeristy california media arts and technology department making art with artificial intelligence --ar 16:9 --style raw --stylize 0 --v 6.1
ucsbmat255_univeristy_california_media_arts_and_technology_depa_6cff7c8b-0422-490b-8dcc-be3534af9d98.png
1. To what degree does your text query influence the generated image?
It delivered what was asked in the prompt

2. What is the style of the image, and why do you think it has produced that?
It is a photo. Out of the four results two were photos of human interacting with art and two were images of art

3. Any thoughts about how the visual elements in the image are organized
It looks like a promotional photo

4. How would you change the query?
I might try to get it to make more art instead of the photo shots

5. Any other comments?
Should MAT put this on their website?

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
2

Prompt: bruegel hunters in the snow ::1 hieronymus bosch ::1 --chaos 50 --ar 16:9 --style raw --stylize 0 --weird 900 --v 6.1
ucsbmat255_bruegel_hunters_in_the_snow_1_hieronymus_bosch_1_--c_a07a687b-6e4b-4e8a-82f4-91d52c92f64f.png
1. To what degree does your text query influence the generated image?
It seems to have combined the two artists' styles

2. What is the style of the image, and why do you think it has produced that?
A painting because they were both painters

3. Any thoughts about how the visual elements in the image are organized
It has a similar perspective to Bruegel's painting with more bizarre characters that you might find in Bosch

4. How would you change the query?
Perhaps include more description of the scene or characters

5. Any other comments?
I was curious how well the styles would be recreated.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
2

Prompt: technical line drawing optical illusion escher tessellation --chaos 30 --ar 16:9 --style raw --weird 600 --v 6.1
ucsbmat255_technical_line_drawing_optical_illusion_escher_tesse_65841aaa-9186-4bd8-b332-6817a7af11ea.png
1. To what degree does your text query influence the generated image?
the style was the main result of the query

2. What is the style of the image, and why do you think it has produced that?
technical line drawing, as requested

3. Any thoughts about how the visual elements in the image are organized
There's some sense of depth and shading

4. How would you change the query?
Perhaps include more scene details

5. Any other comments?
There was little in the way of illusion or tessellation

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
2

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 2:20 pm
by figueroasanabria
Prompt 1: “Surreal dreamscape, floating islands with waterfalls, giant flowers, and soft pastel clouds”
ucsbmat255_Surreal_dreamscape_floating_islands_with_waterfall_4612cdb0-2d8f-43cd-9492-c4eb775b376e_3.png
The query influenced the image well. The floating islands, waterfalls, and pastel clouds match the surreal theme. The flowers are present but not as giant as expected.

What is the style of the image, and why do you think it has produced that?
The style is surreal fantasy. The giant soft lighting, pastel colors, and floating elements fit the “dreamscape” theme perfectly.

Any thoughts about how the visual elements in the image are organized?
The elements are well-organized. The floating islands guide the viewer’s eye, and the tree frames the scene.The background waterfalls balance the composition.

How would you change the query?
I would specify “massive flowers” to make them larger. I could also add more magical elements, like “glowing waterfalls”.

The peaceful, dreamy atmosphere is great. The color palette is beautiful, but the flowers could stand out more.

-Rate:
4/5 almost perfect, just a little more focus on the giant flowers


Prompt 2: “glass and concrete buildings, people interacting with digital kiosks, solar panels on rooftops, and subtle tech elements integrated into the environment”
ucsbmat255_glass_and_concrete_buildings_people_interacting_wi_17f26dac-19d3-4fcf-93cd-e1496d00125d_2.png
To what degree does your text query influence the generated image?
The query strongly influences the image. The glass and concrete building, digital kiosks, and solar panels should be prominently featured, reflecting a modern urban environment that incorporates technology and sustainability.
- The style is contemporary urban realism. This is likely produced by the combination of modern architecture materials with practical technology elements that reflect current design trends.
-To enhance the focus on nature i might add “green spaces between buildings” or “people enjoying outdoor areas” to emphasize the balance of technology and nature in the urban environment.
-Rate: 4/5


Prompt 3: “Building inspired by butterfly wings, curved glass panels, natural light, green facades, rooftop gardens, organic architecture blending with nature”
ucsbmat255_Building_inspired_by_butterfly_wings_curved_glass__23e466b3-5932-4e1b-9c47-8d049af41ced_3.png
-The query has a strong influence on the image elements like “butterfly wings”,” curved glass panels”, and “Green facades” should be distinctly represented, showcasing how the architecture is inspired by natural forms.
-THe style is biophilic architecture or natural inspired design. The style emerges from the integration of organic shapes and materials that mimic and fluidity of natural forms, promoting a connection between the building and its environment.
-The visual elements should be organized to highlight the curves of the building, resembling butterfly wings. The natural light should play a role in emphasizing the glass panels, while rooftop gardens and greenery should integrate seamlessly into the overall design. Rate 3/5

Prompt 4:”Architecture mimicking nature, organic forms, flowing lines, green facades, and natural light integration”
ucsbmat255_Architecture_mimicking_nature_organic_forms_flowin_b111320a-19c5-458d-9ad1-cf289b61d77d_0.png
-The query strongly influences the image.The terms “mimicking nature”, “organic forms”, and “flowing lines” should lead to designs that visibly reflect shapes and patterns, emphasizing a connection to the environment.
-The style is likely organic architecture or eco friendly design. The style arises from focus on integrating natural elements into the built environment, with an emphasis on soft, flowing forms that promote harmony with nature.
-Rate: 4/5

Prompt 5: “ Architecture mimicking nature, curves, organic forms, flowing lines, and natural light integration”
ucsbmat255_Architecture_mimicking_nature_curves_organic_forms_8bcdc19c-18fe-447a-9d88-852b0eb5d0d1_0.png

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 2:25 pm
by emma_brown
I used Midjourney for my exploration, because I had never used it before and broadly I think the aesthetics of the results are a little better out of the box.
-----------------------------------------------------------------
Screenshot 2024-10-03 145021.png

Code: Select all

Citroën grace jones audi quattro mars red car in distant background gritty 1980s style
1. To what degree does your text query influence the generated image?
I found a huge variation when trying to make specific images of a car -- I originally was trying to generate images of the Audi Quattro, but found that there was very limited variation with what was generated.

I asked for Mars Red because that's what the red color was called in a lot of auctions, but Midjourney took it literally
To test that theory, I asked for "badly drawn children's drawings".
Screenshot 2024-10-03 144505.png

Code: Select all

car hand-drawn badly drawn child's drawing
Screenshot 2024-10-03 144619.png

Code: Select all

 audi quattro hand-drawn badly drawn child's drawing
0_2.jpeg
2. What is the style of the image, and why do you think it has produced that?
All of the images of the Audi Quattro involve an angle that a lot of classic cars being photographed are captured from -- sort of head-on and to the side. So despite the style requested, and even some prompts trying to push the car further in the background, the angle of the car mostly does not change and it is always in the foreground.

3. Any thoughts about how the visual elements in the image are organized

Everything is always tidied, the objects in the image tend to be arranged around the subject, so that your eyes are drawn to the center of the image, almost like tunnel vision.

4. How would you change the query?

I think I'd try to get the car more in the background somehow, so that it looked less like a print ad and more like a screenshot from a video commercial.

5. Any other comments?
I think Midjourney struggles with balancing any sci-fi keywords such as "space" with any type of style that isn't sleek -- but I was making some progress by saying "gritty" or "mad max".

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
3 -- it isn't bad but it wasn't what I wanted.

-----------------------------------------------------------------
Screenshot 2024-10-03 145625.png

Code: Select all

arkansas grassland bones and all final scene manga style
1. To what degree does your text query influence the generated image?
For this, I was specifically looking for the color palette of the final scene in Bones and All where the characters sit on a grassland hill at dusk. It's a nice palette of purple and green. However, either Midjourney doesn't know about that movie or it was overidden by the semantic content of "bones and all" because nothing from that scene came through, only literal bones.

2. What is the style of the image, and why do you think it has produced that?
The style is manga because I requested it, but in terms of the setting nothing is really coming through that feels "Arkansas" -- I was expecting more vastness.

3. Any thoughts about how the visual elements in the image are organized
Again, there seems to be an almost "tunnel" formed where the lines of each part of the image, from the treeline to the grass/path division, points to the background (but a little to the left).

4. How would you change the query?
I would try to manually state the colors and aspects of the scene that I had in mind rather than relying on what I believed would have come up in a google search. I guess the movie is sort of new (2022).

5. Any other comments?
Something about the handdrawn style masks a lot of artifacts -- however some of the results had skeletons that looked physically impossible because of the scale and proportions or placement of the bones.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
4 -- I was happy with this when it came out, it did look like a manga albeit a generic one and with weird coloring (they are normally in black and white)

-----------------------------------------------------------------
Screenshot 2024-10-03 150007.png

Code: Select all

teresa of the faint smile sony handycam 8mm smoke
1. To what degree does your text query influence the generated image?
The smoke and the 8mm color quality came through, but I was looking for the visual artifacts that come from vhs tape and I was looking specifically for a character that was in a manga, so I was expecting a manga style to come through. I don't even see the person faintly smiling, but the prompt did seem to pick up on the fact that Teresa == woman.

2. What is the style of the image, and why do you think it has produced that?
Hyperrealistic, which I feel like Midjourney defaults to. Again, some of the color quality of a Handycam comes through but not the horizontal lines you normally see from 8mm tape. Maybe it struggles with applying video effects vs photo effects.

3. Any thoughts about how the visual elements in the image are organized
This looks like a picture I would see on Flickr or Tumblr, where it is supposed to look candid but it is in a location with great light and someone brought a smoke machine. The angle is from above, which adds to the "soft girl" Tumblr look. Interesting but not surprising that she defaults to a white woman who is conventionally attractive.

4. How would you change the query?
Since it doesn't seem to know about the character I was thinking of, I would try to change the race of the subject and get more of the VHS tape artifacts to be applied.

5. Any other comments?
This wasn't a bad photograph and I think the smoke helps hide any artifacts that would be there.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
4 -- aesthetically pleasing at least.

-----------------------------------------------------------------
Screenshot 2024-10-03 150154.png

Code: Select all

glass skin car commercial

1. To what degree does your text query influence the generated image?
This was just a random idea I had and I was hoping that somehow a makeup commercial would be combined with a car commercial. I was expecting soft, pink, feminine aesthetics to come through on their own. Instead, a futuristic sleek conventional car ad came through. It looks like once again, Midjourney focused on the semantic aspects of the individual words "glass" and "skin" applied to a car as a layer rather than the "glass skin" skincare trend.

2. What is the style of the image, and why do you think it has produced that?
Generic futuristic car commercial -- I think I needed to be more specific. I'm not sure why the default car commercial is science fiction rather than scenic vista/outdoorsy, but I think it must be the fact that I said "glass" and "skin" and it thought "force field"/"barrier."

3. Any thoughts about how the visual elements in the image are organized

Boring, generic, that 45 degree angle that's apparently common in car photography. Urban nightscape in the background but there are artifacts where the buildings and their lights are disjointed.

4. How would you change the query?

I would try to force more of a feminine aesthetic and see if I can get actual human skin to come through. Maybe I would try to make a skin covered car that is well-moisturized.

5. Any other comments?

Car commercials are an interesting subject I might try to explore more.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
1 super boring and dumb

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 2:32 pm
by borouyu
[1] Midjourney
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism--ar 16:9 --style raw --s 250
Midjourney1.png
1. To what degree does your text query influence the generated image?
The text query has a significant influence on the generated image. It informs the system of key elements, concepts, and the visual style to focus on. The prompt I used mentioned "Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime," as well as the futuristic theme and elements like glowing geometric shapes, grids, and cosmic depth. The AI translates this text into visual concepts based on patterns in its training data, but how it visualizes abstract concepts like quantum physics or spacetime vary a lot. The system interprets these elements loosely, especially with abstract scientific ideas, which are difficult to directly visualize.

2. What is the style of the image, and why do you think it has produced that?
It looks really computer graphic, general particle effects, not really futuristic style as my prompt. Probably because the style prompt.

3. Any thoughts about how the visual elements in the image are organized
Very dynamic and fluid. The face is wierd.

4. How would you change the query?
To change the style and see more possibilities.


[2] Midjourney
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism--ar 16:9 --style raw --s 0
Midjourney2.png
1. To what degree does your text query influence the generated image?
This one looks more like the mathematics/physical/scientific expression. I think the algorithm is picking up more of futurism than the terms. The image has a sleek, glowing aesthetics and abstract forms.

2. What is the style of the image, and why do you think it has produced that?
Looks more abstract, low saturation, as I changed the -–s 250 to -–s 0.

3. Any thoughts about how the visual elements in the image are organized
Looks like forms from mathematics equations.

4. How would you change the query?
To add more details, maybe explanation of the theory.


[3] Stable Diffusion (DreamStudio)
https://beta.dreamstudio.ai/generate
Stable Diffusion.png
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism
ratio 7 : 4, size 1344 × 768, seed 233933, steps 40

1. To what degree does your text query influence the generated image?
The image tries to visualize holographic, field, and emergence.

2. What is the style of the image, and why do you think it has produced that?
This one looks as abstract as the [2] Midjourney s-0, while still not futurism yet. I'd rather call it impressionism.

3. Any thoughts about how the visual elements in the image are organized
The image is very symmetrical, and blurry. I'm even eeling the reluctance.

4. How would you change the query?
To separate the theories and see how they are expressed individually, rather than mixing together.


[4] DALLE3 (ChatGPT)
Operator Algebra, Holographic Duality, Quantum Field Theory, Emergence of Spacetime, in the style of futurism, proportion 16:9
Dalle3.png
1. To what degree does your text query influence the generated image?
It feels like a very direct translation from text to image, with text, equations

2. What is the style of the image, and why do you think it has produced that?
It’s a mix of style, kind of futurism, emphasizing speed, technology, and modernity, with sharp lines, metallic textures, and neon colors.
And it's even theatrical, according to the spatial orientation.

3. Any thoughts about how the visual elements in the image are organized
The visuals are chaotic, maybe the idea of emergence is highlighted, and the algorithm vaguely connected to other mathematical and physical references.

4. How would you change the query?
To add more style related prompts.

Re: Project 1: Studies in AI generated images

Posted: Thu Oct 03, 2024 2:37 pm
by yuehaogao
Yuehao Gao
Assignment 1
10/01/2024

----------------------------------------------------------------------
----------------------------------------------------------------------

Prompt 1: "A Subaru WRX towing the Titanic Ship behind it using a thick iron anchor chain, on the surface of the ocean, in huge waves and stormy rain. The wheels of the WRX are splashing waters behind it. The picture should be in an artistic brush-painting style."

--------------------------
Image 1A
Model: DALL-E 3
Result:
Prompt_1_DALL_E3.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 6.
It is indispensable to say that most of the elements specified in the prompt are "shown" on the graph, including the WRX car, the ship, the stormy weather, the waves, and the anchor chain. However, nearly all of the elements are placed in an unreasonable or incorrect alignment: for instance, the car is zooming towards the ship rather than "towing" it and facing away; the chain is not working as a "tow rope", but is stretched alongside the car and the ship; the Titanic has five chimneys rather than four. At the same time, the style is not adhering to the "brush-painted style" as specified. This is to say, the picture works as an "assembly of the elements" after trying to understand the prompt but not creating the accurate artwork as wanted.

2. What is the style of the image, and why do you think it has produced that?
The prompt has specified the style to be "brush-painting." While the clouds in the sky, the chimneys, and the body of the Titanic have a little sense of brush painting, other elements, especially the car and the ocean waves, are more likely a "realism" style.

3. Any thoughts about how the visual elements in the image are organized
The model of DALL-E is understanding the elements little by little so that it generates the visual elements more discretely. This is to say, each visual elements make sense by itself, but its relationships with the other visual elements are not as accurate as how the prompt specified. For instance, the WRX is going to run into the Titanic rather than towing it forward.

4. How would you change the query?
I may add more details describing some elements and their position, like "a thick, rusted iron anchor chain over a raging ocean", "The WRX’s wheels throw up arcs of water, struggling against the powerful current", and "the Titanic looms behind". Especially, I would specify the style for the whole picture in the words "The entire scene is captured in a bold, expressive brush-painting style, with sharp, dynamic strokes giving the stormy sky and the water a sense of chaos and movement".

5. Any other comments?
Generally, I do agree that DALL-E 3 knows what each element should look like, but there could be a lack of enough visual data for the model to imagine how "a race car towing a ship on the ocean surface" should look like, hindering the model from generating a precise model.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 3 for the result.

--------------------------
Image 1B
Model: Midjourney
Result:
Prompt_1_Midjourney_4.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 8.
Compared to the model of DALL-E 3, it is obvious that Midjourney understands each element in the prompt even better, especially regarding the position of each element, as well as the overall picture style or textures. For instance, it features a clear and precise front face of the WRX, as well as the body styles of the Titanic, together with the waves and the wheel-splashed water stream. While the position of the car and ship is placed in a position that correctly interprets a "towing" relationship, the chain on the side still seems to be coming out from nowhere. Nevertheless, the text query does influence the generated image more in this model.

2. What is the style of the image, and why do you think it has produced that?
The prompt has specified the style to be "brush-painting." Overall, the entire picture slightly features the texture of an oil painting, especially in the ocean surface and the raindrops.

3. Any thoughts about how the visual elements in the image are organized
Potentially, the model understood the prompt as a whole so it organized the elements in the picture reasonably. Specifically, among all the 4 variations it generated, it seems like the model prioritized putting the WRX in the very middle of the picture and everything else behind it, as it is the very first subject-object that appeared in the prompt

4. How would you change the query?
While everything looks almost perfect in this picture, I would specify more about the chain, like "a rusty, thick anchor chain that is straightened with great tension between the WRX and the front of the ship, as well as more details like "a Rally-blue WRX with strong power feeling" and "a wrecked, giant Titanic ship with a strong vide of history".

5. Any other comments?
Compared to DALL-E 3, it is obvious that Midjourney understood the logic of the prompt better using its dataset collected from Discord user inputs, despite some flies in the ointment like the position of the chain.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 4 for the result.


----------------------------------------------------------------------
----------------------------------------------------------------------

Prompt 2: "A creative stream of flow bursts out of the top of a classical, old, wooden straight piano, while the flow consists of sparkling, high-tech musical notes glowing bright lights. The entire picture should be in the style of contemporary kinetic art."

--------------------------
Image 2A
Model: DALL-E 3
Result:
Prompt_2_DALL_E3.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 8.
For this prompt, it is obvious that more detailed descriptions like "classical, old, wooden" made the model illustrate the piano object very precisely. Additionally, the notes are aligned with the prompt as they are "sparkling" and "bright" enough. Still, specifying that the stream should come from the top of the piano did not play a great role as the model understood it as "the top of the piano keyboard". Meanwhile, the word "technical" did not make the notes look like digital chips or compartments as expected.

2. What is the style of the image, and why do you think it has produced that?
The prompt has specified the style to be "kinetic art". Indeed, the model captured this and made the stream of notes look dynamic and energetic enough.

3. Any thoughts about how the visual elements in the image are organized
The generated picture prioritized the flow of notes over the piano since the flow occupied almost 75% of the picture, which, is reasonable since the prompt has implicitly emphasized that as the major element in the picture.

4. How would you change the query?
I might change the query by adding more specifications to the word "high-tech", like crystal-textured or neon-glowing.

5. Any other comments?
Compared to the first prompt, it is obvious that DALL-E 3 has a much more precise understanding of this one. Still, it seems like the model lacks some sort of "combination imagination" since it did not successfully imagine how musical notes should be "high-tech", and where is "the top of the piano".

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 4 for the result.

--------------------------
Image 2B
Model: Midjourney
Result:
Prompt_2_Midjourney_4.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 8 as well.
Compared to the DALL-E version of interpreting this prompt, there are some aspects that Midjourney did better, including making the notes shine even brighter, with the piano looking "older" by showing dark marks on its body, while some of the music notes floating in the air indeed look more "creative" in their shapes and design. Additionally, the flow does seem to be coming from the top of the piano. However, the prompt failed to make Midjourney understand what is "a stream of flow" as the notes are scattered, and showing everywhere in the picture". Like Dall-E, it also fails to imagine how notes shall be "high-tech". But still, it has generated a great result overall.

2. What is the style of the image, and why do you think it has produced that?
I would consider Midjourney to have the same level of interpreting a piece of "kinetic art" regarding this prompt.

3. Any thoughts about how the visual elements in the image are organized
In this picture, the piano is placed on the left side of the canvas, taking approximately 3/5 of the position. The stream of notes is drawn in a layer that is above the piano since the model understands that it is the major element as well.

4. How would you change the query?
Just like how I would change the query for the Dall-E version of this picture, I would specify more about how the musical notes would look more "high-tech", like the possible colors, textures, and technologies being utilized.

5. Any other comments?
Overall, the model captured every detail given in the prompt, despite having some flaws like misunderstanding how a "stream" should look like.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 4 for the result as well.


----------------------------------------------------------------------
----------------------------------------------------------------------

Prompt 3: (Prompt 2 changed from "kinetic art" to "abstract-impressionist art"): "A creative stream of flow bursts out of the top of a classical, old, wooden straight piano, while the flow consists of sparkling, high-tech musical notes glowing bright lights. The entire picture should be in the style of contemporary abstract-impressionist art."

--------------------------
Image 3A
Model: DALL-E 3
Result:
Prompt_3_DALL_E3.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 4.
This is one experiment I would like to do on both models: to change the style of the picture to something that human-artists might be more adept at: abstract arts. Despite the prompt mentioned the artistic style, the DALL-E generated a very similar picture compared to that of Prompt 2, and showed a very little style of being an "abstract art" as everything still seem very realistic.

2. What is the style of the image, and why do you think it has produced that?
Despite the style of the image is specified as "abstract-impressionist art", which is supposed to be something like random, distributed lines and shapes on the canvas, the generated piano, and the room setting still seems very realistic.

3. Any thoughts about how the visual elements in the image are organized
Like how Midjourney interpreted the previous picture, the piano is placed to the left side of the canvas, and it occupied about 75% of space. The notes are in the layer above the piano. For this picture, the camera seems to be zoomed-in more to focus on the center part of each elements, and we feel to be closer to the elements shown in the picture.

4. How would you change the query?
As the aim is to generate a piece of abstract-impressionist art, I would specify less about the piano and the stream of notes. Rather, I might add more description about the overall style, like "mainly consisted of random-feeling lines, color shapes on a completely-white canvas as the background."

5. Any other comments?
Generally, it seems like Dall-E doesn't really understand what an abstract-impressionist art, at least from the result of this prompt.

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 2 for the result.

--------------------------
Image 3B
Model: Midjourney
Result:
Prompt_3_Midjourney_4.png
1. To what degree does your text query influence the generated image?
On a scale of 1-10, the text query influenced the generated image at an approximate level of 5.
This could be due to the same reason as the Dall-E generation process of this picture, since a similar prompt was entered right before this request, and the models just utilized the previous picture as a reference and tried generating a variation based on that. It feels very similar than the picture it generated for Prompt 2, except it generated three streams instead of one. Despite the original misunderstanding of a "flow" into scattered and distributed notes gives a little sense of "impressionist art", which shall work as a signal that the query slightly influenced the generated image, it is obvious that the model still needs to take in a greater sample regarding the label "abstract-impressionist" art.

2. What is the style of the image, and why do you think it has produced that?
I would consider Midjourney to have the same level of interpreting a piece of "abstract-impressionist art" regarding this prompt.

3. Any thoughts about how the visual elements in the image are organized
The piano is placed in the bottom-middle part of the picture this time, similar as how Dall-E treated Prompt 2. However, what is different about the organization of this picture is that, Midjourney placed most of the "creative, high-tech musical notes" above the piano object, with a minimum about of overlapping.

4. How would you change the query?
Like how I would change the query for Dall-E, I would shift the focus on the element themselves to the style and drawing techniques of abstract-impressionist pictures.

5. Any other comments?
On one hand, despite Midjourney does not seem to understand what should be an "abstract-impressionist art" should look either, it does a slightly better job compared to DALL-E in this task overall. However, on the other hand, one possibility is that the misunderstanding from "one stream" to scattered and distributed notes have made the picture "hit the mark by a fluke".

6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
Based on all the analysis given above, I would personally give a 2.5 for the result, for its slightly closer interpretation of "impressionist art".