Project 1: Studies in AI generated images
-
- Posts: 5
- Joined: Thu Sep 26, 2024 2:14 pm
Re: Project 1: Studies in AI generated images
1. To what degree does your text query influence the generated image?
Unsatisfactory. The image seems pretty poorly constructed with no "alter-ego" except the main figure. The lighting style may have been the only part from the prompt that translated somewhat to the final image although I wouldn't call it exactly high key lighting.
2. What is the style of the image, and why do you think it has produced that?
Adult fiction book design. It may be drawing on images in the training set that are tagged "scientific" or "physics" but they seem pretty generic.
3. Any thoughts about how the visual elements in the image are organized
The elements seem stitched together in an "natural-like" unnatural way; seems amateurishly constructed or assembled.
4. How would you change the query?
I might try adding more adjectives and specifying individual picture elements ("smoky, wavy background", "glowing lights in the air flying")
5. Any other comments?
Feels like one needs to dumb down prompts significantly to even get it to construct individual elements satisfactorily never mind an actual mise en scene
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
1 Poor from prompt translation perspective; 3 in terms of quality of image as an image
prompt: ludwig wittgenstein sitting in a cafe by the seashore looking agitated waving his fists arguing with friend sunny morning light streaming in --ar 16:9 --style raw --s 250
1. To what degree does your text query influence the generated image?
Barely satisfactory. The character does not seem to have any resemblance to the historical person but the gesture is somewhat close. Some picture variations have a friend in the frame too (although the latter has pretty much the same facial features as the main character)
2. What is the style of the image, and why do you think it has produced that?
Contrasty black and white film stock look. Commercial, advertisement type composition. It may be courtesy the "cafe" part of the prompt which presumably features in similar searches, in turn making the model associate the word with that type of composition.
3. Any thoughts about how the visual elements in the image are organized
The elements seem stitched together in an "natural-like" unnatural way; seems amateurishly constructed or assembled.
4. How would you change the query?
I might try adding more adjectives to get the composition arranged in a way that is closer to my requirement but doubt if the actual facial appearance would change.
5. Any other comments?
Training set doesn't seem to have images of the historical person to reconstruct appearance accurately.
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
2.5/3 (charitable) For the gesture but no resemblance nor friend pictured prominently
prompt: kummatti telephoto longshot in grassy midwest plain late evening dusk backlit kodak ektachrome --ar 16:9 --style raw --s 250
prompt: kummatti g aravindan movie walking the deserted streets of new york with a cloth bag hung across his torso late evening light high contrast saturated --ar 16:9 --style raw --s 250
1. To what degree does your text query influence the generated image?
Poor image translation. The model does not seem to recognize the name/term as being the lead character from a critically acclaimed Indian art house film from the 1970s. No human pictured at all in first set. Second set has individuals with no resemblance whatsoever to the character from the movie. The image does seem to construct the scenery and lighting quite well though. Picture style (high contrast saturated) also seems to be pretty close to expected results.
2. What is the style of the image, and why do you think it has produced that?
Landscape, travel magazine type glossy photography. Appears to be constructed from generic landscape or travel stock photos
3. Any thoughts about how the visual elements in the image are organized
Landscape itself seems pretty decently constructed and the backlight feels well rendered.
4. How would you change the query?
I might try adding more adjectives to describe the main character as closely as possible to work around the model's unfamiliarity with the subject.
5. Any other comments?
Training set doesn't seem to have images of the cultural artifacts from outside US regions to construct image accurately. The streets in the second set seem well rendered, presumably from better represented images in training set.
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
1 no lead character, poorly constructed generic figure.
prompt: guy in a modern office space sitting at his desk looking into a monitor showing image of his face looking back but missing eyes in eye socket soft office lighting bleach bypass low contrast --ar 16:9 --style raw --s 250
1. To what degree does your text query influence the generated image?
Weird construction of image. None of the images feature faces with eye sockets empty, except one which is somewhat interesting but not accurate. Also the look specified isn't represented accurately either (bleach bypass - photochemical processing technique).
2. What is the style of the image, and why do you think it has produced that?
Generic, stock photos of offices and sedentary professionals that might be disproportionately represented in training sets.
3. Any thoughts about how the visual elements in the image are organized
Feels artificially placed and constructed with a weird flatness to the image; like a layering of images all shot on different focal lengths and then stitched together
4. How would you change the query?
I might try being more specific to the facial features and placement of objects in the frame to get more satisfactory results.
5. Any other comments?
The model seems to churn out decent images of commercial use for generic situations such as office spaces but not capable of simulating specific looks (bleach bypass)
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
1 not a faithful representation of the prompt, neither subject nor look
Re: Project 1: Studies in AI generated images
I would like to analyze midjourney's performance in generating realistic photo style images compared to abstract style images.
I first tried to generate some Polaroid double exposure style photos to see if they matched real photography.
Prompt1: a vintage Polaroid photo combining the silhouettes of a person walking through a forest with an overlay of a city at sunset --style raw --s 250

1. To what degree does your text query influence the generated image?
The result fulfills my description of the scene very well, with elements of forests, silhouettes, cities, and sunsets, and it also incorporates my use of the word “vintage” by showing some of the stains and damage that is characteristic of old photographs around the edges of the photo. But maybe because the scene description looks harmonious, there is not much double exposure feeling, and it's hard to see the characteristics of a Polaroid, the real Polaroid is indeed warm in tone, but the tone of the resulting image looks a bit too warm and yellowish, even though there is some influence from the setting sun, but the color of the sky is still too yellow. This makes it look less true to the actual film tones of the Polaroid.
2. What is the style of the image, and why do you think it has produced that?
Overall this image mimics the style of film taken by a real camera, but the photo doesn't look 'raw', it looks like it's had some sort of filter added or some sort of post-processing, whereas if I were to take a real vintage Polaroid photo with my phone it might look like it has a few more other elements, such as the paper of the Polaroid photo may reflect some ambient light or something.
3. Any thoughts about how the visual elements in the image are organized
From a photographic composition standpoint, this photo uses rule of thirds and depth, added elements in the foreground, middle ground, and background which give a photo depth and a three-dimensional feel. These are common in photography.
4. How would you change the query?
To better emphasize the style of the Polaroid photo, I added the phrase 'all framed by the nostalgic Polaroid border'. But probably because of the size, it's not very obvious that the border comes through.

5. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
I would give it 2, because the photo's realism and color tones can be better, the arrangement of the elements makes the whole photo look ordinary, even though all the described scene elements are realized.
Prompt2: abstract dreamcore depiction of an uncanny natural scene, flowing water shapes the landscape, dream-like, fluidity amidst eerie, fantastical elements --style raw --s 250

1. To what degree does your text query influence the generated image?
In this attempt to depict a more abstract dream scene, I mentioned the elements of 'flowing', 'water', 'nature scene', and the result looks much in line with these elements, and the use of colors is special. The scene does look abstract, but it also lacks some of the hazeness and unclarity of a dreamscape.
2. What is the style of the image, and why do you think it has produced that?
The initial 4 images generated looked a bit similar to an oil painting or landscape ink drawing style in general, and I picked the one that was closest to a 3D modeling style rather than a painting style.

3. Any thoughts about how the visual elements in the image are organized
It is basically the more common visual organization of landscape painting, with no very unusual presence observed. Except for the first one where it looks like large ocean waves take up most of the picture's proportions.
4. How would you change the query?
I removed the phrase 'flowing water shapes the landscape' and replaced it with 'starlight twinkling, haziness' to add some dreaminess. Since it's based on the original generated image revision, so the natural style of the flowing water scene was retained. The starlight and haziness do add some dreaminess, but the color tones tend to be uniform, midjourney still tends to generate some oil painting style images when there is no specific description of the art style of the image.

5. Any other comments?
In order to generate some images that I wouldn't expect, maybe I'll try more descriptive vocabulary and descriptions of different art styles.
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
I will give it 3.5, it accurately generates the elements in the scene based on the descriptions I gave it, and the blend looks relatively uniform, but I expect to see more complex distribution of scene elements.
Observation
Based on comparisons of images generated by midjourney under a figurative descriptive vocabulary versus an abstract descriptive vocabulary, my initial obsevation is that the degree of abstraction and figurativeness of the vocabulary does not have a very significant effect on the results.
I first tried to generate some Polaroid double exposure style photos to see if they matched real photography.
Prompt1: a vintage Polaroid photo combining the silhouettes of a person walking through a forest with an overlay of a city at sunset --style raw --s 250

1. To what degree does your text query influence the generated image?
The result fulfills my description of the scene very well, with elements of forests, silhouettes, cities, and sunsets, and it also incorporates my use of the word “vintage” by showing some of the stains and damage that is characteristic of old photographs around the edges of the photo. But maybe because the scene description looks harmonious, there is not much double exposure feeling, and it's hard to see the characteristics of a Polaroid, the real Polaroid is indeed warm in tone, but the tone of the resulting image looks a bit too warm and yellowish, even though there is some influence from the setting sun, but the color of the sky is still too yellow. This makes it look less true to the actual film tones of the Polaroid.
2. What is the style of the image, and why do you think it has produced that?
Overall this image mimics the style of film taken by a real camera, but the photo doesn't look 'raw', it looks like it's had some sort of filter added or some sort of post-processing, whereas if I were to take a real vintage Polaroid photo with my phone it might look like it has a few more other elements, such as the paper of the Polaroid photo may reflect some ambient light or something.
3. Any thoughts about how the visual elements in the image are organized
From a photographic composition standpoint, this photo uses rule of thirds and depth, added elements in the foreground, middle ground, and background which give a photo depth and a three-dimensional feel. These are common in photography.
4. How would you change the query?
To better emphasize the style of the Polaroid photo, I added the phrase 'all framed by the nostalgic Polaroid border'. But probably because of the size, it's not very obvious that the border comes through.

5. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
I would give it 2, because the photo's realism and color tones can be better, the arrangement of the elements makes the whole photo look ordinary, even though all the described scene elements are realized.
Prompt2: abstract dreamcore depiction of an uncanny natural scene, flowing water shapes the landscape, dream-like, fluidity amidst eerie, fantastical elements --style raw --s 250

1. To what degree does your text query influence the generated image?
In this attempt to depict a more abstract dream scene, I mentioned the elements of 'flowing', 'water', 'nature scene', and the result looks much in line with these elements, and the use of colors is special. The scene does look abstract, but it also lacks some of the hazeness and unclarity of a dreamscape.
2. What is the style of the image, and why do you think it has produced that?
The initial 4 images generated looked a bit similar to an oil painting or landscape ink drawing style in general, and I picked the one that was closest to a 3D modeling style rather than a painting style.

3. Any thoughts about how the visual elements in the image are organized
It is basically the more common visual organization of landscape painting, with no very unusual presence observed. Except for the first one where it looks like large ocean waves take up most of the picture's proportions.
4. How would you change the query?
I removed the phrase 'flowing water shapes the landscape' and replaced it with 'starlight twinkling, haziness' to add some dreaminess. Since it's based on the original generated image revision, so the natural style of the flowing water scene was retained. The starlight and haziness do add some dreaminess, but the color tones tend to be uniform, midjourney still tends to generate some oil painting style images when there is no specific description of the art style of the image.

5. Any other comments?
In order to generate some images that I wouldn't expect, maybe I'll try more descriptive vocabulary and descriptions of different art styles.
6. On a scale of 5 - from 5 being GREAT to 1 being LOW your rating of the result
I will give it 3.5, it accurately generates the elements in the scene based on the descriptions I gave it, and the blend looks relatively uniform, but I expect to see more complex distribution of scene elements.
Observation
Based on comparisons of images generated by midjourney under a figurative descriptive vocabulary versus an abstract descriptive vocabulary, my initial obsevation is that the degree of abstraction and figurativeness of the vocabulary does not have a very significant effect on the results.