In the past three weeks, I explored MidJourney’s ability to generate three categories of images: realistic and conventional, abstract, and imaginative. I found that MidJourney is not the best tool to generate real and conventional images since a lot of the fine details are missing from the resulting images, making them not realistic. For abstract images, MidJourney would come up with a decent “understanding” of the text query and generate images that surprise me. For imaginative images generation, the MidJourney is capable of blending real and abstract and come up with satisfied results. However, I also found it difficult to control precisely with text descriptions.
Here are some of the example text queries I used:
Text query: starry sky viewed on the top of a volcano, photorealistic
Text query: starry night with galaxy that stretches to infinity, lava erupts out of a volcano, photorealistic
Text query: a man in red jacket skiing down the mountain in high speed, there are pine trees, photorealistic
Link to first week's images:
viewtopic.php?f=86&t=363
In the first category, I was exploring MidJourney’s ability to generate natural scenes and humans. Starry nights and volcanoes are common things, everyone knows ahead of time what they would look like. From the results MidJouney generate, we can see that the software is able to make sense of the elements appearing in the text and put them together. However, the resulting images have a high noise level – there are random bright pixels flying on both the natural scene and skier image. Moreover, fine details that would make the images photorealistic are missing. In my exploration, the boundary between different objects in the same image is very blurry, as evidenced by the sky-volcano boundary and skier-snow boundary. It gives me a feeling that MidJourney would first generate each element in the text query and then later put them together to fit the text description.
In short, for real and conventional image generations, MidJourney is able to make sense of the text query. However, the resulting images, at least in my exploration, are far from what the keyword “photorealistic” describes. Being a CS student and a user of generative AI tools, I think realistic and conventional images are always the hardest for AIs to generate since we, as humans, have good expectations of what comes out and any deviance from those expectations would make the images unrealistic.
For the second category, I went in a completely different direction – abstract. In this exploration, I want to see how MidJourney would make sense of the abstract text description and generate images that we don’t know what look like ahead of time. Here’re some of the text queries:
Text query: starry sky viewed on the top of a volcano, photorealistic
Text query: starry night with galaxy that stretches to infinity, lava erupts out of a volcano, photorealistic
Text query: a man in red jacket skiing down the mountain in high speed, there are pine trees, photorealistic
Link to the images:
viewtopic.php?f=86&t=364
The text queries I chose are abstract in nature. The resulting images, correspondingly, are also abstract. There are no elements that specifically reference forces, gravitational fields, or time and space. Instead, MidJourney utilizes simple geometries and variations in colors to create a “vibe” that fits the text description, which is completely different from the first category. I think this tells something about how MidJourney goes from first understanding the text query to later generating the images. When the text query is abstract, abstract representations in the images are used.
One more thing I noticed when exploring the second category is how varying the adjectives can impact the image generation. The words that reference the “style” set the tone of the images. It also tells me that using high level text descriptors is more useful than giving specific detailed descriptions.
For the next exploration, I did something in between – to text MidJourney’s ability to generate images that are imaginative but not totally abstract.
Here’re some of the images I generated with their corresponding texts.
Text query: tree house
Text query: tree house, realistic
Text query: realistic, tree house, a crooked path
Text query: photorealistic, sophisticated tree house in warm lighting, sourrounded by flowers
Text query: realistic, tree house, a crooked path, viewed from bottom
Text query: realistic, tree house, a crooked path, birdview
I don’t get to see a lot of tree houses in my daily life, so I was really excited about what MidJourney could come up with. From this set of tree house images, I was amazed by MidJourney’s ability to generate imaginative images. Some of the tree house images it generated present very fine details – much like the real tree house images I can find on Google. Yet some of the images are not realistic. Some have a very fragile bottom, making it impossible to sustain the weight. I really like how generative and imaginative those tree house images are. They look like kids’ drawings.
In the end, I proceeded to explore the effect of viewing angles. By default, the images MidJourney generates will have a right-on angle, much like someone is taking a photo in front. I tried keywords “birdview” and “viewed from bottom”. I think in general those words are effective in the overall image generating process.
In conclusion, I tried 3 types of image generation with MidJourney. In short, MidJourney is not good at generating photorealistic conventional images. However, MidJorney excels at generating imaginative and abstract images. In addition to that, the keywords that reference the style or viewing angle really made a huge impact. When referencing something in the text query, it’s better to give a description of the object or concept you want rather than directly giving the name. In one sentence, show don’t tell.