Project 4: MidJourney: Image to Video

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

Project 4: MidJourney: Image to Video

Post by glegrady » Sat Oct 21, 2023 10:34 am

Project 4: MidJourney: Image to Video

The assignment is to explore generating video segments from text and image prompts.
Wikipedia describes this as: https://en.wikipedia.org/wiki/Text-to-video_model

Rumi has done the initial research with links to:
https://youtu.be/cjMo_r7o-5Q and Pika Labs: https://pikalabs.org/

This video describes how to work with the following three software: https://www.youtube.com/watch?v=ZcChO8hBYPQ with LeiaPix, PikaLabs, Heygen

Leiapix: https://convert.leiapix.com/ could be interesting
PikaLabs: Rumi demonstrated this one
Heygen: costs money, text to voice - / not sure how useful for us

https://www.makeuseof.com/best-ai-video ... -to-video/ 9 more software to consider

Included with your text/image to video, also please add an evaluation / commentary about the results - for instance, to what degree did the text / image prompt have a relevant impact on the results for the video...

-----------------------------------
An argument for the use of the --seed parameter: https://www.youtube.com/watch?v=vpp5NtdrViU
George Legrady
legrady@mat.ucsb.edu

pratyush
Posts: 9
Joined: Wed Oct 04, 2023 9:27 am

Re: Project 4: MidJourney: Image to Video

Post by pratyush » Thu Oct 26, 2023 1:57 am

For Project 4, I tried out 3 out of 4 free AI image to video tools available online. Since, in my previous attempts in AI animation I have already tested out the text to video tools made available by Pika Labs,so I decided this new method a go. While the first two were mainly focused on spatial/ perspectival movements, the third one involved generating character animation through facial and lip movements as well as speech and lip-synchronisation. Creating animation with the third method involved the incorporation of more than one AI tools/platforms. The details are as follows:

1. The first attempt involved using mostly animation of the visual elements inside the pictures along the X and Y axes (Breadth and Length) of the frame. I tried using the Pixamotion AI photo animation app that can animate/apply motion to selected parts an image, while keeping the other areas static. Pixamotion is a smartphone app that gives free access to the user for only 7 days. Alongside animating parts of still images in the image to video format, it also lets the user add animated elements such as fire, rain, snowfall or moving clouds in the sky. It allows the users to use instagram-type filters on the animations too. The user will have to manually draw arrow(s) to designate the lines or paths along which the elements in the picture will move and paint or pin the areas that will remain static. This is all done The result produced could be roughly described as still images having moving or animated parts. I used an earlier image generated on Midjourney with the prompt: Van Gogh's painting The Starry Night in the style of Studio Ghibli Japanese anime, seen through a car window, with vibrant colours:: --aspect 16:9 --chaos 5 --style raw --s 250 - Image #4 @MAT 255. I then went ahead and animated the pictorial elements in the sky, while keeping the car in the foreground and the mountains and the town houses in the back static. The motion arrows as well as the pins/paint tools were all used by hand. The image below shows how distribution of these motions as well as the painted/pinned areas as staticity indicators:

Image

Image

And here is the final animation: https://discord.com/channels/1023109610 ... 4266486795

The obvious problem noticeable here is the way in which the animation pathway distorts the image in order to render the illusion of movement. It does not generate additional pixels out of the given ones to animate, which as a result bends the “brush strokes” used in the sky, thus stretching out the image in ways that appear more like a glitch than smooth movement. The resultant image seems therefore a keyframe animation of bent and distorted parts of the image itself. Finally, I found out a way to create a subtle illusion of depth, using foreground elements as well as those that seem like they are travelling from the background to the foreground. I explored some of the additional features allowed by the app and used the’Overlay’ feature soft-focus Lens Bokeh’s in the foreground. Then, I went on to the ‘Elements’ features and added smoke in the form of a meteor-shaped orbiting fume that traverses along an imagined Z-axis in the image, slightly off-centre in the frame.


2. My second attempt was on LeiaPix, which is yet another free AI animation platform. LeiPix animates still images by rendering depth along the Z-axis. The free version allows from Tracking along all three axes as well as a combination of the three where one can control the parameters of movement such as — the duration and amount of the overall movement as well as the amplitude and phase along each axis individually. It in fact allows only camera movement animations where users are asked select the preset pathways along which the camera tracking will take place, which the visual elements within the picture frame remains static (and results into more like a holographic/lenticular effect). So, the rationale behind this was to choose images that depicts more space than emphasize on one main subject/character in the image in order to see how much the animation affcets the spatial dimensions of the given images.I used three images generated earlier during the course on Midjourney. They respectively demonstrate camera tracking on the X, Y and Z axis (see below: a,b,& c). Finally I used a fourth image generated on Midjourney earlier that incorporate simultaneous tracking movement on all three axes (d). Below are the animated results:

[The image prompts for all 4 on Midjourney were:

(a) Van Gogh's painting The Starry Night in the style of a charcoal sketch on paper, seen through a car window, :: --aspect 16:9 --chaos 5 --no human --style raw --s 250 - Image #1 @MAT 255

(b) A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic --ar 16:9 --c 25 --style raw --s 250 - Image #2 @MAT 255

(c) A science classroom underwater with fatigued students doing math problems on infinite stair cases, painted by Georges Seurat, use Pointillist brush technique, vibrant colours, epic --ar 16:9 --c 25 --style raw --s 250 - Image #1 @MAT 255

(d) https://s.mj.run/3s6dGWdLvvY:: 2::https://cdn.discordapp.com/attachments/ ... d411087c38&:: 0.5::https://cdn.discordapp.com/attachments/ ... 70ba349d17& as a picture-postcard in black and white Kodak TriX 400 ISO, use dramatic lighting and shadows --ar 16:9 --c 10 --s 250 --style raw - Image #3 @MAT 255]

(a) https://discord.com/channels/1023109610 ... 0638376018
(b) https://discord.com/channels/1023109610 ... 6615907430
(c) https://discord.com/channels/1023109610 ... 6537492650
(d) https://discord.com/channels/1023109610 ... 1961544735

What the results demonstrate are again distortions and stretching of pictorial elements beyond a point were certain elements either seem to break apart (the stairs in b) or the spatial movement seem to stretch the main subject/object in the pictures (a). In the movement along the X axis, the main subject appears to be like a cutout instead of a solid 3-dimensional object (c). The result is noticed in the last image (d). This is where I felt the use of Z-depth in LiaPix is a little below standard in the sense that it is still incapable of rendering 3-Dimensional solidity to a image for which it has calculated and coordinated depth along the Z-axis already. Even decreasing the Amplitude and Phase value of the Axes hardly reduced these distortions.

3. My final attempt in AI image to video animation incorporated the use of multiple free AI tools. This included the still image generation platform Midjourney, along with Eleven Labs, an AI speech generator, an AI Depth Pass generator called Midas (via Hugging Face) and finally D-iD, the AI video generator that animated still images into talking head avatars. They are mostly partially free and provided limited attributes for free users (apart from Midjourney of course).The Midjourney image that I used for this is a combination of the renowned painter and printmaker of the German Renaissance of late 15th and early 16th centuries Albrecht Dürer's Self-portrait at 26 which is curated at Prado Museum in Madrid, Spain along with a selfie I took at home. Here is the prompt for the Midjourney image thus generated, where a weightage value of 3 was assigned to my selfie: https://s.mj.run/EW2v9YViBi8:: 3::https://cdn.discordapp.com/attachments/ ... bff0ef69df& as a movie poster --ar 16:9 --c 10 --s 250 --style raw - Image #4 @MAT 255. The image looks less like a movie poster and more like a screen shot from a playstation sci-fi game (which was not exactly what Mentioned in the prompt).The deep fake image of myself that was thus generated was far from accurate. Below are the image prompts I used as well as the resulting image on Midjourney.

Selfie:

Image

Albrecht Dürer's Self-portrait at 26:

Image


MIdjourney image with prompt:

Prompt: https://s.mj.run/EW2v9YViBi8:: 3::https://cdn.discordapp.com/attachments/ ... bff0ef69df& as a movie poster --ar 16:9 --c 10 --s 250 --style raw - @MAT 255 (fast)


Images:

Image


<<Upscape 4>> [Prompt: https://s.mj.run/EW2v9YViBi8:: 3::https://cdn.discordapp.com/attachments/ ... bff0ef69df& as a movie poster --ar 16:9 --c 10 --s 250 --style raw - Image #4 @MAT 255]

Upscaled Image:

Image


Then, using Eleven Labs, I generated the speech for the avatar. Eleven Labs uses written input or prompt of texts to generate speeches from a extensive voice library of voices with different accent and cultural/ethnic/dispositional particularities or vocational attributes (such as journalist or news caster, radio hosts or audiobook voiceover artists). The user is asked to type in their speech as the prompt for the AI and the assign a voice of their choice to read out the text prompt. The users can also change different voice settings to manipulate the selected voice even further (such as, changing the stability or clarity of the voice). My prompt was very straight forward: “Hello! My name is Rumi. I am not a real person.” I used the return key to increase the gap between each sentence and increased the clarity of the voice (I used a British Newscasters voice) so that the speech does not seem too rushed. The resulting sound file was then downloaded and saved in MP3 format. (the link to the sound file could not be attached because Eleven Labs do not have the link sharing option. But I have it downloaded and I could play it class if need be, from my computer).

Next, I used Midas (via Hugging Face) to generate a Depth Pass for my image on Midjourney, so that it could be animated later as a 3D image (with depth along the Z-Axis) using the new AI tools created by Adobe and made available through the new versions of Photoshop and After Effects. Here is the Depth Pass that I generated:
image.png
Depth Pass

Since I have access to neither at the moment, I have yet to try this out in the future. I then moved on to D-iD to generate an animated talking head from the Midjourney image. D-iD uses both audio and text prompts and uses them as spoken words which they lip-synch with the animated talking avatars. The final results however show visible stretching and distortion in the lips, teeth and especially on the cheeks as well as below the chin area. It seems that the depth simulation still needs quite a bit of work here too. Below is the result:


https://discord.com/channels/1023109610 ... 7780005998

gracefeng
Posts: 8
Joined: Tue Oct 03, 2023 1:12 pm

Re: Project 4: MidJourney: Image to Video

Post by gracefeng » Thu Oct 26, 2023 9:31 am

Link to my animations: https://drive.google.com/drive/folders/ ... sp=sharing

Prompt 1: dreamy floaters, bubbles, flowers swaying in wind
My goal was to animate the elements already in the image in a way that would look natural. Since there were already flowers and bubbles, I added keywords like "floaters" and "swaying in the wind" to get a natural, dreamy effect. This was pretty successful. I think the AI tends to animate things as they would move in reality, anyway, so it was not too far "out-there" or imaginative. I think it recognized the elements as flowers and bubbles and assigned motion accordingly.

Prompt 2: eye blinking and looking around, pulsing landscape, psychedelic
For this prompt, I wanted to be a little more creative and unconventional. Using the surrealism of the giant eyeball, I attempted to get it to "look around" the landscape + add pulsing, psychedelic effects to heighten the break from reality. In the video, you can clearly see the texture and pulse rhythm of a human heart, so I think the "pulsing" keyword must've had a lot of weight in the prompt. I was disappointed that the eye stayed mostly stationary.

Prompt 3: Eye ball blinks once -gs 20
I thought it'd be cool to make the eyeball blink, as well. I expected this to be more challenging since there is no eyelid on the eyeball, which means the AI would have to create something out of nothing. I could not get it to blink -- I think the AI struggles to create without proper context. In this case, the necessary context would've been the eyelid.

Prompt 4: Giant eyeball looks around -gs 20
For this prompt, I returned to Prompt 2 to get the eyeball to roll around. I removed "pulsing" from the prompt to prevent that keyword from taking over. Unfortunately, this one was also unsuccessful. It resembles the "depth" feature I observed in video 8, which I generated using LeiaPix.

Prompt 5: Eye ball rolls over mountains -motion 4 -gs 20
With the necessity of context in mind, I had the idea to include the mountains in my prompt to encourage interaction between multiple elements in my image. I thought the blinking prompt didn't work simply because I didn't have an eyelid to begin with. Interestingly, the left eye blinked in this animation even though my prompt omitted anything related to blinking, leading me to think the AI maintains some sort of memory of previous prompts.

Prompt 6: Bubbles, chaos, bursts
Seeing as my previous manipulations of the giant eyeball weren't very successful, I decided to use a different input image and add psychedelic effects, which I thought would work nicely due to how busy the landscape is. I liked the result but it felt too calm for my taste.

Prompt 7: Bubbles, psychedelic, bursts, kaleidoscope -gs 20 -motion 4
To incorporate more chaos into my image, I played with the -motion parameter and set it as high as it would go. I also added "kaleidoscope" into my prompt, which I feel guided the AI to the many kaleidoscope reference images and videos on the internet. This was much more successful, in my opinion. I liked how the structures in the image devolved into pure motion while retaining the image's original colors and overall layout.

Prompt 8: LeiaPix -- no prompt

Prompt 9: Bubbles, psychedelic, bursts, kaleidoscope -gs 20 -motion 4
I liked the previous prompt on my last landscape image so much that I decided to use the same prompt on the eyeball image. I was looking for a blown-out, overexposed effect. This was also very successful. I liked that the AI recognized the eyeball as the focal point of the landscape and framed most of its effects around it. As the image devolves, every structure dissolves into washes of colors except the eyeball, which was a really nice touch, in my opinion.

autumnsmith
Posts: 10
Joined: Tue Oct 03, 2023 1:08 pm

Re: Project 4: MidJourney: Image to Video

Post by autumnsmith » Thu Oct 26, 2023 9:52 am

During this class assignment, I took the opportunity to explore creating motion within conceptual artworks that I have previously worked with. The first and second images were based on an exploration of media which I am currently studying across interdisciplinary fields. I have created several other versions that include 2D, 3D, and new media - like Blender. First, my goal in exploring mid-journey was to upload images of the blender files I had already finished and see if I could input more data (i.e. more blue spheres or more colored toruses) into the image to get a finished work as I had imagined. I used 4-6 images that were then uploaded to Midjourney. I don’t dislike the final product but it is significantly more futuristic and dimensional than I had anticipated.

(1) Within the initial prompt and animation. The spheres did not interact how I had hoped. There continually seems to be a glitching or a dream-like aspect that doesn’t compute correctly. There seems to be an invented space, motion, and blabbing effect that does not seem natural to how one would imagine this to be. My idea or hope was that the rings would spin at the very least or there may be some movement within the spheres. However, instead what happens is that not only does the center area bulge but there is a new sense of movement and input from the Pika program. For example, here we almost see one of these smaller circles become planet-like - almost as though it transforms and turns to expand into something that resembles Earth. This is also particularly interesting to me as this is more supposed to be a rendering of a still life rather than hold a scientific aspect.
1. Prompt: https://s.mj.run/hjNN6JdX5YM :: https://s.mj.run/Sb0YkVUls8I :: https://s.mj.run/TgVONKgZXBs :: https://s.mj.run/GdBjJPEat5Y :: https://s.mj.run/uUZuemSpk8c :: https://s.mj.run/123tRfZDCHM :: 3D composition with four blue spheres receding into space, matrix background stacking on top of each other, and eight warm colored torus, with cool lighting, many 3D objects g --ar 16:9 --style raw --s 250 - Image #1 @MAT 255
https://discord.com/channels/1123665496 ... 5645166602
1.png

(2) With this secondary image, I found it interesting, for something so similar to the previous seed image, how much closer or more interesting this animation was concerning what I had initially hoped to see. Rather than a bulging motion or expansion, we see the center rings moving in a horizontal and circular motion and this invention of a swirling color-changing cloud. I am curious how the program makes or justifies the creative decisions about what or how it chooses things within a scene to animate. Especially the difference between the two and where the lines of the dreaming within the programming start to become apparent. I would also be curious to explore further the defaults or color use that Pika gravitates to. This is considering how the sphere in the front of the image shifts from orange to become a cloud of blue-violet.
2. Prompt: https://s.mj.run/hjNN6JdX5YM :: https://s.mj.run/Sb0YkVUls8I :: https://s.mj.run/TgVONKgZXBs :: https://s.mj.run/GdBjJPEat5Y :: https://s.mj.run/uUZuemSpk8c :: https://s.mj.run/123tRfZDCHM :: 3D composition with four blue spheres receding into space, matrix background stacking on top of each other, and eight warm colored torus, with cool lighting, many 3D objects g --ar 16:9 --style raw --s 250 - Image #4
https://discord.com/channels/1123665496 ... 4164544573
2.png

(3) In the animations labeled 3-7, I used a concept from a claymation that I had previously created. The seed or generative idea was to have a scene from a claymation movie where the main character was a little girl and was essentially growing to be a part of her environment. For example - she blends into a leftover junkyard by adhering her environmental elements to herself to become one with her surroundings. With image 3, I used Midjourney to try to create a base idea or visual as a jumping-off point. I feel like out of all of the images, this was one of the ones that came conceptually closer to what I was looking for. After inputting this into Pika, I found that this was one of the more successful animations. There are very subtle movements here but not necessarily huge uncanny glitches. The hand does begin to blur out at the end and the fingers are not as distinguishable. The hair and slight body movement did feel realistic for how I would’ve expected this character to exist within a world. Overall, I felt like this was one of the most successful of all the animations conceptually, aesthetically, and movement-wise.
3. Prompt: cartoon clay sculpture of girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist --ar 16:9 --style raw --s 250 - Image #2
https://discord.com/channels/1123665496 ... 6673931274
3.png

(4) The remaining animations again follow the same prompt as outlined for animation number 3. From here I was curious to explore how subtle variations in movement shifted even across very similar ideas of generative seed images. Based on this scene, I feel like the background was significantly more successfully rendered/animated than the foreground. The movement and flickers of the candles feel like what I would expect to see. The main character in the foreground blurs out and doesn’t give us a chance as the viewer to understand how this figure would interact within the space. The lack of animation for her companion also seems like a bit of a miss. What I am seeing so far within Pika, is that it’s having a really hard time making sense of how to account for the entirety of a human body. Facial close-ups are possibly more sensical in their animation than some of these further-away stills or shots. The continual blurring out of areas for what is considered “animation” of the image is something I am interested in dissecting further.
4. Prompt: claymation cartoon clay sculpture of girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist --ar 16:9 --style raw --s 250 - Image #3
https://discord.com/channels/1123665496 ... 8935403602
4.png

(5) Of the senses in this series of trials, I felt that this one was a little closer to being a successful animation despite not being 100% perfect. The background here is what I feel is most compelling. It is almost like within the scene there is a dust storm taking place, there is movement without it completely overriding the components of the image or changing entirely. This is the only scene where I feel as though, there seems to be more of a story. I mean that by the way all the objects around the main character shift and feel like they are animate objects. The only thing I don’t feel is as successful in this scene is that the main figure’s head blurs too much out of focus and doesn’t give us a realistic understanding of how she would exist in this world.
5. Prompt: claymation cartoon clay sculpture of girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist --ar 16:9 --style raw --s 250 - Image #3
https://discord.com/channels/1123665496 ... 4901777439
5.png
(6) For prompt 6, I took a different direction and tried to allow the Pika program to imagine the same scene and animation together. I used the same prompt that I had previously put into Midjourney to create the generative seems images for 3, 4, 5, and 7. This was to compare and contrast the settings or presets of the two programs. For example here, I thought the eye movement was more successful and felt the program was probably better able to track/understand the subject as it is an extension of its programming. I thought that was the only successful thing about it though. There is not as much movement in this animation as any of the others. Additionally, I felt the stylization was rather a miss for what I was looking to find. It took the term claymation very literally - which could explain the greyed-out appearance of the entire scene. After this Pika generation, one of the questions I wanted to dive deeper into was how could the two informed AI systems possibly vary in what images were initially uploaded into their systems, and is that why we’re seeing stylistic or conceptual gapping here?
6. Prompt: Prompt: claymation cartoon clay sculpture of girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist --ar 16:9
https://discord.com/channels/1123665496 ... 4651854929 (animation without image attached)

(7) This was my final attempt in trying to animate this same scene. After seeing prompt 6 not quite match what I was hoping, I went back to Midjourney to create and refine a generative seed image style to depict a scene. I felt that this went the furthest off course to what I hoped. The scene begins to feel uncanny. This feeling comes from several components. For example, in the background, the elements appear to be taken over by a sort of sandstorm and then are engulfed and hidden, as if they are disappearing. The component of this that is the most uncanny, is how the main figure transforms. It feels like as we are watching the cartoon figure, it begins turning, and is being animated into a very realistic-looking small child instead. One of the programming aspects that I feel like I am fighting against is the program's want or default to incorporate realistic elements and in some cases override the input styles. Through several animations - I found once the animation went into effect, it shifted items and transformed them outside of the seed style. Again, this last study felt like one of the least successful of all the animations.
7. Prompt: cute claymation cartoon clay sculpture of girl becoming her environment in a rugged, metal, industrial junkyard, futurism, weird lighting, mist --ar 16:9 --stylize 50 --style raw - Image #4
https://discord.com/channels/1123665496 ... 9676869818
7.png
Last edited by autumnsmith on Thu Oct 26, 2023 11:59 am, edited 1 time in total.

colindunne
Posts: 7
Joined: Tue Oct 03, 2023 1:09 pm

Re: Project 4: MidJourney: Image to Video

Post by colindunne » Thu Oct 26, 2023 11:41 am

Images used for prompts:
Prompt: person running in red jungle, cardio, workout --chaos 15 --weird 20 --ar 16:9 --style raw --s 250 - Upscaled (4x) by @MAT 255 (fast)
Image

Prompt: hand raised over sun --ar 16:9 --weird 30
Image

Prompt: tired teen studying, 3d cartoon, animation --ar 16:9
Image

Link to videos organized by prompt:
https://discord.com/channels/1023109610 ... 3914672138

For this project, I created a variety of different starting images in Midjourney to see how the video AI would handle the different requests and styles. I approached with a more anatomically complicated one with the person running through a jungle, a more cartoonish one with a tired teen studying, and a seemingly more straightforward one with moving a hand over the center of the sun. Unexpectedly, the last one was the most difficult one for the AI to accomplish despite repeated attempts and alterations. Errors produced were a lack of movement entirely or complete distortion with scale. The others produced were great results that abided closely by their video prompt parameters. One further test I did was giving the pika video ai a similar prompt but without the source image being included in the prompt. Interestingly it produced a similar result but with much greater complexities in the motion and animation. This is an area I intent to explore much further.

bsierra
Posts: 8
Joined: Tue Oct 03, 2023 3:08 pm

Re: Project 4: MidJourney: Image to Video

Post by bsierra » Thu Oct 26, 2023 11:42 am

1. digital glitching, browser pop ups. -motion4
https://discord.com/channels/1023109610 ... 0139404370

2. bright flash of light growing in brightness, subject turning around, digital scanning technology -motion4
https://discord.com/channels/1023109610 ... 7285898330

3. digital glitching, mosaic pixel glitch, vhs glitch, angular digital artifacts, static, noise, moving water -neg organic shapes -motion 1
https://discord.com/channels/1023109610 ... 8597438504

4. couple kissing, tv screens flashing, tv channels changing rapidly, tv screen static, tv screen noise, tv screen pixelize glitch -neg room changing colors, glitch overlay, flowers falling, people walking across room
https://discord.com/channels/1023109610 ... 9502931054

PikaLabs was very interesting to work with. I feel like it isn't perfect in attempting to replicate text prompts, however, using it with the intention of making something weird or abstract yields better results in my opinion. I just found it hard to control, specifically in the creation of realistic movement, and because of that, I decided to lean into the chaos that I felt the software could create. I feel going forward in following projects that this is something I want to explore as much as possible, the idea of experimenting with the creativity of the AI's mind, while taking a backseat to what it's creating, offering just a bit of guidance. The -neg parameter did pretty well in controlling some of what I could in these videos, especially in prompt #4, as I had extra people enter the scene, and flowers coming to life, which I did not want. I feel that the short video length would make these videos appropriate to turn into a video collage, where there would be rapid-fire scenes of short clips, one after another. I thought it was really cool and fun trying to think of ways to add some movement to these images, as their static nature gave them the potential to be animated in all sorts of ways. I thought the glitchy animation effects worked well for what I was trying to accomplish with the pictures initially. Additionally, I found that some of the text prompt failed to work in the way I intended, specifically in prompt #1 and #2, as #1 had no browser popups, and #2 didn't move the subject correctly. I think I would like to go back to prompt #2 specifically, to attempt to turn the subject how I intended to.

luischavezcarrillo
Posts: 8
Joined: Thu Oct 05, 2023 2:48 pm

Re: Project 4: MidJourney: Image to Video

Post by luischavezcarrillo » Tue Oct 31, 2023 10:24 am

Series 1: https://discord.com/channels/1023109610 ... 3347006546
First video Prompt: night sky meteor shower
Second Video Prompt: image only

I wasn't sure whether the bot would be capable of adding elements or changing parts of an image, but when prompted with making a night sky and meteor shower, the sky remained the same, the water was made to move less intensely than the image only version. This leads me to believe the bot is very limited on what it can add to the image from nothing but words at all.

Series 2: https://discord.com/channels/1023109610 ... 9312197632

This series uses Leiapix. I didn't know what to expect when first opening the tool, but was a bit disappointed that it only does depth modifications. From what I did, it appears that a custom drawn in depth may work best for preserving the object's position in the image static, and simulate the camera moving around it. The bot cannot add much to the image other than slight camera alterations. This leads me to instead focus on what Pika can do. I take the original image to Pika, and the prompt "make it shiny and blinding" either confused the bot, or it tried and failed. The trees became animated, but there was no added light or glint. Perhaps it can't add sources of light either?

Series 3:
Prompt: meteor shower in the sky
https://discord.com/channels/1023109610 ... 3612062750
Prompt: meteor shower
https://discord.com/channels/1023109610 ... 9742894155
Prompt: twinkling stars
https://discord.com/channels/1023109610 ... 7205500979

The first image produced zero movement despite being processed with a prompt. The bot definitely appears to be unable to modify images by adding more elements on its own. The second attempt at making a meteor shower resulted in more of a general star movement across the night sky. Once again no meteors, but the bot appears to know that the starry night sky is exactly that. So when running the same image again, but with twinkling stars, the goal is to create a video where stars glimmer and dim repeatedly, however, the stars must have appeared too similar to snowflakes, as it turned it into a snowfall. It might be likely that its inability to create light sources, as said in series 2, is the reason it chose to turn the stars into snow, however, it did maintain the existing glow the stars had despite changing the exact type of object they are (star to snowflake).

Series 4:
https://discord.com/channels/1023109610 ... 2385272974
Dragon series. Prompts with videos

https://discord.com/channels/1123665496 ... 9494656011
failed comic iteration.
The first video appears to have no motion. The previous failed iteration didn't have any prompt attached, thus attempted to make the dragon roar, but did little with the fire. The "comic book movement" modifier may be causing a severe lack of motion, as typically comics do not move. The first video's fire is spotted and animated well, likely detectable by the contrast and shape the orange colors have in relation to the rest of the scene. There are slight changes in the dragon's mouth and nose, but could maybe be related to shadow casting, rather than the bot animating the dragon.
The second video shows the roaring dragon with a very disturbingly animated jaw. The creature looks more like its panting happily than giving a threatening roar. The teeth move in an unsettling manner. Maybe the bot couldn't figure out the entire head shape, and assumed the similar colored teeth and tongue were the same thing (the tongue). Overall the bot was able to detect the human and give it an idle battle stance animation, and detect the fire and animate that without being asked. Once again likely due to how the orange contrasts with the darker rest of the scene.

Series 5:
Prompt: moving grid and moving sun
https://discord.com/channels/1023109610 ... 0662935722
Prompt: car driving, grid moving backwards towards viewer
https://discord.com/channels/1023109610 ... 5541118102
63edb1109000663.5fc9bdd652a77.png
Prompt: grid moving towards viewer, buildings moving toward viewer
https://discord.com/channels/1023109610 ... 8399860900

I decided to revisit the outrun scene again, this time seeing whether specifying objects would make the bot animate them. Once again, almost no movement. It only made the colors of the objects fade in and fade out. I decide to test a different outrun image, and the bot was able to imply the car was driving, but didn't move the grid, instead moving the city and moving a colored shimmer on the grid itself, which gives implied movement of the grid. Whether the bot intended to use implied movement, I do not know, but it was able to make the grid move, without actually making it move. The bot also was able to completely change the shape of the car, at a glance it seems like the car is being changed into different models each time, but it maintains the same general modified shape as it moves through different perspective views. It appears that the bot can make changes to shapes and objects within a piece, but only if it understands what the object is in a larger context. Stuff like small flakes of white over a night sky could be snow, or stars when observed at a standstill. Cars are very known, thus it can modify an existing car to look like a different car. The bot needs to understand objects, which may be why it cannot animate the first image, as it's too abstract for it.

Outrun/Minimalist Test
To test the bot I used the create command and created https://discord.com/channels/1023109610 ... 7879845928 in order to test what the bot understands as outrun as well as whether it could show an abstract image of it. It successfully renders the scenes with the general shapes and colors that one expects, but does so realistically, not abstract. The bot does not like rendering minimalist scenes, likely due to how less objects and shapes means less points of interest to animate from.

Comic Book Test
To test the bot's comic book filter, I created https://discord.com/channels/1023109610 ... 0679455744, which when compared to the outrun sunsets generated in the previous test, both has no movement, nor does it even look like either, save for the palm trees. The static nature of a comic book may in fact, be to blame for this.

Post Reply