Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Due date: March 26, 2026
Project Details: Create a time-based animation using AI Software of Your Choice. Details to be discussed.
Due date: March 26, 2026
Project Details: Create a time-based animation using AI Software of Your Choice. Details to be discussed.
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
These video experiments revealed an important computational constraint within the diffusion-based workflow. While the project aimed to explore temporal continuity through iterative adjustments of denoise values and ControlNet influence, the generation process consistently encountered GPU memory saturation during the sampling stage.
Core Parameters Used in the Study
Model & Sampling
Checkpoint: SD1.5 / DreamShaper
Sampler: Euler
Scheduler: Normal
Sampling Steps: 18–25
CFG Scale: 6.5
Batch Size: 1
Latent & Resolution
Resolution: 512×512 (later reduced during optimization)
Denoise Range Tested: 0.35 – 0.50
ControlNet (Canny)
Preprocessor Resolution: 512
Strength Range Tested: 0.8 – 4.7 (later stabilized around 0.8)
Start Percent: 0.0
End Percent: 0.7–1.0
Video Export
Frame Rate: 8 fps
Format: H.264 (yuv420p)
CRF: 19
Core Parameters Used in the Study
Model & Sampling
Checkpoint: SD1.5 / DreamShaper
Sampler: Euler
Scheduler: Normal
Sampling Steps: 18–25
CFG Scale: 6.5
Batch Size: 1
Latent & Resolution
Resolution: 512×512 (later reduced during optimization)
Denoise Range Tested: 0.35 – 0.50
ControlNet (Canny)
Preprocessor Resolution: 512
Strength Range Tested: 0.8 – 4.7 (later stabilized around 0.8)
Start Percent: 0.0
End Percent: 0.7–1.0
Video Export
Frame Rate: 8 fps
Format: H.264 (yuv420p)
CRF: 19
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
My plan is to continue to work with comfyUI to see what can be created that is time-based.
The plan is to create short videos in which a still photograph will have some motion, for instance, visual elements moved around, or any other visual transition such as image details collapsing, turn around, inside out, rattle, reconfigure, squeeze image to the edge, image data explode, etc.
I am now realizing that the comfyui has to produce still frames which then have to be assembled to simulate the motion.
I am at this time in the preparatory stage, reading online. I came across this info and will dig deeper: https://comfyui.dev/blog/video-generation-in-comfyui/
The plan is to create short videos in which a still photograph will have some motion, for instance, visual elements moved around, or any other visual transition such as image details collapsing, turn around, inside out, rattle, reconfigure, squeeze image to the edge, image data explode, etc.
I am now realizing that the comfyui has to produce still frames which then have to be assembled to simulate the motion.
I am at this time in the preparatory stage, reading online. I came across this info and will dig deeper: https://comfyui.dev/blog/video-generation-in-comfyui/
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
These experiments focused on generating short animated sequences directly within ComfyUI without relying on a pre-existing reference image. The primary objective was to test whether a single diffusion-generated frame could be extended into a temporally coherent micro-animation (2 seconds) using latent batch replication and controlled denoise variation.
Core Parameters Used in Today’s Study
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20
CFG Scale: 6.0
Control After Generate: Randomize
Latent & Resolution
Resolution: 512×512
Batch Replication: Repeat Latent Batch (amount = 16)
Frame Rate: 8 fps
Total Duration: ~2 seconds
PingPong: True
Denoise Range Tested: 0.35 https://vimeo.com/1168660543?share=copy&fl=sv&fe=ci
Core Parameters Used in Today’s Study
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20
CFG Scale: 6.0
Control After Generate: Randomize
Latent & Resolution
Resolution: 512×512
Batch Replication: Repeat Latent Batch (amount = 16)
Frame Rate: 8 fps
Total Duration: ~2 seconds
PingPong: True
Denoise Range Tested: 0.35 https://vimeo.com/1168660543?share=copy&fl=sv&fe=ci
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
I want to use ComfyUI to exert fine control over frames/cues/seeds to generate high-quality keyframes, but due to insufficient server memory, I cannot do continuous frame-by-frame rendering of the entire video locally. The solution is to only use ComfyUI to generate key frames (key poses/key shots) locally, and then upload the key frames to the image → video online service (such as Seedance, DeeVid) to have it do frame filling, or use the key frames as reference material to generate a coherent video.
3 AI Video Generation practical workflow:
Frame-by-frame generation (prompt jitter/seed shift): Each frame is independent or with slight changes; the simplest but prone to flickering.
Latent variable (latent) interpolation/transition: Interpolation is performed in latent space or noise space, which has the advantage of good coherence.
Use temporal consistency modules (WAN / temporal nodes / optical-flow): maintain temporal coherence and reduce jitter. Comfy's WAN related tutorial covers common practices.
Short comparison of online services:
DeeVid (image→video) — accepts image sequences and returns short videos; useful when you want a fast cloud pipeline and don’t have GPU memory locally. Documentation: https://deevid.ai/image-to-video.
Seedance 2.0 (AI video generator) — consumer apps that advertise quick text/image→video generation; easy to use, but often less transparent about model internals and frame-rate control. Example listing: Seedance 2.0 – AI Video Generator (Play Store).
NanoBanana — community tools and web experiments (name varies); friendly for testing Image to video pipelines but may lack enterprise polish.
3 AI Video Generation practical workflow:
Frame-by-frame generation (prompt jitter/seed shift): Each frame is independent or with slight changes; the simplest but prone to flickering.
Latent variable (latent) interpolation/transition: Interpolation is performed in latent space or noise space, which has the advantage of good coherence.
Use temporal consistency modules (WAN / temporal nodes / optical-flow): maintain temporal coherence and reduce jitter. Comfy's WAN related tutorial covers common practices.
Short comparison of online services:
DeeVid (image→video) — accepts image sequences and returns short videos; useful when you want a fast cloud pipeline and don’t have GPU memory locally. Documentation: https://deevid.ai/image-to-video.
Seedance 2.0 (AI video generator) — consumer apps that advertise quick text/image→video generation; easy to use, but often less transparent about model internals and frame-rate control. Example listing: Seedance 2.0 – AI Video Generator (Play Store).
NanoBanana — community tools and web experiments (name varies); friendly for testing Image to video pipelines but may lack enterprise polish.
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Spike introduced us today to deeviv.ai which was quite inspiring. He asked to test the video with my image of a truck in a tree in a lab:
and did a number of variations. I compiled them into a video using the Mac imovie.app and here it is: https://vimeo.com/1168732637
and did a number of variations. I compiled them into a video using the Mac imovie.app and here it is: https://vimeo.com/1168732637
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Motivation
Inspired by the George and Zixuan presentation in class today, I wanted to use ComfyUI to create a short, coherent inpaint-driven bonfire animation (inspired by my favorite game Dark Souls). My goal was to keep the background and camera frame diagram stable while only animating the flame itself, allowing for believable frame-to-frame transitions. Because my server memory is not enough to run a full video model, I designed a lightweight pipeline that generates a small batch of keyframes (4 frames) and then synthesizes them into animated GIFs/videos. This reduces computational costs while creating visually continuous movement in the painted area.
I am very satisfied with the result and hope that this short workflow will help everyone!
I also found that if you want to upload a shorter video you can do it in the form of a gif,so that everyone can see it :)
Methodology I built an inpaint-centered ComfyUI workflow that uses the same base image for every frame, a moving/modified mask so only the fire region changes, and a controlled latent/noise strategy to enforce temporal coherence.
Main steps:
1. Load Image for the background (stable camera framing).
• Load Image for a mask (the inpaint area around the fire). The mask is used to confine edits to the bonfire.
2. Create multi-frame batch
• Use RepeatImageBatch (or construct a small image list) to produce 4 frames from a single base image; each copy becomes one animation frame to inpaint separately.
3. Encode to latent
• VAE Encode to get a LATENT representation for editing.
4. Masking & latent noise
• Set Latent Noise Mask (or equivalent) to provide per-frame controlled noise inside the masked area. This is the place to introduce subtle per-frame variation so the fire changes while background stays constant.
5. Conditioning
• CLIP Text Encode (Prompt) to produce the positive conditioning (e.g., flame, burning, glowing sparks), and optionally a negative conditioning prompt (avoid text, watermark, etc.).
6. Sampling
• KSampler (sampler: dpmpp_2m or karras) with a controlled denoise level. Use the same seed for stable background structure, optionally vary seed or latent noise slightly across frames for movement.
7. Decode
• VAE Decode to convert LATENT back to IMAGE.
8. Batch → Frame list
• Convert Image Batch to Image List (Impact-Pack nodes: Image Batch to Image List / Image List to Image Batch depending on your node set) and ensure shapes are correct.
9. Combine frames into animation
• VHS_VideoCombine (or Video Combine node from VideoHelperSuite) to export animated GIF/video (frame rate, filename prefix, format).
json examples
Evaluation / Analysis -Visual continuity: By keeping the background latent fixed (same seed + same conditioning) and only perturbing the masked region via latent noise, the resulting 4-frame sequence preserves background details and yields natural-looking, localized fire motion.
-Temporal coherence trade-offs: Too large a denoise or completely random seed per frame results in flicker and inconsistent backgrounds. Keep denoise moderate (e.g., 0.4–0.7) and either keep the seed constant or make only tiny per-frame variations in the latent noise mask.
-Resource efficiency: Generating 4 high-quality keyframes is far less memory- and compute-intensive than running full video diffusion models; combining frames into a GIF yields a convincing short loop suitable for presentation and prototyping.
-Limitations: Four frames are enough to show convincing flame behavior at low frame rates (6–10 fps), but for smooth slow-motion you’ll need more frames or an interpolation/post-process step (optical flow / frame interpolation) at the cost of compute.
Inspired by the George and Zixuan presentation in class today, I wanted to use ComfyUI to create a short, coherent inpaint-driven bonfire animation (inspired by my favorite game Dark Souls). My goal was to keep the background and camera frame diagram stable while only animating the flame itself, allowing for believable frame-to-frame transitions. Because my server memory is not enough to run a full video model, I designed a lightweight pipeline that generates a small batch of keyframes (4 frames) and then synthesizes them into animated GIFs/videos. This reduces computational costs while creating visually continuous movement in the painted area.
I am very satisfied with the result and hope that this short workflow will help everyone!
I also found that if you want to upload a shorter video you can do it in the form of a gif,so that everyone can see it :)
Methodology I built an inpaint-centered ComfyUI workflow that uses the same base image for every frame, a moving/modified mask so only the fire region changes, and a controlled latent/noise strategy to enforce temporal coherence.
Main steps:
1. Load Image for the background (stable camera framing).
• Load Image for a mask (the inpaint area around the fire). The mask is used to confine edits to the bonfire.
2. Create multi-frame batch
• Use RepeatImageBatch (or construct a small image list) to produce 4 frames from a single base image; each copy becomes one animation frame to inpaint separately.
3. Encode to latent
• VAE Encode to get a LATENT representation for editing.
4. Masking & latent noise
• Set Latent Noise Mask (or equivalent) to provide per-frame controlled noise inside the masked area. This is the place to introduce subtle per-frame variation so the fire changes while background stays constant.
5. Conditioning
• CLIP Text Encode (Prompt) to produce the positive conditioning (e.g., flame, burning, glowing sparks), and optionally a negative conditioning prompt (avoid text, watermark, etc.).
6. Sampling
• KSampler (sampler: dpmpp_2m or karras) with a controlled denoise level. Use the same seed for stable background structure, optionally vary seed or latent noise slightly across frames for movement.
7. Decode
• VAE Decode to convert LATENT back to IMAGE.
8. Batch → Frame list
• Convert Image Batch to Image List (Impact-Pack nodes: Image Batch to Image List / Image List to Image Batch depending on your node set) and ensure shapes are correct.
9. Combine frames into animation
• VHS_VideoCombine (or Video Combine node from VideoHelperSuite) to export animated GIF/video (frame rate, filename prefix, format).
json examples
Code: Select all
{
"id": "0626d39f-2742-40f5-8ce7-c6199fbf9fa8",
"revision": 0,
"last_node_id": 37,
"last_link_id": 66,
"nodes": [
{
"id": 8,
"type": "CLIPTextEncode",
"pos": [
1524.7701416015625,
933.3764038085938
],
"size": [
400,
200
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
54
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"flame,burning"
]
},
{
"id": 5,
"type": "CLIPTextEncode",
"pos": [
1629.2923583984375,
1221.095703125
],
"size": [
400,
200
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
55
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text,watermark"
]
},
{
"id": 29,
"type": "SetLatentNoiseMask",
"pos": [
1782.48486328125,
791.4531860351562
],
"size": [
264.5999755859375,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 53
},
{
"name": "mask",
"type": "MASK",
"link": 50
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
56
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "SetLatentNoiseMask"
},
"widgets_values": []
},
{
"id": 17,
"type": "CheckpointLoaderSimple",
"pos": [
1024.29736328125,
1278.9427490234375
],
"size": [
315,
98
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
6
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
10,
11
]
},
{
"name": "VAE",
"type": "VAE",
"links": [
27,
57
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"SD1.5/DreamShaper.safetensors"
]
},
{
"id": 30,
"type": "VAEEncode",
"pos": [
1577.2794189453125,
732.0299682617188
],
"size": [
210,
46
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 59
},
{
"name": "vae",
"type": "VAE",
"link": 57
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
53
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 32,
"type": "RepeatImageBatch",
"pos": [
1578.774169921875,
614.90283203125
],
"size": [
315,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 58
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
59
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "RepeatImageBatch"
},
"widgets_values": [
4
]
},
{
"id": 12,
"type": "KSampler",
"pos": [
2140.921142578125,
1304.6380615234375
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 6
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 54
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 56
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1004264693718144,
"randomize",
20,
7,
"dpmpp_2m",
"karras",
0.6000000000000001
]
},
{
"id": 2,
"type": "LoadImage",
"pos": [
1061.085693359375,
765.2529296875
],
"size": [
315,
314
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
58
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
50
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"clipspace/clipspace-mask-742848.6999999881.png [input]",
"image"
]
},
{
"id": 31,
"type": "LoadImage",
"pos": [
693.9619750976562,
765.2071533203125
],
"size": [
315,
314
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": null
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Bonfire-Image.jpg",
"image"
]
},
{
"id": 21,
"type": "VAEDecode",
"pos": [
2210.954345703125,
717.7660522460938
],
"size": [
210,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 23
},
{
"name": "vae",
"type": "VAE",
"link": 27
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
26,
61
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 35,
"type": "ImpactImageBatchToImageList",
"pos": [
2160.6611328125,
867.7892456054688
],
"size": [
315,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 61
}
],
"outputs": [
{
"name": "IMAGE",
"shape": 6,
"type": "IMAGE",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImpactImageBatchToImageList"
}
},
{
"id": 20,
"type": "ImageListToImageBatch",
"pos": [
2176.876953125,
978.3171997070312
],
"size": [
315,
26
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 65
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImageListToImageBatch"
},
"widgets_values": []
},
{
"id": 3,
"type": "VHS_VideoCombine",
"pos": [
2557.69287109375,
973.4593505859375
],
"size": [
219.3603515625,
517.8137817382812
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 66
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-videohelpersuite",
"ver": "0376e577442c236fbba6ef410a4e5ec64aed5017",
"Node name for S&R": "VHS_VideoCombine"
},
"widgets_values": {
"frame_rate": 8,
"loop_count": 0,
"filename_prefix": "AnimateDiff",
"format": "image/gif",
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "AnimateDiff_00026.gif",
"subfolder": "",
"type": "output",
"format": "image/gif",
"frame_rate": 8
},
"muted": false
}
}
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
2511.392822265625,
573.453369140625
],
"size": [
210,
246
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 26
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
6,
17,
0,
12,
0,
"MODEL"
],
[
10,
17,
1,
8,
0,
"CLIP"
],
[
11,
17,
1,
5,
0,
"CLIP"
],
[
23,
12,
0,
21,
0,
"LATENT"
],
[
26,
21,
0,
24,
0,
"IMAGE"
],
[
27,
17,
2,
21,
1,
"VAE"
],
[
50,
2,
1,
29,
1,
"MASK"
],
[
53,
30,
0,
29,
0,
"LATENT"
],
[
54,
8,
0,
12,
1,
"CONDITIONING"
],
[
55,
5,
0,
12,
2,
"CONDITIONING"
],
[
56,
29,
0,
12,
3,
"LATENT"
],
[
57,
17,
2,
30,
1,
"VAE"
],
[
58,
2,
0,
32,
0,
"IMAGE"
],
[
59,
32,
0,
30,
0,
"IMAGE"
],
[
61,
21,
0,
35,
0,
"IMAGE"
],
[
65,
35,
0,
20,
0,
"IMAGE"
],
[
66,
20,
0,
3,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6303940863128553,
"offset": [
-614.0582554298979,
-433.7408572534731
]
},
"frontendVersion": "1.16.9"
},
"version": 0.4
}Evaluation / Analysis -Visual continuity: By keeping the background latent fixed (same seed + same conditioning) and only perturbing the masked region via latent noise, the resulting 4-frame sequence preserves background details and yields natural-looking, localized fire motion.
-Temporal coherence trade-offs: Too large a denoise or completely random seed per frame results in flicker and inconsistent backgrounds. Keep denoise moderate (e.g., 0.4–0.7) and either keep the seed constant or make only tiny per-frame variations in the latent noise mask.
-Resource efficiency: Generating 4 high-quality keyframes is far less memory- and compute-intensive than running full video diffusion models; combining frames into a GIF yields a convincing short loop suitable for presentation and prototyping.
-Limitations: Four frames are enough to show convincing flame behavior at low frame rates (6–10 fps), but for smooth slow-motion you’ll need more frames or an interpolation/post-process step (optical flow / frame interpolation) at the cost of compute.
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Motivation
I wanted to push diffusion-based image-to-video workflows toward motion that feels both natural and unforced — motion that looks like wind moving leaves, water following a channel, a hurricane funnel rotating, ocean swell, or a writhing insect swarm. My pipeline: generate high-quality source images in Automatic1111, then use ComfyUI to turn them into short animated sequences. I draw movement regions with the Mask Editor (where to change pixels) and treat the mask as the control region for per-frame latent / inpaint changes.
Methodology
1. Image generation: Automatic1111, varied prompts to get clean, consistent source frames. Save high-res originals and optional alpha/mask guides.
2. Masking: use Mask Editor to paint the motion region (feathered edges). For moving objects I either create a single moving mask per frame (translated/rotated) or a static mask where I only change latent/noise.
3. Batch frames: in ComfyUI use RepeatImageBatch → per-frame latent/noise adjustments → KSampler (or your preferred sampler) → VAE decode → combine. I experiment with two orthogonal controls: seed strategy (random vs incremental) and denoise strength (controls how much the image can change per step). Observations and scene-by-scene notes
Leaves (tree leaves)
Finding: A precise, descriptive prompt for leaf motion produces far more convincing flutter than a one-word prompt.
Practical: Use a leaf-only motion mask (feather 6–18 px); keep denoise low–medium (0.12–0.22); use incremental seeds per frame (base +1, +2, …) to maintain temporal coherence. Steps 18–28, CFG 6–8. Generation prompt (By George):
photorealistic tree canopy with dense green leaves, high detail on leaf edges and veins, soft natural lighting at golden hour, shallow depth of field, filmic grain optional, ultra-realistic texture
Dynamic-processing prompt (exact):
photorealistic scene, leaves fluttering in a gentle breeze, subtle motion, slight displacement and curl at leaf edges, soft motion blur, natural lighting, high detail, filmic grain optional.
Recommended settings (short): use fixed latent (VAE encode) + set latent noise mask on leaf-only mask; denoise 0.12–0.20; steps 18–24; CFG 6.5–7.5; seed = base_seed + frame_index; mask feather 8–18 px; negative prompt: text, watermark, artifact, extra limbs, lowres.
⸻
River
Finding: Rivers flow along a clear channel; if you mark that flow path the diffusion model can produce coherent downstream motion even without explicit time-series data.
Practical: Mask the river channel and optionally shift the mask a few pixels each frame to guide flow; use low denoise (0.10–0.22) to preserve texture; optionally add small per-frame seed jitter on the water region to simulate turbulence. Generation prompt (exact):
photorealistic flowing river through rocks, clear water with visible small ripples and reflections, sunlight glint highlights, natural color, high detail on foam and wet stones, cinematic composition
Dynamic-processing prompt (exact):
river surface with gentle flowing motion, ripples and small eddies moving downstream, preserve shoreline and rock placement, keep overall composition unchanged, realistic water reflections and subtle displacement, avoid sudden topology changes
Recommended settings: micro-motion denoise 0.12–0.22; stronger ripple 0.20–0.32; steps 20–26; CFG ~7.0; use repeated latent + per-frame seed increments or offset latent noise; optional slight per-frame mask shift; negative prompt: blurry, text, watermark, unnatural colors, extra limbs.
⸻
Ocean / Sea
Finding: Open ocean lacks a single rigid trajectory; higher denoise usually produces more natural, varied swell at the cost of larger frame-to-frame changes.
Practical: Use a larger water mask (feather 12–25 px); experiment denoise 0.18–0.30 first, then test up to 0.4–0.7 for more variety; if frames become jumpy, smooth with optical-flow or frame blending (ffmpeg tblend / minterpolate). Generation prompt (exact):
photorealistic ocean surface, gentle rolling waves, realistic foam and specular highlights, horizon visible, natural sky reflections, high-detail water texture, cinematic lighting
Dynamic-processing prompt (exact):
ocean surface with continuous rolling waves and moving specular highlights, maintain horizon and background sky unchanged, preserve large-scale wave direction, add subtle temporal coherence and motion blur, avoid popping or repeated identical patterns
Recommended settings: coherent waves denoise 0.18–0.30 (or test up to 0.4–0.7 for varied motion); steps 20–30; CFG 6.8–7.2; small seed increments (+1 per frame) or use optical-flow/ControlNet guidance; mask feather 12–25 px; negative prompt: logo, text, unnatural colors, lowres, warped horizon.
⸻
Hurricane (rotating funnel)
Finding: Funnels need a balance between preserving the core funnel geometry and letting debris/particles evolve; mid-range denoise (0.3–0.5) usually works well.
Practical: Prepare a funnel-shaped mask and rotate/translate it slightly per frame; consider a decrement denoise strategy (gradually lower denoise) to stabilize the funnel core while keeping peripheral debris dynamic; use incremental seeds for the funnel and random seeds for dust layers if desired. Generation prompt (exact):
photorealistic tornado / twister over a flat plain, dramatic lighting, rotating funnel cloud with visible dust and debris at the base, realistic cloud structure and motion, high detail on edges and volumetric lighting, cinematic composition
Dynamic-processing prompt (exact):
tornado funnel rotating and translating slightly, swirling dust and debris at base, preserve horizon and background structures, maintain consistent funnel geometry and lighting, emphasize coherent rotation and smooth temporal evolution, avoid popping artifacts
Recommended settings (short): denoise 0.3–0.5; consider decrement strategy across frames; funnel seed incremental, dust layers optional random; apply small per-frame transforms on the funnel mask.
⸻
Swarm (many small moving elements)
Finding: Denoise and seed strategy together control swarm behavior; increasing denoise increases motion intensity, and moving from random → incremental seed increases temporal coherence.
Practical: For coherent flow use incremental seed; for stronger, more dramatic reconfiguration raise denoise (up to ~0.6); guide clusters with larger feather and small per-frame mask translations. Generation prompt (exact):
dense swarm of small insects writhing and crawling over a surface, photorealistic macro detail, high depth and layered motion, natural lighting, realistic insect shapes and shadows, avoid large singular creatures
Dynamic-processing prompt (exact):
swarm of small insects moving collectively, subtle local rearrangements and flow-like motion, preserve background surface texture and lighting, avoid single large monster-like artifacts, keep insect scale consistent across frames
Recommended settings: denoise 0.25–0.38 (or up to 0.6 for dramatic motion); steps 22–30; CFG 6.5–7.5; repeated latent + per-frame seed increments or vary mask intensity; small per-frame mask translations and higher feather to create flow; negative prompt: giant insect, human limbs, text, watermark, extra limbs.
Evaluation / analysis
Animating with diffusion is a balancing act between preserving identity (so an object reads as the same object across frames) and letting randomness (so motion feels alive). With careful masks, seed planning, and denoise tuning you can steer stochasticity into believable natural motion — from single leaves to tempests and swarms.
I wanted to push diffusion-based image-to-video workflows toward motion that feels both natural and unforced — motion that looks like wind moving leaves, water following a channel, a hurricane funnel rotating, ocean swell, or a writhing insect swarm. My pipeline: generate high-quality source images in Automatic1111, then use ComfyUI to turn them into short animated sequences. I draw movement regions with the Mask Editor (where to change pixels) and treat the mask as the control region for per-frame latent / inpaint changes.
Methodology
1. Image generation: Automatic1111, varied prompts to get clean, consistent source frames. Save high-res originals and optional alpha/mask guides.
2. Masking: use Mask Editor to paint the motion region (feathered edges). For moving objects I either create a single moving mask per frame (translated/rotated) or a static mask where I only change latent/noise.
3. Batch frames: in ComfyUI use RepeatImageBatch → per-frame latent/noise adjustments → KSampler (or your preferred sampler) → VAE decode → combine. I experiment with two orthogonal controls: seed strategy (random vs incremental) and denoise strength (controls how much the image can change per step). Observations and scene-by-scene notes
Leaves (tree leaves)
Finding: A precise, descriptive prompt for leaf motion produces far more convincing flutter than a one-word prompt.
Practical: Use a leaf-only motion mask (feather 6–18 px); keep denoise low–medium (0.12–0.22); use incremental seeds per frame (base +1, +2, …) to maintain temporal coherence. Steps 18–28, CFG 6–8. Generation prompt (By George):
photorealistic tree canopy with dense green leaves, high detail on leaf edges and veins, soft natural lighting at golden hour, shallow depth of field, filmic grain optional, ultra-realistic texture
Dynamic-processing prompt (exact):
photorealistic scene, leaves fluttering in a gentle breeze, subtle motion, slight displacement and curl at leaf edges, soft motion blur, natural lighting, high detail, filmic grain optional.
Recommended settings (short): use fixed latent (VAE encode) + set latent noise mask on leaf-only mask; denoise 0.12–0.20; steps 18–24; CFG 6.5–7.5; seed = base_seed + frame_index; mask feather 8–18 px; negative prompt: text, watermark, artifact, extra limbs, lowres.
⸻
River
Finding: Rivers flow along a clear channel; if you mark that flow path the diffusion model can produce coherent downstream motion even without explicit time-series data.
Practical: Mask the river channel and optionally shift the mask a few pixels each frame to guide flow; use low denoise (0.10–0.22) to preserve texture; optionally add small per-frame seed jitter on the water region to simulate turbulence. Generation prompt (exact):
photorealistic flowing river through rocks, clear water with visible small ripples and reflections, sunlight glint highlights, natural color, high detail on foam and wet stones, cinematic composition
Dynamic-processing prompt (exact):
river surface with gentle flowing motion, ripples and small eddies moving downstream, preserve shoreline and rock placement, keep overall composition unchanged, realistic water reflections and subtle displacement, avoid sudden topology changes
Recommended settings: micro-motion denoise 0.12–0.22; stronger ripple 0.20–0.32; steps 20–26; CFG ~7.0; use repeated latent + per-frame seed increments or offset latent noise; optional slight per-frame mask shift; negative prompt: blurry, text, watermark, unnatural colors, extra limbs.
⸻
Ocean / Sea
Finding: Open ocean lacks a single rigid trajectory; higher denoise usually produces more natural, varied swell at the cost of larger frame-to-frame changes.
Practical: Use a larger water mask (feather 12–25 px); experiment denoise 0.18–0.30 first, then test up to 0.4–0.7 for more variety; if frames become jumpy, smooth with optical-flow or frame blending (ffmpeg tblend / minterpolate). Generation prompt (exact):
photorealistic ocean surface, gentle rolling waves, realistic foam and specular highlights, horizon visible, natural sky reflections, high-detail water texture, cinematic lighting
Dynamic-processing prompt (exact):
ocean surface with continuous rolling waves and moving specular highlights, maintain horizon and background sky unchanged, preserve large-scale wave direction, add subtle temporal coherence and motion blur, avoid popping or repeated identical patterns
Recommended settings: coherent waves denoise 0.18–0.30 (or test up to 0.4–0.7 for varied motion); steps 20–30; CFG 6.8–7.2; small seed increments (+1 per frame) or use optical-flow/ControlNet guidance; mask feather 12–25 px; negative prompt: logo, text, unnatural colors, lowres, warped horizon.
⸻
Hurricane (rotating funnel)
Finding: Funnels need a balance between preserving the core funnel geometry and letting debris/particles evolve; mid-range denoise (0.3–0.5) usually works well.
Practical: Prepare a funnel-shaped mask and rotate/translate it slightly per frame; consider a decrement denoise strategy (gradually lower denoise) to stabilize the funnel core while keeping peripheral debris dynamic; use incremental seeds for the funnel and random seeds for dust layers if desired. Generation prompt (exact):
photorealistic tornado / twister over a flat plain, dramatic lighting, rotating funnel cloud with visible dust and debris at the base, realistic cloud structure and motion, high detail on edges and volumetric lighting, cinematic composition
Dynamic-processing prompt (exact):
tornado funnel rotating and translating slightly, swirling dust and debris at base, preserve horizon and background structures, maintain consistent funnel geometry and lighting, emphasize coherent rotation and smooth temporal evolution, avoid popping artifacts
Recommended settings (short): denoise 0.3–0.5; consider decrement strategy across frames; funnel seed incremental, dust layers optional random; apply small per-frame transforms on the funnel mask.
⸻
Swarm (many small moving elements)
Finding: Denoise and seed strategy together control swarm behavior; increasing denoise increases motion intensity, and moving from random → incremental seed increases temporal coherence.
Practical: For coherent flow use incremental seed; for stronger, more dramatic reconfiguration raise denoise (up to ~0.6); guide clusters with larger feather and small per-frame mask translations. Generation prompt (exact):
dense swarm of small insects writhing and crawling over a surface, photorealistic macro detail, high depth and layered motion, natural lighting, realistic insect shapes and shadows, avoid large singular creatures
Dynamic-processing prompt (exact):
swarm of small insects moving collectively, subtle local rearrangements and flow-like motion, preserve background surface texture and lighting, avoid single large monster-like artifacts, keep insect scale consistent across frames
Recommended settings: denoise 0.25–0.38 (or up to 0.6 for dramatic motion); steps 22–30; CFG 6.5–7.5; repeated latent + per-frame seed increments or vary mask intensity; small per-frame mask translations and higher feather to create flow; negative prompt: giant insect, human limbs, text, watermark, extra limbs.
Evaluation / analysis
Animating with diffusion is a balancing act between preserving identity (so an object reads as the same object across frames) and letting randomness (so motion feels alive). With careful masks, seed planning, and denoise tuning you can steer stochasticity into believable natural motion — from single leaves to tempests and swarms.
Re: Project 3: I have basic video working in comfyui
have basic video working in comfyui
Learning from everyone's presentation I do have a video with the text pompt of "100 lines". Attaching workflow here.
I found a interpolation node called "RIFE VFI" Its currently working. It works by adding in-between frames - I now have to test the various setings.
Learning from everyone's presentation I do have a video with the text pompt of "100 lines". Attaching workflow here.
I found a interpolation node called "RIFE VFI" Its currently working. It works by adding in-between frames - I now have to test the various setings.
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
These experiments focused on generating stylized Chinese folklore imagery and exploring how static AI-generated images can be extended into short time-based sequences. The initial stage involved using ComfyUI to generate illustrative frames inspired by traditional Chinese painting aesthetics. The primary objective was to produce visually coherent images that could later function as keyframes within a temporal narrative.
After generating base images in ComfyUI, a secondary experiment was conducted using a new AI video generation system, Hailuo. This platform was tested for its ability to interpolate between two images and create short animated transitions while preserving the stylistic qualities of Chinese painting. The experiment aimed to evaluate whether this system could effectively support time-based storytelling using AI-generated artwork.
Core Workflow Used in Today’s Study
Image Generation (ComfyUI)
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20–30
CFG Scale: 6–7
Control After Generate: Randomize
Resolution & Structure
Resolution: 512 × 512 – 768 × 1024
Generation Method: Diffusion-based image synthesis
Visual Style: Chinese folklore illustration / ink-inspired painting
The images generated in this stage emphasized painterly textures, stylized trees, and mythological atmospheres inspired by traditional East Asian visual culture.
Video Generation Experiment (Hailuo)
Following image generation, the experiment moved into a time-based phase using the Hailuo AI video generation system.
Platform: Hailuo 2.0
Output Resolution: 768p
Duration: ~6 seconds
Two keyframes generated from the ComfyUI process were used as the start and end frames. The system was prompted to generate a smooth visual transition between these frames while maintaining the visual language of traditional Chinese illustration.
The goal of this stage was to explore how AI systems can transform static generative artwork into short animated narratives.
Visual Focus of the Experiment
The generated sequence emphasized:
stylized ancient trees
traditional Chinese costume (hanfu)
mythological landscape atmosphere
gradual environmental transformation
These elements were used to simulate the passage of time within a painterly landscape.
Preliminary Observation
Initial tests suggest that combining diffusion-generated images (ComfyUI) with AI interpolation video systems (Hailuo) provides a promising workflow for producing short time-based visual narratives rooted in traditional artistic styles.
This hybrid pipeline allows static generative artwork to evolve into animated sequences while preserving stylistic coherence.
https://vimeo.com/1170833444?fl=ip&fe=ec
After generating base images in ComfyUI, a secondary experiment was conducted using a new AI video generation system, Hailuo. This platform was tested for its ability to interpolate between two images and create short animated transitions while preserving the stylistic qualities of Chinese painting. The experiment aimed to evaluate whether this system could effectively support time-based storytelling using AI-generated artwork.
Core Workflow Used in Today’s Study
Image Generation (ComfyUI)
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20–30
CFG Scale: 6–7
Control After Generate: Randomize
Resolution & Structure
Resolution: 512 × 512 – 768 × 1024
Generation Method: Diffusion-based image synthesis
Visual Style: Chinese folklore illustration / ink-inspired painting
The images generated in this stage emphasized painterly textures, stylized trees, and mythological atmospheres inspired by traditional East Asian visual culture.
Video Generation Experiment (Hailuo)
Following image generation, the experiment moved into a time-based phase using the Hailuo AI video generation system.
Platform: Hailuo 2.0
Output Resolution: 768p
Duration: ~6 seconds
Two keyframes generated from the ComfyUI process were used as the start and end frames. The system was prompted to generate a smooth visual transition between these frames while maintaining the visual language of traditional Chinese illustration.
The goal of this stage was to explore how AI systems can transform static generative artwork into short animated narratives.
Visual Focus of the Experiment
The generated sequence emphasized:
stylized ancient trees
traditional Chinese costume (hanfu)
mythological landscape atmosphere
gradual environmental transformation
These elements were used to simulate the passage of time within a painterly landscape.
Preliminary Observation
Initial tests suggest that combining diffusion-generated images (ComfyUI) with AI interpolation video systems (Hailuo) provides a promising workflow for producing short time-based visual narratives rooted in traditional artistic styles.
This hybrid pipeline allows static generative artwork to evolve into animated sequences while preserving stylistic coherence.
https://vimeo.com/1170833444?fl=ip&fe=ec