Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Due date: March 26, 2026
Project Details: Create a time-based animation using AI Software of Your Choice. Details to be discussed.
Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
These video experiments revealed an important computational constraint within the diffusion-based workflow. While the project aimed to explore temporal continuity through iterative adjustments of denoise values and ControlNet influence, the generation process consistently encountered GPU memory saturation during the sampling stage.
Core Parameters Used in the Study
Model & Sampling
Checkpoint: SD1.5 / DreamShaper
Sampler: Euler
Scheduler: Normal
Sampling Steps: 18–25
CFG Scale: 6.5
Batch Size: 1
Latent & Resolution
Resolution: 512×512 (later reduced during optimization)
Denoise Range Tested: 0.35 – 0.50
ControlNet (Canny)
Preprocessor Resolution: 512
Strength Range Tested: 0.8 – 4.7 (later stabilized around 0.8)
Start Percent: 0.0
End Percent: 0.7–1.0
Video Export
Frame Rate: 8 fps
Format: H.264 (yuv420p)
CRF: 19
Core Parameters Used in the Study
Model & Sampling
Checkpoint: SD1.5 / DreamShaper
Sampler: Euler
Scheduler: Normal
Sampling Steps: 18–25
CFG Scale: 6.5
Batch Size: 1
Latent & Resolution
Resolution: 512×512 (later reduced during optimization)
Denoise Range Tested: 0.35 – 0.50
ControlNet (Canny)
Preprocessor Resolution: 512
Strength Range Tested: 0.8 – 4.7 (later stabilized around 0.8)
Start Percent: 0.0
End Percent: 0.7–1.0
Video Export
Frame Rate: 8 fps
Format: H.264 (yuv420p)
CRF: 19
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
My plan is to continue to work with comfyUI to see what can be created that is time-based.
The plan is to create short videos in which a still photograph will have some motion, for instance, visual elements moved around, or any other visual transition such as image details collapsing, turn around, inside out, rattle, reconfigure, squeeze image to the edge, image data explode, etc.
I am now realizing that the comfyui has to produce still frames which then have to be assembled to simulate the motion.
I am at this time in the preparatory stage, reading online. I came across this info and will dig deeper: https://comfyui.dev/blog/video-generation-in-comfyui/
The plan is to create short videos in which a still photograph will have some motion, for instance, visual elements moved around, or any other visual transition such as image details collapsing, turn around, inside out, rattle, reconfigure, squeeze image to the edge, image data explode, etc.
I am now realizing that the comfyui has to produce still frames which then have to be assembled to simulate the motion.
I am at this time in the preparatory stage, reading online. I came across this info and will dig deeper: https://comfyui.dev/blog/video-generation-in-comfyui/
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
These experiments focused on generating short animated sequences directly within ComfyUI without relying on a pre-existing reference image. The primary objective was to test whether a single diffusion-generated frame could be extended into a temporally coherent micro-animation (2 seconds) using latent batch replication and controlled denoise variation.
Core Parameters Used in Today’s Study
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20
CFG Scale: 6.0
Control After Generate: Randomize
Latent & Resolution
Resolution: 512×512
Batch Replication: Repeat Latent Batch (amount = 16)
Frame Rate: 8 fps
Total Duration: ~2 seconds
PingPong: True
Denoise Range Tested: 0.35 https://vimeo.com/1168660543?share=copy&fl=sv&fe=ci
Core Parameters Used in Today’s Study
Model & Sampling
Checkpoint: SD1.5 / zAlibiPixelMix_v1.0
Sampler: DPM++ 2M
Scheduler: Karras
Sampling Steps: 20
CFG Scale: 6.0
Control After Generate: Randomize
Latent & Resolution
Resolution: 512×512
Batch Replication: Repeat Latent Batch (amount = 16)
Frame Rate: 8 fps
Total Duration: ~2 seconds
PingPong: True
Denoise Range Tested: 0.35 https://vimeo.com/1168660543?share=copy&fl=sv&fe=ci
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
I want to use ComfyUI to exert fine control over frames/cues/seeds to generate high-quality keyframes, but due to insufficient server memory, I cannot do continuous frame-by-frame rendering of the entire video locally. The solution is to only use ComfyUI to generate key frames (key poses/key shots) locally, and then upload the key frames to the image → video online service (such as Seedance, DeeVid) to have it do frame filling, or use the key frames as reference material to generate a coherent video.
3 AI Video Generation practical workflow:
Frame-by-frame generation (prompt jitter/seed shift): Each frame is independent or with slight changes; the simplest but prone to flickering.
Latent variable (latent) interpolation/transition: Interpolation is performed in latent space or noise space, which has the advantage of good coherence.
Use temporal consistency modules (WAN / temporal nodes / optical-flow): maintain temporal coherence and reduce jitter. Comfy's WAN related tutorial covers common practices.
Short comparison of online services:
DeeVid (image→video) — accepts image sequences and returns short videos; useful when you want a fast cloud pipeline and don’t have GPU memory locally. Documentation: https://deevid.ai/image-to-video.
Seedance 2.0 (AI video generator) — consumer apps that advertise quick text/image→video generation; easy to use, but often less transparent about model internals and frame-rate control. Example listing: Seedance 2.0 – AI Video Generator (Play Store).
NanoBanana — community tools and web experiments (name varies); friendly for testing Image to video pipelines but may lack enterprise polish.
3 AI Video Generation practical workflow:
Frame-by-frame generation (prompt jitter/seed shift): Each frame is independent or with slight changes; the simplest but prone to flickering.
Latent variable (latent) interpolation/transition: Interpolation is performed in latent space or noise space, which has the advantage of good coherence.
Use temporal consistency modules (WAN / temporal nodes / optical-flow): maintain temporal coherence and reduce jitter. Comfy's WAN related tutorial covers common practices.
Short comparison of online services:
DeeVid (image→video) — accepts image sequences and returns short videos; useful when you want a fast cloud pipeline and don’t have GPU memory locally. Documentation: https://deevid.ai/image-to-video.
Seedance 2.0 (AI video generator) — consumer apps that advertise quick text/image→video generation; easy to use, but often less transparent about model internals and frame-rate control. Example listing: Seedance 2.0 – AI Video Generator (Play Store).
NanoBanana — community tools and web experiments (name varies); friendly for testing Image to video pipelines but may lack enterprise polish.
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Spike introduced us today to deeviv.ai which was quite inspiring. He asked to test the video with my image of a truck in a tree in a lab:
and did a number of variations. I compiled them into a video using the Mac imovie.app and here it is: https://vimeo.com/1168732637
and did a number of variations. I compiled them into a video using the Mac imovie.app and here it is: https://vimeo.com/1168732637
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 3: Create a Time-Based Animation Using AI Software of Your Choice
Motivation
Inspired by the George and Zixuan presentation in class today, I wanted to use ComfyUI to create a short, coherent inpaint-driven bonfire animation (inspired by my favorite game Dark Souls). My goal was to keep the background and camera frame diagram stable while only animating the flame itself, allowing for believable frame-to-frame transitions. Because my server memory is not enough to run a full video model, I designed a lightweight pipeline that generates a small batch of keyframes (4 frames) and then synthesizes them into animated GIFs/videos. This reduces computational costs while creating visually continuous movement in the painted area.
I am very satisfied with the result and hope that this short workflow will help everyone!
I also found that if you want to upload a shorter video you can do it in the form of a gif,so that everyone can see it :)
Methodology I built an inpaint-centered ComfyUI workflow that uses the same base image for every frame, a moving/modified mask so only the fire region changes, and a controlled latent/noise strategy to enforce temporal coherence.
Main steps:
1. Load Image for the background (stable camera framing).
• Load Image for a mask (the inpaint area around the fire). The mask is used to confine edits to the bonfire.
2. Create multi-frame batch
• Use RepeatImageBatch (or construct a small image list) to produce 4 frames from a single base image; each copy becomes one animation frame to inpaint separately.
3. Encode to latent
• VAE Encode to get a LATENT representation for editing.
4. Masking & latent noise
• Set Latent Noise Mask (or equivalent) to provide per-frame controlled noise inside the masked area. This is the place to introduce subtle per-frame variation so the fire changes while background stays constant.
5. Conditioning
• CLIP Text Encode (Prompt) to produce the positive conditioning (e.g., flame, burning, glowing sparks), and optionally a negative conditioning prompt (avoid text, watermark, etc.).
6. Sampling
• KSampler (sampler: dpmpp_2m or karras) with a controlled denoise level. Use the same seed for stable background structure, optionally vary seed or latent noise slightly across frames for movement.
7. Decode
• VAE Decode to convert LATENT back to IMAGE.
8. Batch → Frame list
• Convert Image Batch to Image List (Impact-Pack nodes: Image Batch to Image List / Image List to Image Batch depending on your node set) and ensure shapes are correct.
9. Combine frames into animation
• VHS_VideoCombine (or Video Combine node from VideoHelperSuite) to export animated GIF/video (frame rate, filename prefix, format).
json examples
Evaluation / Analysis -Visual continuity: By keeping the background latent fixed (same seed + same conditioning) and only perturbing the masked region via latent noise, the resulting 4-frame sequence preserves background details and yields natural-looking, localized fire motion.
-Temporal coherence trade-offs: Too large a denoise or completely random seed per frame results in flicker and inconsistent backgrounds. Keep denoise moderate (e.g., 0.4–0.7) and either keep the seed constant or make only tiny per-frame variations in the latent noise mask.
-Resource efficiency: Generating 4 high-quality keyframes is far less memory- and compute-intensive than running full video diffusion models; combining frames into a GIF yields a convincing short loop suitable for presentation and prototyping.
-Limitations: Four frames are enough to show convincing flame behavior at low frame rates (6–10 fps), but for smooth slow-motion you’ll need more frames or an interpolation/post-process step (optical flow / frame interpolation) at the cost of compute.
Inspired by the George and Zixuan presentation in class today, I wanted to use ComfyUI to create a short, coherent inpaint-driven bonfire animation (inspired by my favorite game Dark Souls). My goal was to keep the background and camera frame diagram stable while only animating the flame itself, allowing for believable frame-to-frame transitions. Because my server memory is not enough to run a full video model, I designed a lightweight pipeline that generates a small batch of keyframes (4 frames) and then synthesizes them into animated GIFs/videos. This reduces computational costs while creating visually continuous movement in the painted area.
I am very satisfied with the result and hope that this short workflow will help everyone!
I also found that if you want to upload a shorter video you can do it in the form of a gif,so that everyone can see it :)
Methodology I built an inpaint-centered ComfyUI workflow that uses the same base image for every frame, a moving/modified mask so only the fire region changes, and a controlled latent/noise strategy to enforce temporal coherence.
Main steps:
1. Load Image for the background (stable camera framing).
• Load Image for a mask (the inpaint area around the fire). The mask is used to confine edits to the bonfire.
2. Create multi-frame batch
• Use RepeatImageBatch (or construct a small image list) to produce 4 frames from a single base image; each copy becomes one animation frame to inpaint separately.
3. Encode to latent
• VAE Encode to get a LATENT representation for editing.
4. Masking & latent noise
• Set Latent Noise Mask (or equivalent) to provide per-frame controlled noise inside the masked area. This is the place to introduce subtle per-frame variation so the fire changes while background stays constant.
5. Conditioning
• CLIP Text Encode (Prompt) to produce the positive conditioning (e.g., flame, burning, glowing sparks), and optionally a negative conditioning prompt (avoid text, watermark, etc.).
6. Sampling
• KSampler (sampler: dpmpp_2m or karras) with a controlled denoise level. Use the same seed for stable background structure, optionally vary seed or latent noise slightly across frames for movement.
7. Decode
• VAE Decode to convert LATENT back to IMAGE.
8. Batch → Frame list
• Convert Image Batch to Image List (Impact-Pack nodes: Image Batch to Image List / Image List to Image Batch depending on your node set) and ensure shapes are correct.
9. Combine frames into animation
• VHS_VideoCombine (or Video Combine node from VideoHelperSuite) to export animated GIF/video (frame rate, filename prefix, format).
json examples
Code: Select all
{
"id": "0626d39f-2742-40f5-8ce7-c6199fbf9fa8",
"revision": 0,
"last_node_id": 37,
"last_link_id": 66,
"nodes": [
{
"id": 8,
"type": "CLIPTextEncode",
"pos": [
1524.7701416015625,
933.3764038085938
],
"size": [
400,
200
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
54
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"flame,burning"
]
},
{
"id": 5,
"type": "CLIPTextEncode",
"pos": [
1629.2923583984375,
1221.095703125
],
"size": [
400,
200
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
55
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text,watermark"
]
},
{
"id": 29,
"type": "SetLatentNoiseMask",
"pos": [
1782.48486328125,
791.4531860351562
],
"size": [
264.5999755859375,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 53
},
{
"name": "mask",
"type": "MASK",
"link": 50
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
56
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "SetLatentNoiseMask"
},
"widgets_values": []
},
{
"id": 17,
"type": "CheckpointLoaderSimple",
"pos": [
1024.29736328125,
1278.9427490234375
],
"size": [
315,
98
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
6
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
10,
11
]
},
{
"name": "VAE",
"type": "VAE",
"links": [
27,
57
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"SD1.5/DreamShaper.safetensors"
]
},
{
"id": 30,
"type": "VAEEncode",
"pos": [
1577.2794189453125,
732.0299682617188
],
"size": [
210,
46
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 59
},
{
"name": "vae",
"type": "VAE",
"link": 57
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
53
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 32,
"type": "RepeatImageBatch",
"pos": [
1578.774169921875,
614.90283203125
],
"size": [
315,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 58
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
59
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "RepeatImageBatch"
},
"widgets_values": [
4
]
},
{
"id": 12,
"type": "KSampler",
"pos": [
2140.921142578125,
1304.6380615234375
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 6
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 54
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 56
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1004264693718144,
"randomize",
20,
7,
"dpmpp_2m",
"karras",
0.6000000000000001
]
},
{
"id": 2,
"type": "LoadImage",
"pos": [
1061.085693359375,
765.2529296875
],
"size": [
315,
314
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
58
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
50
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"clipspace/clipspace-mask-742848.6999999881.png [input]",
"image"
]
},
{
"id": 31,
"type": "LoadImage",
"pos": [
693.9619750976562,
765.2071533203125
],
"size": [
315,
314
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": null
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Bonfire-Image.jpg",
"image"
]
},
{
"id": 21,
"type": "VAEDecode",
"pos": [
2210.954345703125,
717.7660522460938
],
"size": [
210,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 23
},
{
"name": "vae",
"type": "VAE",
"link": 27
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
26,
61
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 35,
"type": "ImpactImageBatchToImageList",
"pos": [
2160.6611328125,
867.7892456054688
],
"size": [
315,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 61
}
],
"outputs": [
{
"name": "IMAGE",
"shape": 6,
"type": "IMAGE",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImpactImageBatchToImageList"
}
},
{
"id": 20,
"type": "ImageListToImageBatch",
"pos": [
2176.876953125,
978.3171997070312
],
"size": [
315,
26
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 65
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImageListToImageBatch"
},
"widgets_values": []
},
{
"id": 3,
"type": "VHS_VideoCombine",
"pos": [
2557.69287109375,
973.4593505859375
],
"size": [
219.3603515625,
517.8137817382812
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 66
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-videohelpersuite",
"ver": "0376e577442c236fbba6ef410a4e5ec64aed5017",
"Node name for S&R": "VHS_VideoCombine"
},
"widgets_values": {
"frame_rate": 8,
"loop_count": 0,
"filename_prefix": "AnimateDiff",
"format": "image/gif",
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "AnimateDiff_00026.gif",
"subfolder": "",
"type": "output",
"format": "image/gif",
"frame_rate": 8
},
"muted": false
}
}
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
2511.392822265625,
573.453369140625
],
"size": [
210,
246
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 26
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
6,
17,
0,
12,
0,
"MODEL"
],
[
10,
17,
1,
8,
0,
"CLIP"
],
[
11,
17,
1,
5,
0,
"CLIP"
],
[
23,
12,
0,
21,
0,
"LATENT"
],
[
26,
21,
0,
24,
0,
"IMAGE"
],
[
27,
17,
2,
21,
1,
"VAE"
],
[
50,
2,
1,
29,
1,
"MASK"
],
[
53,
30,
0,
29,
0,
"LATENT"
],
[
54,
8,
0,
12,
1,
"CONDITIONING"
],
[
55,
5,
0,
12,
2,
"CONDITIONING"
],
[
56,
29,
0,
12,
3,
"LATENT"
],
[
57,
17,
2,
30,
1,
"VAE"
],
[
58,
2,
0,
32,
0,
"IMAGE"
],
[
59,
32,
0,
30,
0,
"IMAGE"
],
[
61,
21,
0,
35,
0,
"IMAGE"
],
[
65,
35,
0,
20,
0,
"IMAGE"
],
[
66,
20,
0,
3,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6303940863128553,
"offset": [
-614.0582554298979,
-433.7408572534731
]
},
"frontendVersion": "1.16.9"
},
"version": 0.4
}Evaluation / Analysis -Visual continuity: By keeping the background latent fixed (same seed + same conditioning) and only perturbing the masked region via latent noise, the resulting 4-frame sequence preserves background details and yields natural-looking, localized fire motion.
-Temporal coherence trade-offs: Too large a denoise or completely random seed per frame results in flicker and inconsistent backgrounds. Keep denoise moderate (e.g., 0.4–0.7) and either keep the seed constant or make only tiny per-frame variations in the latent noise mask.
-Resource efficiency: Generating 4 high-quality keyframes is far less memory- and compute-intensive than running full video diffusion models; combining frames into a GIF yields a convincing short loop suitable for presentation and prototyping.
-Limitations: Four frames are enough to show convincing flame behavior at low frame rates (6–10 fps), but for smooth slow-motion you’ll need more frames or an interpolation/post-process step (optical flow / frame interpolation) at the cost of compute.