Inspired by the George and Zixuan presentation in class today, I wanted to use ComfyUI to create a short, coherent inpaint-driven bonfire animation (inspired by my favorite game Dark Souls). My goal was to keep the background and camera frame diagram stable while only animating the flame itself, allowing for believable frame-to-frame transitions. Because my server memory is not enough to run a full video model, I designed a lightweight pipeline that generates a small batch of keyframes (4 frames) and then synthesizes them into animated GIFs/videos. This reduces computational costs while creating visually continuous movement in the painted area.
I built an inpaint-centered ComfyUI workflow that uses the same base image for every frame, a moving/modified mask so only the fire region changes, and a controlled latent/noise strategy to enforce temporal coherence.
1. Load Image for the background (stable camera framing).
• Load Image for a mask (the inpaint area around the fire). The mask is used to confine edits to the bonfire.
• Use RepeatImageBatch (or construct a small image list) to produce 4 frames from a single base image; each copy becomes one animation frame to inpaint separately.
• VAE Encode to get a LATENT representation for editing.
• Set Latent Noise Mask (or equivalent) to provide per-frame controlled noise inside the masked area. This is the place to introduce subtle per-frame variation so the fire changes while background stays constant.
• CLIP Text Encode (Prompt) to produce the positive conditioning (e.g., flame, burning, glowing sparks), and optionally a negative conditioning prompt (avoid text, watermark, etc.).
• KSampler (sampler: dpmpp_2m or karras) with a controlled denoise level. Use the same seed for stable background structure, optionally vary seed or latent noise slightly across frames for movement.
• VAE Decode to convert LATENT back to IMAGE.
• Convert Image Batch to Image List (Impact-Pack nodes: Image Batch to Image List / Image List to Image Batch depending on your node set) and ensure shapes are correct.
• VHS_VideoCombine (or Video Combine node from VideoHelperSuite) to export animated GIF/video (frame rate, filename prefix, format).
Code: Select all
{
"id": "0626d39f-2742-40f5-8ce7-c6199fbf9fa8",
"revision": 0,
"last_node_id": 37,
"last_link_id": 66,
"nodes": [
{
"id": 8,
"type": "CLIPTextEncode",
"pos": [
1524.7701416015625,
933.3764038085938
],
"size": [
400,
200
],
"flags": {},
"order": 3,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 10
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
54
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"flame,burning"
]
},
{
"id": 5,
"type": "CLIPTextEncode",
"pos": [
1629.2923583984375,
1221.095703125
],
"size": [
400,
200
],
"flags": {},
"order": 4,
"mode": 0,
"inputs": [
{
"name": "clip",
"type": "CLIP",
"link": 11
}
],
"outputs": [
{
"name": "CONDITIONING",
"type": "CONDITIONING",
"links": [
55
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CLIPTextEncode"
},
"widgets_values": [
"text,watermark"
]
},
{
"id": 29,
"type": "SetLatentNoiseMask",
"pos": [
1782.48486328125,
791.4531860351562
],
"size": [
264.5999755859375,
46
],
"flags": {},
"order": 7,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 53
},
{
"name": "mask",
"type": "MASK",
"link": 50
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
56
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "SetLatentNoiseMask"
},
"widgets_values": []
},
{
"id": 17,
"type": "CheckpointLoaderSimple",
"pos": [
1024.29736328125,
1278.9427490234375
],
"size": [
315,
98
],
"flags": {},
"order": 0,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "MODEL",
"type": "MODEL",
"links": [
6
]
},
{
"name": "CLIP",
"type": "CLIP",
"links": [
10,
11
]
},
{
"name": "VAE",
"type": "VAE",
"links": [
27,
57
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "CheckpointLoaderSimple"
},
"widgets_values": [
"SD1.5/DreamShaper.safetensors"
]
},
{
"id": 30,
"type": "VAEEncode",
"pos": [
1577.2794189453125,
732.0299682617188
],
"size": [
210,
46
],
"flags": {},
"order": 6,
"mode": 0,
"inputs": [
{
"name": "pixels",
"type": "IMAGE",
"link": 59
},
{
"name": "vae",
"type": "VAE",
"link": 57
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
53
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEEncode"
},
"widgets_values": []
},
{
"id": 32,
"type": "RepeatImageBatch",
"pos": [
1578.774169921875,
614.90283203125
],
"size": [
315,
58
],
"flags": {},
"order": 5,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 58
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
59
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "RepeatImageBatch"
},
"widgets_values": [
4
]
},
{
"id": 12,
"type": "KSampler",
"pos": [
2140.921142578125,
1304.6380615234375
],
"size": [
315,
262
],
"flags": {},
"order": 8,
"mode": 0,
"inputs": [
{
"name": "model",
"type": "MODEL",
"link": 6
},
{
"name": "positive",
"type": "CONDITIONING",
"link": 54
},
{
"name": "negative",
"type": "CONDITIONING",
"link": 55
},
{
"name": "latent_image",
"type": "LATENT",
"link": 56
}
],
"outputs": [
{
"name": "LATENT",
"type": "LATENT",
"links": [
23
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "KSampler"
},
"widgets_values": [
1004264693718144,
"randomize",
20,
7,
"dpmpp_2m",
"karras",
0.6000000000000001
]
},
{
"id": 2,
"type": "LoadImage",
"pos": [
1061.085693359375,
765.2529296875
],
"size": [
315,
314
],
"flags": {},
"order": 1,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
58
]
},
{
"name": "MASK",
"type": "MASK",
"links": [
50
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"clipspace/clipspace-mask-742848.6999999881.png [input]",
"image"
]
},
{
"id": 31,
"type": "LoadImage",
"pos": [
693.9619750976562,
765.2071533203125
],
"size": [
315,
314
],
"flags": {},
"order": 2,
"mode": 0,
"inputs": [],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": null
},
{
"name": "MASK",
"type": "MASK",
"links": null
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "LoadImage"
},
"widgets_values": [
"Bonfire-Image.jpg",
"image"
]
},
{
"id": 21,
"type": "VAEDecode",
"pos": [
2210.954345703125,
717.7660522460938
],
"size": [
210,
46
],
"flags": {},
"order": 9,
"mode": 0,
"inputs": [
{
"name": "samples",
"type": "LATENT",
"link": 23
},
{
"name": "vae",
"type": "VAE",
"link": 27
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
26,
61
]
}
],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "VAEDecode"
},
"widgets_values": []
},
{
"id": 35,
"type": "ImpactImageBatchToImageList",
"pos": [
2160.6611328125,
867.7892456054688
],
"size": [
315,
26
],
"flags": {},
"order": 11,
"mode": 0,
"inputs": [
{
"name": "image",
"type": "IMAGE",
"link": 61
}
],
"outputs": [
{
"name": "IMAGE",
"shape": 6,
"type": "IMAGE",
"links": [
65
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImpactImageBatchToImageList"
}
},
{
"id": 20,
"type": "ImageListToImageBatch",
"pos": [
2176.876953125,
978.3171997070312
],
"size": [
315,
26
],
"flags": {},
"order": 12,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 65
}
],
"outputs": [
{
"name": "IMAGE",
"type": "IMAGE",
"links": [
66
]
}
],
"properties": {
"cnr_id": "comfyui-impact-pack",
"ver": "727295b52e5f7b5429e81ca2179172865aa83b99",
"Node name for S&R": "ImageListToImageBatch"
},
"widgets_values": []
},
{
"id": 3,
"type": "VHS_VideoCombine",
"pos": [
2557.69287109375,
973.4593505859375
],
"size": [
219.3603515625,
517.8137817382812
],
"flags": {},
"order": 13,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 66
},
{
"name": "audio",
"shape": 7,
"type": "AUDIO",
"link": null
},
{
"name": "meta_batch",
"shape": 7,
"type": "VHS_BatchManager",
"link": null
},
{
"name": "vae",
"shape": 7,
"type": "VAE",
"link": null
}
],
"outputs": [
{
"name": "Filenames",
"type": "VHS_FILENAMES",
"links": null
}
],
"properties": {
"cnr_id": "comfyui-videohelpersuite",
"ver": "0376e577442c236fbba6ef410a4e5ec64aed5017",
"Node name for S&R": "VHS_VideoCombine"
},
"widgets_values": {
"frame_rate": 8,
"loop_count": 0,
"filename_prefix": "AnimateDiff",
"format": "image/gif",
"pingpong": false,
"save_output": true,
"videopreview": {
"hidden": false,
"paused": false,
"params": {
"filename": "AnimateDiff_00026.gif",
"subfolder": "",
"type": "output",
"format": "image/gif",
"frame_rate": 8
},
"muted": false
}
}
},
{
"id": 24,
"type": "PreviewImage",
"pos": [
2511.392822265625,
573.453369140625
],
"size": [
210,
246
],
"flags": {},
"order": 10,
"mode": 0,
"inputs": [
{
"name": "images",
"type": "IMAGE",
"link": 26
}
],
"outputs": [],
"properties": {
"cnr_id": "comfy-core",
"ver": "0.3.29",
"Node name for S&R": "PreviewImage"
},
"widgets_values": []
}
],
"links": [
[
6,
17,
0,
12,
0,
"MODEL"
],
[
10,
17,
1,
8,
0,
"CLIP"
],
[
11,
17,
1,
5,
0,
"CLIP"
],
[
23,
12,
0,
21,
0,
"LATENT"
],
[
26,
21,
0,
24,
0,
"IMAGE"
],
[
27,
17,
2,
21,
1,
"VAE"
],
[
50,
2,
1,
29,
1,
"MASK"
],
[
53,
30,
0,
29,
0,
"LATENT"
],
[
54,
8,
0,
12,
1,
"CONDITIONING"
],
[
55,
5,
0,
12,
2,
"CONDITIONING"
],
[
56,
29,
0,
12,
3,
"LATENT"
],
[
57,
17,
2,
30,
1,
"VAE"
],
[
58,
2,
0,
32,
0,
"IMAGE"
],
[
59,
32,
0,
30,
0,
"IMAGE"
],
[
61,
21,
0,
35,
0,
"IMAGE"
],
[
65,
35,
0,
20,
0,
"IMAGE"
],
[
66,
20,
0,
3,
0,
"IMAGE"
]
],
"groups": [],
"config": {},
"extra": {
"ds": {
"scale": 0.6303940863128553,
"offset": [
-614.0582554298979,
-433.7408572534731
]
},
"frontendVersion": "1.16.9"
},
"version": 0.4
}
-Visual continuity: By keeping the background latent fixed (same seed + same conditioning) and only perturbing the masked region via latent noise, the resulting 4-frame sequence preserves background details and yields natural-looking, localized fire motion.
-Temporal coherence trade-offs: Too large a denoise or completely random seed per frame results in flicker and inconsistent backgrounds. Keep denoise moderate (e.g., 0.4–0.7) and either keep the seed constant or make only tiny per-frame variations in the latent noise mask.
-Resource efficiency: Generating 4 high-quality keyframes is far less memory- and compute-intensive than running full video diffusion models; combining frames into a GIF yields a convincing short loop suitable for presentation and prototyping.
-Limitations: Four frames are enough to show convincing flame behavior at low frame rates (6–10 fps), but for smooth slow-motion you’ll need more frames or an interpolation/post-process step (optical flow / frame interpolation) at the cost of compute.