Project 2: Images Created Using the ComfyGUI Interface

glegrady · Post by **glegrady** » Mon Feb 09, 2026 1:57 pm

MOTIVATION: Testing the variables in the Neural Style Transfer NODE

METHODS: One image controls the style (what the image looks like) the other is connected to the content (form / shape). Each can be set between 1000 to 100000. I tried to go beyond but these seem to be the limit.

Code: Select all

this is where the code goes

Looking at the NeuralStyleTransfer node in the workflow, it has 5 widget values/settings:
json"widgets_values": [
100000, // Setting 1
30000, // Setting 2
1, // Setting 3
100, // Setting 4
1 // Setting 5
]
Based on typical neural style transfer implementations, these settings likely control:

Content Weight (100000): Controls how much the output should preserve the original content image structure. Higher values = more faithful to the original content image.
Style Weight (30000): Controls how strongly the style should be applied. Higher values = more aggressive style transfer from the style image.
Total Variation Weight (1): Smoothness/denoising parameter. Helps reduce noise and create smoother transitions. Higher values = smoother but potentially less detailed output - I am using 0 to keep the full texture of the content image (the black and white branches)
Number of Steps/Iterations (100): How many optimization iterations to run. More steps = better quality but slower processing.
Learning_rate : The learning rate determines the step size for each iteration when the algorithm is trying to blend your content and style images:

Higher learning rate (e.g., 0.1, 1.0):

Faster convergence
Larger changes per iteration
Risk of overshooting and creating unstable/poor results
May miss the optimal solution

Lower learning rate (e.g., 0.001, 0.01):

Slower, more careful optimization
Smaller, incremental changes
More stable results
Takes more iterations to reach good results
Better fine-tuning

Why it matters: Neural style transfer works by starting with an image (often the content image or random noise) and gradually adjusting it to:

Match the content of your content image
Match the style of your style image

y_d_g · Post by **y_d_g** » Tue Feb 10, 2026 11:05 am

In this project, I examined the interaction between individual samplers and different schedulers within a single checkpoint model: SD1.5 / realisticVisionV60B1_v51Vae.safetensors.

My primary interest was an attempt to recreate a specific image I photographed in the winter of 2025—melted metal from a car, deformed by the Los Angeles wildfires. The experiment aimed to achieve a comparable surface quality of asphalt and solidified low-melting metals without using reference images (image load) or ControlNet. All experiments were conducted as part of a broader artistic project addressing the Los Angeles fires and the relocation of plant species, which constituted one of the contributing factors to the fires.

For each sampler, I initially tested three schedulers by default: normal, exponential, and linear quadratic. If one of these produced a compelling result, the sampler was subsequently tested across all available schedulers.

In the presented samples, I highlight the most successful outcomes in rendering solidified metal and conveying the texture of asphalt.

The next phase of the research will concentrate on a selective workflow using only the sampler and scheduler combinations that demonstrated the strongest visual and material fidelity. This stage will focus on refining their performance and further exploring their potential for simulating analog-like surface qualities. The next step involves using a second sampler node as a refiner, exploring proper node routing and achieving a balanced interaction within a refiner-based node system.

Default setting:
Сheckpoint model: SD1.5 / realisticVisionV60B1_v51Vae.safetensors
Ksampler settings: seed 1; control after generate: fixed ; cfg: 6 ; denoise 1.00
Steps: 28

Empty Latent Image:
Width: 1920
Hight: 1080
Batch size: 1

Positive prompt:
top-down close-up photo of cracked dark and dry asphalt ground, a bright and long silvery molten metal river cooled into irregular puddles and long drips spreads near the crack, this river looks like mercury, high contrast sunlight, realistic texture, sharp focus, documentary style, minimal composition, no people, no cars, no buildings, gritty surface, ultra realistic

Negative prompt:
people, person, hands, shoes, cars, street markings, text, logo, graffiti, buildings, sky, horizon, wide shot, blurry, lowres, cartoon, CGI, illustration, watermark, signature, symmetry

: Key Reference (Photograph taken on iPhone 13 Pro)

: ComfyUI Structure Screenshot

: Successful molten metal texture result using:

Sampler: dpmpp_2m_sde_gpu
Scheduler: kl_optimal

: Successful asphalt/concrete texture result using:

Sampler: dpmpp_3m_sde_gpu
Scheduler: beta

: Interesting amalgamation of researched textures (this texture might be used later to generate metal) using:

Sampler: dpmpp_2s_ancestral_cfg_pp
Scheduler: normal

: Worst sampler for the current texture research is:

Sampler: lcm (worst throughout every scheduler)
Scheduler: beta

zixuan241 · Post by **zixuan241** » Thu Feb 12, 2026 1:46 pm

These video sequences were developed through iterative adjustments of denoise values, prompt constraints, and ControlNet influence within ComfyUI. Rather than changing the underlying prompt or spatial guidance, the primary variable between the two videos was the denoise parameter, which significantly altered the temporal behavior of the generated frames.

At higher denoise values, the same workflow begins to introduce greater frame-to-frame variation. The human figure appears almost static, with only subtle fluctuations in lighting, texture, and edge definition. This produces a restrained, breathing-like motion where the environment feels alive, but the figure remains anchored in place—more like a memory being gently reactivated than a character acting.

At lower denoise values, the resulting video maintains strong visual continuity across frames. The figure becomes less stable, and motion emerges through shifts in silhouette, posture ambiguity, and fluctuating edge structures. However, this motion is not yet a coherent human movement; instead, it reads as a flickering or unstable presence. The figure feels as if it is trying to move, but never fully resolves into a continuous action.

This comparison reveals that denoise functions as a critical threshold between stability and motion. While increasing denoise introduces temporal variation necessary for animation, it simultaneously disrupts identity consistency, causing the figure to fragment rather than animate smoothly. At this stage, denoise alone is insufficient to produce clear, intentional human movement.

ControlNet was intentionally constrained to the early portion of the sampling process to preserve spatial composition without locking the figure into a rigid pose. This allowed the environment and lighting to remain consistent while leaving room for temporal variation. However, without additional motion-aware guidance, the figure’s movement remains implicit rather than explicit.

The ongoing challenge in this research is achieving a balance where consecutive frames remain visually similar enough to preserve character identity, while still differing enough to generate readable motion. I am currently exploring strategies to produce near-identical base images across frames—using tighter seeds, reduced noise injection, or alternative conditioning methods—so that motion can emerge through controlled deviation rather than randomness. The goal is to allow the human figure to “fully move” while maintaining the quiet, cinematic atmosphere established in the still images.

Workflow Data Summary
Checkpoint
SD 1.5 / DreamShaper

VAE
Default DreamShaper VAE

Positive Prompt Focus
Strong sense of silence
Cinematic lighting
Photographic realism
Film still aesthetic
A single, understated human presence
Minimal narrative description to avoid explicit action cues

Negative Prompt Constraints
Illustration, watercolor, painting
Storytelling or narrative scenes
Multiple people or crowds
Anime, cartoon, stylized aesthetics
Text, watermark, graphic artifacts

Latent Size
768 × 512

ControlNet Configuration

Preprocessor
PyraCanny

ControlNet Model
control_v11p_sd15_canny

Strength
0.6

Start Percent
0.0

End Percent
0.6

KSampler

Sampler
Euler

Scheduler
Normal

Steps
25

CFG Scale
6.0

Seed
Fixed

Video-Specific Parameters

Video A (Higher Denoise)

Denoise
0.85

Frame Rate
12 fps

Video B (Lower Denoise)

Denoise
0.40

Frame Rate
6 fps

Media Arts and Technology

Project 2: Images Created Using the ComfyGUI Interface

Re: Style Transfer studies

Re: Project 2: Images Created Using the ComfyGUI Interface

Re: Project 2: Images Created Using the ComfyGUI Interface