In this final project I delved into the image generation using Stable Diffusion 1.5 and ControlNet models.
- Stable Diffusion 1.5 known for its ability to generate detailed and coherent visuals based on prompts.
- ControlNet is a neural network structure that allows us to control diffusion models like Stable Diffusion by incorporating additional inputs, such as sketches, depth maps, or segmentation maps. This enables more precise manipulation of the generated images.
In this project I wanted to see how far I could go in customizing and directing the output of these models to produce images that are not only visually good but also reflective of specific artistic concepts. By using ControlNet, I aimed to overcome some limitations of traditional text-to-image generation, such as lack of control over specific elements in the scene.
OpenPose Model:
Function: Used to control the pose and movement of human figures.
Application: By inputting human skeleton images, we can generate images with specific poses, useful for character design, dance motion capture, and more.
A 3D render of a girl sitting comfortably on a chair in a glass made house, in the art style of The Sims 4, highly detailed, smooth textures, vibrant colors, soft lighting, cheerful atmosphere, digital illustration
Negative prompt: blurry, low quality, deformed, disfigured, extra limbs, text, watermark, out of frame, distorted hands
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 7, Seed: 2159576166, Size: 710x1000, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Denoising strength: 0.75, Token merging ratio: 0.2, ControlNet 0: "Module: openpose_full, Model: control_v11p_sd15_openpose [cab727d4], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Noise multiplier: 1.05, Version: v1.9.3
A 3D render of a girl sitting comfortably on a chair in a garden, in the art style of The Sims 4, highly detailed, smooth textures, vibrant colors, soft lighting, cheerful atmosphere, digital illustration
Canny Edge Detection Model:
Function: Captures the outlines and edges of images.
Application: By inputting edge maps, we can control the overall shape and structure of the image, retaining the details of the original sketch.
A 3D render of a girl sitting comfortably on a chair in a garden, in the art style of The Sims 4, highly detailed, smooth textures, vibrant colors, soft lighting, cheerful atmosphere, digital illustration
Negative prompt: blurry, low quality, deformed, disfigured, extra limbs, text, watermark, out of frame, distorted hands
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 7, Seed: 3231923029, Size: 710x1000, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Denoising strength: 0.75, Token merging ratio: 0.2, ControlNet 0: "Module: canny, Model: control_v11p_sd15_canny [d14c016b], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 100.0, Threshold B: 200.0, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False,
Control Mode: My prompt is more important", Noise multiplier: 1.05, Version: v1.9.3
Different Control Mode:
CFG scale: 12
A 3D rendering of a Dunhuang mural with deep depth, volumetric lighting, realistic shadows, highly detailed textures, cinematic atmosphere, rendered in Unreal Engine, ultra high definition
Negative prompt: low quality, blurry, deformed, watermark, text
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 7, Seed: 2609786599, Size: 960x473, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Denoising strength: 0.75, Token merging ratio: 0.2, ControlNet 0: "Module: canny, Model: control_v11p_sd15_canny [d14c016b], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 100.0, Threshold B: 200.0, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Noise multiplier: 1.05, Version: v1.9.3
Generation without controlnet:
Depth Model:
Function: Adds depth and spatial perception to images.
Application: By inputting depth maps, we can transform flat images into images with three-dimensional effects, enhancing the dimensionality of the scene.
Oringinal img:
Depth img:
Generation:
A mystical fairytale forest filled with glowing mushrooms and floating fireflies, tall ancient trees with twisted branches, mist rolling over the forest floor, soft ethereal lighting, enchanted atmosphere, ultra-detailed, high-resolution, fantasy art, volumetric lighting, 8K, Unreal Engine render
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 12, Seed: 2759343917, Size: 1080x696, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Token merging ratio: 0.2, ControlNet 0: "Module: depth_midas, Model: control_v11f1p_sd15_depth [cfd03158], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.9.3
A mystical fairytale forest filled with glowing mushrooms and floating fireflies, tall ancient trees with twisted branches, mist rolling over the forest floor, soft ethereal lighting, enchanted atmosphere, vibrant and dreamy rainbow colors, colorful flora and plants, magical light effects, ultra-detailed, high-resolution, fantasy art, volumetric lighting, 8K, Unreal Engine render
Negative prompt: low quality, blurry, deformed, disfigured, noisy, overexposed, underexposed, dull colors, mediocre color grading, muted tones, flat lighting, grayscale, monochrome, dark atmosphere, watermark, text, people, animals
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 12, Seed: 1265101828, Size: 1080x696, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Token merging ratio: 0.2, ControlNet 0: "Module: depth_midas, Model: control_v11f1p_sd15_depth [cfd03158], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.9.3
Comparison between ControlNet Canny model and Depth model:
Segmentation Model:
Function: Performs semantic segmentation, allowing different content or styles to be applied to different regions.
Application: Enables replacing or applying different artistic styles to regions like the sky, ground, buildings within an image, achieving scene replacement and style mixing.
Original img:
Segmentation img:
Generated prompts n settings:
A breathtaking fantasy landscape with a vibrant sunset sky filled with swirling nebula clouds and sparkling stars, the ground covered in lush, luminescent flora with hues of purple and blue, towering crystal formations replacing buildings, majestic floating islands in the background, rendered in a dreamy and colorful 3D style, ultra-detailed, high resolution, soft lighting, cinematic atmosphere, Octane Render, 8K
Negative prompt: low quality, blurry, dull colors, flat lighting, noise, artifacts, watermark, text, overexposed, underexposed
Steps: 20, Sampler: DPM++ 2M, Schedule type: Karras, CFG scale: 12, Seed: 1265101828, Size: 696x1080, Model hash: cc6cb27103, Model: v1-5-pruned-emaonly, Token merging ratio: 0.2, ControlNet 0: "Module: seg_ofade20k, Model: control_v11p_sd15_seg [e1f51eb9], Weight: 1.0, Resize Mode: Crop and Resize, Processor Res: 512, Threshold A: 0.5, Threshold B: 0.5, Guidance Start: 0.0, Guidance End: 1.0, Pixel Perfect: False, Control Mode: Balanced", Version: v1.9.3
Generated img: