Project 1: Automata1111 Interface
Due date: January 22, 2026
Project Details: Create a series of images that demonstrate an understanding of the options available in the Automatic 1111 interface. The project should also examine the extent to which Stable Diffusion–generated images can produce visual details in relation to Gabriele Peters’s article “Aesthetic Primitives of Images for Visualization”: https://www.mat.ucsb.edu/~g.legrady/aca ... itives.pdf
Specifically: Which aesthetic primitives can Stable Diffusion control reliably?
Which ones emerge implicitly rather than explicitly?
How well do prompt parameters translate into control over primitives like depth, rhythm, or spatial hierarchy?
Describe why the results are interesting or not.
Project 1: Automata1111 Interface
Project 1: Automata1111 Interface
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
Re: Project 1: Automata1111 Interface
Project 1: From Private Memory to Urban Palimpsest
This project follows the development and progression of memory, beginning with the inner, personal space and continuing outward toward the collective, dense space of the urban environment. Beginning with the isolated, aged figure in the personal, private space, the project continues outward through the windows, the facades, and the residential spaces, until the individual architecture becomes lost in the collective space.
Rather than an architectural object, the project engages the building as an experience. Thus, each spatial layer can be considered as a threshold space, through which the scale of memory changes, from personal memory to shared experience, to architectural repetition, to the city as an archive of superimposed time.
First Set of Images:
This first set imagesfeatures a peaceful domestic scene, with the central figure being an elderly person seated or standing in front of or near a window. These images establishes the emotional origin of the project: memory as something private, silent, and embodied, existing within the confines of a lived interior.
Prompt:a quiet interior scene seen from inside, an elderly figure standing near a window, seen from behind, small in scale, not centered, partially merging with the interior, the figure feels anonymous, reduced to silhouette and tone,
more like a remembered presence than a person, the interior space is clean but worn, walls lightly textured, edges softened by time,
details simplified rather than sharp, light enters through tall windows, overexposed near the glass, flattening depth, washing the exterior into pale tonal layers,
outside the window:
old residential buildings layered with newer ones, forms overlap rather than separate, architecture reads as accumulated memory,
not a clear city view, the entire image rendered in the visual language of early printmaking,stone lithograph, early photographic plate,vintage architectural print,subtle grain, uneven ink density, slight misregistration, paper texture visible, limited tonal range.
Navigate: modern photography, digital sharpness, high resolution realism, architectural visualization, interior design rendering, real estate photo, cinematic lighting, dramatic contrast, spotlight, clean modern materials, new renovation, smooth surfaces.
Second Set of Images:
The second set of images represents the window as threshold, spatial as well as conceptual. The city on the far side of the glass remains visible but vague, flattened, condensed, and slightly out of reach. Architecture appears near, enveloping, but separated by transparency. Again, this threshold represents a move from personal to architectural memory, as experience starts to leave its imprint on architecture.
Prompt: dense old residential buildings filling the entire frame, layered, stacked, compressed urban architecture, aged concrete facades, repetitive small windows, balconies overlapping, buildings pressing into each other, no clear skyline, no depth separation, architecture reads as accumulated memory, the city seen through glass, the presence of a window implied rather than emphasized, slight window frame at the edges only, interior reduced to shadow and trace, exterior dominates the composition, the window functions as a viewing membrane, not a subject, black and white,
monochrome, compressed midtones, chalky whites and soft blacks, early photographic plate, stone lithograph, architectural etching, graphite and charcoal texture, uneven ink density, paper grain visible.
Navigate: window frame dominant, interior room, empty room, still life, symmetrical window, centered composition, single vanishing point,
clean perspective,modern city, glass skyscrapers, clean architecture,futuristic buildings, color, warm lighting, cinematic lighting, neon lights, digital sharpness, hyper detailed, ultra realistic, photorealistic.
Third Set of Images:
In this set of works, the focus turns to dense residential buildings. Apartments are repeated in a pattern, with windows and balconies arranging a grid of occupied spaces. No particular interior is readable; it’s an architecture that’s an accumulation of many lives. These images contribute to an understanding of architecture as a record of everyday life, with repetition as the visual discourse.
Prompt: architectural palimpsest, time compressed into interior space, first-person architectural perspective, moving through narrow passages inside residential buildings, architecture layered across multiple decades within the same structure, walls showing overlapping states of repair and decay, balconies, windows, and corridors slightly misaligned by time, repetitive architectural elements drifting out of sync, no single moment dominates the space, earlier and later versions of the building coexisting, black and white image, muted tonal range, non-documentary architectural representation,
soft spatial ambiguity, clear depth without photographic realism, edges slightly unstable, surfaces reading as accumulated memory rather than material fact.
Navigate: photorealistic photography, street photography, documentary photo, historical archive photo, sharp realism, high texture detail, realistic concrete texture, hyper-detailed surfaces, single-point perspective realism, perfect symmetry, clean architectural geometry, cinematic lighting,
strong shadows, high contrast, ruins photography, war damage, abandoned building photography, 3D render
Final Set of Images:
The final set extends the frame to the city level. The structures have fragmented, overlapped, and interpenetrated to the extent that a dense cityscape emerges that also carries the connotations of familiarization and instability. The architectural elements also have a disaligned appearance as if they have come from different eras and have been superimposed without any resolution. The city takes on the connotations of memory itself, not linear or permanent, constantly recreated by what is observable and what has receded into memory.
Prompt: architectural palimpsest, the same city seen across multiple eras, architecture repeating, misaligned, and overwritten by time, earlier and later versions occupying the same space, dense urban fabric formed by temporal overlap, architectural frames and openings revealing different moments, the city presented as an archive or exhibition of time, views layered like memory windows, double exposure architectural collage, time compressed into a single irreversible image, layers cannot be separated, history accumulated rather than erased.
Navigate: 3D render, unreal engine, single building, clean minimal interior, organic growth, strands, fibers, ruins, decay, destruction,
abstract noise, random texture, conceptual architectural drawing, clearly artificial, non-photographic, poetic, surreal, and speculative.
1. Which aesthetic primitives can Stable Diffusion control reliably?
In this project, SD consistently controls the range, texture, and overall composition. SD also consistently produces flattened grayscale images, discernible textures, and compact architectural repetition. Features like layered window repetition, compact facade repetition, and layered structures remain consistent across generations. These features become most stable and thus most easily controlled.
2. Which aesthetic primitives emerge implicitly rather than explicitly?
Memory, time, and emotional presence are implicit, not directly asked for. The idea of architectural degradation, overlapping time, and history comes about despite not having been asked for. The idea of human presence, particularly that of the elderly, serves as a symbol rather than as a subject.
It appears that, in reality, this does not. The essential features of memory, temporality, and affective presence are implicit rather than prompted by an explicit request. The awareness of architectural deterioration, temporal convergence, and the accumulation of layers of history is present despite the absence of an explicit request. The human presence, especially that of the elderly, is present symbolically rather than concretely defined, generating an emotional response through absence rather than presence.
3. How well do prompt parameters translate into control over depth, rhythm, or spatial hierarchy?
The parameters for depth and spatial hierarchy offer limited control, which often leads to flat or unclear spatial interpretations. Rhythm, however, is successfully controlled with the repetition of architectural elements such as windows and corridors. This is a significant component of the visual language.
4. Why are the results interesting or not?
The result is interesting because the limitations of the model support the concept of the project. The flattened depth and the unclear spatiality of the images corroborate the concept of architecture as memory instead of architecture per se. Rather than accurately representing architecture, the images show the extension of individual memory to a social architecture and urban memory.
This project follows the development and progression of memory, beginning with the inner, personal space and continuing outward toward the collective, dense space of the urban environment. Beginning with the isolated, aged figure in the personal, private space, the project continues outward through the windows, the facades, and the residential spaces, until the individual architecture becomes lost in the collective space.
Rather than an architectural object, the project engages the building as an experience. Thus, each spatial layer can be considered as a threshold space, through which the scale of memory changes, from personal memory to shared experience, to architectural repetition, to the city as an archive of superimposed time.
First Set of Images:
This first set imagesfeatures a peaceful domestic scene, with the central figure being an elderly person seated or standing in front of or near a window. These images establishes the emotional origin of the project: memory as something private, silent, and embodied, existing within the confines of a lived interior.
Prompt:a quiet interior scene seen from inside, an elderly figure standing near a window, seen from behind, small in scale, not centered, partially merging with the interior, the figure feels anonymous, reduced to silhouette and tone,
more like a remembered presence than a person, the interior space is clean but worn, walls lightly textured, edges softened by time,
details simplified rather than sharp, light enters through tall windows, overexposed near the glass, flattening depth, washing the exterior into pale tonal layers,
outside the window:
old residential buildings layered with newer ones, forms overlap rather than separate, architecture reads as accumulated memory,
not a clear city view, the entire image rendered in the visual language of early printmaking,stone lithograph, early photographic plate,vintage architectural print,subtle grain, uneven ink density, slight misregistration, paper texture visible, limited tonal range.
Navigate: modern photography, digital sharpness, high resolution realism, architectural visualization, interior design rendering, real estate photo, cinematic lighting, dramatic contrast, spotlight, clean modern materials, new renovation, smooth surfaces.
Second Set of Images:
The second set of images represents the window as threshold, spatial as well as conceptual. The city on the far side of the glass remains visible but vague, flattened, condensed, and slightly out of reach. Architecture appears near, enveloping, but separated by transparency. Again, this threshold represents a move from personal to architectural memory, as experience starts to leave its imprint on architecture.
Prompt: dense old residential buildings filling the entire frame, layered, stacked, compressed urban architecture, aged concrete facades, repetitive small windows, balconies overlapping, buildings pressing into each other, no clear skyline, no depth separation, architecture reads as accumulated memory, the city seen through glass, the presence of a window implied rather than emphasized, slight window frame at the edges only, interior reduced to shadow and trace, exterior dominates the composition, the window functions as a viewing membrane, not a subject, black and white,
monochrome, compressed midtones, chalky whites and soft blacks, early photographic plate, stone lithograph, architectural etching, graphite and charcoal texture, uneven ink density, paper grain visible.
Navigate: window frame dominant, interior room, empty room, still life, symmetrical window, centered composition, single vanishing point,
clean perspective,modern city, glass skyscrapers, clean architecture,futuristic buildings, color, warm lighting, cinematic lighting, neon lights, digital sharpness, hyper detailed, ultra realistic, photorealistic.
Third Set of Images:
In this set of works, the focus turns to dense residential buildings. Apartments are repeated in a pattern, with windows and balconies arranging a grid of occupied spaces. No particular interior is readable; it’s an architecture that’s an accumulation of many lives. These images contribute to an understanding of architecture as a record of everyday life, with repetition as the visual discourse.
Prompt: architectural palimpsest, time compressed into interior space, first-person architectural perspective, moving through narrow passages inside residential buildings, architecture layered across multiple decades within the same structure, walls showing overlapping states of repair and decay, balconies, windows, and corridors slightly misaligned by time, repetitive architectural elements drifting out of sync, no single moment dominates the space, earlier and later versions of the building coexisting, black and white image, muted tonal range, non-documentary architectural representation,
soft spatial ambiguity, clear depth without photographic realism, edges slightly unstable, surfaces reading as accumulated memory rather than material fact.
Navigate: photorealistic photography, street photography, documentary photo, historical archive photo, sharp realism, high texture detail, realistic concrete texture, hyper-detailed surfaces, single-point perspective realism, perfect symmetry, clean architectural geometry, cinematic lighting,
strong shadows, high contrast, ruins photography, war damage, abandoned building photography, 3D render
Final Set of Images:
The final set extends the frame to the city level. The structures have fragmented, overlapped, and interpenetrated to the extent that a dense cityscape emerges that also carries the connotations of familiarization and instability. The architectural elements also have a disaligned appearance as if they have come from different eras and have been superimposed without any resolution. The city takes on the connotations of memory itself, not linear or permanent, constantly recreated by what is observable and what has receded into memory.
Prompt: architectural palimpsest, the same city seen across multiple eras, architecture repeating, misaligned, and overwritten by time, earlier and later versions occupying the same space, dense urban fabric formed by temporal overlap, architectural frames and openings revealing different moments, the city presented as an archive or exhibition of time, views layered like memory windows, double exposure architectural collage, time compressed into a single irreversible image, layers cannot be separated, history accumulated rather than erased.
Navigate: 3D render, unreal engine, single building, clean minimal interior, organic growth, strands, fibers, ruins, decay, destruction,
abstract noise, random texture, conceptual architectural drawing, clearly artificial, non-photographic, poetic, surreal, and speculative.
1. Which aesthetic primitives can Stable Diffusion control reliably?
In this project, SD consistently controls the range, texture, and overall composition. SD also consistently produces flattened grayscale images, discernible textures, and compact architectural repetition. Features like layered window repetition, compact facade repetition, and layered structures remain consistent across generations. These features become most stable and thus most easily controlled.
2. Which aesthetic primitives emerge implicitly rather than explicitly?
Memory, time, and emotional presence are implicit, not directly asked for. The idea of architectural degradation, overlapping time, and history comes about despite not having been asked for. The idea of human presence, particularly that of the elderly, serves as a symbol rather than as a subject.
It appears that, in reality, this does not. The essential features of memory, temporality, and affective presence are implicit rather than prompted by an explicit request. The awareness of architectural deterioration, temporal convergence, and the accumulation of layers of history is present despite the absence of an explicit request. The human presence, especially that of the elderly, is present symbolically rather than concretely defined, generating an emotional response through absence rather than presence.
3. How well do prompt parameters translate into control over depth, rhythm, or spatial hierarchy?
The parameters for depth and spatial hierarchy offer limited control, which often leads to flat or unclear spatial interpretations. Rhythm, however, is successfully controlled with the repetition of architectural elements such as windows and corridors. This is a significant component of the visual language.
4. Why are the results interesting or not?
The result is interesting because the limitations of the model support the concept of the project. The flattened depth and the unclear spatiality of the images corroborate the concept of architecture as memory instead of architecture per se. Rather than accurately representing architecture, the images show the extension of individual memory to a social architecture and urban memory.
Last edited by zixuan241 on Thu Jan 22, 2026 5:15 pm, edited 2 times in total.
Re: Project 1: Automata1111 Interface
The project explores the capacity of the sd_xl_base_1.0.safetensors [31e35c80fc], ukj_800.ckpt [940d3aa61e], and realisticVisionV20_v20.safetensors [e6415c4892] models to simulate analog textures and visual artefacts—those residual traces that analog media leave on the image surface.
Base Sampling steps: 30
Base prompt used for all images:
daguerreotype, 8mm film, lo-fi, grainy textures, analog photograph of a silhouette of a person in a forest at night, artefacts, accidental shot, corrosions
Base format:
720x512
The comparison of samplers was curated according to a principle of uniqueness: repetitive or overly similar images were excluded, while those that differed significantly from the others were selected for presentation. An additional selection criterion was the degree of successful approximation of realistic analog textures—images that exhibited visual distortions characteristic of generative processes, or that deviated substantially from the prompt, were not included.
Negative prompt:
cartoonish, digital
During certain phases of the research, the prompt was slightly modified in order to examine variations in the models’ behavior and visual output.
Seed and CFG scale would often vary
1)Checkpoint model: sd_xl_base_1.0.safetensors [31e35c80fc]
2) Checkpoint model: realisticVisionV20_v20.safetensors [e6415c4892]
Additional negative prompt: bokeh, unfocused
Additional negative prompt reset to the base prompt
Example of a failure of a LMR sampler in work on an anthropomorphic shapes
Continuation in the comment to this post ------------>
Base Sampling steps: 30
Base prompt used for all images:
daguerreotype, 8mm film, lo-fi, grainy textures, analog photograph of a silhouette of a person in a forest at night, artefacts, accidental shot, corrosions
Base format:
720x512
The comparison of samplers was curated according to a principle of uniqueness: repetitive or overly similar images were excluded, while those that differed significantly from the others were selected for presentation. An additional selection criterion was the degree of successful approximation of realistic analog textures—images that exhibited visual distortions characteristic of generative processes, or that deviated substantially from the prompt, were not included.
Negative prompt:
cartoonish, digital
During certain phases of the research, the prompt was slightly modified in order to examine variations in the models’ behavior and visual output.
Seed and CFG scale would often vary
1)Checkpoint model: sd_xl_base_1.0.safetensors [31e35c80fc]
2) Checkpoint model: realisticVisionV20_v20.safetensors [e6415c4892]
Additional negative prompt: bokeh, unfocused
Additional negative prompt reset to the base prompt
Example of a failure of a LMR sampler in work on an anthropomorphic shapes
Continuation in the comment to this post ------------>
Re: Project 1: Automata1111 Interface
Continuation:
Same settings as previously except format: 1440x1024
3) ukj_800.ckpt [940d3aa61e]
Additional negative prompt: bokeh, unfocused
Unfortunately, it was not possible to fully complete the experiment due to recurring technical errors, most notably repeated GPU memory failures:
OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 23.65 GiB total capacity; 13.13 GiB already allocated; 2.06 MiB free; 13.50 GiB reserved in total by PyTorch)…
Same settings as previously except format: 1440x1024
3) ukj_800.ckpt [940d3aa61e]
Additional negative prompt: bokeh, unfocused
Unfortunately, it was not possible to fully complete the experiment due to recurring technical errors, most notably repeated GPU memory failures:
OutOfMemoryError: CUDA out of memory. Tried to allocate 10.00 MiB (GPU 0; 23.65 GiB total capacity; 13.13 GiB already allocated; 2.06 MiB free; 13.50 GiB reserved in total by PyTorch)…
Re: Project 1: Automata1111 Interface
In-Class Practice:
Prompt: an abstract human face emerging from architectural forms, cubist composition inspired by Picasso,
facial features fragmented and misaligned, soft pastel colors, dreamlike, poetic, architectural anatomy.
Negative: realistic face, photography, symmetrical portrait, anime, hyper realism.
Original Image New Image
Original Image New Image
Prompt: an abstract human face emerging from architectural forms, cubist composition inspired by Picasso,
facial features fragmented and misaligned, soft pastel colors, dreamlike, poetic, architectural anatomy.
Negative: realistic face, photography, symmetrical portrait, anime, hyper realism.
Original Image New Image
Original Image New Image
Re: Project 1: Automata1111 Interface
I conducted an XYZ grid experiment on Automatic1111 using a fixed prompt (“An elderly man sits on a park bench, natural light, documentary style”),testing several different combinations (varying image aspect ratios and random seeds). I replaced the Y values (group label variables) with different variables to make the comparison more prominent.
Group1 X = sampler (DPM++ 2M Karras, Euler a),
Y = seed (1,2,5,10),
Z = CFG (1, 5, 10),
Resolution 512*512,
Steps ~20,0
All images are side-by-side for observing the interactive effects of Sampler × seed x CFG.
Group2 X = CFG (3, 7, 12),
Y = seed (multiple representative values),
Z = sampler (DPM++ SDE Karras, DPM++ 2M Karras, Euler a),
Resolution 1200×800,
Steps ~20,0]xyz_grid-0004-1.png[/attachment]
All images are side-by-side for observing the interactive effects of CFG × sampler × seed.
Key observations:
Smaller images (512×512) often feature more diverse perspectives, but as the aspect ratio increases (1200×800), the perspective typically becomes fixed to front and rear shots taken from a horizontal angle.
Different samplers markedly alter the visual tone—DPM++ SDE Karras yields stronger contrast and dramatic effects, DPM++ 2M Karras approaches photographic realism, while Euler a tends toward soft diffusion;
seed alters perspective and local composition without changing overall stylistic tendencies;
CFG controls “text adherence”: Higher CFG amplifies color, edges, and details, but also increases susceptibility to local human errors (limb disconnection, misalignment) and repetition artifacts.
Respect to Peters' aesthetic primitives: Color (palette) is explicitly controllable (via prompt + CFG); Form/contour can be enhanced but is fragile (CFG↑ strengthens edges but increases artifacts); Spatial organization and depth are partially controllable (prompts can suggest hierarchy, but stable control requires ControlNet/depth); Rhythm/repetition is mostly emergent. (depending on training data and seed); human details (hands/limbs/face) are most fragile, often requiring specialized models or post-processing for stability.
Why this is interesting: Experiments show “more obedient (high CFG)” isn't always better—it amplifies prompt instructions but also magnifies model flaws; the sampler itself functions like a stylistic filter, serving as a viable expressive tool in creation; Many aesthetic primitives aren't controlled by a single dimension but result from the combined effects of prompt, CFG, sampler, seed, and (if needed) structural constraints. This is crucial for those seeking rigorous, controllable design with SD.
Experimental Summary:
For photorealistic figures: Use DPM++ 2M + medium CFG (7–9) with face-fixer/post-processing enabled; For dramatic effects, use SDE; for dreamy aesthetics, use Euler a + low CFG; if stable depth or composition control is needed, ensure ControlNet (depth/pose/edge) is enabled in the UI and upload corresponding condition images; avoid setting CFG too high (>12) to prevent human figure distortion or increased artifacts.
Group1 X = sampler (DPM++ 2M Karras, Euler a),
Y = seed (1,2,5,10),
Z = CFG (1, 5, 10),
Resolution 512*512,
Steps ~20,0
All images are side-by-side for observing the interactive effects of Sampler × seed x CFG.
Group2 X = CFG (3, 7, 12),
Y = seed (multiple representative values),
Z = sampler (DPM++ SDE Karras, DPM++ 2M Karras, Euler a),
Resolution 1200×800,
Steps ~20,0]xyz_grid-0004-1.png[/attachment]
All images are side-by-side for observing the interactive effects of CFG × sampler × seed.
Key observations:
Smaller images (512×512) often feature more diverse perspectives, but as the aspect ratio increases (1200×800), the perspective typically becomes fixed to front and rear shots taken from a horizontal angle.
Different samplers markedly alter the visual tone—DPM++ SDE Karras yields stronger contrast and dramatic effects, DPM++ 2M Karras approaches photographic realism, while Euler a tends toward soft diffusion;
seed alters perspective and local composition without changing overall stylistic tendencies;
CFG controls “text adherence”: Higher CFG amplifies color, edges, and details, but also increases susceptibility to local human errors (limb disconnection, misalignment) and repetition artifacts.
Respect to Peters' aesthetic primitives: Color (palette) is explicitly controllable (via prompt + CFG); Form/contour can be enhanced but is fragile (CFG↑ strengthens edges but increases artifacts); Spatial organization and depth are partially controllable (prompts can suggest hierarchy, but stable control requires ControlNet/depth); Rhythm/repetition is mostly emergent. (depending on training data and seed); human details (hands/limbs/face) are most fragile, often requiring specialized models or post-processing for stability.
Why this is interesting: Experiments show “more obedient (high CFG)” isn't always better—it amplifies prompt instructions but also magnifies model flaws; the sampler itself functions like a stylistic filter, serving as a viable expressive tool in creation; Many aesthetic primitives aren't controlled by a single dimension but result from the combined effects of prompt, CFG, sampler, seed, and (if needed) structural constraints. This is crucial for those seeking rigorous, controllable design with SD.
Experimental Summary:
For photorealistic figures: Use DPM++ 2M + medium CFG (7–9) with face-fixer/post-processing enabled; For dramatic effects, use SDE; for dreamy aesthetics, use Euler a + low CFG; if stable depth or composition control is needed, ensure ControlNet (depth/pose/edge) is enabled in the UI and upload corresponding condition images; avoid setting CFG too high (>12) to prevent human figure distortion or increased artifacts.