Media Arts and Technology

Posted: **Fri Sep 16, 2022 8:07 am**

wk07 - Stable Diffusion - 2nd set/b]

Post your 2nd realized in Stable Diffusion here

Posted: **Thu Nov 10, 2022 2:21 pm**

This week I explored Material Stable Diffusion, a fork of the Stable Diffusion Cog model that outputs tileable images for use in 3D applications such as Monaverse. It is designed to produce fine detailed texture images. Here's the link to the Material Stable Diffusion: https://replicate.com/tommoore515/mater ... _diffusion

I did a little research on Monaverse, which is a beautiful social worlds for the users to collect, show arts, and gather with others in the Metaverse. It combines the innovative metaverse tooling with the latest blockchain technology and features a decentralized and creator-first system. Assets uploaded to Mona are decentralized and stored on IPFS (InterPlanetary File System). Here's a link to the Monaverse: https://monaverse.com/

Material Stable Diffusion is still mysterious the moment I explored it. I did not find detailed documentation on how to use it or its different between the Stable Diffusion. And there's no blog detailing how Material Stable Diffusion works under the hood. However, I do find a very lone technical paper on High-Resolution Image Synthesis with Latent Diffusion Models. https://arxiv.org/pdf/2112.10752.pdf

As an ice-cream fan, I tried the ice-cream texture first.

Prompt: strawberry milk ice-cream texture.

Promot: cookie dough texture.

I am in general satisfied with my ice-cream texture generation. The strawberry ice-cream looks very very real. The color looks a little pinker than I expected but the texture looks very good under the lighting. I can tell the cone is made of corn and is crispy. It shows fine texture of the cone. For the ice-cream itself, I am very happy with the fact that it shows a natural crack, which reveals what it looks like inside.

The image of the cooke dough looks a little dark and the cookie seems missing from the generation but the texture looks very inviting and real. It would be better if the resolution is a little higher. In addition, it looks like the lighting of the image is not ambient enough. There are too much shadows.

I then take a look at some of the example generations as well as their prompts. I found that most of the prompts contain the phrases "trending on artstation, base color, albedo, 4k". Those phrase are purely adjectives and do not provide a description of the texture being generated. From the results, it looks like those phrases are important to generate real, vivid, detailed texture images.

More explorations on Material Stable Diffusion.

Prompt: Lava flowing down a mountain texture

Prompt: Lava flowing down a mountain texture, trending on artstation, base color, albedo, 4k

I also use the same prompt to on Lexica to see what Stable Diffusion would generate and compare the results.

Generated with Lexica:
Prompt: Lava flowing down a mountain texture, trending on artstation, base color, albedo, 4k

From the results above, it is clear that Material Stable Diffusion is the best to generate surface textures. We can clearly see the iconic surface texture
of the volcanic rock as well as the effect that the lava is about to solidify. I think the detailed surface texture generated by the Material Stable Diffusion is extremely useful for the texture mapping. Mapping is a technique used to add surface detail to otherwise flat and featureless faces without increasing the polygon count. Mapping information is stored in two dimensions and applied to a polygon using various interpolations, which is widely used in gaming engine and computer graphics to generate high resolution, vivid 3D objects.

Below is a diagram that illustrates the texture mapping.

Prompt: honey from a bottle texture

Generated with Lexica:
honey from a bottle texture

I think Material Stable Diffusion definitely is better at generating real textures. It provides fine details that other models cannot. One more thing I found is that Material Stable Diffusion is stricter on the text prompt. I have encountered several times where it stopped generating images because my prompt is not safe.

I did a little digging on the NSFW. Here's what I found. If you are accessing Stable Diffusion via Hugging Face or the DreamStudio web app, you will not be able to generate NSFW Images as these tools have NSFW filters. Online forums suggest that users can bypass the NSFW filters if generating the images locally or on Google Colab. However, due to the complexity of installing the Material Stable Diffusion on my laptop, I did not try it.

Posted: **Thu Nov 10, 2022 2:26 pm**

For this week, I experimented photo-generating with Text2Light and material_stable_diffusion. If I compare these two, they are both easy to use but it's pretty obvious that material_stable_diffusion is more time-saving as it has a more straightforward interface. Looking at their results, I am pretty baffled by Text2Light, as it didn't quite succeed in generating images that have strong correlations with the prompts I put in. From what I understand, Text2Light aims at creating a three-dimensional space where a light source is desired--essentially through this we can imagine how a particular architectural/natural space can be lit. This could be well manifested when I put in the prompt: "A theater during rehearsal."

"A theater during rehearsal" implies a particular lighting condition: the whole theater including the audience area might be lit up as it doesn't necessarily require full stage light. So far so good.

However, things got a little bit messy as I started to put in prompts that do not necessarily give out enough information that says something about the lighting conditions. For example, some thing like this: "Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, a psychedelic drug–induced hallucination." This old prompt I have been using throughout these past weeks.

Not only did it fail to understand the lighting implications, but it also failed to understand the prompt almost at all. It gave out an image that almost looked like the sample it initially used. Hence I was wondering if something went wrong accidentally because of my own mistakes. Then I tried with another prompt, where I tried to create a lighting condition for the AI: "A cityscape of low skyscrapers in Nigeria during sunset."

Apparently, it was only able to pick up sunset in the prompt as the photo is of purely sunset. Did the "cityscape" in the prompt confuse the AI in terms of a more expansive space it suggests? In order to make things more clearly, I tried with an even more straightforward prompt: "The city of Nigeria in the morning."

What it came up with was a weird space seemingly inside a building, with the interior of the office and the space of hallway meshed with each other. I think we need to further explore what the best way to provide a prompt for the Text2Light AI.

However, material_stable_diffusion functions more similarly to the COLAB stable diffusion Yixuan provided last week. When putting in prompt: "Post-oil Nigeria, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, a psychedelic drug–induced hallucination."

For the first two images, they are pretty much the same as the results I got from last week--only that they are more like a cut-out from a larger landscape picture. However, in generating four different outputs, we have another result that focused more on the psychedelic portion of the prompt. The psychedelic prompt becomes the overwhelming style for the image, creating the animating style of the image (a similar function we have on Midjourney).

The dominating animation style appeared when I put in the prompt: "afrofuturism, low skyscrapers, buildings and homes carpeted with greenest grass and smiley-faced bopping periwinkle flowers, animations from the early 1900s."

What's interesting about this platform for me is actually the NSFW policy of the prompt/the image. When I put in a prompt of this: "a pregnant woman standing in front of a robot home, pollens."
I got no image results but a NSFW warning from the system:

What's in the prompt the violates the NSFW code? Is it "a pregnant woman?"
However, when I "stylized" the prompt and typed in: "Wangechi Mutu, a pregnant woman, a robot home, pollens, animations from the early 1900s."
It successfully generated images without the warning:

The images are indeed disturbing to some but what I found even more disturbing is the black box in which how content regulations are being designed and enforced in an AI system. Did it actually censor not my prompt but actually the image that it created? Through "stylization" it's more viable instead because it might be an artistic project or projecting onto the future but not asking for a political now? Those are the questions I hope to ask in this one NSFW warning and what does it mean for an AI system as such.

Posted: **Thu Nov 17, 2022 1:01 pm**

Text2Light Test:
It seems this version of stable diffusion does not understand prompt well. Although images persist to be panorama, their content does not follow the guidance from prompts.

Prompt: A stunning detailed shoggoth by zdzisław beksiński, stormy ocean, beautiful lighting, full moon, detailed swirling water tornado, artstation
image generated from Lexica

image generated from Text2Light

Prompt: A stunning, moody photograph from an alien planet. vivid colors
image generated from Lexica

image generated from Text2Light

stablediffusion-infinity Test:
This is by far the closest tool that a designer can use to assist design production. I uploaded Weihao's image from Week 1, and extended its boundary by adding more prompts. The original goal was to create an interior view where fencers are standing in front of a window wall, however, when wrapping the drawing area around the uploaded image, the model gives errors and does not produce any image. So the strategy became drawing one area after another.

Prompt:
Aliens brainsuckers kills sacrifice aztecs, perfect faces, fine details, studio lighting, close view, subtle shadows, art by katsuya terada, Ilford HP5

Prompt:
An evolution of evil, ornate cathedral arch, abstract, decay, giger textures, h. r. giger infinitely complex detail, by antoine wiertz and noah bradley, interstellar civilisation, the underground night life of occult city is booming, visually stunning, Ilford HP5

Another common problem is the scale of the drawing area will also scale its content, in another word, the scale of the content will not consider the scale of the existing image, as shown above, arches on the right and left are of different sizes because sizes of the drawing area are different.

Prompt:
flying chestburster swarm, h. r. giger, infinitely complex detail, Ilford HP5

Erasing certain area and recreate with a new prompt will also restore the original image to some degree. And the content produced in the blank area is greatly impacted by the surrounding composition of existing content.

Posted: **Thu Nov 17, 2022 3:07 pm**

This week, I did a little training of my own using DreamBooth. Dream Booth extends Stable Diffusion by allowing users "specialize" the diffusion model for a specific subject. We perform specialization through a training process using a small data set consisting of a single subject. Specialization allows us to take our subject, call it x, and use it within a text to image prompt to generate images that more consistently return x within various contexts. For example we might give stable diffusion "x eating a scorpion lollipop". Theoretically DreamBooth's training process will return results that are more consistent with the specific data in the training set, relative to if we were to run the same prompt using the standard Stable Diffusion model.

I have a personally curated data set of teeth covered in that purple stuff the dentist give you to check for plaque. Here are some example images from the set:

: pic00029.jpg (7.53 KiB) Viewed 6242 times

Unlike DreamBooth's examples, this data set does not have a well defined object in frame. It consists mostly of closeups. I imagine DreamBooth is not prepared for cases like this, but let's see how it fails.

Running the trained model with no input:

Running the trained model and recursing with img2img:

Notes: A low CFG will give you results that highly correlate with the data set, no matter the prompt. As the CFG gets higher, the results drift towards a typical Stable Diffusion result, but better corresponds to the prompt. To my personal taste, any CFG above 4 returns poor results.

Posted: **Mon Dec 05, 2022 11:46 pm**

This week, I tried three extensions of Stable Diffusion: (1) outpainting in an infinite canvas, (2) video creation using camera movement, and (3) material stable diffusion.

Outpainting / Infinite Canvas:
Similar to the Dall-E2 outpainting interface, this extension has a friendly user interface that lets users control various image generation tasks. Some important settings are:

Generation frame size: determines the size of the image generated. It can cause video RAM issues if too large.
Sample number: generates this many images for each image generation
Guidance: how strictly follow the text prompt (see last week's response for the effect of this parameter)

To try out infinite canvas, I created the following image with the prompt "Alien lifeforms tugging each other in a galactic landscape.":

The central image is an uploaded image of mine, and I used it to start the composition. It was interesting to see some elements in this image repeated in other canvas parts.
I don't love the color combination (blue and orange again!!), but I enjoy the variety of creature forms on this distant exoplanet. The life form on the top right resembles my central life forms, giving the overall composition more continuity. This canvas is generated by about 30-40 image frames.

Overall I think this is a great tool to use in the long run because it grants more compositional control than using just one image. I imagine using it to incorporate empty spaces, extend or emphasize certain composition parts, or create higher-resolution images.

Deforum
This tool works like magic by enabling camera movements in an interpolated space of images, creating videos.

It has a ridiculous number of settings, but overall it works with 2D and 3D animation styles. 2D animations cannot zoom in/out of the images and can only move left/right or up/down in the image space.

All animations take multiple prompts and an assignment on which frame number it will come into effect. For example, the following sequence of prompts results in a 140-frame animation:

"0": "octane render of a solar system, large dust cloud in space, trending on Artstation",
"20": "octane render of a solar system, yellow proto-star in space, trending on Artstation",
"30": "octane render of a solar system, orange star in space with proto-planets orbiting around, trending on Artstation",
"40": "octane render of a solar system, orange sun in space with planets orbiting around, trending on Artstation",
"60": "octane render of a solar system, large orange star in space with planets orbiting around, trending on Artstation",
"70": "octane render of a solar system, gigantic red star in space with touching the planets, trending on Artstation",
"80": "octane render of a solar system, gigantic red star in space burning extremely bright, trending on Artstation",
"90": "octane render of a solar system, supernova in space, trending on Artstation",
"110": "octane render of a solar system, white neutron star in space, trending on Artstation",
"120": "octane render of a solar system, white neutron star in space, emitting strong lights, trending on Artstation",
"130": "octane render of a solar system, blackhole in space, trending on Artstation"

In the animated GIF of 140-frame animation, the camera moves inside the image space with two units in every frame.

click to see animation

Like the previous animation, the same prompts can generate a sequence where the camera moves away from the scene by two units in every frame. Note that the camera moving away means revealing new areas in the image that need to be impainted. This impainting operation sometimes fails, and the image displays split regions of distinct images:

click to see animation

Material Stable Diffusion:
I also tried material Stable Diffusion. This one can generate seamless textures to be used in with 3D objects. I liked the idea of combining image generation with the 3D asset pipeline since this pipeline usually has many steps and is quite tedious.

Text prompt: Tree trunk close up, albedo, 4k

Text prompt: Ancient coils, albedo, 4k

I generated these textures because I recently implemented a texture-mapping feature in my clay printing software and wanted to experiment with AI-generated images. Here are some pictures of the very last result of "ancient coils":

Posted: **Tue Dec 06, 2022 9:38 am**

Stable-Diffusion-Infinity

This experiment aims to explore the tool Stable-Diffusion-Infinity that was demonstrated by Mert in the class. It is very attractive to me for its ability to generate very large images with fine step-by-step control. This creation process exposes much more control than the one step text-to-image synthesis so that the results feel more connected to the author.

Link to colab: https://colab.research.google.com/githu ... olab.ipynb

Topic #1 Skatepark image

This subject matter of skatepark originates from my photo shoot experience in Venice Beach Skatepark during the weekend. I would like to make a comparison between photos I generated with Stable Diffusion and photos I shot with a camera in terms of how connected I feel to them.

This is a photo I took with my camera:

I see a type of aesthetics in skateparks. The form of those curved walls, extending on and on, looks abstract and embody minimalistic aesthetics. Along with the curvature, there is subtle brightness change, presenting a smooth tone transition consists of different shades of grey. These quality makes the construction itself very interesting, reminding me of the steel-made sculpture by Richard Serra.

After some initial getting-used-to with the Stable-diffusion-inifinity, I began my creation with two stock images, which are close-up images of the skatepark walls. The goal was to use the software to generate a larger skatepark image with the two images of small areas. In other words, to infer the global based on the local details. I was hoping the aesthetics retained in the two images can be extended to a larger scene.

Reflections:
Overall, I like the final output image. I had fine control over the composition of the final image to achieve a balanced interesting composition. I can control the layout by adding skaters in desired locations and remove them from unwanted locations. I was able to modify the curves shaped by the landscape of skatepark to make them more visually appealing. I was also able to replace the texture of the walls, such as adding a graffiti. On the other hand, I feel like the photo still looks a bit cartoonish, which allows it being recognized as synthetic photo in no time. I was not able to get physically correct geometry of the skatepark walls, which might be due to the curved surface is too ambiguous to the diffusion model. Also, the generated skaters retains a look of animation movie. I think it could be tuned more towards the realistic style by tweaking the text prompt.

Topic #2 China "White paper" protest

Protesters hold white sheets of paper in Beijing yesterday in a demonstration against COVID restrictions. Photo: Thomas Peter/Reuters
Source: https://www.axios.com/2022/11/28/china- ... xi-jinping

There was a series of protest happening in many major cities in China against the government's "zero covid" policy. The policy has resulted in many tragedies over the past year, and people was finally fed up with the policy and walk to the street to protest after the fire accident in Xinjiang happened, which results in death of 10 people.

What makes the protest more powerful is the people choose a simple white paper as the picket sign. It served as a response to China's strict control of freedom of speech. Every time after a tragedy happened due to the "zero covid" policy, no public discussion over the internet was allowed. The government was afraid of people's criticism on their "zero covid" policy and acted like nothing happened. All articles about those bad news will be deleted within short amount of time, resulting a "404" webpage. People might get into trouble if they post about the information that is not allowed to spread. Under the high pressure from government, "White paper" protest unyieldingly expressed the complaints about the government with silence. In this way, even the government with least freedom of speech cannot do anything in any form to stop it: there is no information to delete, block, and filter.

The photos of the "white paper" protests all around China were not allowed to spread over the internet, and were deleted quite soon. It came to my mind, what if I generate some "White paper" protest photos with Stable Diffusions and post them on the internet? Will it trigger a "deletion" from the social media platform? Will I get in trouble by creating and posting those images? There is not even a name of charge for this behavior. Also, from the perspective of information communication: how will people respond to my synthetic images? Will the audience be informed and affected the same as they saw the real photos? This usage put the stable diffusion in the similar role of editorial cartoon, which can also be used to criticize the government in such an implicit way that even government with least freedom of speech cannot actually accuse the author of criticizing the government.

Additionally, I would like to examine if "white paper" protest is novel compared to typical protests with a picket sign by observing if the generated photo of protest contains pure white paper. If the white paper protest is novel, then there should not be any photo of protester holding a piece of white paper, which will pose a challenge to the model for synthesize the photo portraying "white paper" protest.

Here are some typical examples:

prompt = "Chinese people protesting on the street and holding empty white paper in midnight"

In this image, the picket sign is not pure white.

prompt = "Chinese people protesting on the street and holding empty white paper in midnight in photorealistic journalism style"
Added the style description, photo has more details.

prompt = "Chinese people protesting against covid-19 and police on the street and holding empty white paper in midnight in photorealistic journalism style"
Added "covid-19", suddenly all people were were masks.

prompt = "Chinese people protesting against covid-19 and police on the street and holding empty white letter-sized paper in midnight in photorealistic journalism style"
Specifying the size of the paper to be letter-sized print paper. This is by far the most successful result.

prompt = "Chinese people protesting against covid-19 and police on the street and holding empty white letter-sized paper in midnight in photorealistic journalism style, clear, f16, small aperture, high-iso"
Trying to match the photo style that is common in social media, in which most photo are taken with cell phone. The result still looks like taken by professional cameras with the shallow depth of field.

prompt = "Mass Chinese people protesting covid-19 policy by holding purewhite empty letter-sized printer paper on the street in the midnight in front of a row of police, photorealistic, photojournalism, kodak tri-x 100"
Rephrase the prompt

------
Another two themes:

1. Policeman vs People:

prompt = "Two Chinese citizen holding printer paper protesting in front of a row of police in the midnight, the police is facing towards the camera, photorealistic journalism style, deep depth of field,high-iso, iphone"

prompt = "Chinese protesting covid-19 policy by holding purewhite empty letter-sized printer paper on the street in the midnight in front of a row of police, photorealistic, photojournalism, kodak tri-x 100"

2. Emphasize on localtion -- Shanghai:

prompt = "Mass Chinese people protesting covid-19 policy by holding A4 white empty paper on the street in the midnight in Shanghai, photorealistic, photojournalism, the paper has no content on it, the paper is purely white"
The iconic Shanghai landmark Oriental Pearl Tower appears in the photos.

----

Finally I used DALLE-2 to outpaint one of the images generated by stable diffusion. Text prompt: "Chinese people protesting against covid-19 and police on the street and holding empty white letter-sized paper in midnight in photorealistic journalism style"

I used stable-diffusion-infinity to generate this image from scratch. Text prompt: "Mass Chinese people protesting covid-19 policy by holding A4 white empty paper on the street in the midnight in Shanghai, photorealistic, photojournalism, the paper has no content on it, the paper is purely white"

I used stable-diffusion-infinity to generate this image from scratch.

1. I posted last two photos in social media (WeChat) and most people treated them as real photos at the first glance and reacted genuinely. The "white paper" element are so potent to trigger people's sympathy if they are aware of the happening of this protest.
2. Some people who are familiar with the Diffusers can tell the photos are synthetic from the artifacts on the building facades and the human faces.
3. I felt concerned right after I posted these images because I thought the information control will kick in and delete my photos, but they seemed fine after quite a while. Now I think it is an alternative way to convey the controlled message with Diffusers under information control from government. In this case, What Diffusors did was basically to create infinite variants of the real photos, so that even though the real photos are automatically blocked on the internet, the variants can still survive and spread from people to people.
4. On the other hand, there is also concerns of potential fake news created by Diffusion model. The photos synthesized by diffusion model is so realistic that people might take it as the reality. As the audience, we must interrogate the authenticity of the photo. If it is not authentic, then we must be critical on the author's motivation to create those photos.
5. Stable diffusion was having difficulty to generate protesters with pure white paper. In most images, protestors are holding a picket sign with text on it. To some extent, this fact proves the "white paper" protest is a novel form of movement that is unique to today's China society.

Media Arts and Technology

wk07 - Stable Diffusion - 2nd set, Nov. 10, 2022

wk07 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk7 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk7 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk7 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk7 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk07 - Stable Diffusion - 2nd set, Nov. 10, 2022

Re: wk07 - Stable Diffusion - 2nd set, Nov. 10, 2022