Stable-Diffusion-Infinity
This experiment aims to explore the tool Stable-Diffusion-Infinity that was demonstrated by Mert in the class. It is very attractive to me for its ability to generate very large images with fine step-by-step control. This creation process exposes much more control than the one step text-to-image synthesis so that the results feel more connected to the author.
Link to colab:
https://colab.research.google.com/githu ... olab.ipynb
Topic #1 Skatepark image
This subject matter of skatepark originates from my photo shoot experience in Venice Beach Skatepark during the weekend. I would like to make a comparison between photos I generated with Stable Diffusion and photos I shot with a camera in terms of how connected I feel to them.
This is a photo I took with my camera:
I see a type of aesthetics in skateparks. The form of those curved walls, extending on and on, looks abstract and embody minimalistic aesthetics. Along with the curvature, there is subtle brightness change, presenting a smooth tone transition consists of different shades of grey. These quality makes the construction itself very interesting, reminding me of the steel-made sculpture by Richard Serra.
After some initial getting-used-to with the Stable-diffusion-inifinity, I began my creation with two stock images, which are close-up images of the skatepark walls. The goal was to use the software to generate a larger skatepark image with the two images of small areas. In other words, to infer the global based on the local details. I was hoping the aesthetics retained in the two images can be extended to a larger scene.
Reflections:
Overall, I like the final output image. I had fine control over the composition of the final image to achieve a balanced interesting composition. I can control the layout by adding skaters in desired locations and remove them from unwanted locations. I was able to modify the curves shaped by the landscape of skatepark to make them more visually appealing. I was also able to replace the texture of the walls, such as adding a graffiti. On the other hand, I feel like the photo still looks a bit cartoonish, which allows it being recognized as synthetic photo in no time. I was not able to get physically correct geometry of the skatepark walls, which might be due to the curved surface is too ambiguous to the diffusion model. Also, the generated skaters retains a look of animation movie. I think it could be tuned more towards the realistic style by tweaking the text prompt.
Topic #2 China "White paper" protest
Protesters hold white sheets of paper in Beijing yesterday in a demonstration against COVID restrictions. Photo: Thomas Peter/Reuters
Source:
https://www.axios.com/2022/11/28/china- ... xi-jinping
There was a series of protest happening in many major cities in China against the government's "zero covid" policy. The policy has resulted in many tragedies over the past year, and people was finally fed up with the policy and walk to the street to protest after the fire accident in Xinjiang happened, which results in death of 10 people.
What makes the protest more powerful is the people choose a simple white paper as the picket sign. It served as a response to China's strict control of freedom of speech. Every time after a tragedy happened due to the "zero covid" policy, no public discussion over the internet was allowed. The government was afraid of people's criticism on their "zero covid" policy and acted like nothing happened. All articles about those bad news will be deleted within short amount of time, resulting a "404" webpage. People might get into trouble if they post about the information that is not allowed to spread. Under the high pressure from government, "White paper" protest unyieldingly expressed the complaints about the government with silence. In this way, even the government with least freedom of speech cannot do anything in any form to stop it: there is no information to delete, block, and filter.
The photos of the "white paper" protests all around China were not allowed to spread over the internet, and were deleted quite soon. It came to my mind, what if I generate some "White paper" protest photos with Stable Diffusions and post them on the internet? Will it trigger a "deletion" from the social media platform? Will I get in trouble by creating and posting those images? There is not even a name of charge for this behavior. Also, from the perspective of information communication: how will people respond to my synthetic images? Will the audience be informed and affected the same as they saw the real photos? This usage put the stable diffusion in the similar role of editorial cartoon, which can also be used to criticize the government in such an implicit way that even government with least freedom of speech cannot actually accuse the author of criticizing the government.
Additionally, I would like to examine if "white paper" protest is novel compared to typical protests with a picket sign by observing if the generated photo of protest contains pure white paper. If the white paper protest is novel, then there should not be any photo of protester holding a piece of white paper, which will pose a challenge to the model for synthesize the photo portraying "white paper" protest.
Here are some typical examples:
prompt = "Chinese people protesting on the street and holding empty white paper in midnight"
In this image, the picket sign is not pure white.
prompt = "Chinese people protesting on the street and holding empty white paper in midnight
in photorealistic journalism style"
Added the style description, photo has more details.
prompt = "Chinese people protesting
against covid-19 and police on the street and holding empty white paper in midnight in photorealistic journalism style"
Added "covid-19", suddenly all people were were masks.
prompt = "Chinese people protesting against covid-19 and police on the street and holding empty white
letter-sized paper in midnight in photorealistic journalism style"
Specifying the size of the paper to be letter-sized print paper. This is by far the most successful result.
prompt = "Chinese people protesting against covid-19 and police on the street and holding empty white letter-sized paper in midnight in photorealistic journalism style,
clear, f16, small aperture, high-iso"
Trying to match the photo style that is common in social media, in which most photo are taken with cell phone. The result still looks like taken by professional cameras with the shallow depth of field.
prompt = "
Mass Chinese people protesting covid-19 policy by holding purewhite empty letter-sized printer paper on the street in the midnight in front of a row of police, photorealistic, photojournalism, kodak tri-x 100"
Rephrase the prompt
------
Another two themes:
1. Policeman vs People:
prompt = "Two Chinese citizen holding printer paper protesting in front of a row of police in the midnight, the police is facing towards the camera, photorealistic journalism style, deep depth of field,high-iso, iphone"
prompt = "Chinese protesting covid-19 policy by holding purewhite empty letter-sized printer paper on the street in the midnight in front of a row of police, photorealistic, photojournalism, kodak tri-x 100"
2. Emphasize on localtion -- Shanghai:
prompt = "Mass Chinese people protesting covid-19 policy by holding A4 white empty paper on the street in the midnight in Shanghai, photorealistic, photojournalism, the paper has no content on it, the paper is purely white"
The iconic Shanghai landmark Oriental Pearl Tower appears in the photos.
----
Finally I used DALLE-2 to outpaint one of the images generated by stable diffusion. Text prompt: "Chinese people protesting against covid-19 and police on the street and holding empty white letter-sized paper in midnight in photorealistic journalism style"
I used stable-diffusion-infinity to generate this image from scratch. Text prompt: "Mass Chinese people protesting covid-19 policy by holding A4 white empty paper on the street in the midnight in Shanghai, photorealistic, photojournalism, the paper has no content on it, the paper is purely white"
I used stable-diffusion-infinity to generate this image from scratch.
1. I posted last two photos in social media (WeChat) and most people treated them as real photos at the first glance and reacted genuinely. The "white paper" element are so potent to trigger people's sympathy if they are aware of the happening of this protest.
2. Some people who are familiar with the Diffusers can tell the photos are synthetic from the artifacts on the building facades and the human faces.
3. I felt concerned right after I posted these images because I thought the information control will kick in and delete my photos, but they seemed fine after quite a while. Now I think it is an alternative way to convey the controlled message with Diffusers under information control from government. In this case, What Diffusors did was basically to create infinite variants of the real photos, so that even though the real photos are automatically blocked on the internet, the variants can still survive and spread from people to people.
4. On the other hand, there is also concerns of potential fake news created by Diffusion model. The photos synthesized by diffusion model is so realistic that people might take it as the reality. As the audience, we must interrogate the authenticity of the photo. If it is not authentic, then we must be critical on the author's motivation to create those photos.
5. Stable diffusion was having difficulty to generate protesters with pure white paper. In most images, protestors are holding a picket sign with text on it. To some extent, this fact proves the "white paper" protest is a novel form of movement that is unique to today's China society.