Report 6: Vision Science & Machine Learning

Post Reply
Posts: 163
Joined: Wed Sep 22, 2010 12:26 pm

Report 6: Vision Science & Machine Learning

Post by glegrady » Tue May 04, 2021 11:44 am

Report 6: Volumetric data, Computational Photography
Post by glegrady » Thu Apr 22, 2021 11:55 am

MAT 255 Techniques, History & Aesthetics of the Computational Photographic Image ... s255b.html

Please provide a response to any of the material covered in this week's two presentations by clicking on "Post Reply". Consider this to be a journal to be viewed by class members. The idea is to share thoughts, other information through links, anything that may be of interest to you and the topic at hand.

Topic: Vision Science & Machine Learning

Report for this topic is due by May 20, 2021 but each of your submissions can be updated throughout the length of the course.
George Legrady

Posts: 6
Joined: Thu Apr 01, 2021 5:33 pm

Re: Report 6: Vision Science & Machine Learning

Post by kevinclancy » Tue May 18, 2021 6:29 pm

This was definitely one of the more challenging weeks of content so far, as I’ve felt pretty grounded in all the work up until this point. While I have been generally aware of Machine Learning, I definitely do not understand it in an applied way yet, which I think will require some actual hands-on exploration. The CNN Explainer tool from the Polo Club of Data Science at Georgia Tech was extremely helpful in this regard because it allowed me to deconstruct the CNN process at my own pace, and to really delve into each stage of the process to try to understand it. Weihao Qiu’s presentation was a great primer, and the CNN Explainer allowed me to slow it down and examine all of the components from different perspectives. I tried importing photos of a few of my sculptures, which the classifier most often read as “ladybug”. I also Google image searched “spilled espresso” and found a prank prop of spilled coffee, which when imported into the CNN Explainer, did still read as “espresso” (0.8539). I kept importing images of spilled espresso, sometimes as abstract as a stain in carpet, which the CNN still read as “espresso”, but with less statistical certainty.

I’m definitely interested in the artistic possibilities of machine learning after all the amazing examples we have looked at, but I still feel I will need to do more research to understand the ethical, philosophical, technical, and artistic concerns inherent in the field before I can truly contribute something meaningful. I read chapter 5 of Paul Virilio’s “The Vision Machine” this week, which added a rich philosophical dimension to those concerns, and provides many insights on the foundational question that Professor Legrady posed: “We are in an unresolved situation: a) We believe in photographs, b) Photographs/videos can be faked. The culture has yet to figure out how to make sense of this discrepancy!”

I disagree slightly with the first claim that “a) We believe in photographs”. This may be generational skepticism, but I think I’ve been somewhat aware of the theater and spectacle of the image for most of my adult life, most acutely today. While we haven’t had to deal with forgery at the magnitude of Deep Fakes, we are aware that photography and film have always had a degree of selective framing, staging, obfuscation, surveillance, and a propagandistic function that has been weaponized by state power to manage public perception.

What I liked most about Virilio is that he approaches “The Vision Machine”, and his philosophy more broadly, from a perspective of speed, politics, and power. These elements are critical to our current dilemma of Deep Fakes, and we haven't fully dissected them. Virilio is perhaps best known for his philosophy of speed, coining the term dromology or the “science (or logic) of speed”. My work is largely about the rapid acceleration of technology, and it's complex effects on the human body and our social spheres, and I think Virilio’s philosophy will be instrumental as I work toward my thesis. Professor Colin Gardner had suggested I read Virilio, and he also appears in Alan Warburton's new work "RGBFAQ", so I'm happy to get an opportunity to read him. Virilio initially published “The Vision Machine” in French in 1988, the year after my birth, and the move toward simulation, deception, cyber warfare, and mass surveillance he anticipates in “The Vision Machine” has been steadily building my entire life. Virilio describes a shift in militarism from traditional ground warfare to strategies of deterrence in the Cold War to new spheres of cyber and information warfare that feel eerily similar to the reality we are currently living through.

Virilio provides acute premonitions (some of which have since borne out), historical context, and a grounding in his present moment (the late 1980’s):
The age of the image's formal logic was the age of painting, engraving and etching, architecture; it ended with the eighteenth century.

The age of dialectic logic is the age of photography and film or, if you like, the frame of the nineteenth century. The age of paradoxical logic begins with the invention of video recording, holography and computer graphics .. . as though, at the close of the twentieth century the end of modernity were itself marked by the end of a logic of public representation.

Now, although we may be comfortable with the reality of the formal logic of traditional pictorial representation and, to a lesser degree, the actuality of the dialectical logic governing photographic and cinematic representation, we still cannot seem to get a grip on the virtualities of the paradoxical logic of the videogram, the hologram or digital imagery. (Virilio, 1994, p.63)
This notion of a new age of paradoxical logic reminds me of an exhibition at the Miller ICA at CMU called PARADOX: THE BODY IN THE AGE OF AI, curated by Elizabeth Chodos. Virilio’s third age of paradoxical logic also reminds me of his contemporary Michel Serres’ postulation that we are in the midst of a third major revolution in communication, as quoted here in a 2014 interview with Hans Ulrich Obrist:
I’ve spoken of three revolutions from a historical perspective. The first was in the first millennium BC, when writing emerged in an oral world. The second was printing in the 15th century, with the advent of Gutenberg and the book. It seems to me that our revolution, the digital one, is the third. It’s a revolution that rests on the medium/message binary, in other words, on hard/soft. At the “oral stage,” the information medium was the human body and the message was oral. The medium later became paper and the message was written, or printed. And today the medium is hardware and the message is electronic – it’s the third revolution.
Virilio also adds a third weapons age to this tripartite view of history:
At this juncture we enter a third weapons age, following the prehistoric age of weapons defined by range, and the historic age of 'functional' weapons. With erratic and random weapons we move into the post-historic age of the arsenal. ERW are discreet weapons whose functioning depends entirely on the definitive split between real and figurative. (Virilio, 1994, p.69)
Virilio also offers prescient observations on mass surveillance, which he was observing through the medium of television at the time, but which has ever increasing ramifications in the age of the Internet, smartphones, mass surveillance, and machine learning:
Since Bentham, the goal has normally been identified with the panoptic, in other words, with a central surveillance system in which prisoners find themselves continually under someone's eye, within the warde[n]'s field of vision.

From now on, inmates can monitor actuality, can observe televised events — unless we turn this around and point out that, as soon as viewers switch on their sets, it is they, prisoners or otherwise, who are in the field of television, a field in which they are obviously powerless to intervene. (Virilio, 1994, p. 65)
This panopticon effect, or increasingly a “reverse panopticon”, is exponentially amplified today as each of our devices track our movements, clicks, interactions, and behaviors. We don't merely gaze at the screen, the screen returns our gaze. This reverse panopticon is explored in the “Mirror with a Memory” podcast hosted by artist Martine Syms and Carnegie Museum of Art. I just got the excellent print catalog of the same name, which explores many of the themes we are dealing with in this course.

Some of the most eerie and prescient points in Virilio’s “The Vision Machine” concern the shifting terrain of militarism into a realm of disinformation, distrust, and dissimulation, which intersects perfectly with our current conundrum of Deep Fakes, Fake News, and a deep distrust of institutions. To quote Virilio:
The chief tack of warfare is accordingly not some more or less ingenious stratagem. In the first instance, it involves the elimination of the appearance of the facts, the continuation of what Kipling meant when he said: 'Truth is the first casualty of war'. Here again, it is less a matter of introducing some manoeuvre, an original tactic, than of strategically concealing information by a process of disinformation; and this process is less to do with fake effects - once we accept the lie as given - than with the obliteration of the very principle of truth. (Virilio, 1994, p. 66)
This passage is chilling today, and gets to the core of our struggle with Deep Fakes. There is a political dimension to the Deep Fake that goes beyond our perceptual ability to determine a real or manipulated image or video. Authoritarians do not actually need to create seamless, undetectable, entirely believable Deep Fakes; they merely need to erode the public's trust in institutions, facts, and truth itself to the point where our skepticism, uncertainty, and distrust become paralyzing. In this environment, even "Cheap Fakes" are effective with Virilio’s crucial component of speed. An undermining of truth and facts also relies on the speed of massive continuous flows of information to continually refresh and bury the illusion beyond our rate of retention, analysis, and verification. I believe the accelerations of technology currently outpaces art, philosophy, and ethics, and the politics of the lie, deception, and bad faith are currently outpacing truth, debate, nuance, and good faith, which leads us to our current conundrum. To quote Virilio again:
In the face of the discrete devaluation of territorial space which followed from the conquest of circumterrestrial space, geostrategy and geopolitics come on and do their number together as part of the stage show of a regime of perverted temporality, where TRUE and FALSE are no longer relevant. The actual and the virtual have gradually taken their place, to the great detriment of the international economy, as the Wall Street computer crash of 1987, moreover, clearly demonstrated.

Dissimulating the future in the ultra-short time of an on-line 'compunication' (computer communication), Intensive time will then replace the extensive time in which the future was still laid out in substantial periods of weeks, months, years to come. The age-old duel between arms and armour, offensive and defensive, then becomes irrelevant. Both terms now merge in a new 'high-tech mix', a paradoxical object in which decoys and countermeasures just go on developing, rapidly acquiring a predominantly defensive thrust, the image becoming more effective as ammunition than what it was supposed to represent! (Virilio 1994, p.68)
Paul Virilio “The Vision Machine”: ... achine.pdf
PARADOX: THE BODY IN THE AGE OF AI at Miller ICA at CMU: ... -age-of-ai
Michel Serres Interviewed by Hans Ulrich Obrist:
“Mirror with a Memory” podcast: ... y/podcast/
“Mirror with a Memory” publication: ... a-memory/
Alan Warburton "RGBFAQ": ... emoved.pdf

Posts: 6
Joined: Thu Apr 01, 2021 5:31 pm

Re: Report 6: Vision Science & Machine Learning

Post by jungahson » Wed May 19, 2021 6:39 pm

When I read Rudolf Arnheim's art & visual perception and David Marr's Vision, I see there's a huge difference. As Kandinsky mentioned, I think artists seek hidden relationships while scientists tries to discover the principle behind how we see and recognize things in the world. Although machine vision has developed a lot due to deep learning, I think it is still in a premature stage compared to human vision, especially artistic vision. As image data (e.g. imagenet) that deep learning is using is built starting from the word-database of WordNet, there are some limitations. For instance, as some of the semantic relations are more suited to concrete concepts than to abstract concepts, it is difficult to classify emotions like "fear" or "happiness" into deep and well-defined hyponyms/hypernym relationships like.

Therefore, I think it is important for artists in the field to continuously put effort in creating datasets dedicated to artwork such as MART Dataset ( In addition, I found it is very meaningful to find features that are based on art theory as shown in Machajdik and Hanbury's work: They use low-level features such as color, texture, composition, and content. Zhao et al. argued that Machajdik and Hanbury's work had limitations because these features have a week link to emotions and are not interpretable by humans. Therefore they proposed other features such as balance, emphasis, harmony, variety, gradation, and movement to classify image emotions. ( ... YEGXCem3SQ)
Last edited by jungahson on Thu May 20, 2021 10:19 am, edited 28 times in total.

Posts: 6
Joined: Thu Apr 01, 2021 5:32 pm

Re: Report 6: Vision Science & Machine Learning

Post by alexiskrasnoff » Thu May 20, 2021 1:49 am

This topic was an interesting one to research. Thinking about/picking apart vision, something that comes so naturally to us that we often take it for granted, in such a concrete, technical way definitely feels odd.
The first thing I'd like to share was a funny coincidence, as this article was just randomly recommended to me after the class that we talked about vision and the different books, particularly David Marr's Vision, and I've had it bookmarked since, haha. The article discusses vision in the context of cats, particularly how they perceive Kanizsa squares in a similar way to humans. As the article puts it,"The Kanizsa square consists of four objects shaped like Pac-Man, oriented with the "mouth" facing inward to form the four corners of a square," which "visually evoke the sense of an edge in the brain even if there isn't really a line or edge there. " This phenomenon is coincidentally shown in the illustration on the cover of Marr's book (albeit in triangle form)! Here is the full article: ... ket-newtab

Another article that I wanted to mention is this paper on machine vision titled Introduction: ways of machine seeing that we looked at in Fabian's seminar. The paper goes into machine vision as well as the relationship between seeing and knowing. It also briefly mentions/discusses Marr's Vision near the end. Here is the link to it: ... 20-01124-6

Lastly, thinking about vision/machine vision in an artistic context makes me think of Daniel Rozin's work. I'm sure everyone is familiar with him (I feel like we may have discussed him in this class briefly? Can't remember), but he's most well known for his mechanical mirrors that use cameras and motors to "reflect" images back through the manipulation of different objects. I think he's a really great example of using machine vision to create an innovative interactive work. Here is a link to my favorite one, using little penguin figurines: As well as a link to the rest of his work:

Post Reply