Kendall Stewart
11-13-12
Computational Camera
CMLab, National Taiwan University
http://graphics.csie.ntu.edu.tw/
http://graphics.csie.ntu.edu.tw/publication.php
This lab is developing a better way to perform matting and compositing for films. Matting and compositing are very common techniques used for special effects, where a subject is placed on a background where he/she/it was not originally shot. This is a process that is traditionally achieved through use of a blue/green screen, or a process called rotoscoping. Both require carefully controlled lighting and intensive user interaction. The CMLab created an algorithm that can pull alpha mattes of complex shapes from natural images. They then extended that algorithm to video, allowing complex mattes like smoke.
The CMLab’s model can also capture how a foreground object refracts and reflects light. This foreground object can then be placed on a new background where it will refract and reflect light from that scene. This program is also equipped with new techniques to make realistic shadow composites, by recognizing photometric and geometric properties of the background.
(green screen technology used in Avatar)
National Taiwan University’s research on matting and compositing relates to work that I hope to be a part of in the near future. Though I have taken few film classes while at UCSB, my ultimate goal after completing my art degree is to somehow be involved in the entertainment industry. When filmmakers are first starting out, they rarely have the budget for entire sets of green screen. This lab’s research would help me and many other first-time filmmakers to achieve an expensive look, without breaking the budget. And if the field I decide to go into is special effects, then this advanced matting and compositing algorithm will make my job easier.
Wk7 - Computational Camera
-
- Posts: 3
- Joined: Mon Oct 01, 2012 3:21 pm
Re: Wk7 - Computational Camera
The topic that I found most interesting (due to it's close aesthetic relationship that it shares with art productions of my own) is Non-Parametric Probabilistic Image Segmentation.
Computational Vision, CALTECH
Director
Research Focus
http://www.vision.caltech.edu/
Within itself, the topic's name is a bit dense. To understand the material being investigated, I'll give a brief description of the terms that make up the topic's name; and further, a general understanding of what this type of imaging accomplishes.
The general use for Non-Parametric Probabilistic Image Segmentation is to observe an image as it occurs, through use of sensors, and transmit/recompose this data onto a digital system. One use of such a system is the xBox Kinect. These devices sense motion, track it, and can display it in real time. Yet there are many difficulties in the field, and the Non-Parametric Probabilistic Image Segmentation process attempts to tackle these problems.
The process of image segmentation is 'decomposing' an image to it's rudimentary components - pixels. From here, pixels are 'clumped' together with adjacent pixels to form "super-pixels" (1). Generally speaking, when decomposing an object to it's basic forms, we do so in attempts to develop an easier approach of understanding the object in question (and from there, display it). This holds true for the deconstructive nature intrinsic in the process of image segmentation - and in the following image, multiple steps of the deconstructive process of image segmentation is shown:
This image demonstrates the process of image segmentation - the deconstruction of the image into groups of pixel aggregates. (2)
One problem with image segmentation is the algorithm that is used in the process. Computers have "hard" decisions of how to analyze and clump the pixels of images with accordance to composition of the image. According to Datong Chen in his Modeling vs. Segmenting Images Using A Probabilistic Approach, image segmentation is the "preprocessing step in image analysis" (3). Yet, the "segmentation errors introduced by the 'hard' decisions bring difficulties to higher-level image analysis" (3). With parametric image segmentation, the image sensor must follow algorithms based upon parameters, and thus shapes or colors can be mistaken due to reliance upon these parameters. With sensory of high-information images, "heavy assumptions on the shape of the clusters" can cause problems (4). This is where the Non-Parametric processes come into play.
Parametric processes must relate to a set of parameters. In Non-Parametric image segmentation, image processors can handle "complex structures" due to their independency from relying on the assumptions of what information the superpixels convey (4). To put it technically, Non-Parametric processing is "equivalent to placing a little probability bump around each data point and approximating the cluster distribution as the (normalized) sum of the bumps" (4). Rather than taking the information given by the super-pixels in accordance to parameters, the image sensor takes a probabilistic sum of various "points" throughout the image.
Thus, Non-Parametric Probabilistic Image Segmentation is a process in which an image is sensed, digitally transmitted, and rendered in real time; with special attention paid to the process in which the image is sensed. This process is very reliant upon gaussian mathematics, which take into account inputs and outputs, and from there be able to describe probabilistic events.
Image to be sensed, rendered (4).
Image sensed, rendered (4).
1 - http://en.wikipedia.org/wiki/Segmentati ... rocessing)
2 - http://spie.org/Images/Graphics/Newsroo ... 6_fig1.jpg
3 - http://www.vision.caltech.edu/marco/archive/ICCV07.pdf
4 - http://www.vision.caltech.edu/marco/arc ... gPage.html
Computational Vision, CALTECH
Director
Research Focus
http://www.vision.caltech.edu/
Within itself, the topic's name is a bit dense. To understand the material being investigated, I'll give a brief description of the terms that make up the topic's name; and further, a general understanding of what this type of imaging accomplishes.
The general use for Non-Parametric Probabilistic Image Segmentation is to observe an image as it occurs, through use of sensors, and transmit/recompose this data onto a digital system. One use of such a system is the xBox Kinect. These devices sense motion, track it, and can display it in real time. Yet there are many difficulties in the field, and the Non-Parametric Probabilistic Image Segmentation process attempts to tackle these problems.
The process of image segmentation is 'decomposing' an image to it's rudimentary components - pixels. From here, pixels are 'clumped' together with adjacent pixels to form "super-pixels" (1). Generally speaking, when decomposing an object to it's basic forms, we do so in attempts to develop an easier approach of understanding the object in question (and from there, display it). This holds true for the deconstructive nature intrinsic in the process of image segmentation - and in the following image, multiple steps of the deconstructive process of image segmentation is shown:
This image demonstrates the process of image segmentation - the deconstruction of the image into groups of pixel aggregates. (2)
One problem with image segmentation is the algorithm that is used in the process. Computers have "hard" decisions of how to analyze and clump the pixels of images with accordance to composition of the image. According to Datong Chen in his Modeling vs. Segmenting Images Using A Probabilistic Approach, image segmentation is the "preprocessing step in image analysis" (3). Yet, the "segmentation errors introduced by the 'hard' decisions bring difficulties to higher-level image analysis" (3). With parametric image segmentation, the image sensor must follow algorithms based upon parameters, and thus shapes or colors can be mistaken due to reliance upon these parameters. With sensory of high-information images, "heavy assumptions on the shape of the clusters" can cause problems (4). This is where the Non-Parametric processes come into play.
Parametric processes must relate to a set of parameters. In Non-Parametric image segmentation, image processors can handle "complex structures" due to their independency from relying on the assumptions of what information the superpixels convey (4). To put it technically, Non-Parametric processing is "equivalent to placing a little probability bump around each data point and approximating the cluster distribution as the (normalized) sum of the bumps" (4). Rather than taking the information given by the super-pixels in accordance to parameters, the image sensor takes a probabilistic sum of various "points" throughout the image.
Thus, Non-Parametric Probabilistic Image Segmentation is a process in which an image is sensed, digitally transmitted, and rendered in real time; with special attention paid to the process in which the image is sensed. This process is very reliant upon gaussian mathematics, which take into account inputs and outputs, and from there be able to describe probabilistic events.
Image to be sensed, rendered (4).
Image sensed, rendered (4).
1 - http://en.wikipedia.org/wiki/Segmentati ... rocessing)
2 - http://spie.org/Images/Graphics/Newsroo ... 6_fig1.jpg
3 - http://www.vision.caltech.edu/marco/archive/ICCV07.pdf
4 - http://www.vision.caltech.edu/marco/arc ... gPage.html
-
- Posts: 7
- Joined: Mon Oct 01, 2012 3:50 pm
Re: Wk7 - Computational Camera
Shree Nayar served as the chair of the computer science department for columbia university. He now heads the computer vision laboratory known as CAVE. One of the projects in his lab is active refocusing of both images and video. The system overlays a grid of dots onto the image and uses these as points of reference to aid in edge detection. This information is then used to create a depth map with sharp edges that when put together with the original image, results in a more focused image.
The CAVE lab also works on developing 360 degree cameras.
I have been putting together a series of images from a hiking trail in order to create a virtual map of the trail. Together these technologies can help in this process by enabling my map to have 360 degree views along the trail as well as to ensure my photos come out focused. This would both enhance the experience and capability of my map while potentially reducing the number of visits to the trail.
The CAVE lab also works on developing 360 degree cameras.
I have been putting together a series of images from a hiking trail in order to create a virtual map of the trail. Together these technologies can help in this process by enabling my map to have 360 degree views along the trail as well as to ensure my photos come out focused. This would both enhance the experience and capability of my map while potentially reducing the number of visits to the trail.
-
- Posts: 6
- Joined: Mon Oct 01, 2012 4:03 pm
Re: Wk7 - Computational Camera
Rob Fergus is computational graphic researcher who focuses on the areas of Computer Vision, Machine Learning and Computer Graphics. He is interested in building statistical models of images both at the high level of objects and scenes and also at the low level of pixels and edges. These models may then be deployed in a variety of problems. Those of particular interest include: object recognition, image search and computational photography.
In order to classify objects and interpret their relations, we must first segment the image into regions that correspond to individual object or surface instances. Starting from an oversegmentation, pairs of regions are iteratively merged based on learned similarities. The key element is a set of classifiers trained to predict whether two regions correspond to the same object instance based on cues from the RGB image, the depth image, and the estimated scene structure.
Source: http://cs.nyu.edu/~silberman/papers/ind ... upport.pdf
Our algorithm flows from left to right. Given an input image with raw and inpainted depth maps, we compute surface normals and align them to the room by finding three dominant orthogonal directions. We then fit planes to the points using RANSAC and segment them based on depth and color gradients. Given the 3D scene structure and initial estimates of physical support, we then create a hierarchical segmentation and infer the support structure. In the surface normal images, the absolute value of the three normal directions is stored in the R, G, and B channels. The 3D planes are indicated by seperate colors. Segmentation is indicated by red boundaries. Arrows point from the supported object to the surface that supports it.In order to classify objects and interpret their relations, we must first segment the image into regions that correspond to individual object or surface instances. Starting from an oversegmentation, pairs of regions are iteratively merged based on learned similarities. The key element is a set of classifiers trained to predict whether two regions correspond to the same object instance based on cues from the RGB image, the depth image, and the estimated scene structure.
Source: http://cs.nyu.edu/~silberman/papers/ind ... upport.pdf
-
- Posts: 10
- Joined: Mon Oct 01, 2012 4:02 pm
Re: Wk7 - Computational Camera
As written in previous posts, the camera obscura serves as the basic model for all future cameras. However, the camera obscura lacks in functionality. It was limited in the amount of ray and light field sampling it could capture, in a particular scene. Today, with combined forces, scientists and artists are creating computational camera applications in sensing strategies and algorithmic techniques to enhance the capabilities of digital photography. This innovation in photography requires highly developed software based methods for processing representations and reproducing them in unique and original ways. In each computational device, transformations apply to both the optics and decoding software. With respect to the computational image sensors, several research teams are also developing detectors that can perform image sensing as well as early visual processing.
Computational cameras serve to motivate new image functionalities. These functionalities can come from an enhanced field of view, spectral resolution, dynamic range, and temporal resolution. The functionalities may also manifest in terms of flexibility. Today, due the the advancements in computational cameras, more and more people have access to optical settings (focus, depth of field, viewpoint, resolution, lighting, etc.) after they capture an image.
Pupil plane coding, for instance, occurs when optical elements are placed at, or close to, the pupil plane of a traditional camera lens. A great example of pupil plane coding are seen in cell phone devices. More so than ever, drastic improvements have been made to the resolution, optical quality, and photographic functionality of the camera phone. Ubiquitous in today’s life, photo applications display coded apertures for enhancing signal-to-noise ratio, resolution, aperture, and focus. Camera phones utilize programmable apertures for viewpoint control and light field captures. By implementing computational imaging these small devices have developed a higher performance-to-complexity ratio and have achieved high image resolution through post-processing. However, not all computational technology can be implemented on portable devices. Sometimes the phone cameras' sensors and optics can't be adjusted accordingly, the computing resources aren't powerful enough, or the APIs connecting the camera to the computing software are too restrictive. Although interest in computational photography has steadily increased, progress has been hampered by the lack of portable, programmable, camera platforms available.
As a result, some researchers have looked to physics instead. They implement physic-based methods in computational photography, to remove damaged effects from digital photographs and videos. In this project computational algorithms capture and remove dirty-lens and thin-occluder artifacts. Hypothetically, this is what i mean; imagine a lens on a security camera that accumulates various types of contaminants over time (e.g., fingerprints, dust, dirt). Sometimes, the images recorded by this camera are taken through these thin layers of particles. They then obstruct the scene. Algorithms in new computational devices rely on the understanding of physics in image formation to directly recover information lost in these photographs. Because the camera defocuses, artifacts are spotted out in a low frequency and are either added or multiplied, so the viewer can recover data about the original.
Focal plane coding is a different approach implemented in computational photography. It happens when an optical element is placed on, or close to an image detector. This approach allows small physical motion sensors to capture and control pixels in multiple exposures. The Focal Sweep Camera is a prime example of how new computational software functions with focal plane coding. With this camera, users can capture (with one click) a stack of images in a scene that corresponds to the cameras different focal settings. The Focal Sweep Camera uses a high-speed image sensor that translates the image while the lens records it. The captured stacks are images of possibly dynamic scenes that are swept through by the plane of focus. This imaging system determines the sensor speed and shortest duration needed to sweep the depth. The focal stack captured by the Focal Sweep Camera corresponds to a finite duration of time and hence includes scene motion.
Now,artists achieve illumination coding by projecting complex light patterns onto a scene. With respect to digital projectors and the controllable flash, technological advances have played a more sophisticated role in capturing images. In this last project, a team designers use active illuminations to create optical (“virtual”) tags in a scene. A key component to their photography experiment is that it does not require making physical contact with particular objects in a scene. They use infrared (IR) projectors to illustrate temporally coded (blinking) dots onto a scene. They use the projector-like sources in a powerful way. As cameral flashes, it provides full brightness and color control over the time in which 2D sets of rays are emitted. The camera can now project arbitrarily complex illumination patterns onto the scene, capture images of those same patterns, and compute information regarding that scene. Although these tags are at first invisible to the human eye, they are later detected by the IR-sensitive’s photo detector time-varying codes.
http://www1.cs.columbia.edu/CAVE/projects/what_is/
http://www.cs.columbia.edu/CAVE/project ... ep_camera/
http://graphics.stanford.edu/projects/lightfield/
http://www1.cs.columbia.edu/CAVE/projects/photo_tags/
http://www1.cs.columbia.edu/CAVE/projects/cc.php
Computational cameras serve to motivate new image functionalities. These functionalities can come from an enhanced field of view, spectral resolution, dynamic range, and temporal resolution. The functionalities may also manifest in terms of flexibility. Today, due the the advancements in computational cameras, more and more people have access to optical settings (focus, depth of field, viewpoint, resolution, lighting, etc.) after they capture an image.
Pupil plane coding, for instance, occurs when optical elements are placed at, or close to, the pupil plane of a traditional camera lens. A great example of pupil plane coding are seen in cell phone devices. More so than ever, drastic improvements have been made to the resolution, optical quality, and photographic functionality of the camera phone. Ubiquitous in today’s life, photo applications display coded apertures for enhancing signal-to-noise ratio, resolution, aperture, and focus. Camera phones utilize programmable apertures for viewpoint control and light field captures. By implementing computational imaging these small devices have developed a higher performance-to-complexity ratio and have achieved high image resolution through post-processing. However, not all computational technology can be implemented on portable devices. Sometimes the phone cameras' sensors and optics can't be adjusted accordingly, the computing resources aren't powerful enough, or the APIs connecting the camera to the computing software are too restrictive. Although interest in computational photography has steadily increased, progress has been hampered by the lack of portable, programmable, camera platforms available.
As a result, some researchers have looked to physics instead. They implement physic-based methods in computational photography, to remove damaged effects from digital photographs and videos. In this project computational algorithms capture and remove dirty-lens and thin-occluder artifacts. Hypothetically, this is what i mean; imagine a lens on a security camera that accumulates various types of contaminants over time (e.g., fingerprints, dust, dirt). Sometimes, the images recorded by this camera are taken through these thin layers of particles. They then obstruct the scene. Algorithms in new computational devices rely on the understanding of physics in image formation to directly recover information lost in these photographs. Because the camera defocuses, artifacts are spotted out in a low frequency and are either added or multiplied, so the viewer can recover data about the original.
Focal plane coding is a different approach implemented in computational photography. It happens when an optical element is placed on, or close to an image detector. This approach allows small physical motion sensors to capture and control pixels in multiple exposures. The Focal Sweep Camera is a prime example of how new computational software functions with focal plane coding. With this camera, users can capture (with one click) a stack of images in a scene that corresponds to the cameras different focal settings. The Focal Sweep Camera uses a high-speed image sensor that translates the image while the lens records it. The captured stacks are images of possibly dynamic scenes that are swept through by the plane of focus. This imaging system determines the sensor speed and shortest duration needed to sweep the depth. The focal stack captured by the Focal Sweep Camera corresponds to a finite duration of time and hence includes scene motion.
Now,artists achieve illumination coding by projecting complex light patterns onto a scene. With respect to digital projectors and the controllable flash, technological advances have played a more sophisticated role in capturing images. In this last project, a team designers use active illuminations to create optical (“virtual”) tags in a scene. A key component to their photography experiment is that it does not require making physical contact with particular objects in a scene. They use infrared (IR) projectors to illustrate temporally coded (blinking) dots onto a scene. They use the projector-like sources in a powerful way. As cameral flashes, it provides full brightness and color control over the time in which 2D sets of rays are emitted. The camera can now project arbitrarily complex illumination patterns onto the scene, capture images of those same patterns, and compute information regarding that scene. Although these tags are at first invisible to the human eye, they are later detected by the IR-sensitive’s photo detector time-varying codes.
http://www1.cs.columbia.edu/CAVE/projects/what_is/
http://www.cs.columbia.edu/CAVE/project ... ep_camera/
http://graphics.stanford.edu/projects/lightfield/
http://www1.cs.columbia.edu/CAVE/projects/photo_tags/
http://www1.cs.columbia.edu/CAVE/projects/cc.php
Re: Wk7 - Computational Camera
Personalization of computational photography- humanization of digital media and imaging
I was very interested in Aaron Koblin's idea of humanizing data. In his work he gives people an image to recreate such as a hundred dollar bill and has them recreate the data however they please and puts all of the information that people have sent together to create the image of a hundred dollar bill. While the image does not look exactly like the item, the output is always different due to how much and what kind of effort the participants put into creating the data for the image.
Digital image created by users for the “Ten Thousand Cents” project.
While in a normal digital photo, you would only be able to come up with numbers assigned to the dot that for that particular xy coordinate that determined the color with more numbers if it was a high quality photograph with a higher amount of dots per inch. With Koblin's method, each dot contains whatever information the participant chooses to provide.
In most of my work I take sound clips or video clips from my life and compile them to recreate an experience such as watching tv while trying to have a conversation. If I could crowd source the data of the images that I showed to the viewers with links to what the viewers looked like and what they were talking about to the internet, I could create a unique and somewhat intimate experience for each viewer. The internet participants could add their own images that reminded them of things that they had seen on tv that could be influenced by the conversations that the viewers were having, what the viewer looked like, or the kind of relationship the viewer had with whoever was viewing with them. The art piece would move past the individual artist and onto what the project was really about: the interactions that you have based on tv and how it both hinders and helps conversation.
http://en.wikipedia.org/wiki/Computational_photography
http://en.wikipedia.org/wiki/Aaron_Koblin
http://www.youtube.com/watch?v=4v4XxlfVk3o
I was very interested in Aaron Koblin's idea of humanizing data. In his work he gives people an image to recreate such as a hundred dollar bill and has them recreate the data however they please and puts all of the information that people have sent together to create the image of a hundred dollar bill. While the image does not look exactly like the item, the output is always different due to how much and what kind of effort the participants put into creating the data for the image.
Digital image created by users for the “Ten Thousand Cents” project.
While in a normal digital photo, you would only be able to come up with numbers assigned to the dot that for that particular xy coordinate that determined the color with more numbers if it was a high quality photograph with a higher amount of dots per inch. With Koblin's method, each dot contains whatever information the participant chooses to provide.
In most of my work I take sound clips or video clips from my life and compile them to recreate an experience such as watching tv while trying to have a conversation. If I could crowd source the data of the images that I showed to the viewers with links to what the viewers looked like and what they were talking about to the internet, I could create a unique and somewhat intimate experience for each viewer. The internet participants could add their own images that reminded them of things that they had seen on tv that could be influenced by the conversations that the viewers were having, what the viewer looked like, or the kind of relationship the viewer had with whoever was viewing with them. The art piece would move past the individual artist and onto what the project was really about: the interactions that you have based on tv and how it both hinders and helps conversation.
http://en.wikipedia.org/wiki/Computational_photography
http://en.wikipedia.org/wiki/Aaron_Koblin
http://www.youtube.com/watch?v=4v4XxlfVk3o