Proj 3 - Student Defined Visualization Project

Post Reply
Posts: 160
Joined: Wed Sep 22, 2010 12:26 pm

Proj 3 - Student Defined Visualization Project

Post by glegrady » Mon Dec 30, 2019 5:54 pm

Proj 3 - Student Defined Visualization Project

Final Project Schedule
Feb 18 - Schedule review, examples of previous final projects and introduction to JSON by Weidi
Feb 20 - Presentation of final project ideas by each student
Feb 25 - Lab and individual meetings
Feb 27 - Lab and individual meetings
Mar 03 - Work in Progress presentation
Mar 05 - Explanation of project documentation template for website
Mar 10 - Final Project Class Presentation
Mar 12 - Final Project Class Presentation & completion of documentation to be posted at

Project Criteria & Definitions

For the final project each student is to create a project of their choice but continuing with the approach of data in an interactive 3D space.

Your first task is to identify and select your data. This can be a continuation of the Seattle library data, or acquisition of data from other sources. Data can also be correlated between multiple sources. Visualization software to be used is Processing so that we can compare and learn from each other's projects.

The data should be granular, meaning that there should be a significant density of data to be visualized in 3D space. Each data’s x,y,z position should be directly defined by the data’s values.

The project should reveal an understanding of how to use spatial relationships, color coding, interaction methods, and all the features of visual language basics covered in the previous demos and projects.

Some previous Final Projects to review
Some references for you to review in case it is of interest to your project

Frequency Pattern Mining Paper ... Mining.pdf
George Legrady

Posts: 3
Joined: Wed Jan 08, 2020 11:02 am

Re: Proj 3 - Student Defined Visualization Project

Post by erinpwoo » Thu Feb 20, 2020 10:51 pm

Final Project Idea

My final project will examine the effects of the current housing crisis in San Francisco. It will consist of a 3D map of San Francisco with lines extending in the z-axis that represent multivariate data on median rent, evictions, and rent control at specific coordinates. Being a Bay Area native, I have seen the landscape change dramatically during my lifetime. I am passionate about the communities and cultures I’ve grown up around and I believe that the accessible, open, and inclusive nature of San Francisco is what makes it such an incredible place. I hope this project brings insight to the areas most affected by the housing crisis and can show the relationship between rising property values and rapid socioeconomic change.

The great thing about this project is that there is a large amount of data related to the San Francisco housing crisis, although it will take some time to figure out which variables and trends I am most interested in following. There are a variety of datasets available to use, many of which are available through DataSF, a government-supported organization that has over 400 public datasets relating to the development and demographics of the city. One dataset I’m particularly interested in has eviction data that can be mapped by coordinates. Additionally, Zillow has a large amount of data on property values and real estate trends.

I would like to spend a lot of time learning about different methods I can use to group and analyze the different columns of data together. My previous project focused mainly on designs and aesthetics, but I would like to spend a good amount of time learning about statistical methods that can help me choose and group the most relevant datasets together.

Since this project requires coordinate mapping on a layout of the city, it will be a bit difficult finding particular datasets that contain this spatial information. Additionally, I want the user to able to easily navigate through fine points within the map, which may be difficult to accommodate with PeasyCam. I am also still fleshing out the details with the design and am trying to find the best way to display and group multivariate data. I was thinking about taking these multiple columns of data and use a dimension-reduction method to group them together, although I also want the user to look at trends in these variables individually. Here is what my current idea sketch looks like, with a slider at the bottom that allows the user to see how these trends have evolved over time.

Screen Shot 2020-02-20 at 10.28.48 PM.png

Posts: 3
Joined: Wed Jan 08, 2020 11:00 am

Re: Proj 3 - Student Defined Visualization Project

Post by jingxuan » Sun Feb 23, 2020 8:34 pm

I will use data visualization to show the video games sales in NA, EU, JP, and the global market from 1980 to 2020.

First of all, I am a gamer that I have played different games for a decade. I am able to show the trends of games in different regions from 1980 to 2020 with the dataset of games' sales. There are rank, games' name, platform, year, genre, publisher, NA_sales, EU_sales, JP_sales, other_sales, and global_sales. After I finish data visualization, we can find out the game trends of the genre, company, and game name in each region.

The dataset is from kaggle video game sales.
Screen Shot 2020-03-02 at 22.29.09.png
I used a cube to visualize the data. The four sides of the cube indicate four markets. The left side is the Japanese market, the top side is the European market, the right side is the North American market, and the bottom side is the global market. I design a gaming button to control the appearance of each side's visualization, the button is located at the right bottom and it is marked by the abbreviation of four markets.

For the data itself, I design another button to set the year and genre of game sales so that the user can find some trends by year and genre. It is located at the right bottom. By clicking the button the two displayer bars will more accordingly. One bar represents years that there are numbers on that bar. The other bar represents genre by color, each color corresponds to a genre.

Here are some images:
Screen Shot 2020-03-13 at 21.06.16.png
Screen Shot 2020-03-13 at 21.06.38.png
Screen Shot 2020-03-13 at 21.06.55.png
Screen Shot 2020-03-13 at 21.07.04.png
Screen Shot 2020-03-13 at 21.07.45.png
For now, I can see the data of sales is not balanced. Comparing with some huge sales numbers, the small numbers will be neglected in the graph. So I used log to make the data more balanced.

Here is the code:
(1.13 MiB) Downloaded 62 times
Last edited by jingxuan on Wed Mar 25, 2020 6:41 pm, edited 3 times in total.

Posts: 3
Joined: Wed Jan 08, 2020 10:53 am

Re: Proj 3 - Student Defined Visualization Project

Post by chuanxiuyue » Mon Feb 24, 2020 12:28 am

I am going to show people's sense of direction in a virtual maze after learning it by following a fixed lap around the maze.

According to the literature on human spatial cognition, there is a spectrum of individual differences in environmental learning ability. That is, after the same exposure to a new environment, some people can establish a very good sense of direction in the environment. Others may be totally lost when doing a navigation task, including pointing to unseen objects in the environment or find a short cut to get to a certain object in the environment. Visualizing people's pointing performance and shortcutting performance in the virtual environment can reveal individual differences in learning the environment. Also, using behaviors data, architects can have a good sense of how different structure of the maze can influence people's mental images of the built environment.

55 participants (25 male and 30 female) learned a virtual environment by following a fixed route for 5 laps(As shown in the image below. Yellow line indicates the learned route. Stars indicate the objects in the maze (12 intotal) (image source: Marchette et al., (2011))
After learning the virtual maze, they were asked to complete 3 tasks. 1. Offsite pointing pre-task: see an image of one object then imagine they are standing in front of the object to point to the other object in the environment using a pointer. There are 27 trials in total. 2. Shortcutting task: from one object in the environment, take a route to find another object in the environment (20 trials in total). 3. Offsite pointing post-task: do the offsite pointing again after gaining more exposure during the shortcut task (27 trials). 4. Onsite pointing task: there were transported to different locations in the maze again and pointing to unseen targets in the maze by facing to the direction to the target objects (20 trials).

Landscape: Create 3D heatmap based on people's coordinates when they are doing the shortcutting phase. Use different buttons to control showing female landscape or male landscape (there are gender differences in maze shortcutting behaviors). Using color on the landscape to highlight the learned route.
Put 12 3D objects in the visualization to indicates the 12 objects in the maze (see stars in the first image)

Use different curves to indicate each individual's pointing performance on each trial of 3 tasks . In the following image, the red line shows the right directions for each trial (27 in total for offsite pointing pre-task). The histogram shows number of people pointing to different directions.
Pointing Performance.png
I may use different colors and levels of transparency to show people's performance in different tasks or performance in pointing to different targets. I will also try to highlight sex differences or difficulty levels of different trials.
Note Feb 27, 2020 (2).png
Rendering 3D objects may be technically difficult and visually complex.
Choosing color solutions to show landscapes.
(3 MiB) Downloaded 57 times
Last edited by chuanxiuyue on Mon Mar 09, 2020 8:23 pm, edited 6 times in total.

Posts: 3
Joined: Thu Jan 09, 2020 4:46 pm

Re: Proj 3 - Student Defined Visualization Project

Post by boningdong » Mon Feb 24, 2020 7:02 pm

Coronavirus Spreading Analysis
For this project, I am trying to use the novel coronavirus data to visualize its breakout in China.

I want to work on this project for several reasons. First, I hope this project could make people remember the breakout of coronavirus and use it as an alert for future reference. Second, I want to make people realize how severe the disease and how fast it can spread. This virus is in the pandemic stage in some countries and I hope people can be alerted and union together to keep the plague from spreading.

Detailed Description
Before giving out the description of my ideas, I attached a sketch of my design here for reference:
Screen Shot 2020-02-24 at 4.46.16 PM.png
Based on what I am thinking right now, first I want to try to represent the number of cases using animating points with lighting effects, and the locations of these points are determined by where the cases are reported. Second, I am thinking to visualize the increment of the cases between days using flow patterns. Specifically, these flow patterns consist of a bunch of lines and flow animation which indicates the flow direction. Also, the color, thickness, or the brightness of the lines will be determined by how much the influence of one area to another. This influence will be based on mathematical analysis. For example, assume Beijing and Guangdong both have a similar number of cases. If we want to consider their influence on Heibei province right now, the cases in Beijing should have more influence than those in Guangdong, because Beijing is closer. This is just one example, I will combine the number of cases, the distance of different locations and the population density when designing the mathematical model.

For the tools, in addition to P5 or Processing, I am actually thinking about Unity or some other visualization tools that can support user-defined shader easily and can fully utilize the GPU power instead of rendering the objects using CPU. These are just my random thoughts, and I am still exploring the possibilities of this project.

After exploring and comparing p5 and Unity, I eventually did my visualization using P5.js because overall p5.js is easier to use and can provide good visual effects for the dataset I found. In the visualization, I visualize the total cases, reported female cases and male cases separately across the world based on the longitude and latitude locations. In addition to the total accumulated cases for all genders, I also visualized the daily increments using vertical bars.
There is a slide bar that allows users to select the date they want to inspect. The date covers the start of the disease until the date of the last reported case. It worth mentioning that even though the cases are associated with a piece of location information, mapping them to the map directly will still show discrete peaks. So I wrote a ConvSum function that applies a gaussian filter to the data mesh so that the result can show an overall trend.
Here are the screenshots of the visualization:
Screen Shot 2020-03-14 at 2.32.55 PM.png
total cases (include different gender) & daily increment
Screen Shot 2020-03-14 at 2.32.31 PM.png
total cases & total female cases & total male cases
Screen Shot 2020-03-14 at 2.33.18 PM.png
total female cases
Screen Shot 2020-03-14 at 2.33.32 PM.png
total male cases
Updates(Mar 22nd)
To make the users understand the meaning of the colors easily, I added labels clarify. Also to improve the user experience, I added colors, hovering effects, and on/off styles to the buttons. The buttons will go darker if the corresponding group is not activated.
Here are the results:
corona visual1.png
corona visual2.png
corona visual3.png

Technical Difficulties
Originally I am thinking to visualize the movement of the virus as it develops, and also visualize the spread of the virus using flow lines to indicate where the emerging cases are infected. Yet due to the dataset I found doesn't have enough information such as the travel history, I cannot get a good result if I tried in that way. So eventually, I thought it's probably a good idea to show the movement or the trend on the map using a slide bar. By sliding the time slide bar, the user can see the dynamic change of the virus over time.
Coronavirus Visualization by
(7.4 MiB) Downloaded 50 times
Last edited by boningdong on Sun Mar 22, 2020 9:50 pm, edited 6 times in total.

Posts: 4
Joined: Wed Jan 08, 2020 10:55 am

Re: Proj 3 - Student Defined Visualization Project

Post by ziyanlin » Thu Feb 27, 2020 12:19 am

Airbnb data visualization for some cities in the US

I was planning to visualize a project with climate information for travelers. But I found that there were already tons of examples and ideas about data visualization with climate and weather. So, I wonder I may use the open-source API provided by Airbnb to visualize these data with local weather.

These Airbnb data come from which includes the name, the name of the host, location information, type of rooms, price, reviews and days available. But we do not need all these data, I only want to use some specific data to express the elements that matters in travelling and living there for few days. I will link these important data, including the photos of these Airbnb, to the elements in the visualization project.

I will use the latitudes and longitudes of these Airbnb locations to allocate their X and Y coordinates. I will try to use a real map of the US to allocate the location of these information, but I do not think it will be intuitive and detailed for this project. I am looking for a new method to arrange their locations. And I will draw lines from top to bottom to represent the days that the Airbnb is available. And the weights of these lines are representing their rating or reviews. I will use the starting point as the price of these houses, because the price of the Airbnb are always the most important elements that people want to know about. And that is the most distinct method to show their differences.

Posts: 3
Joined: Wed Jan 08, 2020 10:50 am

Re: Proj 3 - Student Defined Visualization Project

Post by yuleiyuan » Thu Feb 27, 2020 12:12 pm

American Population Status Analysis

Dataset and Motivation
For my project, I want to make a 3-D visualization on the status of people who voted in the 2016 presidential election. I got the data from my previous statistical class. In that class, we did an intensive research on why Hilary lost in the election with her success in the popular vote, while in this class, I can focus on the life status of the voters in the election.

Here is what the dataset looks like:
Modifications have been used to applied to the data for the visualization. Below is the R code that was used to do the data manipulation. I gathered information on a state level and combined certain columns into new variables that I create. The names of states has been recoded into numerical values for mapping purpose.

Code: Select all

census.del <- census %>% na.omit() %>% 
               mutate(Men = Men/TotalPop*100) %>% 
               mutate(Employed = Employed/TotalPop*100) %>% 
               mutate(Citizen = Citizen/TotalPop*100) %>% 
               mutate(Minority = Hispanic+Black+Native+Asian+Pacific) %>%

census.subct<-census.del %>% group_by(State,County) %>% 
               add_tally(TotalPop) %>% 
               rename(CountyTotal=n) %>% 

census<-census.subct %>% group_by(State) %>%  
                         funs(.*(Weight))) %>%
write.csv(census, "~/MAT 259/census.csv")

census_state_number<-census.state[,"State"]%>%  mutate_if(is.factor,as.numeric)
write.csv(census_state_number, "~/MAT 259/census_state_number.csv")
For a dataset that has multiple information for an observation(here one observation is a state), a tree map with multiple layers would be a good way to display the data. Thanks to the tree map library by Ben Fry in Processing. The idea behind the concept is like this:
Here in my data, each tree stands for a status of one state(population count, salary,etc.). The size of each sub-rectangle is decided by the counts of people for that status of all states. A larger number of count will obtain a larger area. The order of these sub-rectangles is decided by the algorithm.

Here is a sketch of my initial idea:
For the x and y axis, I thought to have population counts on the x axis and y axis be divided by counties. Each county would have a surface to display the information within the county. I discard this initial design because different states contains different counties, and I don't know how to map counties on each layers for the coordinate states. Then I made changes on the initial data set and achieved the following visualization.

There is an overview of the visulization:
The transparent layers stands for status of population, and there are 21 layers in total. Each sub-rectangle represents a state. The colorful lines links the sub-rectangles on different layers for the same state. The graph is displayed at presentation mode by default. Click on the interaction button will stop the spinning.
Press the space to display the status for each individual state linked by lines. Each state will have a different color to represent it.The plot will automatically iterate through each state if in presentation mode.
I spent a lot time on organizing the original data set and trying to find what are the best things to map on the graph while understanding the algorithm.Originally I wanted to add a bar that can display the description for each layer when you click on it, but I don't think it's a good idea because it will block the graph in display when it opens.Adding the status description by each individual layer is a better design. I will change my design and I will update later. Thank you all for your feedback.
(11.77 KiB) Downloaded 19 times
Last edited by yuleiyuan on Sat Mar 14, 2020 7:46 pm, edited 19 times in total.

Posts: 3
Joined: Wed Jan 08, 2020 10:54 am

Re: Proj 3 - Student Defined Visualization Project

Post by evgenynoi » Tue Mar 03, 2020 3:10 pm

I am using a data set, provided by Human Rights Society «Memorial» (non-commercial organization studying political repressions in the USSR and in present-day Russia). The data contains more than 3 million records on victims of political terror in the Soviet Union.

I am particularly interested in a specific ethnic group among the victims: Wolgadeutsche (Volga Germans), a group of ethnic Germans who colonized southern Russia and settled along Volga river under Catherine the Great in XVIII century. In 1942 around 438,000 Germans were arrested, tried and deported to work in labor camps in Kazakhstan, Siberia and Urals, of which tens of thousands died during the transportation that could last several month with limited food and water available. After arrival the death toll rose to 30-40% due to labor coniditions, severe weather and inadequate provision of food, clothing and shelter in labor camps.

Data import and handling
The data set contains 3M records with varying degree of detail. First the dump of the database was donwloaded from Github and imported onto a MySQL Server. Second the index on one of the tables was generated to Then the following query was run on the database to generate the data for further analysis. The resulting database consist of 80,000 rows with more than 10 columns.

Code: Select all

create INDEX idx_personid on memo.crcase (PersonID); 

INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/wolga66.csv'
        t1.PersonID, Surname, FirstName, Patronymic, Gender, BirthYearMin as birth, t4.Description as birthplace, 
    FirstRepressionYearMin as repr, t3.Description as nation, ArrestDay, ArrestMonth, ArrestYear, 
    TrialDay, TrialMonth, TrialYear, RepressionYear, Accusation, Sentence, Death, Execution, RehabDay, 
    RehabMonth, RehabYear, t22.Description as live
FROM memo.person t1
INNER JOIN memo.persondata t2
ON t1.PersonID = t2.PersonID
INNER JOIN memo.nationality t3 
ON t2.NationalityID = t3.NationalityID
INNER JOIN memo.birthplace t4
ON t2.BirthPlaceID = t4.BirthPlaceID
                PersonID, ArrestDay, ArrestMonth, ArrestYear, TrialDay, TrialMonth, TrialYear, RepressionYear, Accusation, Sentence, Death, Execution, RehabDay, RehabMonth, RehabYear, LivingPlaceID
        FROM memo.crcase) c
ON t1.PersonID = c.PersonID
INNER JOIN memo.livingplace t22
ON c.LivingPlaceID = t22.LivingPlaceID
WHERE lower(t3.Description) LIKE '%нем%') a
LIMIT 100000;
Visualization concept
From the very start I wanted this project to incorporate sound component. In the end, the early desktop mp3 players design was an inspiration for the visualization that I created. The origin of the data in the top middle is metaphoric of the Sun. The rays emanating from the Sun represent the length of life before trial, while their endpoints symbolize the beginning of the deportation.

Two modes of visualization exist: POINT and LINE, which can be combined or looked at separately. The color of lines/points signifies four common sentences (capital punishment, labor army, labor camp, and special settlement), while distance from the center represents the age of a person at trial.
Due to the volume of data, it was impossible to visualize entities within one iteration, therefore, I randomly initiate a starting point and sample from the dataset at specific interval (n=750) to enable animation. The "location" of a person on the axes remains fixed, so that when repetitive resampling of a record occurs, the point/line is added at exactly the same location, where the first one appeared.

After the animation is run for at least 10 seconds, one can try to visualize those, who died or was shot during the deportation by blacking out the drawn points on canvas.

The animation enables sequential visualization by four groups and in two different modes.
POINT mode illustrations are found below. Notice differences in age distribution for four different groups. For instance, many children and teenagers who were sent to special settlements or orphanages.
LINE mode fills up a screen relatively fast, compared to POINT.
Finally the combinations of the two modes are possible.
Notice the visualization of the high death rate between the two images.
Technical Issues
The main issue for me was gathering and pre-processing data. Some basic text mining algorithms were used to identify four major categories visualized during the project. I found it hard to incorporate clickable elements of interface due to my limited knowledge on the matter. So this would seem like the natural extension of the project given enough time and opportunity.
(8.64 MiB) Downloaded 23 times
Last edited by evgenynoi on Fri Mar 13, 2020 10:46 pm, edited 6 times in total.

Posts: 3
Joined: Fri Jan 10, 2020 10:22 am

Re: Proj 3 - Student Defined Visualization Project

Post by guanyuchen » Thu Mar 05, 2020 8:40 am

1. What is the topic
Investigate the popularity (numbers of followers) of uploaders in Bilibili (Chinese version of YouTube) from different areas (technology, life, etc.)

2. What is the motivation for the project
Some uploaders in Bilibili might get a large number of followers with huge amounts of views of videos, but some ones not. The goal of this project is to make visualizations about different tendencies of this relationship in diverse areas.

For example, some music videos by uploading from music uploaders might receive large numbers of views. However, these uploaders still could not get more followers.

3. What is the data and the metadata
Main data in this project are used including numbers of followers of unloaders, numbers of videos' views, numbers of bullet comments from videos, numbers of shares from videos.

Metadata: All data are supported by LePtC.

4. Please add sketches, or links for data and examples of what the visualization could look like:

The initial idea was generated by a picture below:
Some final graphs from the project:
Screen Shot 2020-03-12 at 22.09.43.png
Screen Shot 2020-03-12 at 22.11.19.png
Screen Shot 2020-03-12 at 22.10.12.png

5. What can we learn from the project

Explanations of some Important variables and features:

I picked up top10 uploaders from each class in Bilibili (Chinese Youtube). Classes have 9 subareas including Game, Music, Animation, Dance, Technology, Digital, Life, Fashion, Autotune_Remix.

From graphs in section 4:
- Each point represents posted video from the uploader.
- Each line represents posted videos from one uploader from 2018 to 2020.
- Each direction presents days of the week(Monday to Sunday) of the posted video.

Here are some features of my plot:
The height of plots are numbers of followers of uploaders.
The radius of each point represents the number of views of such video.
With wider lines, the videos had more shares.
With lighter color, the videos had more bullet comments.

Screen Shot 2020-03-12 at 22.12.03.png
By observing the graph above, it is obvious to see a huge curve in the digital area. That indicates an influential video that a uploader posted helped him to become popular and got many followers in short time. Over the time, the uploader became the most popular character in digital area.

Also, from my perspective, some uploaders could not be the most popular “youtubers” because selected fields they concern have too many limitations. That means they have too specific-group of audiences in website.

6. What are challenges
For data, it is hard to collect useful and proper data. Many thanks for LePtC.
For graphs, it is hard to generate similar random shapes like the inspired picture in very beginning. But I am glad that I got many help from Weidi for creative ideas.
(348.54 KiB) Downloaded 25 times

Posts: 4
Joined: Wed Jan 08, 2020 10:55 am

Re: Proj 3 - Student Defined Visualization Project

Post by ziyanlin » Tue Mar 24, 2020 6:13 pm

Visualization for Airbnb data

Project description
This project collected the Airbnb data in Asheville city and visualize the detailed price and available information into a dynamic visual design. It shows the available date, prices and reviews of customers which are several most valued elements by the customers to decide to choose the Airbnb house for a trip or short living. And my project provides a visualization for the customers to view the list of choice for Airbnb in a convenient way.

These Airbnb data come from which includes the name, the name of the host, location information, type of rooms, price, reviews and days available. All these data form a sheet with more than hundreds of columns and more than 10,000 rows of data. But I do not need all these data, I only want to use some specific data to express the elements that matters in travelling and living there for few days. I link these important data, including the price and available days of these Airbnb, to the elements in the visualization project.

There are two controlling system in this visualization project. The one is the control panel which adjust the color and the movement of visual elements. User can change the color of elements and the background. They can also decide the movement and speed of the elements.
2.png (8.56 KiB) Viewed 1648 times
And another one is controlling the movement of camera. User can use w, a, s, d, z, x to control the camera in this 3D space with 3 axes. And users can use up arrow and down arrow to control the angle of the camera.

Visualization Design
I use strip to represent each Airbnb in the 3D space. The color represent the review of this Airbnb house, the length of the strips (value of Z - axis) represents the available day of the Airbnb house, the X – axis represents the price of the Airbnb house and the Y – axis represents the number of reviews (which can highly reflect the total number of customers who have lived in this Airbnb house).
Following the visualization create by this project, we can find several facts that may reveal the regular pattern of the Airbnb house. The more expensive the Airbnb is received better review. And most of the negative reviews are given to the Airbnb house with short available days which means that these kinds of houses are not only for providing Airbnb services, so they are not professional enough to satisfy customers。

Here is the link to my project
(41.81 MiB) Downloaded 20 times

Post Reply