Proj 3 - Student Defined Visualization Project

Professor George Legrady
Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

Proj 3 - Student Defined Visualization Project

Post by glegrady » Fri Dec 23, 2022 8:05 am

Proj 3 - Student Defined Visualization Project

--------------
DETAILS
The final project integrates all things learned through the course, such as asking an interesting MysQL question, working with large data, and visualizing the data through Processing. The big difference is that each student selects their dat and visualization design. Data must be multi-dimensional and visualization has to be in 3D.

Your first task is to identify and select your data. Some of you may be working with datasets in your studies. This is the opportunity to try an alternative visualization of the data. Key current research topics of interests include environmental data, political data, news language analysis, bio diversity, etc. One of the challenges for datavis is that it takes time to learn what the data can do, so continuation with working with the Seattle library data is also an option.

Project criteria evaluation: Significant effort in innovative approaches in data content, data sampling and analysis, are a key to a successful project. Data can also be correlated between multiple sources. Visualization software environment to be used is Processing but data pre-processing can be done in other softwares like Python or R. In fact the Processing interface does have a Python and R so these are also possibilities.

The data should be relevant and granular, meaning that there should be a significant density of data to be visualized in 3D space. Each data’s x,y,z position should be directly defined by the data’s values. Preference is for the data to determine the visual form, rather than matching data to an existing form, for instance, a geographic map has a pre-determined visual/spatial organization.

The project should reveal an understanding of how to use spatial relationships, color coding, interaction methods, and all the features of visual language basics covered in the previous demos and projects.

--------------
George Legrady
legrady@mat.ucsb.edu

briannagriffin
Posts: 11
Joined: Fri Sep 23, 2022 10:04 am

Re: Proj 3 - Student Defined Visualization Project

Post by briannagriffin » Thu Mar 02, 2023 2:56 pm

For the final project, I am using a data set on MLB starting pitchers to analyze relationships between several different variables. Specifically, I am interested in the interaction between variables and ‘winning percentage’. I have recently completed a Machine Learning Final Project in which I built 5 different regression models that predicted the ‘winning percentage’ as an outcome variable. I found that 2 of the most prominent variables in predicting ‘winning percentage’ were ‘best’ and ‘team_win_loss_percentage’. I am curious to see if the same results will be displayed in my 3D visualization. Going into the project, I also wanted to use curves in my visualization, with cubes connecting their endpoints. My final idea includes 5 cubes with different x and y axes that correspond to different metrics of the pitchers in the data set. The z-axis is constantly the winning percentage for each observation. Curves are created for each observation, and they will go through each cube. Hence, they are influenced by the 5 differing x, y metrics of each cube. Also, the colors of the curves are based off of the age of the pitcher.

Here is a zip file that contains the Processing Code, data, my final presentation, and a code book explaining the variables in my data set:
(On Thur 3/16 I have updated the GUI button colors and uploaded a new zip file)
Brianna_proj3.zip
(1.31 MiB) Downloaded 124 times
Last edited by briannagriffin on Thu Mar 16, 2023 3:57 pm, edited 5 times in total.

yanchenlu
Posts: 3
Joined: Thu Jan 12, 2023 4:20 pm

Re: Proj 3 - Student Defined Visualization Project

Post by yanchenlu » Mon Mar 06, 2023 11:49 pm

# ----------------------------------------------------------------------------------------------------------------
Update - 3/24:
Updated code again:
- fixed Contributor toggle (now the dropdown menu top bar also hides when the toggle is turned off)
- changed fonts to Futura
# ----------------------------------------------------------------------------------------------------------------
Update - 3/21:
I've finished moving all the text display to the GUI using ScrollableList. However, I was not able to show the pattern preview images in a ScrollableList. It doesn't look like it's supported in ControlP5. Therefore, I decided to move pattern display to the left side of the y-axis, and show one preview image per frame, randomly selecting from the pattern collection associated with the point the user's mouse is pointing at in the 3D space.
Here are some screenshots:
menu0.png
menu1.png
contributor_pattern_preview0.png
contributor_pattern_preview1.png
all_pattern_preview0.png
all_pattern_preview1.png
all_pattern_preview2.png
Here is a video (screen recording) of how the program is looking now:
Screen Recording.mov.zip
screen recording of program
(36.71 MiB) Downloaded 137 times
Here is the updated/final version processing code:
processing_code.zip
final version
(8.76 KiB) Downloaded 124 times
Here is the data set (unchanged from 3/7):
patterns.csv
(24.58 MiB) Downloaded 132 times
Github repo for this project: https://github.com/ynchn/data_viz/tree/main/proj3

# ----------------------------------------------------------------------------------------------------------------
Update - 3/7:
I made the size of each point correspond to the # of patterns in that coordinate. Larger point == more patterns @ (cols, num_colors, rows).
Screen Shot 2023-03-07 at 2.53.42 PM.png
Current code is a bit of a mess, but it's functional to view the data points in 3D. You can navigate with keys: left/right (move along x-axis), up/down (move along y-axis) and w/s (move along z-axis).

# ----------------------------------------------------------------------------------------------------------------
Initial post - 3/6:
I've finished scraping the pattern website for making macrame/friendship bracelets using BeautifulSoup. There are around 63K+ instances of data. I've saved them in a JSON file (forum doesn't allow JSON uploads so it's linked here ->https://github.com/ynchn/data_viz/blob/ ... terns.json (~35MB)) as well as in a CSV file. The dataset is quite large. A way to efficiently deal with them would be nice. If not, I can try to do sampling? Suggestions would be very much appreciated.

The dataset has the following information for now:
- id (int)
- pattern_url (string)
- preview_url_list (list of string), [preview_url, preview_urlx2, preview_urlx3]
- dimensions (list of int), [cols, rows]
- num_strings (int)
- num_colors (int)
- contributor_name (string)
- contributor_url (string)
I think I can also look into & collect the user defined tags in each pattern's pages, e.g. "arrow", "zigzag", "sunset". However, since these tags are put in by users, there is a lot of variability. I'm not sure how convoluted it'll be to deal with/how to simplify it, yet.
Attachments
Yanchen_proj3.zip
final-final 3/24 version
The data folder is included in this zip.
(1.65 MiB) Downloaded 113 times
Last edited by yanchenlu on Fri Mar 24, 2023 11:26 am, edited 10 times in total.

qinghuang
Posts: 5
Joined: Thu Jan 12, 2023 1:09 pm

Re: Proj 3 - Student Defined Visualization Project

Post by qinghuang » Tue Mar 07, 2023 10:02 am

In September 2020, The Social Dilemma caught the public’s attention, marking “the first time a documentary [had] ever been the most popular movie on Netflix” (Forbes, 2020, para. 4). This 90-minute documentary highlights the exploitation exerted by the major technology platforms and interviews with Silicon Valley whistleblowers. It vividly describes how technology companies influence elections, track users’ behavioral data to deliver targeted advertisements, and promote addictive features to keep users online.

This study mined 202,832 tweets using the Twitter application programming interface (API) for academic research (Twitter, 2022) to scrape tweets containing the keywords “the social dilemma” and “@SocialDilemma” posted within the one month after the movie’s release on Netflix. After cleaning the data (removing retweets, foreign languages, URLs, and bots), the final dataset contained 64,221 tweets. From these data, this study randomly sampled 3,000 tweets and their authors’ tweet counts to track their behavioral changes over time.

Variables: Sentiment Analysis & Topic Modeling
Using Multilayer Perceptron (Artificial Neural Network Classification; Pedregosa et al., 2011) and different sets of hyperparameters to improve the algorithms, our sentiment analysis model achieved a 90% validation accuracy rate. After supervised machine learning on some pre-labeled Brandwatch (a commercial social media monitoring platform) data, our model detected whether a tweet about The Social Dilemma was positive, negative, or neutral.

Considering the volume of the data, this study used topic modeling to reveal latent thematic structure in online discussions on Twitter and Reddit. This study will utilize standard Natural Language Processing (NLP) preprocessing steps (Honnibal & Montani, 2017) on a training dataset, such as tokenization (replacing all of our sensitive string data with unique identification symbols). With a Latent Dirichlet Allocation (LDA) model (Russell, 2013) to identify recurring clusters of co-occurring words (Törnberg & Törnberg, 2016), this study created topic groupings for keywords on tweets.

This is the first model I created for the Social Dilemma Twitter dataset. Three color presented three sentiments (neutral, positive, negative). As seen from the screenshot, most lines collapsed at the bottom of the previous model and it was difficult to discover patterns. I am planning to use a new 3D model (create a curve for each user instead of a straight line; the height of the curve depends on the difference between the post frequencies of the user 30 days before and after; if the frequency increases, the curve will be above a surface, and if it decreases, the curve will be under the surface; more control elements will be added to show the different topic groups).
Attachments
Screen Shot 2023-03-07 at 10.01.08 AM.png
Screen Shot 2023-03-07 at 10.01.16 AM.png
Screen Shot 2023-03-07 at 10.01.48 AM.png
first model.zip
(586.32 KiB) Downloaded 128 times
data-SD.csv
(128.47 KiB) Downloaded 121 times

zeyuwang
Posts: 3
Joined: Thu Jan 12, 2023 4:26 pm

Re: Proj 3 - Student Defined Visualization Project

Post by zeyuwang » Tue Mar 07, 2023 11:25 am

My idea is to create a data visualization of COVID-19 cases and deaths in the past three years for the five permanent members of the United Nations: UK, Russia, France, USA, and China. The visualization aims to compare and contrast the different government strategies implemented in each country and how they have affected the development of the pandemic. I have already completed the data preprocessing and have a clear sketch of how the data will be presented. Initially, my focus will be on creating the visualization, but I may explore the correlation between vaccination rates and COVID-19 deaths if time allows.
new version.png
newest version(03/22/2023)
sketch:
Note Mar 2, 2023.pdf
(373.85 KiB) Downloaded 146 times
example data:
2020China.csv
(14.73 KiB) Downloaded 140 times
source code:
Attachments
project3_zeyu 2.zip
(94.14 KiB) Downloaded 99 times
picture3.png
picture2.png
picture1.png
Last edited by zeyuwang on Wed Mar 22, 2023 1:48 pm, edited 4 times in total.

jhutson
Posts: 3
Joined: Thu Jan 12, 2023 1:08 pm

Re: Proj 3 - Student Defined Visualization Project

Post by jhutson » Tue Mar 07, 2023 12:10 pm

For my third project, I will be working with a fertility treatment dataset which documents 160,000 individual, anonymized patient who have received fertility treatment which is attached. The visualization will aim to give an idea of general trends for individuals in certain groups, such as single parents and queer couples, as well as allowing a user to follow a single patients journey as they underwent several stages of fertility treatment. Please see presentation of sketches for some details: https://docs.google.com/presentation/d/ ... sp=sharing.

Update 3/16:
I have taken ~3000 samples from this dataset--1000 patients with male partners, 1000 with female partners, and 1000 without a partner-- and visualized some of the steps they underwent while having IVF treatment. The visualization allows the user to select a patient manually or randomly, and will display a text description of their IVF treatment.

Update 3/20:
I have added options to select which patients are showing by their partners--can turn on/off patients with male partners, female partners, or no partners.
I also changed the scale for the last box so live birth is at the top of the cube rather than the middle. I made the size of the canvas static. I've uploaded a new zip file with the code.
Attachments
Hutson_Project3.zip
(290.31 KiB) Downloaded 126 times
Screen Shot 2023-03-16 at 10.28.10 AM.png
Screen Shot 2023-03-16 at 10.26.47 AM.png
Screen Shot 2023-03-16 at 10.25.39 AM.png
ar-2017-2018.xlsx
(25.49 MiB) Downloaded 111 times
Last edited by jhutson on Mon Mar 20, 2023 9:05 am, edited 2 times in total.

lu_yang
Posts: 9
Joined: Mon Sep 26, 2022 10:23 am

Re: Proj 3 - Student Defined Visualization Project

Post by lu_yang » Wed Mar 15, 2023 9:39 pm

1.jpg
2.jpg
3.jpg
Lu Yang MAT259 Project 3.pdf
(992.73 KiB) Downloaded 147 times
Flocking.zip
(2.57 MiB) Downloaded 147 times

qinghuang
Posts: 5
Joined: Thu Jan 12, 2023 1:09 pm

Re: Proj 3 - Student Defined Visualization Project

Post by qinghuang » Thu Mar 16, 2023 1:48 pm

qinghuang wrote:
Tue Mar 07, 2023 10:02 am
In September 2020, The Social Dilemma caught the public’s attention, marking “the first time a documentary [had] ever been the most popular movie on Netflix” (Forbes, 2020, para. 4). This 90-minute documentary highlights the exploitation exerted by the major technology platforms and interviews with Silicon Valley whistleblowers. It vividly describes how technology companies influence elections, track users’ behavioral data to deliver targeted advertisements, and promote addictive features to keep users online.

This study mined 202,832 tweets using the Twitter application programming interface (API) for academic research (Twitter, 2022) to scrape tweets containing the keywords “the social dilemma” and “@SocialDilemma” posted within the one month after the movie’s release on Netflix. After cleaning the data (removing retweets, foreign languages, URLs, and bots), the final dataset contained 64,221 tweets. From these data, this study randomly sampled 3,000 tweets and their authors’ tweet counts to track their behavioral changes over time.

Variables: Sentiment Analysis & Topic Modeling
Using Multilayer Perceptron (Artificial Neural Network Classification; Pedregosa et al., 2011) and different sets of hyperparameters to improve the algorithms, our sentiment analysis model achieved a 90% validation accuracy rate. After supervised machine learning on some pre-labeled Brandwatch (a commercial social media monitoring platform) data, our model detected whether a tweet about The Social Dilemma was positive, negative, or neutral.

Considering the volume of the data, this study used topic modeling to reveal latent thematic structure in online discussions on Twitter and Reddit. This study will utilize standard Natural Language Processing (NLP) preprocessing steps (Honnibal & Montani, 2017) on a training dataset, such as tokenization (replacing all of our sensitive string data with unique identification symbols). With a Latent Dirichlet Allocation (LDA) model (Russell, 2013) to identify recurring clusters of co-occurring words (Törnberg & Törnberg, 2016), this study created topic groupings for keywords on tweets.

This is the first model I created for the Social Dilemma Twitter dataset. Three color presented three sentiments (neutral, positive, negative). As seen from the screenshot, most lines collapsed at the bottom of the previous model and it was difficult to discover patterns. I am planning to use a new 3D model (create a curve for each user instead of a straight line; the height of the curve depends on the difference between the post frequencies of the user 30 days before and after; if the frequency increases, the curve will be above a surface, and if it decreases, the curve will be under the surface; more control elements will be added to show the different topic groups).
Attachments
new_panel.zip
(265.17 KiB) Downloaded 99 times
Screen Shot 2023-03-16 at 2.36.24 PM.png
Screen Shot 2023-03-16 at 2.36.48 PM.png
Screen Shot 2023-03-16 at 2.37.08 PM.png
Screen Shot 2023-03-16 at 2.37.35 PM.png

arnavkumar
Posts: 3
Joined: Fri Jan 07, 2022 12:14 pm

Re: Proj 3 - Student Defined Visualization Project

Post by arnavkumar » Sat Mar 18, 2023 7:10 pm

Initial Concept:

The focus of this analysis is on the exploration of future possible pathways of data analysis for researching CBL-VR’s historical video archive, as well as incoming videos from current research. CBL-VR is a VR education project taking place under UCSB’s Gevirtz Graduate School of Education, and the videos themselves consist of research participants, who are mainly elementary school children, as well as CBL-VR researchers. The videos and their structure are extremely ad-hoc, with the purpose being to study participants and how they interact within a 3D, virtual reality space. The analysis would provide a small part of the research foundation supporting the ongoing development of a VR game under the same project, which is being designed to assist native and non-native English speakers to learn English.
Screenshot 2023-03-18 at 7.32.30 PM.png
Screenshot 2023-03-18 at 7.34.35 PM.png
Attachments
AzureVideoIndexerInsights3DGraph.zip
(127.62 KiB) Downloaded 108 times
Arnav Kumar MAT259 Project 3 Report.pdf
(8.4 MiB) Downloaded 123 times

Post Reply