Proj 3 - Student Defined Visualization Project
This is the 3rd of 3 assignments to be realized in the MAT 259 Data Visualization course:
https://www.mat.ucsb.edu/~g.legrady/aca ... 4w259.html
DETAILS
The final project integrates all things learned through the course, such as asking an interesting MysQL question, working with large data, and visualizing the data through Processing. The big difference is that each student selects their dat and visualization design. Data must be multi-dimensional and visualization has to be in 3D.
Your first task is to identify and select your data. Some of you may be working with datasets in your studies. This is the opportunity to try an alternative visualization of the data. Key current research topics of interests include environmental data, political data, news language analysis, bio diversity, etc. One of the challenges for datavis is that it takes time to learn what the data can do, so continuation with working with the Seattle library data is also an option.
Project criteria evaluation: Significant effort in innovative approaches in data content, data sampling and analysis, are a key to a successful project. Data can also be correlated between multiple sources. Visualization software environment to be used is Processing but data pre-processing can be done in other softwares like Python or R. In fact the Processing interface does have a Python and R so these are also possibilities.
The data should be relevant and granular, meaning that there should be a significant density of data to be visualized in 3D space. Each data’s x,y,z position should be directly defined by the data’s values. Preference is for the data to determine the visual form, rather than matching data to an existing form, for instance, a geographic map has a pre-determined visual/spatial organization.
The project should reveal an understanding of how to use spatial relationships, color coding, interaction methods, and all the features of visual language basics covered in the previous demos and projects.
Proj 3 - Student Defined Visualization Project
Re: Proj 3 - Student Defined Visualization Project
The dataset I am working with is all National Basketball Assocation (NBA) players' statistics by season since 1948. This dataset is made available through the NBA API and I am accessing it through a Python package called nba_api. There are a total of 4,831 players in the history of the NBA. There is an API request limit but I have now compiled the full dataset, which is attached.
The idea for the visualization so far is to do a 3D plotting of basic stats (points, rebounds, assists, steals, and blocks) per season for every player. The idea in my head as of now is that you can click through each year, viewing historical traces of players that remained in the league the next season. I would also like other buttons that allow you to see all players and all years, players with the max value for each stat, and maybe even a view of all players on a team in any given timeframe. Attached are sketches that show my idea for this visualization. What is yet to be figured out is how I will display my information; I could use Bezier curves, but want the data to be readable, so I might end up combining rebounds+assists and steals+blocks. Or I might just eschew steals and blocks altogether as they are the less important statistics.
The idea for the visualization so far is to do a 3D plotting of basic stats (points, rebounds, assists, steals, and blocks) per season for every player. The idea in my head as of now is that you can click through each year, viewing historical traces of players that remained in the league the next season. I would also like other buttons that allow you to see all players and all years, players with the max value for each stat, and maybe even a view of all players on a team in any given timeframe. Attached are sketches that show my idea for this visualization. What is yet to be figured out is how I will display my information; I could use Bezier curves, but want the data to be readable, so I might end up combining rebounds+assists and steals+blocks. Or I might just eschew steals and blocks altogether as they are the less important statistics.
- Attachments
-
- full_nba_data.csv
- (2.98 MiB) Downloaded 164 times
Last edited by paulkim on Thu Feb 29, 2024 8:29 pm, edited 1 time in total.
Re: Proj 3 - Student Defined Visualization Project
My first concept aims to visually explore and interpret how deep learning models(such as AlexNet), understand and process image data. Utilizing activation layer outputs extracted from the model for various images, I plan to create a 3D visualization that demonstrates the journey of image features through the model's layers.
The foundation of my visualization is the multidimensional activation data obtained from each layer of the AlexNet model, representing the model's response to different images.
To extract layer outputs from the model, I first used pytorch to read an image from the ImageNet dataset and classified this image using the AlexNet model, then capture the intermediate results for each of layers in the model. I then dump the output values of all the activation layers to a CSV file, but the activation layer outputs can contain thousands of values which can lead to the generation of very large CSV files, so I create separate CSV files for each activation layer and name each file with the corresponding layer name: download/file.php?mode=view&id=7235
Also, since CSV is a two-dimensional data format, so I wanted to mimic the data format of the CSV file to the structure of higher dimensional data, especially for a 4D tensor, so I processed and saved the data in the form of [batch_size, channels, height, width]. The final result looks like this:
download/file.php?mode=view&id=7236
These values do not have a specific meaning, It just represents some data generated by deep learning model when processing images, but I think the data pattern may have some changes when the model is processing different kinds of images. I'm also very curious about the different patterns of data change for different kinds of images, so I'd like to present it in the form of a data visualization.
Since ImageNet and AlexNet are a little bit outdated (but it's not affect the observation of the data pattern of image deep learning), I also have another alternative concept:
My second concept is based on this dataset: Stress Analysis in Social Media: https://www.kaggle.com/datasets/ruchi79 ... media/data
This concept aimed at exploring and representing the complex, interconnected discussions of stress within social media communities(Reddit). I draw inspiration from the natural world, the intricate and supportive networks formed by mycelium in fungal colonies. This project seeks to map out the spread and dynamics of stress-related discussions across various Reddit communities. By visualizing these discussions as a mycelial network, the project aims to visualize how stress is shared, discussed, and mitigated through communal support in the digital realm.
The foundation of my visualization is the multidimensional activation data obtained from each layer of the AlexNet model, representing the model's response to different images.
To extract layer outputs from the model, I first used pytorch to read an image from the ImageNet dataset and classified this image using the AlexNet model, then capture the intermediate results for each of layers in the model. I then dump the output values of all the activation layers to a CSV file, but the activation layer outputs can contain thousands of values which can lead to the generation of very large CSV files, so I create separate CSV files for each activation layer and name each file with the corresponding layer name: download/file.php?mode=view&id=7235
Also, since CSV is a two-dimensional data format, so I wanted to mimic the data format of the CSV file to the structure of higher dimensional data, especially for a 4D tensor, so I processed and saved the data in the form of [batch_size, channels, height, width]. The final result looks like this:
download/file.php?mode=view&id=7236
These values do not have a specific meaning, It just represents some data generated by deep learning model when processing images, but I think the data pattern may have some changes when the model is processing different kinds of images. I'm also very curious about the different patterns of data change for different kinds of images, so I'd like to present it in the form of a data visualization.
Since ImageNet and AlexNet are a little bit outdated (but it's not affect the observation of the data pattern of image deep learning), I also have another alternative concept:
My second concept is based on this dataset: Stress Analysis in Social Media: https://www.kaggle.com/datasets/ruchi79 ... media/data
This concept aimed at exploring and representing the complex, interconnected discussions of stress within social media communities(Reddit). I draw inspiration from the natural world, the intricate and supportive networks formed by mycelium in fungal colonies. This project seeks to map out the spread and dynamics of stress-related discussions across various Reddit communities. By visualizing these discussions as a mycelial network, the project aims to visualize how stress is shared, discussed, and mitigated through communal support in the digital realm.
- Attachments
-
- image0.zip
- (2.92 MiB) Downloaded 167 times
-
- Stress Analysis in Social Media.zip
- (1.3 MiB) Downloaded 171 times
Re: Proj 3 - Student Defined Visualization Project
For the final project, I want to realize a data visualization about trending movies in recent years.
About the dataset part, I found this dataset on Kaggle, which contains information about the top 200 movies released in 2023 (https://www.kaggle.com/datasets/mohamma ... -2023/data).
And I also found an API called TMDB (https://developer.themoviedb.org/refere ... ng-started). It has information about movies, TV shows, or actor images.
Update: I got licenses from IMDb Non-Commercial Datasets(https://developer.imdb.com/non-commercial-datasets/). I am determined to merge title.basics.tsv.gz and title.ratings.tsv.gz datasets to extract up-to-date movie ratings and information. This combined data will be used to create informative data visualizations. (I cannot upload them because of the size limitation. I'll upload the combined and filtered one after finishing)
About the visualization part, I want to develop a model that may look more similar to what Weidi Zhang did in 2018. (https://vislab.mat.ucsb.edu/2018/p3/Wei ... index.html) With the movie's title as a structure, I could make several 3D-polygon-shaped models that may show the trend of movies maybe from different distributors or released in different months.
-------------------------------------------------------------------------------------------------------------
I used IMDb Non-Commercial Datasets to select movies since 2020 and combined them with ratings using Dask(Dask is a Python library for parallel and distributed computing).
The sketch is attached below. About the visualization part, the three axes would be year, numVotes, and ratings. The numVotes decides the point size and transparency, and the rating decides if the title is horizontal or vertical.
I'm currently working on loading data and building the 3D-space model in Processing.
About the dataset part, I found this dataset on Kaggle, which contains information about the top 200 movies released in 2023 (https://www.kaggle.com/datasets/mohamma ... -2023/data).
And I also found an API called TMDB (https://developer.themoviedb.org/refere ... ng-started). It has information about movies, TV shows, or actor images.
Update: I got licenses from IMDb Non-Commercial Datasets(https://developer.imdb.com/non-commercial-datasets/). I am determined to merge title.basics.tsv.gz and title.ratings.tsv.gz datasets to extract up-to-date movie ratings and information. This combined data will be used to create informative data visualizations. (I cannot upload them because of the size limitation. I'll upload the combined and filtered one after finishing)
About the visualization part, I want to develop a model that may look more similar to what Weidi Zhang did in 2018. (https://vislab.mat.ucsb.edu/2018/p3/Wei ... index.html) With the movie's title as a structure, I could make several 3D-polygon-shaped models that may show the trend of movies maybe from different distributors or released in different months.
-------------------------------------------------------------------------------------------------------------
I used IMDb Non-Commercial Datasets to select movies since 2020 and combined them with ratings using Dask(Dask is a Python library for parallel and distributed computing).
The sketch is attached below. About the visualization part, the three axes would be year, numVotes, and ratings. The numVotes decides the point size and transparency, and the rating decides if the title is horizontal or vertical.
I'm currently working on loading data and building the 3D-space model in Processing.
- Attachments
-
- Top_200_Movies_Dataset_2023(Cleaned).csv
- (15.44 KiB) Downloaded 159 times
Last edited by jingpeng on Tue Mar 05, 2024 12:19 am, edited 3 times in total.
Re: Proj 3 - Student Defined Visualization Project
The dataset I wanted to work with is based on the pictures posted on the "Astronomy Picture of the Day" website by NASA.
This is the website: https://apod.nasa.gov/apod/archivepix.html and this is the GitHub page for the API connection: https://github.com/nasa/apod-api , https://www.kaggle.com/datasets/harshit ... collection
The dataset is fairly minimal on the aspect of data. The main source is the images related to our cosmos, which are posted daily by NASA, along with a title and a small description. There is one archive from 1995 to 2015 and then a separate source from 2015 till today.
The data that I can extract from the database is the picture for each day, the date of the post, and the concept tags for each picture.
**I'm still working on connecting the API correctly to receive the data**
I'm currently in the process of looking at visualizations that are related to a large image dataset or in general distribution of images in a 3D environment.
This is the website: https://apod.nasa.gov/apod/archivepix.html and this is the GitHub page for the API connection: https://github.com/nasa/apod-api , https://www.kaggle.com/datasets/harshit ... collection
The dataset is fairly minimal on the aspect of data. The main source is the images related to our cosmos, which are posted daily by NASA, along with a title and a small description. There is one archive from 1995 to 2015 and then a separate source from 2015 till today.
The data that I can extract from the database is the picture for each day, the date of the post, and the concept tags for each picture.
**I'm still working on connecting the API correctly to receive the data**
I'm currently in the process of looking at visualizations that are related to a large image dataset or in general distribution of images in a 3D environment.
- Attachments
-
- archive.zip
- (9.52 MiB) Downloaded 169 times
Re: Proj 3 - Student Defined Visualization Project
Here is an updated WIP:
Attached screenshot shows data for the 2023-2024 NBA season, with points on the X-axis, rebounds on the Y-axis, and assits on the Z-axis.
Left and right clicks will take you through the seasons. Mouseover spheres will show player name, points, rebounds, and assists for that season.
To-do remaining:
1. Create trace of previous season
2. Create more interesting color gradient as values increase along axes
3. Create all-years toggle view
Code is also attached--will schedule meeting with Jenni for tomorrow!
Attached screenshot shows data for the 2023-2024 NBA season, with points on the X-axis, rebounds on the Y-axis, and assits on the Z-axis.
Left and right clicks will take you through the seasons. Mouseover spheres will show player name, points, rebounds, and assists for that season.
To-do remaining:
1. Create trace of previous season
2. Create more interesting color gradient as values increase along axes
3. Create all-years toggle view
Code is also attached--will schedule meeting with Jenni for tomorrow!
- Attachments
-
- proj3.zip
- (21.29 MiB) Downloaded 157 times