## Proj 3 - Student Defined Visualization Project

Posts: 163
Joined: Wed Sep 22, 2010 12:26 pm

### Proj 3 - Student Defined Visualization Project

Proj 3 - Student Defined Visualization Project

Final Project Schedule
Feb 16 - Schedule review, examples of previous final projects and introduction to JSON by Weihao
Feb 18 - Discussion of final project ideas by each student
--
Feb 23 - Association Rule-Learning [FP Tree Algorithm]
Feb 25 - Lab and individual meetings
--
Mar 02 - Work-in-Progress group presentation
Mar 04 - Overview of online project documentation template for vislab.mat.ucsb.edu website
--
Mar 09 - Final Project Class Presentation
Mar 11 - Final Project Class Presentation & Online Documents
--

--------------
DETAILS

Students determine their data and visualization design. Data must be multi-dimensional and visualization has to be in 3D.

Your first task is to identify and select your data. This can be a continuation of the Seattle library data, or data from other sources. Innovative approaches in data content, data sampling and analysis, are a key to a successful project. Data can also be correlated between multiple sources. Visualization software to be used is Processing so that we can compare and learn from each other's projects.

The data should be granular, meaning that there should be a significant density of data to be visualized in 3D space. Each data’s x,y,z position should be directly defined by the data’s values. Preference is for the data to determine the visual form, rather than matching data to an existing form, for instance, a geographic map has a pre-determined visual/spatial organization.

The project should reveal an understanding of how to use spatial relationships, color coding, interaction methods, and all the features of visual language basics covered in the previous demos and projects.

--------------

ashleybruce
Posts: 3
Joined: Thu Jan 07, 2021 2:59 pm

### Re: Proj 3 - Student Defined Visualization Project

Introduction and Data Collection: For my initial idea for this project, I wanted to dive a bit deeper into my background as a Biologist and find some data in that field. Upon looking through countless databases, I stumbled upon an experiment that was intriguing to me. This experiment looked at the sensitivity to E.coli to different antibiotics and treatment methods. A link to the data and experiment can be found here:
Dryad Data -- Associations between sensitivity to antibiotics, disinfectants, and heavy metals in natural, clinical and labor.url.zip
Website with link to experiment article and relevant files to the experiment
Upon digging deeper into the data provided, one file in particular seemed it would be the best candidate for data visualization -- the raw data for their experiment. This file can be found here:
datavisData.csv
Raw Data with some notes on the right hand side for myself
Compared to the original file downloaded from the website, I organized the data a bit more so it would be easier to read, as well as included some notes on the right hand side to make visualizing it a bit easier.

Because a lot of different terms are used throughout the experiment, I will give a brief summary of the data collection process to help understand the data a little better.

Experiment Background: This experiment looked at the sensitivity of 96 different strains of E.coli to a range of different antibiotics and phages. Two have two different strains of E.coli, it means that each bacterium differs genetically in some way from one another. A more common example to think of would be the flu. H1N1 is a different strain than H3N2 (both pandemic causing flu strains) due to genetic differences, but still similar enough to remain the same species of virus. This is the case for E.coli, except instead of 2 different strains, there were 96 being looked at.

Each strain of E.coli was subject to different treatment methods, consisting of antibiotics and phages. Each treatment method then was tested against each strain in different concentrations. The growth of each strain subjected to different treatments was then measured. How growth was measured was by using a spectrometer, which looks at how much light passes through a given sample. In the case of this experiment, a spectrometer was used to measure the optical density (OD) of each sample at 600nm. OD 600 is commonly used for measuring the concentration of bacteria or other cells in a liquid as the 600nm wavelength does little to damage or hinder their growth. The higher the concentration of bacterial in a liquid culture, the higher the optical density of that culture when measured.

Initial Ideas: Understanding a bit more about the data, I began to sketch out a few ways to visualize the data.
Sketch of Initial Ideas
As can be seen from the sketches, the main thing I felt was important was that each strain had to be considered separately from other strains. Perhaps one treatment is more effective against one strain than other, so lumping all of them together seemed counter-intuitive to me. My initial sketch shows each strain and treatment being a point on an axis, but because there are 96 different strains, each strain being subject to 24 different treatment methods, one axis alone would have had over 2000 different discrete numbers. I felt this would make it extremely hard to compare the effectiveness of different treatments among strains.

Initial Visualizations: In order to better separate the strains from one another, I decided it would be best to implement a drop-down so that the user could choose which strain they wanted to look at, which can be seen below.
Next, I plan on adding lines to connect the points of each treatment method within each strain. This will help visualize the effectiveness of each treatment within each strain.
Last edited by ashleybruce on Tue Mar 02, 2021 4:50 pm, edited 2 times in total.

wsheppard
Posts: 3
Joined: Thu Jan 07, 2021 3:09 pm

### Re: Proj 3 - Student Defined Visualization Project

My first idea for this project is to visualize the relative performance of tennis players over time by looking at head-to-head matchups. For now, the data I've collected describes men's professional tennis matches on the ATP tour during the "Open Era" from 1968 through 2016. The data lists the winners and losers of every match, the match score, the tournament at which the match was played, the tournament seed of the winning and losing players, and a few other statistics like the number of games won by each player.

The data I've compiled comes from https://datahub.io/sports-data/atp-worl ... ennis-data, which I believe ultimately comes from the ATP website.

EDIT: I've changed my project to visualizing the characteristics of microplastics in the San Francisco Bay. This data was collected by the California Natural Resources Agency and is available here: https://data.cnra.ca.gov/dataset/microplastic-sf-bay

This represents characteristics of individual plastic particles retrieved from the San Francisco Bay in the form of surface water, stormwater, sediment, effluent emissions, or from prey fish tissue. The data includes the color of the particle, the type of plastic, the morphology of the particle, the date it was collected, and some cases the length and width in mm of the particle. My first draft of this project attempts to map the particles on a 3D grid, where axes represent the length, width, and morphology of the particles. This will ideally make some of the coarse patterns in the data clear. Some properties are already apparent: for each morphology, the size distribution of particles takes a particular shape of either a long blob or a boomerang. The different morphologies also have different color distributions.
Attachments
sheppard plastic sketch.pdf
Last edited by wsheppard on Tue Mar 02, 2021 4:47 pm, edited 2 times in total.

zhuowei
Posts: 3
Joined: Thu Jan 07, 2021 3:00 pm

### Re: Proj 3 - Student Defined Visualization Project

Concept:

I’ve been really interested in the stock market recently. I’ve heard people saying things like ‘Buy the rumor, sell the news.’ Or ‘Price drops with good earnings and goes up with bad earnings.’, which is kind of counterintuitive. For this project, I’m interested in looking at how news/earnings reports/social media popularity actually affect the stock price.

Design:

Use spy return as the baseline and plot the relative return for all 500 stocks in s&p 500 around spy. When there’s a news/earnings event/social media discussion, a point with different color will be drawn for that stock at that date. Different color represents the degree of different event. For example, green means good news, blue means bad news and the color will be in gradient to represent different degrees of good or bad news. I also want to show the aggregated results across all stocks for the entire time period of the data. The idea is to show the return of the stocks 1-5 days after the events. We will categorize the events based on their degree(good news->bad news; good earning->bad earing; popular->not so popular) And for each category, we will show the distribution of the return with boxplots.

Sketch is attached.

Data Description:

We will use historical stock price data for all stocks in s&p 500; news data; earning data; social media data. Stock data is usually used for stock price prediction. Combine with news sentiment, it could potentially help the prediction. Screenshots of some of the dataset are attached.

Earning data shows:
symbol: ticker of stock
date: earnings date
qtr: which quarter’s earning is reported
eps_est: earnings per share estimated
eps: earnings per share actually
release_time: the time of the report release: post means after market closes, pre means before market opens

For social media data, I’m thinking about getting reddit data. Some of the important variables in the dataset:
Title: title of the posts, where we can parse out the ticker in discussion
Upvote_ratio: upvote/downvote ratio, where we get the sentiment towards this post
Total comments: how many comments are under this post, which allow us to know the popularity of this post

I’m still working on the news data. The idea is to scrape the news title and do sentiment analysis on the title to know if they are positive or negative news. And then we will use that for our visualization.
Attachments

colette_lee
Posts: 3
Joined: Thu Jan 07, 2021 3:07 pm

### Re: Proj 3 - Student Defined Visualization Project

My idea for my project is to visualize data from Twitter obtained from the Twitter scraping tool Twint. I can use it to obtain data from tweets with a certain topic including username, likes, retweets, etc. I would like to continue with the concept from my first project and use Twitter to analyze and visualize people's general attitudes about capitalism and/or socialism. As of now I've collected a dataset with tweets containing the word 'capitalism' with at least 1000 likes from January 2019 to the present along with each tweet's tweetId, username, date created, likes, and retweets. Ideally I would like a way to check if the tweet has either a positive or negative attitude about the topic, but I'm not sure of a way to do that accurately. I would also like to analyze the overall language used in these tweets, such as commonly used words, overlapping topics, etc. I am still working on the design concept for the project.

ingmar_sturm
Posts: 3
Joined: Thu Jan 07, 2021 3:10 pm

### Re: Proj 3 - Student Defined Visualization Project

Concept Idea and data

I'd like to visualize migration stock data, i.e. how many citizens from another country reside in any given country. For 235 countries I have data on how many people from each of the 235 countries live there over 7 time periods (1990-2019). I also have data for migration flows for 45 countries, i.e. how many people moved from one country to another in any given year. There is aggregate data available on "net migration", i.e. people moving to a country minus people leaving that country but this data doesn't specify where somebody is moving from.

The source of the stock data is the United Nations. I mainly picked it because I research human migration for my PhD. I think that it is interesting because migration is such a big part of our lives and at least since the 1990s, there is data available on migrant stocks for virtually all countries.

I envision that each country is represented as a sphere (see picture). The size of each sphere is proportional to the number of migrants living in the country it represents. To tell each country apart, each sphere has a different color. There will be a toggle that allows the user to switch through different years. On rollover, some country data is displayed.
Extension ideas

After implementing the concept above, I can implement the following extensions:
1. There could be ribbons/vertices visualizing the migration movements. Unfortunately, I only have data on 45 countries, and the dates of the stocks and flows datasets do not match up. I will keep looking for better data but it doesn't look good.
2. The spheres can be organized in space according to their location on planet earth. This has the nice effect of allowing for a change of perspective, i.e. south can be up.

richardjiang
Posts: 3
Joined: Thu Jan 07, 2021 3:05 pm

### Re: Proj 3 - Student Defined Visualization Project

Project Description

I am primarily aiming to visualize certain aspects of the game of basketball in a 3D way.

Main Data Source

The data I will use is obtained via the very extensive stats.nba.com REST API. Several libraries exist for accelerating this process depending on the language used. For this project, I will focus on the data of every shot taken by every player during the 2018-19 NBA regular season. Other seasons are available, but to limit the scope, this was the latest complete season (excluding the 2019-20 COVID affected season which resulted in a gap in data collection).

Briefly, for each shot, various fields are available but of primary interest is the X and Y coordinate of the shot and whether each shot was made or missed. The X and Y coordinate are mapped to half-court coordinates on the 47' x 50' NBA court despite the slight asymmetry in the game. Altogether, this encompasses ~190k shots spread among 30 teams over 1230 games and all of the shot-taking NBA players.

In the past few years, professional sports has become significantly more analytical with teams focusing on designing their team and plays around statistical numbers. One of the most obvious and apparent changes is shot selection in the NBA which came largely from noting the success of slightly riskier 3 point shots. This type of data is known to have changed the way that basketball games flow and has been well documented by several people.

Work-In Progress

To be completed.

lfloegelshetty
Posts: 4
Joined: Thu Jan 07, 2021 3:02 pm

### Re: Proj 3 - Student Defined Visualization Project

Project Description

I want to visualize the amount of CO2 that is generated by each state over the years and the sources that it comes from.

Data Source

The data source I will be using is from the EPA on Greenhouse Gases generated in the US. It contains the metric tons of CO2 generated by each state, the location of its source, and a breakdown of the type of source it is i.e. pulp and paper, metals and pulp and paper.
flight (1).xls