The 3D visualization project explores the placement of data into 3D space as a way to visualize the performance of data.
1.28 p5, 3D labeling, Treemap, InfoGraph (Weidi in class coding for the p5)
1.30 Work in progress presentation - You will describe your concept to the class
2.04 Lab individual Meetings
2.06 3D Student project presentations (1/2 of class)
2.11 3D Student project presentations (2/2 of class)
The PeasyCam is the Processing library that allows for the 3D spatialization and mouse interaction: http://mrfeinberg.com/peasycam/
Control P5: http://www.sojamo.de/libraries/controlP5/ to add buttons if needed
Color Sampler: http://tristen.ca/hcl-picker/#/hlc/6/1/A7E57C/2C4321
Some Processing functions for 3D:
Translate, pushmatrix, popmatrix functions are introduced. Information about push, pop and translation can be found at:
P5 or PROCESSING?
We are introducing P5.js https://p5js.org/ as students have asked if they can post their interactive works on the internet. P5 is an interpretation of Processing for the web. You can choose to work either in Processing or P5. We restrict assignments to these environments as the quarter is short, and this way you can learn from each other's work and from previous projects which you can find at http://vislab.mat.ucsb.edu/ (click on course). If you find code in previous assignments, or on the internet, or from anywhere else it is critical to give credit to where you got the code from, otherwise its plagiarism.
For instance, if you were to use a code segment from Mert Toka's 3D demo which I showed in class: viewtopic.php?f=72&t=291&start=10#p1930 where you use the code inside your code, you would add something like:
//This function was previously published by Mert Toka at viewtopic.php?f=72&t=291&start=10#p1930
X,Y,Z,etc. DATA COLUMNS NEEDED
For this to work there is a need to have a minimum of 4 metadata for each record as we need values for the horizontal (x), the vertical (y), the depth (z), and the brightness or color or size of the x,y,z point in the 3D space. Additionally, if you are using texts from the titles or subjects that will be a 5th column. Also if you want to place pictures (book, cd, movie pictures covers) which you can get from Amazon, Apple, and other sources, the JSON demo at the syllabus shows how this is done.
GENERAL DATA SCIENCE KNOWLEDGE DISCOVERY APPROACHES:
1) Clustering & Searching for Patterns – finding the performance of a specific topic, data, relationships
2) Anomaly Detection – identifying instances that do not conform to the typical data in a set
3) Association rule-mining: Looking for co-occurrences, finding combinations in a set of transactions: If (hot dogs + ketchup) then BEER
4) Predictions: Recommendations based on past events as introduced in class by GuanYuchen: viewtopic.php?f=78&t=317&start=10#p2179
The outcome of this project will be graded and will constitute the mid-term grade. If by chance it turns out that you may get a lower grade then desired, then you can upgrade the project during the course to increase the grade. So no need to worry.
What are the conditions for a good grade:
1) An interesting MySQL query
2) A working interactive visualization in 3D
3) Design based on the data: Let the metadata values determine where and how the data is to be located in 3D
4) Visual Coherence: Visualization should follow standard design rules. Consider space, the function of color, clean fonts (Arial, Helvetica, Futura, etc.) Review examples at the course website: https://www.mat.ucsb.edu/~g.legrady/aca ... ences.html If unclear, ask me.
CONTENT INNOVATION: your query question and outcomes. How original, engaging, unusual, your query, or your approach to the query may be, and how interesting the data may be. The data has to be multivariate, and granular (meaning a lot of data) so that we can see patterns forming in the data.
DESIGN: The design can build on our demos but hopefully go beyond. Areas of exploration are in how you use space, form, colors, data organization, timing, interaction, coherence, direction, etc.
COMPUTATION: The third evaluation is the computational component. First of all, the code needs to work. Special consideration will be for unusual, elegant expression, utilizing functions, algorithms, etc that you can introduce to the class.
This is a lot to cover in the short time we have. Take one step at a time!
Please check again as I may update this assignment description!
Project online demo:
1 Data processing
Monthly checkouts of the novels related to A Song of the ice and fire and monthly total checkouts of all novels were extracted using mySQL.
1.1 SQL Queries
Code: Select all
SELECT EXTRACT(YEAR_MONTH FROM cout) AS year_months, COUNT(CASE WHEN (LOWER(title) IN ('game of thrones' , 'game of thrones the graphic novel Volume 1', 'game of thrones the graphic novel Volume 2', 'game of thrones the graphic novel Volume 3', 'game of thrones the graphic novel Volume 4') AND itemtype = 'acbk') THEN 1 END) AS novel1, COUNT(CASE WHEN (LOWER(title) IN ('clash of kings' , 'clash of kings the graphic novel Volume 1', 'clash of kings the graphic novel Volume 2', 'clash of kings book two of A song of ice and fire') AND itemtype = 'acbk') THEN 1 END) AS novel2, COUNT(CASE WHEN (LOWER(title) = 'storm of swords' AND itemtype = 'acbk') THEN 1 END) AS novel3, COUNT(CASE WHEN (LOWER(title) = 'feast for crows' AND itemtype = 'acbk') THEN 1 END) AS novel4, COUNT(CASE WHEN (LOWER(title) = 'dance with dragons' AND itemtype = 'acbk') THEN 1 END) AS novel5, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete first season' AND itemtype = 'acdvd') THEN 1 END) AS season1, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete second season' AND itemtype = 'acdvd') THEN 1 END) AS season2, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete third season' AND itemtype = 'acdvd') THEN 1 END) AS season3, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete fourth season' AND itemtype = 'acdvd') THEN 1 END) AS season4, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete fifth season' AND itemtype = 'acdvd') THEN 1 END) AS season5, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete sixth season' AND itemtype = 'acdvd') THEN 1 END) AS season6, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete seventh season' AND itemtype = 'acdvd') THEN 1 END) AS season7, COUNT(CASE WHEN (LOWER(title) = 'game of thrones the complete eighth season' AND itemtype = 'acdvd') THEN 1 END) AS season8 FROM spl_2016.outraw WHERE EXTRACT(YEAR_MONTH FROM cout) != '202002' GROUP BY EXTRACT(YEAR_MONTH FROM cout) LIMIT 500;
1.2.1 Tidy the dataset.
Monthly checkouts percent of the novels related to A Song of the ice and fire were calculated. The same pre-processing procedure were done for the DVDs of the HBO TV series counterparts. I calculated the monthly percentage instead of using raw checkouts because according to project 1, my classmates also found similar trends of checkouts regarding to other items. Thus, I would like to control the total checkouts to exclude the factor that people might not borrow items too frequently these years in general. Then, the monthly percentages were input into Rstudio to develop two time series models for the novels and the DVDs separately.
1.2.2 Missing values for 02/2018 and 01/2018
Before model development, it was noted that there are missing values for January and February in 2018. Thus, I used an interpolation function to estimate the values for these two months (See details in the .rmd file).
1.2.3 Time-series model development
Based on theories related to time series, variance in some types of time series can be decomposed into three different parts. First part is a main trend, which means, in general, a decreasing or increasing or quadratic trend over years. The second part is seasonality, which means the monthly checkouts vary due to the time within a year. For example, people may borrow more novels or DVDs in winter due to Christmas. The last part is random component, which means there are some trivial and dynamic factors causing the fluctuations in monthly checkouts. For example, please see below variation decomposition figure for novels. 2. 3D Visualization
2.1 Initial Sketch Each layer, I would like to show the data differently. The layers for the raw data will show be bar graphs for monthly checkout percent based on the data withdrawing from mySQL. (The values for January and February in 2018 have been interpolated).
As for the layers for the time series model, The thicker line shows the main trend, two bounds near the main trend shows the random part and the line spiraling along the year axis shows the seasonality.
Show general structure of the visualization contained by 4 layers. Users can use "Change Color Solution" button to switch among 3 sets of color solution Users can check different perspectives of the structure by pressing different button and the instructions are on the left-bottom side of the screen. Users can using the checkbox to turn on/off the random dots bouncing between the bounds of the random component. 2.3 Interactivities
Other checkboxes were added to turn on/off different labels, such as year.
If mouse move to the left side of the screen (close to the buttons and checkboxes) the instructions below will show up. However, if mouse move away, the instructions will disappear. I added this feature to make the whole canvas tidier.
3. Interpretation and Improvement
The result shows that novels did help to boost novel reading at beginning but after season 4, the interest of reading novels went done. Also, people would like to borrow DVDs when they were newly released but the interest of borrowing did not last very long.
Both novel reading and DVD watching shows seasonality but they only contribute to a small amount of fluctuation. In general, people are still more interested in watching DVDs than reading novels. The random component accounts for more fluctuation of novel reading than that of DVD watching.
This visualization is far away from being perfect. First, it just combine several 2D visualization instead of taking full advantage of 3D space. Second, it would be better to show the connection and relationship between novels and DVDs as opposed to plotting them in different layers.
- (1.78 MiB) Downloaded 50 times
In this assignment, I wanted to explore the change of the popularities of different programming languages over the past years. By using the Seattle Public Library checkout records, I fetched, classified and visualized these checkout records to analyze the popularity change of a specific language, and also tried to see if there is any relationship between the trend of different programming languages.
Project online demo: https://editor.p5js.org/boningUCSB/full/EsJxpC1m
Source Code: https://github.com/boningdong/MAT259-3D-Visualization
or download the attachment:
In addition to controlling the camera angle using the mouse, you can also naviagate the camera using 'w', 'a', 's', 'd' keys to move forward, left, backward and right. Also 'spacebar' and 'ctrl' can translate the camera up and down. Arrow UP and Arrow Down can tilt the camera.
I tried to search the names of different programming languages to fetch the corresponding data from the MySQL server. Yet the result got in this way include non-programming materials, for example there are a lot of non-programming books contain the word “ruby”. So I also check if the Dewey class is between 000 and 006, which are the Dewey class numbers assigned for computer science books. Here is the query I used to fetch all of the data.
Code: Select all
Code: Select all
To fully utilize the extra dimensions in 3D space, I am trying to use the angle to represent the month information, and the height to represent the year. So overall the data will be presented following a helix pattern.
To differentiate the programming languages, I colored most of them based on their logo color, and each language will take one radius as their track so that they will not collide with each other.
The daily checkouts should be displayed using circles or torus so that the viewer can see the result from any angle.
Also, the user should be able to select the years and languages to be shown, it’s easy for the languages to be compared.
Final Design & Screenshots
Here it shows the data from 2008 to 2018 for all the programming languages. Users can select only one year to show, which makes it easier to visualize the relative popularity of these languages over one year. Seeing from the top, users can also see clearly the daily checkouts over the years, because circle diameter size implies the checkout times. Users can turn off other languages to only show the languages they want to compare. For the following picture, I only turned on ‘Python’ and ‘C/C++’, we can see the trend of these two languages over the years. Obviously, Python is getting more and more popular and vice versa for C/C++.
One problem of p5 or processing is the data is drawn on the screen by CPU, even though after drawing on the screen the rendering work is actually handled by the GPU, CPU actually takes a lot of time to create the 3D objects and calculates their locations. So for my program, the fps is pretty low because so many pieces of data need to be displayed on the screen. I tried to optimize the code but because I cannot avoid the CPU-drawing stage inside the draw() function, there is no way I could boost the speed up. I am thinking for this kinda amount of data, it’s probably good using low-level WebGL or OpenGL to render the 3D objects after it first initialized by the CPU. But this solution probably puts too much burden on the programmer, especially those who are not experienced with computer graphics.
It’s also probably a good idea to add orthographic camera mode because, without the perspective effects, users can view the data within one year and compare the relative popularities easily. I tried to achieve this feature and I have a button called perspective, but when switching from perspective mode to orthographic mode, the camera cannot work properly. So I am thinking probably using two camera systems can solve this issue.
- Boning's 3D Visualization Project Report.pdf
- (2.23 MiB) Downloaded 53 times
I was interested in the check out statues of dewey class 520-529. I wanted to compare the difference of between each sub-dewey classes to see which one of them is the most popular among readers. The data that I obtained contains total checkouts number, deweys classes and all checkout related items from 2006 to 2018.
Below is my SQL Queries for obtaining the data:
Code: Select all
SELECT YEAR(cout) as year, MONTH(cout) as month, SUBSTRING(deweyClass, 1, 3) as dewey, title, Counts FROM ( SELECT cout, bibNumber, itemType, deweyClass, title, COUNT(bibNumber) AS Counts FROM spl_2016.outraw WHERE YEAR(cout) BETWEEN 2006 AND 2019 AND deweyClass >= 520 AND deweyClass < 530 AND (deweyClass IS NOT NULL) AND deweyClass NOT LIKE ' ' AND deweyClass NOT LIKE '' GROUP BY cout, deweyClass ASC, bibNumber , title ORDER BY Counts DESC,deweyClass ASC, bibNumber , title) as tbl
The first draft is a simple point plot. The x axis is year count from 2006 to 2019, and the y axis is the month count. Each point refers to the check out of one single item. More future revisions will be applied since this graph didn't reveal the information that I wanted.
- (14.96 KiB) Downloaded 56 times
For Project 2, it is an extension of my Project 1. The goal of this project is to compare true and predicted checkouts in different dewey classes from Seattle Library Database.
- Deal with Missing data
- Rescale different dewey classes data into same range
- Predict future data
- Use checkouts from 2006 to 2010 as training dataset
- Make predictions for checkouts from 2011 to 2019 for every month
I mapped my data into a sphere with coordinates X(Year), Y(Month), Z(Counts). Based on all data points, it is possible to draw smooth curves of points for months and years in order to see how differences/closeness between true checkouts and predicted checkouts with time.
After receiving useful feedbacks from my classmates, TA and Prof.Legrady, I revised my first draft to add more labels.
Here is an example:
- Change curve lines to DNA shapes : https://www.openprocessing.org/sketch/439787
- (21.3 KiB) Downloaded 36 times
I am interested to see the data visualization of 4 entertainments(General Music, Indoor Games, Sports, Shows) from 2006 to 2019. Therefore, I used MySQL to collect data from the Seattle public library. I also separate the data set from January to December.
Here is a part of the data:
Code: Select all
SELECT YEAR(cout) AS year, MONTH(cout) AS month, SUM(IF(deweyClass >= 780 and deweyClass < 781, 1, NULL)) AS 'General music', sum(IF(deweyClass >= 793 and deweyClass < 796, 1, NULL)) AS 'Indoor game', sum(IF(deweyClass >= 796 and deweyClass < 800, 1, NULL)) AS 'Sports', sum(IF(deweyClass >= 791 and deweyClass < 792,1, NULL)) AS 'Shows' FROM spl_2016.outraw WHERE (deweyClass > 750 AND deweyClass < 900) AND YEAR(cout) BETWEEN 2006 AND 2019 GROUP BY year , month ORDER BY year , month ;
Here is the attachment of code:
After watching the movie Hereditary, I was inspired to investigate trends and patterns in checkouts related to the paranormal and the occult. Many projects have previously examined popular topics such as science and religion, but I was fascinated by this genre since it belongs to neither of these categories. Sometimes, we experience things that cannot be explained by science-- so, when do we turn to the supernatural for explanation?
The project visualizes checkout data from Dewey classes 130-139, which are topics on parapsychology and the occult. Each moving point represents a single title and each ring represents a single month. The speed and size of the point correlates with the checkout count for that month, while the volume of points on a single ring represents the total volume of checkouts in these Dewey classes for that given month. The rings are organized in chronological order, as the user should be able to scroll through time to see trends in this topic over the course of the dataset’s lifetime. They can also pause the sketch to mouse over individual points and see the title's details.
Visually, I found that many occult beliefs and traditions involved circle-shaped symbols and astrological imagery. An example of this type of symbol would be a pentacle. After some brainstorming, I was figuring out ways to take this shape and draw it into a 3D space. Since there is a lot of occult imagery related to stars and planets, I designed the data points to appear as orbiting planets, with each of the rings representing each month from February 2006 - present. I wanted the user to feel as if they are "travelling through time" as they zoom forwards and backwards inside the rings.
Here are some of the preliminary sketches I made:
Additionally, as I was developing the intial design, I realized that my design was very similar to Flying Lotus's album cover for Cosmogramma, which I was subconsciously thinking about when brainstorming ideas.
The query for this dataset was very simple. The only data I needed was the checkout date of each title, the Dewey class and the checkout counts. The results were then returned in chronological order to make it easier to loop through the data when drawing it in the sketch.
Code: Select all
SELECT deweyClass, bibNumber, title, DATE(cout), COUNT(bibNumber) FROM spl_2016.outraw WHERE deweyClass < 140 AND deweyClass >= 130 GROUP BY title, bibNumber ORDER BY DATE(cout) ASC
To view details for a single title, you can pause the sketch by holding down the spacebar and mousing over the point to reveal the popup box with the title name and the checkout count. To turn off a category in the sketch, you can click on the colored box for that category. Click on the box again to make it appear back in the sketch.
The project was done in Processing for Java. The source code for the project is uploaded on GitHub and can be downloaded locally.
Here is what the final product looks like:
Challenges and Future Improvements
One of the things I'd like to improve upon in this project would be the label placement for the years. I'm thinking of implementing new labels for each ring that, as a whole, form a sort of spiral pattern so it's easier to see the individual labels. Dealing with text sizing and quality proved to be a challenge, as the resolutions vary greatly depending on the mode and style of the text. Hopefully I can find some way around this issue so the labels are easier to read.
Additionally, I would like to change certain settings in PeasyCam to make the zooming more intuitive. I will most likely slow down the speed of the scroll as well as increase the zoom distance.
My main goal in visualizing the Seattle Public Library (SPL) data was to bring in the locational information into investigation. In my previous project I identified and joined locational attributes of 11,000 books to include the information on the branch where the books are stored. Another crucial element of proposed visualization was in generating representation at the finest level of detail. That is, the base unit of analysis should have been the most basic one: individual check-ins or check-outs. Overall, around 130,000 identifiable check-ins/outs within the course of 3 years (2012-12-31 – 2015-12-31) were found in the database for 11,000 books, followed by a simple exploratory analysis. Here my first design decision was to switch from p5.js onto standalone Processing due to the volume of the data.
Code: Select all
select * from spl_2016.inraw where (cout>'2012-12-31' and cout<'2015-12-31') and CONCAT(bibNumber,collcode,itemtype) in ('261cs9rarbk', '813canfacbk', </.../> -- insert unique identifiers from the attached sql file (11,000 lines of code). );
Because the granular analysis was based on individual check-in trajectories in time, I decided to project each individual book on the horizontal (X) axis. In fact, new variable ‘ranker’ was generated as a re-ordered version of the barcodes (see code below), where the books were sorted according to the branch, in which they were stored. This allowed visible differentiation of patterns in data, with visible groupings.
Code: Select all
# generate unique identifier as reordered version of barcodes dur5['ranker'] = dur5.barcode.rank(method='dense') print(dur5.ranker.max())
Aesthetics and Controls
Various colors were used to denote five locations with most check-ins/outs: Central branch, Northeast branch, Southwest branch, Lake City branch, and Douglass-Truth branch. Each location was assigned a key-controller. Check-in from every other location was denoted in grey to not distract from the first five.
Given enough time and modest progress in Processing coding, I could utilize bundling techniques to group the similar trajectories closer in space. Additionally, it would be interesting to add animation along the lines in the form of an electric charge running from the top to the bottom as the time progresses.
- (5.38 MiB) Downloaded 22 times
This is a 3D data visualization project finished with MySQL and P5 library. With the data provided by professor George Legrady, I was able to access some of the data from The Seattle Public Library. I chose some data set with check in and check out data during 2018 to visualize the trend of flow in different Dew class in the library. I chose to use Bezier curve to express some different meaning in dimensions of 3D space.
At first, I tried to collect all the data from 2006 to 2018 and piece them by month. But then I found the data in these years are too scattered for showing in 3D space, because I needed to connect each month with its own check in and check out and this would happened in 12(months for each year) * 12(months for each year) * 13 years. Though it is easy for p5 to handle these data, the result would be too complex for audience to see the results. I chose to use the data set in 2018 and represent each month in order. Here is MySQL code for collecting check in and check out data for each month in 2018, and I divided each dew class into each part. Here is just the sample of the code. Because I have to adjust the code every time I change the month, so it could be different every time.
Code: Select all
SELECT MONTH(cout), SUM(CASE WHEN deweyClass > 770 AND deweyClass < 771 THEN 1 ELSE 0 END) AS D000to099, SUM(CASE WHEN deweyClass > 771 AND deweyClass < 772 THEN 1 ELSE 0 END) AS D100to199, SUM(CASE WHEN deweyClass > 772 AND deweyClass < 773 THEN 1 ELSE 0 END) AS D200to299, SUM(CASE WHEN deweyClass > 773 AND deweyClass < 774 THEN 1 ELSE 0 END) AS D300to399, SUM(CASE WHEN deweyClass > 774 AND deweyClass < 775 THEN 1 ELSE 0 END) AS D400to499, SUM(CASE WHEN deweyClass > 775 AND deweyClass < 776 THEN 1 ELSE 0 END) AS D500to599, SUM(CASE WHEN deweyClass > 776 AND deweyClass < 777 THEN 1 ELSE 0 END) AS D600to699, SUM(CASE WHEN deweyClass > 777 AND deweyClass < 778 THEN 1 ELSE 0 END) AS D700to799, SUM(CASE WHEN deweyClass > 778 AND deweyClass < 779 THEN 1 ELSE 0 END) AS D800to899, SUM(CASE WHEN deweyClass > 779 AND deweyClass < 780 THEN 1 ELSE 0 END) AS D900to999 FROM spl_2016.inraw WHERE deweyClass >= 0 and year(cout) = '2018' GROUP BY MONTH(cout);
All the points represent the amount of check-in or check-out in month in 2018. The sizes of these points are controlled by the amount of check-in or check-out for each month. With different dew class, I chose different color and control point to draw curve from start point. The starting points and ending points are always the points which represents heck-out months and check-in months. Each big class of dew, such as 0-99, 100-199, 200-299 … etc. has their own control point to direct the curve.