Proj 2 - 3D Visualization

Posts: 160
Joined: Wed Sep 22, 2010 12:26 pm

Proj 2 - 3D Visualization

Post by glegrady » Mon Dec 30, 2019 5:54 pm

Proj 2 - 3D Visualization

The 3D visualization project explores the placement of data into 3D space as a way to visualize the performance of data.

1.28 p5, 3D labeling, Treemap, InfoGraph (Weidi in class coding for the p5)
1.30 Work in progress presentation - You will describe your concept to the class
2.04 Lab individual Meetings
2.06 3D Student project presentations (1/2 of class)
2.11 3D Student project presentations (2/2 of class)

The PeasyCam is the Processing library that allows for the 3D spatialization and mouse interaction:
Control P5: to add buttons if needed
Color Sampler:

Some Processing functions for 3D:
Translate, pushmatrix, popmatrix functions are introduced. Information about push, pop and translation can be found at:

We are introducing P5.js as students have asked if they can post their interactive works on the internet. P5 is an interpretation of Processing for the web. You can choose to work either in Processing or P5. We restrict assignments to these environments as the quarter is short, and this way you can learn from each other's work and from previous projects which you can find at (click on course). If you find code in previous assignments, or on the internet, or from anywhere else it is critical to give credit to where you got the code from, otherwise its plagiarism.

For instance, if you were to use a code segment from Mert Toka's 3D demo which I showed in class: viewtopic.php?f=72&t=291&start=10#p1930 where you use the code inside your code, you would add something like:

//This function was previously published by Mert Toka at viewtopic.php?f=72&t=291&start=10#p1930

For this to work there is a need to have a minimum of 4 metadata for each record as we need values for the horizontal (x), the vertical (y), the depth (z), and the brightness or color or size of the x,y,z point in the 3D space. Additionally, if you are using texts from the titles or subjects that will be a 5th column. Also if you want to place pictures (book, cd, movie pictures covers) which you can get from Amazon, Apple, and other sources, the JSON demo at the syllabus shows how this is done.

1) Clustering & Searching for Patterns – finding the performance of a specific topic, data, relationships
2) Anomaly Detection – identifying instances that do not conform to the typical data in a set
3) Association rule-mining: Looking for co-occurrences, finding combinations in a set of transactions: If (hot dogs + ketchup) then BEER
4) Predictions: Recommendations based on past events as introduced in class by GuanYuchen: viewtopic.php?f=78&t=317&start=10#p2179

The outcome of this project will be graded and will constitute the mid-term grade. If by chance it turns out that you may get a lower grade then desired, then you can upgrade the project during the course to increase the grade. So no need to worry.

What are the conditions for a good grade:
1) An interesting MySQL query
2) A working interactive visualization in 3D
3) Design based on the data: Let the metadata values determine where and how the data is to be located in 3D
4) Visual Coherence: Visualization should follow standard design rules. Consider space, the function of color, clean fonts (Arial, Helvetica, Futura, etc.) Review examples at the course website: ... ences.html If unclear, ask me.

CONTENT INNOVATION: your query question and outcomes. How original, engaging, unusual, your query, or your approach to the query may be, and how interesting the data may be. The data has to be multivariate, and granular (meaning a lot of data) so that we can see patterns forming in the data.

DESIGN: The design can build on our demos but hopefully go beyond. Areas of exploration are in how you use space, form, colors, data organization, timing, interaction, coherence, direction, etc.

COMPUTATION: The third evaluation is the computational component. First of all, the code needs to work. Special consideration will be for unusual, elegant expression, utilizing functions, algorithms, etc that you can introduce to the class.

This is a lot to cover in the short time we have. Take one step at a time!

Please check again as I may update this assignment description!
George Legrady

Posts: 3
Joined: Wed Jan 08, 2020 10:53 am

Re: Proj 2 - 3D Visualization

Post by chuanxiuyue » Thu Feb 06, 2020 2:19 pm

By using this 3D visualization, I would like to show the relation between the popularity of the novels of A Song of the ice and fire and the popularity of the DVDs of the HBO TV series adapted from these novels from 2010 to 2019. I used time series models to show that the variance in monthly checkout percentage of the novels and DVDs are comprised of seasonal variations, random variations and main trends.

Project online demo:

1 Data processing

Monthly checkouts of the novels related to A Song of the ice and fire and monthly total checkouts of all novels were extracted using mySQL.

1.1 SQL Queries

Code: Select all

    EXTRACT(YEAR_MONTH FROM cout) AS year_months,
            (LOWER(title) IN ('game of thrones' , 'game of thrones the graphic novel Volume 1',
                'game of thrones the graphic novel Volume 2',
                'game of thrones the graphic novel Volume 3',
                'game of thrones the graphic novel Volume 4')
                AND itemtype = 'acbk')
    END) AS novel1,
            (LOWER(title) IN ('clash of kings' , 'clash of kings the graphic novel Volume 1',
                'clash of kings the graphic novel Volume 2',
                'clash of kings book two of A song of ice and fire')
                AND itemtype = 'acbk')
    END) AS novel2,
            (LOWER(title) = 'storm of swords'
                AND itemtype = 'acbk')
    END) AS novel3,
            (LOWER(title) = 'feast for crows'
                AND itemtype = 'acbk')
    END) AS novel4,
            (LOWER(title) = 'dance with dragons'
                AND itemtype = 'acbk')
    END) AS novel5,
            (LOWER(title) = 'game of thrones the complete first season'
                AND itemtype = 'acdvd')
    END) AS season1,
            (LOWER(title) = 'game of thrones the complete second season'
                AND itemtype = 'acdvd')
    END) AS season2,
            (LOWER(title) = 'game of thrones the complete third season'
                AND itemtype = 'acdvd')
    END) AS season3,
            (LOWER(title) = 'game of thrones the complete fourth season'
                AND itemtype = 'acdvd')
    END) AS season4,
            (LOWER(title) = 'game of thrones the complete fifth season'
                AND itemtype = 'acdvd')
    END) AS season5,
            (LOWER(title) = 'game of thrones the complete sixth season'
                AND itemtype = 'acdvd')
    END) AS season6,
            (LOWER(title) = 'game of thrones the complete seventh season'
                AND itemtype = 'acdvd')
    END) AS season7,
            (LOWER(title) = 'game of thrones the complete eighth season'
                AND itemtype = 'acdvd')
    END) AS season8
    EXTRACT(YEAR_MONTH FROM cout) != '202002'
LIMIT 500;
1.2 Data analysis
1.2.1 Tidy the dataset.
Monthly checkouts percent of the novels related to A Song of the ice and fire were calculated. The same pre-processing procedure were done for the DVDs of the HBO TV series counterparts. I calculated the monthly percentage instead of using raw checkouts because according to project 1, my classmates also found similar trends of checkouts regarding to other items. Thus, I would like to control the total checkouts to exclude the factor that people might not borrow items too frequently these years in general. Then, the monthly percentages were input into Rstudio to develop two time series models for the novels and the DVDs separately.

1.2.2 Missing values for 02/2018 and 01/2018
Before model development, it was noted that there are missing values for January and February in 2018. Thus, I used an interpolation function to estimate the values for these two months (See details in the .rmd file).

1.2.3 Time-series model development
Based on theories related to time series, variance in some types of time series can be decomposed into three different parts. First part is a main trend, which means, in general, a decreasing or increasing or quadratic trend over years. The second part is seasonality, which means the monthly checkouts vary due to the time within a year. For example, people may borrow more novels or DVDs in winter due to Christmas. The last part is random component, which means there are some trivial and dynamic factors causing the fluctuations in monthly checkouts. For example, please see below variation decomposition figure for novels.
2. 3D Visualization

2.1 Initial Sketch
Each layer, I would like to show the data differently. The layers for the raw data will show be bar graphs for monthly checkout percent based on the data withdrawing from mySQL. (The values for January and February in 2018 have been interpolated).

As for the layers for the time series model, The thicker line shows the main trend, two bounds near the main trend shows the random part and the line spiraling along the year axis shows the seasonality.

2.2 Screenshots
Show general structure of the visualization contained by 4 layers.
Users can use "Change Color Solution" button to switch among 3 sets of color solution
Users can check different perspectives of the structure by pressing different button and the instructions are on the left-bottom side of the screen.
Users can using the checkbox to turn on/off the random dots bouncing between the bounds of the random component.
2.3 Interactivities
Other checkboxes were added to turn on/off different labels, such as year.
If mouse move to the left side of the screen (close to the buttons and checkboxes) the instructions below will show up. However, if mouse move away, the instructions will disappear. I added this feature to make the whole canvas tidier.

3. Interpretation and Improvement
The result shows that novels did help to boost novel reading at beginning but after season 4, the interest of reading novels went done. Also, people would like to borrow DVDs when they were newly released but the interest of borrowing did not last very long.
Both novel reading and DVD watching shows seasonality but they only contribute to a small amount of fluctuation. In general, people are still more interested in watching DVDs than reading novels. The random component accounts for more fluctuation of novel reading than that of DVD watching.

This visualization is far away from being perfect. First, it just combine several 2D visualization instead of taking full advantage of 3D space. Second, it would be better to show the connection and relationship between novels and DVDs as opposed to plotting them in different layers.
(1.78 MiB) Downloaded 50 times
Last edited by chuanxiuyue on Thu Mar 19, 2020 5:38 pm, edited 11 times in total.

Posts: 3
Joined: Thu Jan 09, 2020 4:46 pm

Re: Proj 2 - 3D Visualization

Post by boningdong » Wed Feb 12, 2020 5:52 pm

In this assignment, I wanted to explore the change of the popularities of different programming languages over the past years. By using the Seattle Public Library checkout records, I fetched, classified and visualized these checkout records to analyze the popularity change of a specific language, and also tried to see if there is any relationship between the trend of different programming languages.

Project Links
Project online demo:
Source Code:
or download the attachment:
3D Visualization Project Source Code by
(10.15 MiB) Downloaded 54 times

In addition to controlling the camera angle using the mouse, you can also naviagate the camera using 'w', 'a', 's', 'd' keys to move forward, left, backward and right. Also 'spacebar' and 'ctrl' can translate the camera up and down. Arrow UP and Arrow Down can tilt the camera.

I tried to search the names of different programming languages to fetch the corresponding data from the MySQL server. Yet the result got in this way include non-programming materials, for example there are a lot of non-programming books contain the word “ruby”. So I also check if the Dewey class is between 000 and 006, which are the Dewey class numbers assigned for computer science books. Here is the query I used to fetch all of the data.

Code: Select all

# Coding books
    COUNT(cout) AS checktimes,
    YEAR(cout) AS years,
    MONTH(cout) AS months,
    DAY(cout) AS days
    (itemtype LIKE '%bk')
        AND (deweyClass != ' ')
        AND (deweyClass != '')
        AND (deweyClass BETWEEN 000 AND 006)
        AND ((LOWER(title) LIKE '%python%')
        OR (LOWER(title) LIKE '%c++%')
        OR (LOWER(title) LIKE '% c %')
        OR (LOWER(title) LIKE '%swift%')
        OR (LOWER(title) LIKE '%javascript%')
        OR (LOWER(title) LIKE '%java%')
        OR (LOWER(title) LIKE '%php%')
        OR (LOWER(title) LIKE '%cpp%')
        OR (LOWER(title) LIKE '% sql %')
        OR (LOWER(title) LIKE '%kotlin%')
        OR (LOWER(title) LIKE '%ruby%'))
GROUP BY deweyClass , title , years , months, days
ORDER BY years , months, days ASC
To show the popularity of a specific programming language, I also need to classify the results and sum the checkout times of the same language together. But doing that in SQL is too hard, so I determined to process this in Javascript. The way I did that is to create a data matrix that contains the daily checkout times for a specific language. Here are the details of the code which handles book classification.

Code: Select all

// retrieve data from table and classify the data.
    for (var i = 0; i < num_rows; i++) {
        var langIdx = -1;
        var title = dataset.getString(i, tableIdx.title);
        var checkoutTimes = dataset.getNum(i, tableIdx.times);
        var year = dataset.getNum(i, tableIdx.year);
        var month = dataset.getNum(i, tableIdx.month);
        var day = dataset.getNum(i,;
        title = title.toLowerCase();
        title = ' ' + title + ' '

        //['Python', 'C/C++', 'Swift', 'Javascript', 'Java', 'PHP', 'SQL', 'Kotlin', 'Ruby'];
        if (title.includes(' python '))
            langIdx = langIdxList.python;
        else if (title.includes('javascript'))
            langIdx = langIdxList.javascript;
        else if (title.includes('java'))
            langIdx =;
        else if (title.includes(' php '))
            langIdx = langIdxList.php;
        else if (title.includes(' sql '))
            langIdx = langIdxList.sql;
        else if (title.includes(' kotlin '))
            langIdx = langIdxList.kotlin;
        else if (title.includes(' swift '))
            langIdx = langIdxList.swift
        else if (title.includes('ruby'))
            langIdx = langIdxList.ruby;
        else if (title.includes(' c++ ') || title.includes(' c '))
            langIdx = langIdxList.ccpp;

        if (langIdx == -1) {
            print("Cannot classify the book based on its title.");
            print("Title: " + title);
        datasetMatrixDays[getYearIdx(year)][getMonthIdx(month)][getDayIdx(day)][langIdx] += checkoutTimes;
Basically what it does is to check if the title includes any keywords to indicate what programming language it is about, and then save to the corresponding cell determined by the date and the language itself.

Design Concept
To fully utilize the extra dimensions in 3D space, I am trying to use the angle to represent the month information, and the height to represent the year. So overall the data will be presented following a helix pattern.
To differentiate the programming languages, I colored most of them based on their logo color, and each language will take one radius as their track so that they will not collide with each other.

The daily checkouts should be displayed using circles or torus so that the viewer can see the result from any angle.

Also, the user should be able to select the years and languages to be shown, it’s easy for the languages to be compared.

Final Design & Screenshots
Here it shows the data from 2008 to 2018 for all the programming languages.
Screenshots 5.png
Users can select only one year to show, which makes it easier to visualize the relative popularity of these languages over one year.
Screenshots 1.png
Seeing from the top, users can also see clearly the daily checkouts over the years, because circle diameter size implies the checkout times.
Screenshots 2.png
Users can turn off other languages to only show the languages they want to compare. For the following picture, I only turned on ‘Python’ and ‘C/C++’, we can see the trend of these two languages over the years. Obviously, Python is getting more and more popular and vice versa for C/C++.
Screenshots 3.png

Future Improvement
One problem of p5 or processing is the data is drawn on the screen by CPU, even though after drawing on the screen the rendering work is actually handled by the GPU, CPU actually takes a lot of time to create the 3D objects and calculates their locations. So for my program, the fps is pretty low because so many pieces of data need to be displayed on the screen. I tried to optimize the code but because I cannot avoid the CPU-drawing stage inside the draw() function, there is no way I could boost the speed up. I am thinking for this kinda amount of data, it’s probably good using low-level WebGL or OpenGL to render the 3D objects after it first initialized by the CPU. But this solution probably puts too much burden on the programmer, especially those who are not experienced with computer graphics.

It’s also probably a good idea to add orthographic camera mode because, without the perspective effects, users can view the data within one year and compare the relative popularities easily. I tried to achieve this feature and I have a button called perspective, but when switching from perspective mode to orthographic mode, the camera cannot work properly. So I am thinking probably using two camera systems can solve this issue.
Boning's 3D Visualization Project Report.pdf
(2.23 MiB) Downloaded 53 times
Screenshots 4.png
Last edited by boningdong on Thu Feb 13, 2020 3:56 pm, edited 1 time in total.

Posts: 3
Joined: Wed Jan 08, 2020 10:50 am

Re: Proj 2 - 3D Visualization

Post by yuleiyuan » Thu Feb 13, 2020 12:26 pm

Concept Description

I was interested in the check out statues of dewey class 520-529. I wanted to compare the difference of between each sub-dewey classes to see which one of them is the most popular among readers. The data that I obtained contains total checkouts number, deweys classes and all checkout related items from 2006 to 2018.

SQL Queries

Below is my SQL Queries for obtaining the data:

Code: Select all

SELECT YEAR(cout) as year, 
MONTH(cout) as month, 
SUBSTRING(deweyClass, 1, 3) as dewey, 
Counts  FROM (
    COUNT(bibNumber) AS Counts
    YEAR(cout) BETWEEN 2006 AND 2019
        AND  deweyClass >= 520 AND deweyClass < 530 AND (deweyClass IS NOT NULL) AND deweyClass NOT LIKE ' ' 
        AND deweyClass NOT LIKE ''
GROUP BY cout, deweyClass ASC, bibNumber , title
ORDER BY Counts DESC,deweyClass ASC, bibNumber , title) as tbl

First Draft:

The first draft is a simple point plot. The x axis is year count from 2006 to 2019, and the y axis is the month count. Each point refers to the check out of one single item. More future revisions will be applied since this graph didn't reveal the information that I wanted.
Screen Shot 2020-02-13 at 11.58.52 AM.jpg
(14.96 KiB) Downloaded 56 times

Posts: 3
Joined: Fri Jan 10, 2020 10:22 am

Re: Proj 2 - 3D Visualization

Post by guanyuchen » Thu Feb 13, 2020 1:24 pm

For Project 2, it is an extension of my Project 1. The goal of this project is to compare true and predicted checkouts in different dewey classes from Seattle Library Database.

  • Deal with Missing data
  • Rescale different dewey classes data into same range
  • Predict future data
  • Use checkouts from 2006 to 2010 as training dataset
  • Make predictions for checkouts from 2011 to 2019 for every month
I mapped my data into a sphere with coordinates X(Year), Y(Month), Z(Counts). Based on all data points, it is possible to draw smooth curves of points for months and years in order to see how differences/closeness between true checkouts and predicted checkouts with time.

After receiving useful feedbacks from my classmates, TA and Prof.Legrady, I revised my first draft to add more labels.

Here is an example:

Further Studies:
(21.3 KiB) Downloaded 36 times
Last edited by guanyuchen on Mon Mar 16, 2020 9:48 pm, edited 1 time in total.

Posts: 3
Joined: Wed Jan 08, 2020 11:00 am

Re: Proj 2 - 3D Visualization

Post by jingxuan » Thu Feb 13, 2020 2:51 pm

Concept description:
I am interested to see the data visualization of 4 entertainments(General Music, Indoor Games, Sports, Shows) from 2006 to 2019. Therefore, I used MySQL to collect data from the Seattle public library. I also separate the data set from January to December.
Here is a part of the data:
Screen Shot 2020-02-13 at 14.17.48.png

MySQL Queries:

Code: Select all

    YEAR(cout) AS year,
    MONTH(cout) AS month,
    SUM(IF(deweyClass >= 780 and deweyClass < 781, 1, NULL)) AS 'General music',
    sum(IF(deweyClass >= 793 and deweyClass < 796, 1, NULL)) AS 'Indoor game',
    sum(IF(deweyClass >= 796 and deweyClass < 800, 1, NULL)) AS 'Sports',
    sum(IF(deweyClass >= 791 and deweyClass < 792,1, NULL)) AS 'Shows'
    (deweyClass > 750 AND deweyClass < 900)
        AND YEAR(cout) BETWEEN 2006 AND 2019
GROUP BY year , month
ORDER BY year , month
Screenshots and analysis:
Screen Shot 2020-02-13 at 14.33.39.png
When the user opens the visualization, the above image will show. It is obvious to see that I split each circle into 12 pieces and each piece represents a month from Jan to Dec. Meanwhile, the data for each month is shown on the corresponding line. Additionally, each color of the graph indicates an entertainment which that the user can find on the top right corner. By using the translucent effect, the user can have an initial understanding of the data set.
Screen Shot 2020-02-08 at 21.13.55.png
This image is when the user uses mouse to see the graph from a different direction. It will be more clear to see each layer of the graph and the data change from different years.
Screen Shot 2020-02-08 at 21.14.28.png
The color of the left-hand side buttons will change from white to gray when the user puts the mouse on that button. Clicking the button will limit the data to that corresponding year. Because there are too many layers of graphs, this function can let the user see the data more clearly.

Here is the attachment of code:
(9.96 KiB) Downloaded 19 times

Posts: 2
Joined: Wed Jan 08, 2020 10:48 am

The Embedded Spur

Post by jbevan » Thu Feb 13, 2020 4:01 pm

Please Check out the Documentation. C:

This project is about the evolution of discourses over the years - with some machine intelligence used for data imputation and semantic definition.
Bevan 3D Project Documentation.pdf
(202.9 KiB) Downloaded 44 times
(117.51 KiB) Downloaded 17 times

Posts: 3
Joined: Wed Jan 08, 2020 11:02 am

Re: Proj 2 - 3D Visualization

Post by erinpwoo » Thu Feb 13, 2020 5:25 pm

Project Idea

After watching the movie Hereditary, I was inspired to investigate trends and patterns in checkouts related to the paranormal and the occult. Many projects have previously examined popular topics such as science and religion, but I was fascinated by this genre since it belongs to neither of these categories. Sometimes, we experience things that cannot be explained by science-- so, when do we turn to the supernatural for explanation?

The project visualizes checkout data from Dewey classes 130-139, which are topics on parapsychology and the occult. Each moving point represents a single title and each ring represents a single month. The speed and size of the point correlates with the checkout count for that month, while the volume of points on a single ring represents the total volume of checkouts in these Dewey classes for that given month. The rings are organized in chronological order, as the user should be able to scroll through time to see trends in this topic over the course of the dataset’s lifetime. They can also pause the sketch to mouse over individual points and see the title's details.


Visually, I found that many occult beliefs and traditions involved circle-shaped symbols and astrological imagery. An example of this type of symbol would be a pentacle. After some brainstorming, I was figuring out ways to take this shape and draw it into a 3D space. Since there is a lot of occult imagery related to stars and planets, I designed the data points to appear as orbiting planets, with each of the rings representing each month from February 2006 - present. I wanted the user to feel as if they are "travelling through time" as they zoom forwards and backwards inside the rings.

Here are some of the preliminary sketches I made:
Screen Shot 2020-02-13 at 4.34.01 PM.png
Additionally, as I was developing the intial design, I realized that my design was very similar to Flying Lotus's album cover for Cosmogramma, which I was subconsciously thinking about when brainstorming ideas.


The query for this dataset was very simple. The only data I needed was the checkout date of each title, the Dewey class and the checkout counts. The results were then returned in chronological order to make it easier to loop through the data when drawing it in the sketch.

Code: Select all

	deweyClass, bibNumber, title, DATE(cout), COUNT(bibNumber)
	deweyClass < 140 AND deweyClass >= 130
 GROUP BY title, bibNumber
How to Use

To view details for a single title, you can pause the sketch by holding down the spacebar and mousing over the point to reveal the popup box with the title name and the checkout count. To turn off a category in the sketch, you can click on the colored box for that category. Click on the box again to make it appear back in the sketch.

Final Result

The project was done in Processing for Java. The source code for the project is uploaded on GitHub and can be downloaded locally.
  • GitHub:
  • Download:
(749.71 KiB) Downloaded 17 times

Here is what the final product looks like:
Screen Shot 2020-02-13 at 5.05.09 PM.png
Screen Shot 2020-02-13 at 5.05.50 PM.png
Screen Shot 2020-02-13 at 5.06.13 PM.png
Screen Shot 2020-02-13 at 5.11.15 PM.png
Screen Shot 2020-02-13 at 5.14.02 PM.png
Challenges and Future Improvements

One of the things I'd like to improve upon in this project would be the label placement for the years. I'm thinking of implementing new labels for each ring that, as a whole, form a sort of spiral pattern so it's easier to see the individual labels. Dealing with text sizing and quality proved to be a challenge, as the resolutions vary greatly depending on the mode and style of the text. Hopefully I can find some way around this issue so the labels are easier to read.

Additionally, I would like to change certain settings in PeasyCam to make the zooming more intuitive. I will most likely slow down the speed of the scroll as well as increase the zoom distance.

Posts: 3
Joined: Wed Jan 08, 2020 10:54 am

Re: Proj 2 - 3D Visualization

Post by evgenynoi » Sat Feb 15, 2020 3:19 pm

My main goal in visualizing the Seattle Public Library (SPL) data was to bring in the locational information into investigation. In my previous project I identified and joined locational attributes of 11,000 books to include the information on the branch where the books are stored. Another crucial element of proposed visualization was in generating representation at the finest level of detail. That is, the base unit of analysis should have been the most basic one: individual check-ins or check-outs. Overall, around 130,000 identifiable check-ins/outs within the course of 3 years (2012-12-31 – 2015-12-31) were found in the database for 11,000 books, followed by a simple exploratory analysis. Here my first design decision was to switch from p5.js onto standalone Processing due to the volume of the data.

Code: Select all

select *
from spl_2016.inraw
where (cout>'2012-12-31' and cout<'2015-12-31') and
	CONCAT(bibNumber,collcode,itemtype) in ('261cs9rarbk',
</.../> -- insert unique identifiers from the attached sql file (11,000 lines of code). 
Data Wrangling and Visual Variables
Because the granular analysis was based on individual check-in trajectories in time, I decided to project each individual book on the horizontal (X) axis. In fact, new variable ‘ranker’ was generated as a re-ordered version of the barcodes (see code below), where the books were sorted according to the branch, in which they were stored. This allowed visible differentiation of patterns in data, with visible groupings.

Code: Select all

# generate unique identifier as reordered version of barcodes
dur5['ranker'] = dur5.barcode.rank(method='dense')
To identify the duration of the book loan, I calculated the difference between the check-in and check-out dates. The loan on a book is a continuous process, and as such, it is traditionally represented as a line. To locate the line on the axis in relation to other books (vertical (Y) axis), new scale of days from min(check-out) to max(check-in) was generated. While the mean duration of a book is 16 days, there are 48,000 transactions with an indicated loan period above mean. After generating several visualizations and in order to minimize cluttering on the canvas, the filtering duration threshold was set to 75.
Finally, I decided to add time of day as a 3rd dimension of the visualization, with a minimum at the opening hour of the library and maximum at the closing time. In doing so, I decided to add the curvature and convert simple straight-line representations into curveVerteces (as per Processing terminology), where the degree of curvature depends on the middle point which was warped on the y-axis (see code in project folder for further details). In the last step, I added points at the start and end of a book loan to better differentiate the duration.
The whole creative process was more inductive in its manner. I was merely letting the inputs manifest their structure in the visual forms of 3d cube, which provided a more easily navigable space for data visualization and exploration. In the end, I tweaked a few things here and there, to accentuate certain patterns in the data, but this was merely an exercise in developing my programming skills in data visualization.

Aesthetics and Controls
Various colors were used to denote five locations with most check-ins/outs: Central branch, Northeast branch, Southwest branch, Lake City branch, and Douglass-Truth branch. Each location was assigned a key-controller. Check-in from every other location was denoted in grey to not distract from the first five.

Future Developments
Given enough time and modest progress in Processing coding, I could utilize bundling techniques to group the similar trajectories closer in space. Additionally, it would be interesting to add animation along the lines in the form of an electric charge running from the top to the bottom as the time progresses.
(5.38 MiB) Downloaded 22 times

Posts: 4
Joined: Wed Jan 08, 2020 10:55 am

Re: Proj 2 - 3D Visualization

Post by ziyanlin » Thu Feb 20, 2020 3:32 pm

Project Description

This is a 3D data visualization project finished with MySQL and P5 library. With the data provided by professor George Legrady, I was able to access some of the data from The Seattle Public Library. I chose some data set with check in and check out data during 2018 to visualize the trend of flow in different Dew class in the library. I chose to use Bezier curve to express some different meaning in dimensions of 3D space.


At first, I tried to collect all the data from 2006 to 2018 and piece them by month. But then I found the data in these years are too scattered for showing in 3D space, because I needed to connect each month with its own check in and check out and this would happened in 12(months for each year) * 12(months for each year) * 13 years. Though it is easy for p5 to handle these data, the result would be too complex for audience to see the results. I chose to use the data set in 2018 and represent each month in order. Here is MySQL code for collecting check in and check out data for each month in 2018, and I divided each dew class into each part. Here is just the sample of the code. Because I have to adjust the code every time I change the month, so it could be different every time.

Code: Select all

 WHEN deweyClass > 770 AND deweyClass < 771 THEN 1
 END) AS D000to099,
 WHEN deweyClass > 771 AND deweyClass < 772 THEN 1
 END) AS D100to199,
 WHEN deweyClass > 772 AND deweyClass < 773 THEN 1
 END) AS D200to299,
 WHEN deweyClass > 773 AND deweyClass < 774 THEN 1
 END) AS D300to399,
 WHEN deweyClass > 774 AND deweyClass < 775 THEN 1
 END) AS D400to499,
 WHEN deweyClass > 775 AND deweyClass < 776 THEN 1
 END) AS D500to599,
 WHEN deweyClass > 776 AND deweyClass < 777 THEN 1
 END) AS D600to699,
 WHEN deweyClass > 777 AND deweyClass < 778 THEN 1
 END) AS D700to799,
 WHEN deweyClass > 778 AND deweyClass < 779 THEN 1
 END) AS D800to899,
 WHEN deweyClass > 779 AND deweyClass < 780 THEN 1
 END) AS D900to999
    deweyClass >= 0 and 
    year(cout) = '2018'
Visualization Design

All the points represent the amount of check-in or check-out in month in 2018. The sizes of these points are controlled by the amount of check-in or check-out for each month. With different dew class, I chose different color and control point to draw curve from start point. The starting points and ending points are always the points which represents heck-out months and check-in months. Each big class of dew, such as 0-99, 100-199, 200-299 … etc. has their own control point to direct the curve.

Post Reply