3D Visualization

glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

3D Visualization

Post by glegrady » Wed Jan 04, 2017 6:27 pm

In this assignment, we are exploring interactive 3D volumetric visualization.

Use the full capabilities of 3D where each x,y,z location represents data. You may map data directly into 3D space based on the 4 datasets, or explore clustering algorithms as a means of organizing the data in 3D space.

Feb 16: Work-in-Progress
Feb 23: Project is Due

Query
Your query will need to have minimum 4 columns for x,y,z location and then to color or scale the pixel/voxel. Your MySQL query should be more complex then previous ones and cross the full spectrum of the database. As usual, you are expected to come up with an interesting question that will give interesting results! Your query should use MySQL syntax that goes beyond those presented in the demos, and the query should go for granular detail, so mapping main Dewey categories will not be sufficient, nor fulfill the expectation of an interesting query. "Change over Time", correlating data, or spatially distributing data based on algorithms can be interesting to explore. The important thing is that the x,y,z location of where the data is placed should have meaning.

--------------
Libraries to use:
3D spatial navigation and interactivity is introduced using the PeasyCam library: http://mrfeinberg.com/peasycam/

ControlP5 is a GUI and controller library that includes functions like Sliders, Buttons, Toggles, Knobs, Textfields, RadioButtons, etc. http://www.sojamo.de/libraries/controlP5/

OpenGL (Open Graphics Library), a cross-platform graphics interface is used in this demo. Translate, pushmatrix, popmatrix functions are introduced. Information about push, pop and translation can be found at:
https://www.processing.org/tutorials/transform2d/
http://processing.org/tutorials/p3d

--------------
Now that we have acquired all the necessary skills, the

EVALUATION CRITERIA will be the following:

Innovation in content : your query question and outcomes. How original, engaging, unusual, your query, or your approach to the query may be, and how interesting the data may be. The data has to be multivariate, and granular (meaning a lot of data) so that we can see patterns forming in the data.

Innovation in design/form : The design needs to go beyond our demos. Areas of exploration are in how you use space, form, colors, data organization, timing, interaction, coherence, elegance, etc.

Edward Tufte’s “Visual Display of Quantitative Information” https://www.amazon.com/Visual-Display-Q ... 1930824130 is a classic reference

https://www.edwardtufte.com/tufte/

Computation : The third evaluation is the computational component. First of all, the code needs to work. Special consideration will be for unusual, elegant expression, utilizing functions, algorithms, etc that you can introduce to the class.
George Legrady
legrady@mat.ucsb.edu

wolfe
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 3D Visualization

Post by wolfe » Wed Feb 15, 2017 9:11 pm

I wanted to look at mapping fiction titles. I ended up using the same query (fiction that was only checked out once) but I did further analysis on the items titles. I analyzed them using tensorflow's implementation of word2vec (https://www.tensorflow.org/tutorials/word2vec). Word2Vec converts words into multidimensional vectors based on their relationship with other words. I built a skip-gram neural network, trained it using gradient descent, and then brought down the vectors to 3 dimensions using t-SNE (t-Distributed Stochastic Neighbor Embedding).

I then created geometry for each title based on the words as vertices. For longer titles I just mapped the first 4 words. You can toggle off the geometry to view only the word map. I kept the same keyboard functionality and color scheme from the previous assignment.

I used the demo'd text highlighting and modified the code so it was bill-boarded, staggered, and drawn on top of the geometry.

There is still a lot of room to explore the best way to display this data and interact with it. Also I would like to do this with all of the fiction titles, but the query keeps timing out.
Screenshot from 2017-02-15 20-42-10.png
Screenshot from 2017-02-15 20-41-34.png
Screenshot from 2017-02-15 20-39-46.png
Screenshot from 2017-02-15 20-10-23.png
Last Assignment's query results:
output.csv
(1.72 MiB) Downloaded 227 times
t-SNE output from Word2Vec analysis
tsne-justTitle-onlyOnePerBook.csv
(677.23 KiB) Downloaded 235 times
-----------------------------------------------------------------------------------------------------------------------------------
UPDATE
-----------------------------------------------------------------------------------------------------------------------------------
I added a controlp5 interface in the corner to easily be able to choose filters and viewing style. This also doubles as a legend.

I added a title vector view where each title is the average of all the word vectors in the title.

I added the ability to select a word or title using the arrow keys. When the word/title is selected, the nearby words/titles are also highlighted. This allows the viewer to see what words/titles are clustered together. When in title view, the selected title's path also appears. When in word view, all the titles that use the selected word's paths are drawn. The titles are also written to the left.
-------------------------------------------------------------------------------------------------------------------------------------
The project can be viewed where each title is a vector.
Screenshot from 2017-02-22 15-39-36.png
A title can be selected and the title path is drawn above it.
Screenshot from 2017-02-22 15-40-05.png
Screenshot from 2017-02-22 15-40-33.png
The titles can be filtered by category.
Screenshot from 2017-02-22 15-44-30.png
-------------------------------------------------------------------------------------------------------------------------------------
The project can be viewed where each word is a point.
Screenshot from 2017-02-22 15-40-57.png
When a word is selected all the title paths are drawn that connect with that word.
Screenshot from 2017-02-22 15-41-31.png
Screenshot from 2017-02-22 15-41-41.png
-------------------------------------------------------------------------------------------------------------------------------------
The project can be viewed where each title is a polygon.
Screenshot from 2017-02-22 15-42-13.png
When a word is selected all the titles paths are drawn that connect with that word brighter.
Screenshot from 2017-02-22 15-42-25.png
Here is the same word selected without the polygons.
Screenshot from 2017-02-22 15-42-44.png
-------------------------------------------------------------------------------------------------------------------------------------
Source code:
HannahWolfe3DAssignment.zip
(12.61 MiB) Downloaded 157 times
Last edited by wolfe on Thu Feb 23, 2017 9:50 am, edited 2 times in total.

sara.lafia
Posts: 5
Joined: Wed Jan 11, 2017 10:41 am

Autodidact Diaries: 3D Visualization

Post by sara.lafia » Thu Feb 16, 2017 8:11 am

Concept
This is a visualization of all reference books checked out from the Seattle Public Library since 2005. Library patrons often borrow reference books to learn new skills, so these checkouts serve as a suitable proxy for better understanding what people are teaching themselves, when they are learning, and for how long they continue. While reference books have free-text subject descriptions, there is a need to topically model books in a systematic way in order to draw such conclusions. Regions, or bands in this case, of topics emerge from the book subjects, which correspond loosely to the Dewey Decimal classifications.

Dimensions
The horizontal axis (x) is time (with temporal resolution of a day). The vertical axis (y) is dewey decimal classification. The reference books are organized in a theme series (z), where each slice is one topic derived from LDA analysis of the text subject descriptions and resultant 37 topic groups. The topic legend is seen on the left panel and summarizes the top three terms for each topic. The width of each rectangle in the visualization also indicates the duration of time that the reference book was borrowed.

Query
I expanded my original query significantly to include much more granular thematic and temporal information. The query returns 7,638 records in 5.559 seconds.

Code: Select all

SELECT 
    deweyClass AS Dewey,
    title AS Title,
    subject AS Subject,
    DATE(checkOut) AS CheckoutDate,
    COUNT(checkOut) AS CheckoutCount,
    TIMESTAMPDIFF(DAY, checkOut, checkIn) AS DaysCheckedOut
FROM
    spl_2016.deweyClass,
    spl_2016.transactions,
    spl_2016.title,
    spl_2016.itemType,
    spl_2016.itemToBib,
    spl_2016.subject
WHERE
	(TIMESTAMPDIFF(DAY, checkOut, checkIn) != '')
    AND deweyClass != "" 
    AND subject != ""
    AND spl_2016.deweyClass.bibNumber = spl_2016.transactions.bibNumber 
    AND spl_2016.deweyClass.bibNumber = spl_2016.title.bibNumber 
    AND spl_2016.deweyClass.bibNumber = spl_2016.subject.bibNumber 
    AND spl_2016.deweyClass.bibNumber = spl_2016.itemToBib.bibNumber 
    AND spl_2016.itemToBib.itemNumber = spl_2016.itemType.itemNumber
    AND (itemType = 'arbk') 
    AND YEAR(checkOut) >= '2005' 
    GROUP BY title 
    ORDER BY CheckoutDate
Method
The LDA algorithm was applied to reference books using the MALLET tool (http://mallet.cs.umass.edu/topics.php). Themes are generated from reference book subject theme, based on word co-occurrence. The purpose is to discover the latent themes hidden in the heterogenous free-text descriptions and annotate each document with a best fit topic. The annotations are used to organize and summarize the checkout information. Stopwords (the, and, etc.) have been removed. The threshold for tagging a document is 0.05 (default). The number of iterations is 200 (default). Heuristic for number of topics (k=7527) is 10-40 topics (fit the model w/iterations). The top ten words for each generated topic are used to generate a theme (subjective). This is done by cross-referencing best-fit books for each theme. Each reference book is assigned a theme based on its probability distribution best-fit.

Update
This project was changed substantially. New more granular data were obtained and algorithms (LDA) were applied to the subject data. Interactivity, such as mouse-over for book titles, and on-demand filtering, improve the interface usability. Future versions of this project would benefit from additional filtering (by topic, ) and by rendering the titles in a separate display box (as they are difficult to view when boxes overlap).
Attachments
export.png
screenshot 1: "How to Start a Business in WA"
export1.png
screenshot 2: "Teaching English Overseas Job Guide"
export2.png
screenshot 3: "Dictionary of Musical Terms and Concepts"
SaraLafia_3DVer2.zip
updated project data, code (version 2)
(733.39 KiB) Downloaded 147 times
Last edited by sara.lafia on Sat Feb 25, 2017 9:08 am, edited 4 times in total.

brooksjaredc
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 3D Visualization

Post by brooksjaredc » Wed Feb 22, 2017 6:46 pm

High School English Class Books:

I wanted to find some data that were related by some topic, but also had books that were checked out often. I decided high school english class books would be a good choice. I looked up the top 8 most popular books for high school english and looked at their checkout frequency. I also wanted to see if the same old copies are being used over and over, or if the library was buying new copies.

I looked up the checkout and checkin dates for these 8 books, along with the barcode for each checkout with the following query:

Code: Select all

SELECT title,
year(cout),
week(cout),
year(cin),
week(cin),
barcode
FROM spl_2016.inraw
WHERE itemType like "%bk"
AND (title = "great Gatsby"
OR title = "lord of the flies"
OR title = "Romeo and Juliet"
OR title = "to kill a mockingbird"
OR title = "of mice and men"
OR title = "scarlet letter"
OR title = "catcher in the rye"
OR title = "macbeth")
AND year(cout) > 2004
AND barcode > 10000000000
I drew circular arcs from checkout to checkin along the xaxis, with different colors for each title, and placed them on the yaxis according to their barcode. I've also added some animation that shows the dates as a wave, so the wave lights up 2005 data, then 2006, and so on. Along with that, you can toggle music so that the wave plays different notes for each title and the note are modulated by the average barcode so that low barcode have soft, pure notes, and the higher barcodes sound more distorted.
Screen Shot 2017-02-22 at 6.30.53 PM.png
On the flip-side of this visualization, I've plotted a histogram of the popularity of each title by month, and this also can do the wave animation. Both height and opacity depend on checkout popularity.
Screen Shot 2017-02-22 at 6.38.02 PM.png
There is interaction functionality to quickly flip between the visualization top and bottom by pressing 'f'. The data for this side come from the following query:

Code: Select all

SELECT DATE_FORMAT(cout,'%Y/%m') AS yearmonth,
sum(case when title = "great Gatsby" then 1 else 0 end) as greatgatsby,
sum(case when title = "lord of the flies" then 1 else 0 end) as lordoftheflies,
sum(case when title = "Romeo and Juliet" then 1 else 0 end) as romeoandjuliet,
sum(case when title = "to kill a mockingbird" then 1 else 0 end) as tokillamockingbird,
sum(case when title = "of mice and men" then 1 else 0 end) as ofmiceandmen,
sum(case when title = "scarlet letter" then 1 else 0 end) as scarletletter,
sum(case when title = "catcher in the rye" then 1 else 0 end) as catcherintherye,
sum(case when title = "macbeth" then 1 else 0 end) as macbeth
FROM spl_2016.inraw
WHERE itemType like "%bk"
AND year(cout) > 2004
AND barcode > 10000000000
GROUP BY yearmonth
ORDER BY yearmonth
Lessons Learned:
Mainly I learned a lot about the subtleties of plotting in 3D, too many to go into here. But as for the data, it is interesting to see that most of the barcodes existed since 2005, but almost half have been created since then. The rate at which the library gets new copies of the books can be seen from the top-side visualization.
Still, though, some books have old copies that still get used. And some books go in and out of popularity (Lord of the Flies).
project3a.zip
(1.24 MiB) Downloaded 208 times

griessbaum
Posts: 4
Joined: Wed Jan 11, 2017 10:40 am

Re: 3D Visualization

Post by griessbaum » Wed Feb 22, 2017 8:51 pm

Perfect title (3d)

I am interested in finding good book titles in respect to the dewey category.
I tried to answer this question by first finding the two most popular first words in a book title for each dewey category. I defined the popularity of the first word as the number of checkouts of books that start with that word. For each of these two words, I searched for the two most popular words that follow these first words. I iterated accordingly for several iterations.

It appeared to be easier to (automatically) re-execute the a query with changing parameters than writing a potentially complicated SQL query that would get all the values at once.
The formatable query to retrieve the first two words:

Code: Select all

SELECT     
    LEFT(deweyClass, 1) as dewey,
    '' as parent,
    count(id) as checkouts,    
    LOWER(SUBSTRING_INDEX(title, ' ', 1)) as nextWord
FROM
    spl_2016.inraw
WHERE
    LEFT(deweyClass, 1) = "{dewey}"
AND
    cout > "{startDate}"
GROUP BY 
    nextWord
ORDER BY 
    checkouts DESC
LIMIT 
    2;
And the formatable query to retrieve each subsequent words:

Code: Select all

SELECT     
    LEFT(deweyClass, 1) as dewey,
    LOWER('{word}') as parent,
    COUNT(id) as checkouts,    
    SUBSTRING_INDEX(SUBSTRING_INDEX(LOWER(title), '{word} ', -1),' ', 1) as nextWord
FROM
	spl_2016.inraw
WHERE
    LEFT(deweyClass, 1) = "{dewey}"
AND
    cout > "{startDate}"
AND 
	title LIKE '%{word} %' 
GROUP BY 
	nextWord
ORDER BY 
	checkouts DESC
LIMIT 
	2;
I then represented each word as a node in a 3D space. Each node is repulsed by any other node. The repulsing force hereby is proportionally to the inverse of the square of the distance. At the same time, nodes are connected through arcs to their predecessor and successor word-nodes. The arcs impose attracting forces on the nodes they are connecting. The nodes visualize their directionality through a small dot traveling alongside the arc from the predecessor to the sucessor word-node.
Since the nodes experience friction, they eventually will end up in a stable position.
snapshot_pydap4.png
snapshot
Attachments
HW4.zip
(11.61 KiB) Downloaded 186 times

christopherchen0
Posts: 5
Joined: Wed Jan 11, 2017 10:44 am

Re: 3D Visualization

Post by christopherchen0 » Thu Feb 23, 2017 3:36 am

I was interested in patterns of how checkout times and dates reflected on checkout durations. For this representative data set, I focused my ideas on three main search terms: cyber, virtual, and digital as I believed this would help to create the overall design of my visualization.
digital.png
main view
My query is as follows:

Code: Select all

SELECT title, itemNumber, DATE(cout), TIME(cout), DATE(cin), TIME(cin)
FROM inraw
WHERE year(cout) > 2004 AND title LIKE "%cyber%" OR title LIKE "%digital%" OR title LIKE "%virtual%"
GROUP BY cout, itemNumber
ORDER BY cout
For the design, I was playing around with a number of different designs and ways to represent the data. I quickly stumbled upon an interesting design using the box() method which produced results similar to a city skyline. Around this time I focused my search on those three terms as they fit the theme of the visualization.
digital3.png
top down view
I arranged the data of checkout on the Z-axis, moving further away. Along the X-axis, I had the time of the item's checkout. Finally, the Y-axis represented the checkout duration. The width and depth of the boxes was determined by their item number, which added a bit of variation.
digital2.png
side view
Each bar's color was determined by their search term. Items with "cyber" are drawn in red, "virtual" in green, and "digital" in blue. Items with multiple words are colored according to additive color rules: red and green make yellow, blue and red make magenta, and green and blue make cyan. Other functionalities like filtering out the different terms was included: you can use your keys 1, 2, 3, respectively to interact and hide them.
digital5.png
digital/blue filtered out
Attachments
digitalskyline.zip
(1.45 MiB) Downloaded 83 times

ariellalgilmore
Posts: 7
Joined: Wed Jan 11, 2017 10:40 am

Re: 3D Visualization

Post by ariellalgilmore » Fri Feb 24, 2017 6:10 pm

I was interested in knowing how the decade from 2006 to 2016 effected people's interests in the planets in our solar system. I looked at the average duration of checkout for each item that contained a planet's name in the title and the amount of times items were checked out that contained a planet's name. I also included Pluto as being one of the planets because I thought it might give interesting data points compared to the rest of the planets.

Query Code:

1st Query: Amount of Time's items were checked out with a planet's name in the title.

Code: Select all

SELECT		
year(cout),		
COUNT(CASE WHEN title LIKE '%Mercury%' THEN 1 END) as		
Mercury,
COUNT(CASE WHEN title LIKE '%Venus%' THEN 1 END) as		
Venus,
COUNT(CASE WHEN title LIKE '%Earth%' THEN 1 END) as		
Earth,
COUNT(CASE WHEN title LIKE '%Mars%' THEN 1 END) as		
Mars,
COUNT(CASE WHEN title LIKE '%Jupiter%' THEN 1 END) as		
Jupiter,
COUNT(CASE WHEN title LIKE '%Saturn%' THEN 1 END) as		
Saturn,
COUNT(CASE WHEN title LIKE '%Uranus%' THEN 1 END) as		
Uranus,
COUNT(CASE WHEN title LIKE '%Neptune%' THEN 1 END) as		
Neptune,
COUNT(CASE WHEN title LIKE '%Pluto%' THEN 1 END) as		
Pluto
FROM spl_2016.inraw		
WHERE year(cout) > 2005 and year(cout) < 2017 		
GROUP BY year(cout)		
ORDER BY year(cout);
2nd Query:

Code: Select all

SELECT		
year(cout),		
AVG(CASE WHEN title LIKE '%Mercury%' THEN datediff(cin,cout) END) as		
Mercury,
AVG(CASE WHEN title LIKE '%Venus%' THEN datediff(cin,cout) END) as		
Venus,
AVG(CASE WHEN title LIKE '%Earth%' THEN datediff(cin,cout) END) as		
Earth,
AVG(CASE WHEN title LIKE '%Mars%' THEN datediff(cin,cout) END) as		
Mars,
AVG(CASE WHEN title LIKE '%Jupiter%' THEN datediff(cin,cout) END) as		
Jupiter,
AVG(CASE WHEN title LIKE '%Saturn%' THEN datediff(cin,cout) END) as		
Saturn,
AVG(CASE WHEN title LIKE '%Uranus%' THEN datediff(cin,cout) END) as		
Uranus,
AVG(CASE WHEN title LIplaKE '%Neptune%' THEN datediff(cin,cout) END) as		
Neptune,
AVG(CASE WHEN title LIKE '%Pluto%' THEN datediff(cin,cout) END) as		
Pluto
FROM spl_2016.inraw		
WHERE year(cout) > 2005 and year(cout) < 2017 		
GROUP BY year(cout)		
ORDER BY year(cout);
Third Query: similar to the first, but including all the months

The design I decided to create a space metaphor, so the center is the first planet, Mercury, and each ring is the following planet all the way to Pluto being the last outer ring. The spheres rotating around each ring represent each year, so each ring as 10 spheres on it. The spheres are different sizes determined by the average duration each item was checked out and the rotation is determined by the amount of times they were checked out. The longer duration the bigger the radius and more times it is checked out the faster it will rotate. The years are being represented in sequential order by the colors: grey, pink, green, blue, purple, dark grey, red, dark green, dark blue, and black. I also added shapes in the center one for each planet. I am using a PShape were vectors are being made in the x,y,z axis representing month, year, amount checked out.

The first image is the view looking down at the image, so looks almost like a 2D form.
Screen Shot 2017-02-24 at 5.00.25 PM.png
The second image is the side view
Screen Shot 2017-02-24 at 5.00.49 PM.png
This is all the shapes being formed at once
Screen Shot 2017-02-24 at 5.01.47 PM.png
And this is one shape
Screen Shot 2017-02-24 at 5.01.59 PM.png
I am planning on adding labels and different buttons to filter through each mode, but have been really struggling on positioning it and stopping it from rotating. Currently, I have the keys 1 to 9 being able to filter through which year you want to show for each planet. Also you can click the letters, m(mercury, v(venus), e(earth), r(mars), j(jupiter), s(saturn), u(uranus), n(neptune), p(pluto), and 'a' for all of them.

Analysis:

When creating the different PShape's for each planet I first began by mapping all the values compared to the other planet, but the values of Earth were to overpowering, so I had to map each planet separately. This then created all similar shapes, which I thought was interesting showing that the amounts checked out compared to each planet is all very relative to the other planets. The speed for the the spheres is more dramatic in Earth and Pluto. Pluto was no longer called a planet in 2006, but it became more popular to checkout items with pluto in the title in 2009, which is interesting.
Attachments
Project3D_1_working 5.zip
(10.43 KiB) Downloaded 70 times
Last edited by ariellalgilmore on Fri Feb 24, 2017 10:44 pm, edited 1 time in total.

freeberg
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 3D Visualization

Post by freeberg » Fri Feb 24, 2017 7:37 pm

Project Name: The Vinyl Frontier

Question: My original question was to investigate the volume and genres of the CD's checked out from the Seattle Public Library. However, I came to find out that there is a massive number of CDs checked out every year, so I could not look look at all CD's over a wide period.

I reformed my question to focus on the creator of the #1 best-selling album of all time: Michael Jackson. He is also interesting to look at because he passed away in mid-2009. My hypothesis was that there would be an uptick in the checkouts of his albums immediately following his passing.

Query: I queried the checkouts of his most common albums from 2008 to 2011. I joined that result set with the weekly checkouts of all those titles. The database was running quite slow, so I made the two queries separately and joined the results with R.

Granular Query:

Code: Select all

SELECT DISTINCT # lots of duplicate transactions in this table
	unix_timestamp(tr.checkOut) as checkOut,
    IFNULL(unix_timestamp(tr.checkIn), 0) as checkIn,
    o.title,
    year(tr.checkOut) as yearOut,
    month(tr.checkOut) as monthOut,
    weekOfYear(tr.checkOut) as weekOut
FROM
	spl_2016.transactions as tr
JOIN spl_2016.outraw as o
	ON tr.itemNumber = o.itemNumber
WHERE
	 lower(o.title) IN(
        "number ones",
		"off the wall",
		"thriller",
		"bad",
		"dangerous",
		"history past present and future book 1",
		"invincible",
        "michael",
		"escape"
        )    
     AND (IFNULL( 
		PERIOD_DIFF( date_format(tr.checkIn, "%YYYY%MM"), date_format(tr.checkOut, "%YYYY%MM") ) +
            (year(tr.checkIn) - year(tr.checkOut)) * 12, 0) <= 1) 
	AND year(tr.checkOut) >= 2007 
    AND year(tr.checkOut) <= 2011 
    AND (o.itemType REGEXP "cd" AND NOT o.itemtype REGEXP "(cas|bccd|acdisk|dvd|rec)") # Only CDs
    AND ROUND(o.deweyClass) IN(781, 782) # 782 is "vocal music"
Weekly Count Query:

Code: Select all

SELECT
    year(tr.checkOut) as yearOut,
    month(tr.checkOut) as monthOut,
    weekOfYear(tr.checkOut) as weekOut,
    if(o.bibnumber = 2149593, 1, 0) as thrillerCount
FROM
	spl_2016.transactions as tr
JOIN spl_2016.outraw as o
	ON tr.itemNumber = o.itemNumber
WHERE
	year(tr.checkOut) >= 2009 AND 
    year(tr.checkOut) <= 2011 AND
	(o.itemType REGEXP "cd" AND NOT o.itemtype REGEXP "(cas|bccd|acdisk|dvd|rec)") AND # Only CDs
    ROUND(o.deweyClass) = 782 # 782 is "vocal music"
GROUP BY
	yearOut, monthOut, dayOut
Design: Since the query focuses around music, I wanted the form of the data to resemble a vinyl record. To accomplish this, I imagined the space being organized by a cylindrical coordinate system. Each individual transaction is represented as a segment of the record's spiraling grooves. Each groove is color-coded by month. If a segment is very long and towards the outside of the object, then it was checked out closer to 2011. If it is shorter and near the center, it was checked out closer to 2008. Each spiral segment's position along the Z-axis is proportional to its popularity. Items close to the front were checked out in a very popular week, less popular weeks are in the back.

Sound Component: Jackson's best-selling album was Thriller. I used that as my inspiration for the sonification of the data. When the user presses "p", the song, Thriller, will begin to play. The volume of the track is calculated directly from the weekly checkouts of the albums. For the data before Jackson's passing, the song plays quietly and with little fluctuation. As the song approaches the data for mid/late 2009, the song becomes much louder and fluctuates until the end of the song. It was cool to literally hear the data.

Interactivity: I used controlP5 to make a HUD with some options for the user. There are checkboxes for each month of the year and for the years 2008, 2009, 2010, 2011. On startup, all are selected. Deselecting a checkbox will remove that portion of data from the visualization. There is a last checkbox: "Plot Missing Items" which will plot each incomplete transaction as a dot. The dots are color-coded just like the lines.
  • Space: resets camera to default view
    P: plays "Thriller" in accordance to data
    R: rewinds the song.
    Y: Reset year selections
    M: Reset month selections
Analysis: As expected, when we look at the checkouts year-by-year, we can clearly see the surge in Jackson's popularity following his death. The same result can be heard from the sound component. Furthermore, it seems that Jackson's popularity carried through the remaining portion of the data. Some months in 2011 were very popular for Jackson.

Retrospective: Overall I am quite happy with the result. The single hardest technical aspect was plotting the spiral segments efficiently, as they are a series of connected line segments in a PShape object and expensive to render. The hardest non-technical aspect was narrowing my question from my original and general one. If time allowed, I would have added a reference accompanying the song and more labeling to help guide the viewer's understanding of the data's organization.

(The photos below are best viewed in reverse order.)
Attachments
vinylViz.zip
(11.86 MiB) Downloaded 74 times
mising_items.png
missing items, a whole lot of them too!
2008_inside_2011_outside.png
angled view, we can see 2008 checkouts along the center and 2011 checkouts on the outside
2009_only.png
checkouts in 2009, notice the summer months in yellow/red
2008_only.png
only 2008, notice how the data is centered along Z-axis
side_View.png
side view, we can see popularity along this axis
opening_View.png
the view on sketch startup

jingyi_xiao
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 3D Visualization

Post by jingyi_xiao » Sat Feb 25, 2017 1:42 am

Question
Some books have longer title while others have shorter ones. Does this difference change over time? Does the difference vary in different Dewey classes?
Idea
I queried the average title length in different Dewey classes in different years (from 2006 to 2016) using Seattle Public Library data. I wanted to see if the average title length changes a lot or being stable over time.

Query
This query is used to calculate the average title length of books in all 1000 Dewey sections over 11 years, and grouped by Dewey sections and different year.

Code: Select all

SELECT 
    a.yr, a.dewey, 
    AVG(CASE
        WHEN a.itemtype LIKE '%bk%' THEN a.length
        ELSE 0
    END) AS AvgTitleLength
FROM
    (SELECT DISTINCT
        bibNumber,
            itemtype,
            FLOOR(deweyClass) AS dewey,
            LENGTH(title) AS length,
            year(cout) as yr
    FROM
        spl_2016.inraw
    WHERE
        YEAR(cout) > 2005 AND YEAR(cout) < 2017
            AND FLOOR(deweyClass) != '') AS a
GROUP BY a.dewey, a.yr;
Data and visualization
The query and data is not very complicated, I used a matrix to store all the data. For visualization, I used PShape functions (beginShape(), curveVertex() and endShape()). And there are a lot functions and details we can explore in PeasyCam and ControlP5.
PeasyCam and ControlP5
I spent a lot of time figuring out what I can do with PeasyCam and ControlP5. The examples of these two packages in processing are very helpful, because they can give you a basic idea of what it can do for your project. But customization need further exploration. Some question I had at first would be:
How to keep ControlP5 user interface still on top of 3D part (not rotate when camera changes perspectives)?
How to use ControlP5 interface to add checkbox, range and so forth? How they listen to the event?
When use ControlP5 interface, how to keep 3D part still (when drag mouse move over the interface, how to camera still)?
How to keep the label (text) always face front even you rotate the camera?
There are many details needs attention. And the solution of above questions can be found in the source code below.

Visualization concept
In the front view, there are lots of curves overlap together. Every color represents a Dewey class. Time goes from back to front. There are 11 'bundles' represent 11 years (2006-2016). In every bundle, there are 10 curves represent 10 Dewey divisions. Every curve is made up of 10 points (linked together), every point is a Dewey section within a Dewey division.
Interaction
Click 'OVERVIEW' checkbox to turn on/off the overview view.
Click the colorful checkbox to turn on/off the particular Dewey class.
Click 'LABEL' checkbox to turn on/off the label view.
Click 'NOTATION' checkbox to turn on/off the notation view.
Click 'ROTATE' checkbox to turn on/off the another perspective.
Drag 'TIMELINE' buttons to change to particular time interval.
Drag 'DEWET DIVISION' buttons to change to particular Dewey divisions.
Click 'INTERFACE' icon to turn on/off the title information.

Evaluation/Analysis
Just look the shape of every Dewey class, we can see the difference within some Dewey class is really small, like Dewey 300, 500, 600 and 800, while the others differ much, like Dewey 000, 400 and 900. So there is really no single pattern can be used to describe this phenomena.
Time-dependent
Average title length in some Dewey section changes a lot in the past 11 year. Like Dewey 93(incunable), changes from around 70 to 140(almost double). Dewey 298 (no longer used) changes from around 30 to 120, and then drop to 60 and go up a little bit in 2016. Title length in Dewey 435(Grammer of standard German), 561(Paleobotany), 626(Not assigned or no longer used), 673(Nonferrous metals), 756(Not assigned or no longer used), 825(English speeches), 903(Dictionaries, encyclopedias, concordances of history) and 927(Biography) also changes over time.
Time-independent
Many Dewey sections have a very stable average title length. For example, title length of Dewey 512(Algebra) is very stable at around 30. And Dewey 600, 800 is also very stable.
Assumption
Some books are in a category of no longer used or not assigned, so books are mixed up and hard to find a pattern.
Some books in certain Dewey class is not very popular and does not have many checkouts, the average is easily influenced by outliers when we do not have a lot of data. But when books are popular and have many checkouts, it is more like a normal distribution, and the average(mean) are more close to mathematical expectation value.
Attachments
final_2.png
Side view
final_1.png
Overview

kschlesi
Posts: 6
Joined: Wed Jan 11, 2017 10:42 am

Re: 3D Visualization

Post by kschlesi » Sat Feb 25, 2017 2:05 am

Concept: For this project, I wanted to keep the idea of looking at the contexts in which patrons of the Seattle Public Library interact with different countries and cultures. I used a similar query to my 2D project, counting the number of checkouts of media related to each country, and grouping the checkouts by Dewey number and time to highlight topic trends and changes over the past decade. I wanted to create objects whose shapes encapsulated these changes.

Query: This was a similar query to the one for my previous project, but I wanted to make the data more granular. I counted the checkouts by month instead of just by year, and I broke down the topics into 1000 separate Dewey numbers rather than rounding the Dewey classifications to the nearest 10. I also made sure to include only those with a Dewey classification. This query runs in approximately 5 minutes.

Code: Select all

SELECT year(cout), month(cout), floor(deweyClass) as deweyBin, 
       sum(CASE WHEN spl_2016.outraw.title LIKE "%mexic%" THEN 1 ELSE 0 END) as Mexico,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%ireland%" OR spl_2016.outraw.title LIKE "%irish%" THEN 1 ELSE 0 END) as Ireland,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%spain%" OR spl_2016.outraw.title LIKE "%spanish%" THEN 1 ELSE 0 END) as Spain,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%israel%" THEN 1 ELSE 0 END) as Israel,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%egypt%" THEN 1 ELSE 0 END) as Egypt,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%viet%nam%" THEN 1 ELSE 0 END) as Vietnam,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%german%" THEN 1 ELSE 0 END) as Germany,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%russia%" THEN 1 ELSE 0 END) as Russia,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%china%" OR spl_2016.outraw.title LIKE "%chinese%" THEN 1 ELSE 0 END) as China,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%cuba%" THEN 1 ELSE 0 END) as Cuba,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%iraq%" THEN 1 ELSE 0 END) as Iraq,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%japan%" THEN 1 ELSE 0 END) as Japan
FROM spl_2016.outraw
WHERE year(cout) > 2005 AND year(cout) < 2017
  AND deweyClass IS NOT NULL AND deweyClass != ""
  AND (spl_2016.outraw.title LIKE "%mexic%" OR 
       spl_2016.outraw.title LIKE "%ireland%" OR spl_2016.outraw.title LIKE "%irish%" OR
       spl_2016.outraw.title LIKE "%spain%" OR spl_2016.outraw.title LIKE "%spanish%" OR
       spl_2016.outraw.title LIKE "%israel%" OR
       spl_2016.outraw.title LIKE "%egypt%" OR
       spl_2016.outraw.title LIKE "%viet%nam%" OR
       spl_2016.outraw.title LIKE "%german%" OR
       spl_2016.outraw.title LIKE "%russia%" OR
       spl_2016.outraw.title LIKE "%china%" OR spl_2016.outraw.title LIKE "%chinese%" OR
       spl_2016.outraw.title LIKE "%cuba%" OR
       spl_2016.outraw.title LIKE "%iraq%" OR
       spl_2016.outraw.title LIKE "%japan%")
GROUP BY deweyBin, year(cout), month(cout)
ORDER BY deweyBin, year(cout), month(cout);
In addition, I ran a separate query to search for all the media with no Dewey classification, and break down these checkouts by topics found in their call numbers. I started with a query to pull out all biographies in this set, whose call numbers start with "B ":

Code: Select all

SELECT year(cout), month(cout), callNumber, 
       sum(CASE WHEN spl_2016.outraw.title LIKE "%mexic%" THEN 1 ELSE 0 END) as Mexico,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%ireland%" OR spl_2016.outraw.title LIKE "%irish%" THEN 1 ELSE 0 END) as Ireland,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%spain%" OR spl_2016.outraw.title LIKE "%spanish%" THEN 1 ELSE 0 END) as Spain,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%israel%" THEN 1 ELSE 0 END) as Israel,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%egypt%" THEN 1 ELSE 0 END) as Egypt,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%china%" OR spl_2016.outraw.title LIKE "%chinese%" THEN 1 ELSE 0 END) as China,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%cuba%" THEN 1 ELSE 0 END) as Cuba,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%iraq%" THEN 1 ELSE 0 END) as Iraq,
       sum(CASE WHEN spl_2016.outraw.title LIKE "%japan%" THEN 1 ELSE 0 END) as Japan
FROM spl_2016.outraw
INNER JOIN spl_2016.callNumber
ON spl_2016.outraw.itemNumber = spl_2016.callNumber.itemNumber
WHERE year(cout) > 2005 AND year(cout) < 2017
  AND (deweyClass IS NULL OR deweyClass = "")
  AND (spl_2016.outraw.title LIKE "%mexic%" OR 
       spl_2016.outraw.title LIKE "%ireland%" OR spl_2016.outraw.title LIKE "%irish%" OR
       spl_2016.outraw.title LIKE "%spain%" OR spl_2016.outraw.title LIKE "%spanish%" OR
       spl_2016.outraw.title LIKE "%israel%" OR
       spl_2016.outraw.title LIKE "%egypt%" OR
       spl_2016.outraw.title LIKE "%china%" OR spl_2016.outraw.title LIKE "%chinese%" OR
       spl_2016.outraw.title LIKE "%cuba%" OR
       spl_2016.outraw.title LIKE "%iraq%" OR
       spl_2016.outraw.title LIKE "%japan%")
  AND (callNumber REGEXP "^B ")
GROUP BY deweyBin, year(cout), month(cout)
ORDER BY deweyBin, year(cout), month(cout);
This query runs in about 2 minutes. I ran more similar queries to search for a few more call number-defined topics, including "J" = juniors, "E" = early childhood (I think), "YA" = young adult, "MYSTERY," "SCI-FIC," general "FIC," "CD," "DVD," and "VHS."

Process: I started by trying to define the shape I wanted to give to each object, that would define the topic and frequency patterns. I started with cylindrical objects where the radius indicated the number of checkouts, the polar angle indicated the month from Jan 2006 to Dec 2016, and the height indicated the topic. I modified a Processing example for drawing cylinders to change the radius and stack a bunch of them. Here is an early try or two at sketching it for ten Dewey numbers in the History class:
early_1.png
early_2.png
Early Object Sketches

Since the data is more granular than in the 2D project, the yearly spikes around the polar angle and the wild changes between checkouts for adjacent topics made the object look too jagged. So I decided to separate Dewey numbers into separate objects, and separate the year and month to different axes (polar angle is month, height is year). This made the objects more smooth. I also experimented with just drawing dots or lines instead of full cylinders, and I liked this effect better (plus it was faster to render). I spent a long time scaling the radii so the data would be visible, and the key turned out to be a log scale, which highlights the most popular while allowing small numbers of checkouts to still be visible. I also scaled saturation and brightness linearly with number of checkouts to emphasize the effect. I chose a different hue to represent each country.

Each object itself is 3D, but when it came to arranging them in space, I was not sure of the best way. I started with a 2D sheet of them, arranged by Dewey number, and I thought it worked well. Unlike a few other, more complex arrangements, this one didn't block data from view. I also decided to draw the objects for all the countries on top of each other, so the dominant countries for each topic over can be easily compared. Changing the locations was made much easier by making each object an instance of a class I called "Bead," where each Bead object stored its own location and had its own draw method.

Result: My final visualization includes all the objects for the non-Dewey items as well as each Dewey category (starting with 200, since there is very little interesting data in 000-199). Starting the sketch shows the objects head-on in dot mode, but pressing "r" or "h" will bring the viewer to a more instructive top or side view. (Figuring out the camera was one of the most difficult parts of this project, but I got it down pretty well by the end.) Pressing "s" (or switching to the preprogrammed views) will also plot in spiral mode, which for some reason is much faster to render than dots.
top_view_dots.png
Top View
hanging_view_spirals.png
Side View

Interactivity: In addition to switching between dots and spirals and changing views, there are other interactive features. Since there is a lot of data and a lot of interesting patterns, I wanted to make exploration easy for the user. Pressing the keys 1-6 will toggle each country on and off. Mousing over the center of an object will display that object's Dewey number and topic name, for better understanding the patterns. Pressing "t" turns on a mode that also prints month and year coordinates around an object when mousing over it. Other keys will also turn on and off the major class labels on the side, and the instructional interface itself. Finally, since switching to the pre-programmed views restricts rotation ability, pressing the space bar restores full camera control.
music_countries.png
Comparing Music Styles of 3 Countries

Analysis: There are a lot of interesting patterns to discover in this data. Overall, I felt some of the most interested sections were history/geography and social sciences, where certain countries were much more prominent in some areas than others, such as the topics corresponding to their continent's history. A few interesting observations: Almost all of the religion section topics are devoted to Christianity, so Ireland and Spain are widespread and countries with other primary religions are relegated to the "other" subsection -- except Israel, which is the most prominent country in the "Bible" topics. China, Japan, and Spain have many more cookbooks than Israel, Ireland, and Cuba. China dominates most of the technology section, and Japan most of the arts section, while the most popular languages to learn are clearly Chinese and Spanish. Japan is well-represented in sci-fi, and Ireland in mystery. Topics where all countries are similarly well represented include the music section and the "history of the United States" section, as well as biographies.

The time frequency patterns are often more subtle than topic variations, but one notable long-term change is the slow dying out of VHS checkouts after about 2008 for all countries, and a notable yearly pattern is the uptick in checkouts about Chinese customs in January/February, around the time of the Chinese New Year.
vhs_cd_dvd.png
The Death of VHS
chinese_customs.png
Chinese Customs

Code: Zipped code, query, and data are included below.
Beads_3D_final.zip
(357.83 KiB) Downloaded 83 times

Post Reply