## Proj 2: 2D Spatial Map

Posts: 160
Joined: Wed Sep 22, 2010 12:26 pm

### Proj 2: 2D Spatial Map

In preparation for this assignment we have a guest speaker: Sara Irina Fabrikant is a Professor of Geography and the head of the Geographic Information Visualization and Analysis (GIVA) group at the GIScience Center of the Geography Department of the University of Zurich (UZH). Her talk is titled: "The Art and Science of Charting Non-Spatial Data Landscapes", Monday January 27, 2014, 5:30 pm to 6:45 pm in ESB 2001: http://engineering.ucsb.edu/about/directions

-------------------
2D Spatial Map is the first Visualization project in the Processing Environment. The criteria for the project are as follows:

. Collect data with MySQL either inside Processing, or by creating txt files to be used by the program.

. The visualization should not be a linear graph but use x,y,z settings in a 2D grid system. This means using at least 3 datasets from the Seattle Library Data. Length and Height of the matrix each represent a data value. The 3rd value is to be represented by coloring cells either using saturation, or brightness, or scale as a way to convey relative values between cells. Examples include the attached image:
a matrix of color coded values
Assignment Due Dates:
1.28: Concept, data, doodle and design decisions (colors, where to label, etc.)
2.04: 2D lecture / Class lab: individual meetings
2:06: Project Presentation to class

. Advanced students can explore placement by algorithmically calculated location such as Kohonen Self-Organizing Map Algorithm: http://www.nceas.ucsb.edu/~hespanha/srh/Welcome.html or any other spatial distribution algorithm as long as its not a linear frequency version.

grant.mckenzie
Posts: 4
Joined: Tue Jan 14, 2014 11:47 am

### Re: Proj 2: 2D Spatial Map

CONCEPT

A “signature” does not only apply to the human autograph, but also applies to data and human behavior. For example the “check-out” history of a specific book could be used to describe the book. Rather than individual books, I plan to explore the “check-out signatures” of specific classes of material in the Seattle Public Library. I believe that certain classes of material have more similar check-out signatures than others and a visualization of this data would be quite informative. For example patrons of the library may checkout books related to Fishing and Cooking at similar times which show signatures very different than books related to Ice Hockey.

DATA

The foundation of this assignment will be the Dewey Decimal Class system. The 3 digit base code (before the decimal place was extracted producing a total of 1000 possible Dewey Classes. In reality the Seattle Public Library system only contains items tagged with one of 915 Dewey Classes1. For each of these 915 Dewey Classes, an array of check-out times was extracted aggregated by the hour of the day. Normalizing this data produced an array with 12 entities (12 open hours of the library) that sums to 1 for each of the Dewey Classes. The hours were then grouped in to “perceived parts of the day.” The hours between 8a-11a where classified as Morning, 11a-3p as Afternoon and 3p-7p as Evening. Summing the normalized “check-out” counts across these groups and multiplying the number (bounded by 0 and 1) by 255 produced three distinct values for each of the Dewey Classes; values that were represented Red, Green and Blue in an RGB color pallet.

Additionally, a CSV file2 containing the numeric Dewey Decimal value along with it's “Name” property were downloaded since numeric values are often hard to interpret. This was used to describe (label) the rectangles in the final visualization.

DESIGN DECISIONS

Originally, I wanted to use a Self Organizing Map approach to this assignment (Figure 1). Unfortunately time and the complexity got the best of me so I took a more standard approach which allowed me to show all of the Dewey Classes in the SPL dataset rather than just a subset of 100. By temporally clustering the item check-outs in to Morning, Noon and Evening, I was able to describe each of the Dewey Classes by 3 numerical values (between 0 and 1). Multiplying these values by 255 allowed the Dewey Classes to be represented on a RGB color scale. For example, a high proportion of “check-outs” in the morning returns a color value high in RED and lower in others.

I plotted out the Dewey Classes in rectangles (10px by 60px) and organized them by rows and columns. Each column represents a “section” of the Dewey Class System and each section contains up to 10 classes. The difficulty with showing 915 classes is that labeling is an issue. While I do feel it is important to first look at the visualization without any labels (to really appreciate the artistic side of the data), I decided to add a “mouse over” label that states the name and number of each Dewey Class as you move around the visualization. Lastly, I also included a legend to describe what the colors represent (shown in the bottom right corner of the visualization (Figure 2).

Please see attachments for Database Queries and Figures. Additionally the code is zipped and included.
Attachments
McKenzie_MAT259_PROJ2.zip
Source Code
McKenzie_Proj2_v2.pdf
Final Submission
McKenzie_Proj2.pdf
Original Submission
Last edited by grant.mckenzie on Sun Feb 02, 2014 9:53 pm, edited 2 times in total.

kevin.g.deweese
Posts: 2
Joined: Tue Jan 14, 2014 11:43 am

### Re: Proj 2: 2D Spatial Map

Kevin Deweese

Concept:
Which types of material have seasonal dependence on checkout activity? I suspect that summer months will see an increase in children's reading material as compared to adults. I'm curious if this is the case and what similar patterns might exist.

Data:
The top 10 types of media will be examined for the year 2013. For each type of media, the average activity per month will be calculated along with the activity per month. The query to find the top 10 types and their totals in 2013 is shown here. Other querries will be similar.

SELECT kind, count( * )
FROM (
SELECT kind, activity.item
FROM kind
INNER JOIN activity ON kind.item = activity.item
WHERE activity.o
BETWEEN '2013-01-01'
AND '2014-01-01'
) AS mytable
GROUP BY kind
ORDER BY count( * ) DESC
LIMIT 10

Visualization:
Source code for the visualization is attached. The data shown is the distance from the average, normalized by the average.

The color scheme is that results that are above average are shown in green while results below average are shown in red. Additionally the calculated distance from average above can be seen for each grid by hovering the cursor over the entry.

Results and Conclusions:
The first thing I notice about this is that there is more green to the left and more red to the right as if everyone starts the new year with a fresh zeal for using the library. The most variable data seems to be the jckit type which has a much higher than average checkout rate in the beginning of the year and declines towards the end of the year. Almost all types show a decreased activity in December. Perhaps during the winter holidays everyone is checking out less. The juvenile book, cd, and dvd types do show an increase in summer months. The adult types show a lesser increase during these months except for bcbk which shows a decrease.
Attachments
Assignment2.zip
DeweeseProj2.pdf
Deweese2Ddoodle.pdf
Last edited by kevin.g.deweese on Thu Feb 06, 2014 12:55 pm, edited 2 times in total.

songgaogeo
Posts: 4
Joined: Tue Jan 14, 2014 11:48 am

### Re: Proj 2: 2D Spatial Map

Concept
As a geographer, I am interested in whether I can apply some cartographical design in the 2D spatial maps such as choropleth maps or spatial density maps to visualize the temporal patterns of check-outs in different granularities, such as hourly, daily, weekly and monthly. And more importantly, different visualization methods might have their own pros and cons. I will try different ways to find a good visualization to show the temporal patterns of check-outs. Specifically, a choropleth map is a thematic 2D-map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. In addition, with regard to the spatial density maps, it calculates the density of features around each output cell (a cell is one or several pixels) based on its value and neighborhoods’ values. Conceptually, a neighborhood is defined around each cell center, and the number of cells that fall within the neighborhood is summarized and divided by the area of the neighborhood.

please check the attachment for the data queries and the doodle picture:-)
SongGao_MAT259_HW2_Doodle.pdf
SongGao_MAT259_HW2_Final.pdf
Song_Proj2_SourceCode_Data_Figures.zip
Last edited by songgaogeo on Wed Feb 05, 2014 10:35 am, edited 1 time in total.

currier
Posts: 4
Joined: Tue Jan 14, 2014 11:50 am

### Re: Proj 2: 2D Spatial Map (revised)

Project 2: 2D spatial map (revised)

In our data set a handful of library items—technically, item numbers—are associated with more than one bar code. As of this writing, 26,901 item numbers fall into this category, or about 0.8% of the total unique itemNumbers in the spl2 database. These are anomalies—according to metadata, each physical item should have one item number and one bar code. This visualization is an attempt to characterize this handful of anomalous items and reveal patterns that could suggest explanations as to why multiple bar codes may be associated with a single item number.
Attachments
currier_proj2_images.zip
currier_proj2_revised2.pdf
currier_proj2.zip
2014-01-27_queryresults.zip
Last edited by currier on Thu Feb 27, 2014 9:44 pm, edited 2 times in total.

milrober
Posts: 4
Joined: Tue Jan 14, 2014 11:44 am

### Re: Proj 2: 2D Spatial Map

Concept:
I want to explore how the popularity of items changes with time. I will map all items in the 780s (Music) of the Dewey system. The proposed mapping would look similar to the Map of the Market below, but it will take into account time (and the evolution thereof.) The x-axis will represent time, similar to the Google music genre visualization below, the y-axis will map the different subcategories in the Dewey system, and z will map the popularity (number of checkouts) for the specific point in time.

Concept (Updated):
Treemaps utilize the entire allotted space, and they distribute “leaves” economically depending on metrics set by the programmer. Aside from a pure visualization of hierarchical structures, the ability of treemaps to convey more information is augmented by the use of color coding and varying sizes to display additional aspects of the leaves. My treemap shows the popularity of different items within the “Arts and Recreation” dewey class (700s). The larger the leave, the more it appears within the data results, specifically, the more checkouts of an item within a dewey class, the larger that dewey classes’ leaf.
Attachments
sketch_2dspatial.zip
RMiller-2DSpatialTreemap.docx
RobertMiller_doodle.pdf
Last edited by milrober on Thu Feb 06, 2014 3:32 pm, edited 7 times in total.

mohithingorani
Posts: 5
Joined: Tue Jan 14, 2014 11:46 am

### Re: Proj 2: 2D Spatial Map

What is your Second Language Seattle?

Concept:
For this assignment I have studied the interest of Seattle’s citizens in learning a new European language. I have chosen to study French, German, Spanish & Italian over a 6 year period (2007-2013) The checkouts have been made in the following itemtype categories: books, Cassettes and DVDs.
I intend to visualize not just the new language trends but also the means of study employed.

Query:
for Italian:
select month(cout),year(cout),sum(case when itemtype = "acbk" or "jcbk" then 1 else 0 end) as book,sum(case when itemtype="accas" or "jccas" then 1 else 0 end ) as cassette,sum(case when itemtype = "acdvd" or "jcdvd" then 1 else 0 end) as dvd from inraw where deweyClass>=450 and deweyClass<460 and date(cout) >= "2006-01-01" and date(cout) <= "2013-12-31" group by month(cout),year(cout) order by year(cout), month(cout)
Attachments
Assignment 2.pdf
Last edited by mohithingorani on Thu Feb 06, 2014 3:44 pm, edited 1 time in total.

hellobuaazl
Posts: 4
Joined: Tue Jan 14, 2014 11:54 am

### Re: Proj 2: 2D Spatial Map

calculate the quantity of books checked out about football, baseball, basketball and hockey in recent years to form a map.
Attachments
sketch_2D_mapping_sports.zip
2D spatial map assignment.pdf
Last edited by hellobuaazl on Thu Feb 06, 2014 3:30 pm, edited 1 time in total.

Posts: 2
Joined: Tue Jan 14, 2014 11:41 am

### Re: Proj 2: 2D Spatial Map

Please find attached the results of the queries in the XLSX file.

Question:
I wanted to find out what the most popular keywords from 2005-2013 were, and how they varied across months. More concretely, I was curious if irrespective of the year there is a discernible relationship in the most popular keywords in one month with that in another month. One of the ways of determining the most popular keywords in a particular month is to query the most popular item being lent out that specific month, and then populate the keywords associated with this title. Of course, there needs to be a level of cleanup of data since words like ‘of’, ‘the’, ‘and’, numbers, etc don’t lend too much insight nor help in gaining any insight regarding a trend.

Data:
I used the tip that Prof Legrady provided last time about not using the spl2.inraw table, and instead used the other tables to query from. As is evident, I populated the Item/Bib number as an identifier to the title, and soon enough came to a conclusion that sorting by item number is not a good way to go as multiple item numbers can map to a single bib number. So instead I decided to sort via the bibliography number, and retrieved the most popular titles in a particular month. To decide the most lent out items, I had to determine the number of loans of the title and for this I restricted my query to count the number of times a specific title was lent out in a given month from 2005-2013.

Query:
SELECT t1.item AS Item, t1.bib AS BibNo,
UPPER(SUBSTR(t5.kind,3)) AS Media_Type, t2.title AS Title,
COUNT(t1.o) AS Num_Loans,
t3.kywds AS Keywords
FROM spl2.activity AS t1
INNER JOIN spl2.title AS t2 ON t2.bib = t1.bib
INNER JOIN ( SELECT bib, GROUP_CONCAT(LOWER(keyword)) AS kywds
FROM spl2.keyword
WHERE LOWER(keyword) NOT REGEXP 'the|and|for|not|any|^([0-9]+)|^([a-z]){1}\$|^([a-z]){2}\$| '
GROUP BY bib) AS t3 ON t3.bib = t1.bib
INNER JOIN spl2.kind AS t5 ON t5.item = t1.item
WHERE t1.item > 0 AND YEAR(t1.i)>=2005 AND YEAR(t1.o)>=2005 AND YEAR(t1.o)<2014
AND LOWER(t2.title) NOT REGEXP '^uncatalog+' AND MONTH(t1.o)=X
GROUP BY t1.bib
ORDER BY Num_Loans DESC
LIMIT 20

where X is a number from 1-12, denoting the month.

Processing Time for each query is between 120~140s.

Visualization:

Attachments
MAT259_HW2_WriteUp_Dey.pdf
Top20_Items_Kywds_month wise 2005_2013.xlsx

m_uppal
Posts: 4
Joined: Tue Jan 14, 2014 11:43 am

### Re: Proj 2: 2D Spatial Map

Introduction and approach:

Comic books and children books are one of the most searched terms within the Seattle public library.

In this visualization, I wanted to explore the emergence of comic books in 2012 and 2013. Not just book, but also other media like CD/DVD, Audiobooks, Movies, Graphic Novels etc.
Also how this relates to the biggest comic-con festival in Seattle? How much was the increase/decrease in transaction around that period? What is their favorite comic character? All these questions have been pursued in this project..
I used DeweyClass 741.5 primarily to get my results and rest was keyword searching..
Attached: detailed explanation with sql queries and processing source code.
Attachments
HW2-Prototype.pdf