Proj 1: Data Query

glegrady
Posts: 160
Joined: Wed Sep 22, 2010 12:26 pm

Proj 1: Data Query

Post by glegrady » Mon Jan 05, 2015 3:43 pm

Create a MySQL query that asks an interesting question about the SPL database.
Assignment due: January 15, 2014

The query should consist of the following:
. Results to consist of multivariate numeric data to allow for minimum 3 columns metadata
. Standardize columns for output to .csv file: (vertical, horizontal, pixel value + more)
. Your search results should be sufficient to feature subtleties expressed visually within a screen size between 1920x 1280 to 2560 x 1850 pixels

Post your project by clicking POST REPLY. The project should consist of the following:
A brief description of your query (to be visible in your post) followed by:
. Give the MySQL Query
. Explain it and the motivation
. Give the processing time
. Provide an analysis/commentary: Discuss what you found through the query process in the data

Post all of this here, or as a pdf attachment
. Attach the csv file
George Legrady
legrady@mat.ucsb.edu

a.lazareva
Posts: 5
Joined: Sat Jan 10, 2015 11:29 am

Re: Proj 1: Data Query

Post by a.lazareva » Sun Jan 11, 2015 1:28 pm

Frequently Borrowed Together
I wanted to create a query that would return pairs of items that were borrowed together and the number of times the items were borrowed together. I believe this is an interesting question because it could allow us to create a simple ‘recommendation’ program similar to Amazon’s ‘Frequently bought together’ feature. I also wanted to see if there are any interesting data patterns that might give insights into user behavior.

Please see the attached file for the query, results, and analysis
Attached a simple co-occurance matrix with pixel coordinates corresponding to dewey classes. For example, (100,200) shows that items from these categories were checked out together. This is based on data going back to 2010 (1million lines). I excluded comic books from this one.
A.Lazareva_HW1.pdf
Project submission
(102.85 KiB) Downloaded 253 times
output.csv
output
(2.18 KiB) Downloaded 217 times
mockup.jpg
mockup

boyan
Posts: 5
Joined: Tue Jan 14, 2014 11:48 am

Re: Proj 1: Data Query

Post by boyan » Wed Jan 14, 2015 8:18 pm

People borrow books for different reasons, some of them wish to read the best­selling novels, others wish to learn programming skills. In other words, people have their personal preference for certain kinds of books. The specific reading interests may influence the checkout duration. It would be interesting to see if there is a pattern between checkout duration and reading interests. In this database, the reading interests could be represented by the particular dewey classes to which the borrowed books belong.
Attachments
output.csv
(222 Bytes) Downloaded 210 times
BYan_HW1.pdf
(115.9 KiB) Downloaded 298 times
Last edited by boyan on Thu Jan 15, 2015 10:06 am, edited 1 time in total.

donghaoren
Posts: 5
Joined: Sat Jan 10, 2015 11:33 am

Re: Proj 1: Data Query

Post by donghaoren » Wed Jan 14, 2015 8:51 pm

Average Lending Time

One interesting question about this dataset is to see the difference between the check-out and check-in time (lending time), which can reveal how long people read the books to some extent. Although people may not read the book during the entire lending time, but this number is the only available measure of reading time in the database. In this assignment, I explored the statistics of the lending times of top-level Dewey classes using SQL queries.

Update:

Included yearly average lending time for each of the 100 dewey classes.
MAT259 Assignment 1.pdf
(107.31 KiB) Downloaded 242 times
dewey-count-average_lending_time.csv
(4.38 KiB) Downloaded 224 times
dewey-count-average_lending_time-yearly.csv
(10.77 KiB) Downloaded 204 times
Last edited by donghaoren on Fri Jan 16, 2015 11:01 am, edited 3 times in total.

kurtoon
Posts: 5
Joined: Sat Jan 10, 2015 11:28 am

Re: Proj 1: Data Query

Post by kurtoon » Wed Jan 14, 2015 11:22 pm

Urban Agriculture

Seattle is home the P-Patch Community Garden Program, an unusually successful urban agriculture organization that manages 85 gardens and 2850 plots. Established in 1973, the program continues to expand, growing food on 14.9 acres and donating over 28,000 pounds of produce annually.

Given the widespread presence of urban farming in Seattle, I was curious if any patterns could be discerned from the number of holdings and checkout frequency of books within the Dewey classification 630, Agriculture.

MySQL query and commentary:
MAT259_Kaminski_hw1.pdf
(105.43 KiB) Downloaded 335 times
Data:
MAT259_Kaminski_hw1_data.csv
(4.03 KiB) Downloaded 207 times

younkeehong
Posts: 4
Joined: Sat Jan 10, 2015 11:36 am

Re: Proj 1: Data Query

Post by younkeehong » Wed Jan 14, 2015 11:43 pm

Loan period, sorted by Pages and Dewey number.

Q1: Does longer books get checked out longer?
Q2: Is there any specific type of subject that gets checked out longer?

avg loantime will be calculated grouped by dewey class and book length (number of pages)

/* --- RENEWEAL --- */

second query.
to dig deeper into the specific topics, the philosophy (100~199) was chosen.

Philosophy for quite sure deals a way to wisdom for you,
and you can find philosophy of ancients in the books,
books are at library.

And what time of the year do you feel wisdom-less?
When end of the year comes? When new year starts?
or when the leaves start to fall?

so this new query investigates loan popularity, loan period
VS
dewey class in range of philosophy, and checkout time w.r.t. months in year.

Code: Select all

select
        floor(deweyClass) as dewey,
        month(checkOut) as month,
        sum(timestampdiff(day, date(checkOut), date(checkIn))) as totaltime,
        sum(1) as totalnum
from
	spl3._rawXmlDataCheckIns
where
	_rawXmlDataCheckIns.deweyClass >= 100 and _rawXmlDataCheckIns.deweyClass < 200
	and (checkOut is not null and checkIn is not null
		and timestampdiff(day, date(checkOut), date(checkIn)) >= 1
		and timestampdiff(day, date(checkOut), date(checkIn)) <= 100)
	and (year(checkOut) >= 2013 and year(checkOut) < 2014)
group by
	floor(deweyClass),
        month(checkOut);
Attachments
HW1.pdf
revised with specific explanation
(207.28 KiB) Downloaded 226 times
hw1.csv
(8.58 KiB) Downloaded 219 times
Last edited by younkeehong on Wed Feb 18, 2015 11:08 pm, edited 3 times in total.

intae
Posts: 19
Joined: Tue Oct 14, 2014 10:56 am

Re: Proj 1: Data Query

Post by intae » Wed Jan 14, 2015 11:51 pm

Relationship between local cultural event and book checkout.

I tried to verify how much the local cultural event in Seattle effects the book checkout in the Seattle public library. For this study, I choose one of the biggest cosplay events in the Northwest “Sakura-Con”, which is held in downtown Seattle every spring. A lot of players dress up as same characters in a cartoon. Especially, Japanese mangas are the most welcomed topic in this event. Famous cartoons artists, novel writers, and singers are invited for this event. This event gathers a lot of people who communicates their interests through a website into a physical place. I assume whether the people like the Japanese cartoon or not, once they meet all the characters, these encounters would invoke interest in the cartoon culture. I expect this leads to an increase in the request of Japanese mangas in - public library.

FIrst, I started to run the program based on the Dewey Decimal Classification(DDC) with three categories “Entire comic book”, “Japanese mangas”, and “American comic book”. Then I performed the checkout time at two weeks interval per month to compare the checkout traffic.

Processing times were varied.
Attachments
assignment1_intae.csv
(3.12 KiB) Downloaded 201 times
Assignment1_Intae.pdf
(229.82 KiB) Downloaded 313 times
Last edited by intae on Thu Jan 15, 2015 12:12 am, edited 2 times in total.

menzer
Posts: 5
Joined: Sat Jan 10, 2015 11:30 am

Re: Proj 1: Data Query

Post by menzer » Thu Jan 15, 2015 12:07 am

Empirical Hours of a Library

The comprehensive data base of check-outs from the Seattle Public Library between 2005 and 2014 covers a wide range of media types (e.g., books, DVDs, records) and subjects (e.g., Philosophy, Computer Science, Language). Based on more than 67 million entries one can study behavioral questions related to its users and variability of apparent patterns over time. While some patterns are expected to be found on a global data base level, other patterns may only be apparent when specifically selecting a subset of media types, subjects or a combination of both.

Here, I am posing the question of how long the library is awake every day in the study period from a data perspective (empirical approach). The term awake will first be defined as the difference of the first check-out of the day and the last check-out of each day on a global data base level covering all media types and subjects. One would expect the time of the first and last check out of a day to correlate well with the actual opening hours of the library, which is the first question of my study. In the following are the current opening hours of the library that I pulled from the website:
Mon - Thu: 10am – 8pm
Fri - Sat: 10am – 6pm
Sun: 12pm – 6pm


One can easily calculate the time difference between first and last check-out of a day, to test if the opening hours are undermined well by actual observed check-outs. Specific trends in the data that are related to the day of the week can be identified, for example, some days of the week may be less awake then other days of the week when averaging awake hours over the whole study period by weekday. A study from McKenzie (2014) that uses the same data set, suggests that there is a trend in number of check-outs and their variability by media subject over the course of a typical week, with Monday having the highest number of check-outs and a steady decline throughout the course of the week. The question of when the library is more or less awake can then be further specialized to media types or media subjects to study behavioral patterns of groups of people that favor e.g., “Phineas and Ferb” comic books or records from “The Beatles”. Based on my query, one can in turn also ask the question of when the library sleeps, and search for media types or subjects that have not been rented out over the longest time in the data base. This could potentially lead to conclusions of which selections in the library should be reduced or extended in the future.
Attachments
OM-Empirical_Hours_Of_A_Library.pdf
pdf of SQL queries, runtimes and analyses
(134.5 KiB) Downloaded 201 times
awake_2005_2014.csv
awake data set
(394.08 KiB) Downloaded 189 times
Last edited by menzer on Fri Jan 16, 2015 12:06 am, edited 5 times in total.

brocknoah
Posts: 5
Joined: Sat Jan 10, 2015 11:36 am

Re: Proj 1: Data Query

Post by brocknoah » Thu Jan 15, 2015 12:37 am

I wanted to create a query to view the popularity of a programming language over time. The Dewey Decimal Class system made it much easier to search and remove a lot of static and noise. A constant downward trend can be seen from 2007-2013 with eight of the ten languages.
Attachments
HTML.csv
(3.12 KiB) Downloaded 151 times
Project1.pdf
(166.62 KiB) Downloaded 243 times

nataly_moreno
Posts: 5
Joined: Sat Jan 10, 2015 11:31 am

Re: Proj 1: Data Query

Post by nataly_moreno » Thu Jan 15, 2015 2:09 am

Harry Potter: Movies vs Reading - Checkouts Over Time

Do movies with sequels lose popularity in the sequels? I’ve often heard that the sequels are not better than the first so I wanted to explore this question, and I have also heard that movies that come from books tend to be better than movies that do not. I was not able to find all the movie titles I had in mind, so I decided to take a closer look at the Harry Potter series. I wanted to know which of the 7 movies was checked out most frequently and how the movie checkouts compared to the book checkouts. I also wanted to see when the books and movies were checked out most often over time.

The CSV file for the movie and book checkouts with dates is too big to upload, so I only uploaded the shorter version with the counts instead. However, the queries can be found within the pdf.

~ Update - Friday, January 16, 2015 ~
CSV files need to have all numerical data so I modified the query to just output numerical data. I then further processed the data using Python to have the data separated into a more convenient format for making the grid with Processing.
Attachments
Updated_Hw1_NatalyMoreno.pdf
Contains new query at the bottom along with Python code
(140.64 KiB) Downloaded 252 times
HarryPotterBookMovieWithCounts.csv
This one only shows the counts, the other one with all the dates is too big.
(2.02 KiB) Downloaded 163 times
MAT259A_Hw1_NatalyMoreno.pdf
(118.35 KiB) Downloaded 243 times
Last edited by nataly_moreno on Fri Jan 16, 2015 12:12 am, edited 3 times in total.

Post Reply