Proj 1 - MySQL Assignment | Knowledge Discovery

Posts: 160
Joined: Wed Sep 22, 2010 12:26 pm

Proj 1 - MySQL Assignment | Knowledge Discovery

Post by glegrady » Thu Jan 03, 2019 1:25 pm

Proj 1 - MySQL Assignment | Knowledge Discovery

Assignment due: Thursday, January 17, 2019

DATA MINING: This first assignment is to explore retrieving data from a database that may result in interesting insight about the accumulation of the data. Our approach is "Knowledge Discovery in Databases" or KDD or "Data mining", which involves the discovery of patterns in large datasets. The goal is the discovery and extraction of patterns and knowledge from large amounts of data. The library database we have access to consists of hourly checkouts (and returns) of books, cds, dvds, and other items from the Seattle Public Library (SPL). On a broad level, it provides a historical overview of cultural interests of the Seattle downtown community but it can also be seen as a sampling of North American cultural interests from September 2005 to today.

There are approx. 91 million datasets in the database representing 13 year's of activity. The data consists of multivariate data as each item in the database has the following metadata, checkout, checkin, title, media (itemtype), acquisition number (item number), item classification by Library of Congress (bibnumber), individual object label (barcode) and librarian classification labels (subject). These are described at : ... ataDef.pdf

The data is classified in two ways. The library uses the Dewey Decimal system by which to organize non-fiction items: ... al_classes but over 50% of the items in the library collection are fiction, and therefore are coded in a more arbitrary ways. For instance, music, movies, seem to be distributed in both Dewey and non-Dewey ways of classification. The most popular Dewey items can be viewed hourly at: ... &d=05&h=17 The top most popular items are comic books, cookbooks, health, travel books, etc.


MYSQL ASSIGNMENT - THE CONCEPT: In any database, there lies hidden knowledge. What does a database contain, and what can MySQL queries reveal? Your first assignment is to find something of interest based on your own cultural / knowledge interests. Here are some options:

1) Topics of Cultural Interest:
. What is the performance of a topic, a title, - a media over time, or by volume, or by trends, or by news events?
. How does the collection change over time?

2) The Database Organizational Structure
. What is the classification approach that the library uses
. What anomalies, errors, outliers, illogical classification methods, etc. may be revealed.
(All databases have these as its impossible to precisely classify all things within a structured system)

3) Data Analytics Query Methods
. Explore statistical methods or algorithms to retrieve or process data

View previous student examples in the student forum: viewtopic.php?f=75&t=303#p2015 (2018)


GETTING THE DATA: Use the MySQL Workbench to write a query by which to retrieve the data from the SPL database. Here are the guidelines of how to run the query: ... LayOut.pdf

Use the spl_2016 database which gets updated daily. Become familiar with the metadata that describes each checkout: ... ataDef.pdf

Decide which metadata you need/want - these will become columns in the csv file. "Select *" from the inraw or outraw tables will get you 11 columns. They are:
ID: The numeric location in the database
itemNumber: The items' acquisition date
bibNumber: Defined by Library of Congress - multiple copies of the same title have the same bibnumber
cout: checkout date/timestamp
cin: checkin date/timestamp
collcode: Collection type, name and physical branch location
itemtype: What media (book, cd, dvd, etc.)
title: title
callNumber: Multiple copies of same item will share same call number but have different barcodes and itemNumber
DeweyClass: The item's dewey classification if it is recorded as a Dewey (non-fiction) item

In the example below, the query will return 4 vertical columns consisting of all the different bibnumbers (17 have been found); their itemtype (book, dvd, etc.); item's titles; and the total count for each bibnumber sequenced from most to least.

bibNumber, itemType, title, COUNT(bibNumber) AS Counts
title LIKE '%Blade Runner%'
GROUP BY bibNumber , itemType , title

Do a report on your research, and propose what to visualize based on time, or volume of activity. Possibly make a comparison between book(s) and movie(s) and soundtracks (cd)

Your query may function to process the existing data, or to calculate retrieved information. To create visualizations, you will need to have 4 columns of numeric values, for instance, time-based, or ordered data (itemNumber, dewey, volume, etc.) These will be used for horizontal, vertical, depth positions, and the 4th to define color density of the positioned item.

Once you have all the material - click on "POST REPLY" to this link and add your info to complete the assignment.


Case Study of a Topic: The Sci-fi Movie Blade Runner

Rodger assembled a case study here: ... Report.pdf
George Legrady

Posts: 14
Joined: Mon Apr 16, 2018 10:23 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by yichenli » Mon Jan 14, 2019 8:03 pm

I compared check-out activities of high school instruction related dewey classes (especially, 378 Higher Education, and 510-519 Mathematics) to the dismissal time of two local high schools during the week of September 5th to 9th, 2016.

(I hope no one else's assignment be misled by this submission)
(7.23 MiB) Downloaded 110 times

Posts: 4
Joined: Thu Jan 10, 2019 10:57 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by meilinshi » Tue Jan 15, 2019 2:03 pm

I’m interested in the checkouts within Dewey class 400: Language. I picked ten languages and would like to see if there is a relative popularity at Seattle Public Library from 2006 to 2018. I was also curious about if there is a pattern for the checkouts and for how long people usually keep these items.
(494.57 KiB) Downloaded 106 times

Posts: 3
Joined: Thu Jan 10, 2019 10:50 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by anaghauppal » Thu Jan 17, 2019 11:23 am

I was interested in popularity (measured by checkouts) of different media (books, CDs, DVDs, VHS, audio, magazines) over time. This led to some interesting results.
(20 MiB) Downloaded 102 times

Posts: 4
Joined: Thu Jan 10, 2019 10:58 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by suburtner » Thu Jan 17, 2019 11:45 am

Map Use and Musings at the Seattle Public Library - 2016

I'm interested in knowing how maps are catalogued at the Seattle Public Library (SPL) in 2016. Maps create an interesting challenge for library and database organizers, because they exist beyond just the medium of text, but as images or collections of vector-based features. My research questions try to explore how the library organizes maps, and the structure by which they are made accessible.
(178.2 KiB) Downloaded 95 times
Last edited by suburtner on Fri Jan 18, 2019 10:56 am, edited 1 time in total.

Posts: 2
Joined: Thu Jan 10, 2019 11:02 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by carlylarsson » Thu Jan 17, 2019 12:35 pm

My project was an exploration of how the genre of non-dewey books can be determined by words found in the subject field.
HW1 MAT259.pdf
(1.22 MiB) Downloaded 117 times

Posts: 4
Joined: Thu Jan 10, 2019 10:51 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by chantalnguyen » Thu Jan 17, 2019 1:24 pm

I'm interested in exploring the popularity of various world cuisines and how it varies over time, especially for certain cuisines (e.g. Vietnamese, Filipino) that have been considered "trendy" in recent years.
(16.3 MiB) Downloaded 99 times

Posts: 4
Joined: Thu Jan 10, 2019 11:03 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by aschoettler » Thu Jan 17, 2019 1:49 pm

I was interested in finding items with a sustained, consistent level of popularity or monthly checkouts. A trendy item might be a 'flash in the pan,' but I was interested in identifying items which, by contrast, have at least some popularity, but which have a low variance in their number of monthly checkouts. This turned out to be a difficult question to compute answers to, but I was able to gather the data I wanted on a sample of items, while making progress toward executing a larger query.
(299.7 KiB) Downloaded 93 times

Posts: 4
Joined: Wed Jan 16, 2019 10:17 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by jiaheng » Thu Jan 17, 2019 2:50 pm

The topic I'm interested in is about a famous novel series "Fifty Shades of Grey", as well as its film adaptation with the same name. From the table below, we can see that the book series is published from 2011 to 2012, while the movies are released from 2015 to 2018, around Valentine's day.
(1.35 MiB) Downloaded 92 times

Posts: 4
Joined: Thu Jan 10, 2019 11:00 am

Re: Proj 1 - MySQL Assignment | Knowledge Discovery

Post by yokoebata » Thu Jan 17, 2019 3:03 pm

In my eagerness to discover how SPL visitors behave in accordance to the major release dates of Crazy Rich Asians (book publication release, movie trailer release, & movie premiere date), I will be analyzing the checkout behaviors influenced by society’s habits of bandwagoning, based on the monthly total checkouts of the book, Crazy Rich Asians.
(317.51 KiB) Downloaded 45 times

Post Reply