Proj 1 - MySQL & Knowledge Discovery

Professor George Legrady
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

Proj 1 - MySQL & Knowledge Discovery

Post by glegrady » Thu Jan 06, 2022 5:16 pm

Proj 1 - MySQL Query & Knowledge Discovery

This is the 1st of 3 assignments to be realized in the MAT 259 Data Visualization course: https://www.mat.ucsb.edu/~g.legrady/aca ... 2w259.html
----------------------------------
POST A REPLY to this assignment description with the following:

a) Provide a 1 paragraph description of your SQL data search project in the student forum
b) Add a PDF that documents the work you did. The PDF should have your name in the title.

the PDF should include the following:
. A one-paragraph description of your query search
. Concept description
. MySQL Query
. The data and data results
. Discussion/Analysis of results
----------------------------------
WHAT SHOULD THE QUERY BE ABOUT?
. The query can be about a cultural subject of interest to you that might be represented in the library database
. The query can explore the structure of how the database is organized, how things are classified
. The query can be about anomalies, errors, outliers in either the data or how the data is organized, or how things are classified
----------------------------------
EVALUATION CRITERIA
. Is the query interesting?
. Include in your post all requested information listed above in "Post a reply"
----------------------------------
SCHEDULE
Tues - January 04, 2021 - Course Overview & Introduction to MySQL
Thur - January 06, 2021 - Review MySQL examples
Tues - January 11, 2021 - Data Processing steps
Thur - January 13, 2021 - Student presentation
----------------------------------
Data Exploration & Knowledge Discovery through a large multivariate dataset.

The first assignment is to explore a large database consisting of multivariate data with the intent to discover and extract patterns with MySQL that may reveal something of interest. SQL is the standardized language used to access the database: https://www.mysqltutorial.org/what-is-mysql/


We have access to a unique database consisting of checkouts of books, cds, dvd's documented by the hour since January 1, 2006 that represents the aggregated cultural interests of downtown Seattle, but also of the larger national interests. The database currently consists of over 101 million checkouts and returns.
----------------------------------
MYSQL ASSIGNMENT - THE CONCEPT: In any database, there lies hidden knowledge. What does a database contain, and what can MySQL queries reveal? Your first assignment is to find something of interest based on your own interests and skillsets. Here are some options:

1) Patterns, Probability & Prediction
. What is the performance of a topic(s), a title(s), media over time, or by volume, or by trends?
. What can be predicted based on previous performance?
. Can it be predicted how a sequence of data change over time?

2) The Database Organizational Structure
. What anomalies, errors, outliers, illogical classification methods, etc. may be revealed within the organizational and classification of how items are encoded
(All databases have outliers, anomalies, errors in the system as its impossible to precisely classify all things within a structured form)

3) Covid Situation
. 2020 has had an unprecedented impact on the database as the library was closed for over 5 months between March to September. Nonetheless items were circulating.
. Electronic books have been in the collection since 2009 but these are not recorded in the database we receive. Nonetheless they can be reviewed through other means:

4) Data Analytics Query Methods
. Explore statistical methods or algorithms to retrieve or process data
. Are there any machine-learning opportunities in analyzing the data?
----------------------------------
The database consists of multivariate data. For each checkout there exists the following metadata:

Ordinal (In a numeric sequence)
ID: Assigned by the database to keep track of each entry
ItemNumber: Assigned by the library when an object enters the system
Dewey Classification (Dewey numeric) The item's dewey classification if it is recorded as a Dewey (non-fiction) item

Interval Scale (Time-Stamp)
Check-out/check-in in minutes, hour, day, month, year

Categorical (Not necessarily numerically orderable)
BibNumber: Each title has a specific number, copies of titles all have same number. Defined by the LIbrary of Congress
Barcode: Each item has a unique number on RFID sticker
CallNumber: by which to locate items on shelves - Ordinal if Dewey, otherwise categorical. Multiple copies of same item may share same call number but have different barcodes and itemNumber
CollCode: What the item is and where its located: https://data.seattle.gov/Community/Libr ... /6vkj-f5xf

Semantic (Text-based)
Title: Each Item has a title
ItemType: books, cds, dvds, music sheets, etc.
Subjects: Keywords (arbitrary labeling). These are located in a separate database:
----------------------------------
The library uses the Dewey Decimal system by which to organize non-fiction item but the majority of the items in the library collection do not have Dewey classification labels. For instance, music, movies, seem to be distributed in both Dewey and non-Dewey ways of classification. The most popular Dewey tend to be comic books, cookbooks, health, travel books, etc. A daily insight to the Dewey performance can be tracked at: http://128.111.26.109/parsing/index.php
---------------------------------
Label your Documents
Please make sure to label your documents like csv files by the name of your project, or your name so we can identify where they come from
George Legrady
legrady@mat.ucsb.edu

bdprice
Posts: 1
Joined: Fri Jan 07, 2022 12:19 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by bdprice » Tue Jan 11, 2022 2:57 pm

Pro Sports, Lockouts, and Impact on Fan Interest
Brad Price
Major League Baseball (MLB) is currently in a lockout. Because the players’ union and MLB could not ratify a new collective bargaining agreement, MLB has closed its doors to players. This means that for the time being, MLB players are not employees of MLB. Players are not allowed to use team facilities in any capacity (hence, “lockout”) but MLB is also not allowed to market the players to fans. This means that there is little-to-no MLB media of any form right now. I suspect that this drives fan interest in MLB down and hurts revenue in the short term (maybe for approximately one calendar year?).
I can test this hypothesis by analyzing fan interest (in this case, book and DVD checkouts from the Seattle Public Library) over time while paying specific attention to changes during recent lockouts in the other major pro sports (NBA, NFL, NHL).

Data on lockout dates came from https://abc17news.com/news/2021/12/04/p ... t-facts-2/
Checkout data from Seattle Public Library database (2016)
SQL code assisted by https://www.mat.ucsb.edu/~g.legrady/aca ... imensions'
Attachments
SQLquery.txt
(624 Bytes) Downloaded 56 times
pythonfile.txt
(3.2 KiB) Downloaded 63 times
sports data.csv
(4.13 KiB) Downloaded 48 times
proj1BradPrice.pdf
(413.69 KiB) Downloaded 70 times
Last edited by bdprice on Thu Jan 13, 2022 1:16 pm, edited 2 times in total.

jiaxinwu
Posts: 3
Joined: Fri Jan 07, 2022 12:21 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by jiaxinwu » Tue Jan 11, 2022 11:59 pm

There are various fields in Computer Science to study. As a student majoring in Computer Science, I want to know whether the popularity of different fields has an influence on library checkouts. So I write SQL about the checkouts related to fields within the Dewey Class 100. Also, as a programmer, I am interested in analyzing the trend of different programming languages from this data. After that, I compared my results with the statistics collected by the authority to make a deeper explanation.
Attachments
Proj1-PPT.pdf
(373.09 KiB) Downloaded 61 times
Proj1.pdf
(323.81 KiB) Downloaded 63 times

zijianwan
Posts: 3
Joined: Fri Jan 07, 2022 12:32 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by zijianwan » Wed Jan 12, 2022 11:58 am

In Dewey decimal classification, there are 10 topics (see Figure 1), which are subdivided into 100 subclasses for more detailed classification. It would be interesting to see the overall trend of the number of checkouts of each topic in the past 10 years (from 2012 to 2021). More interestingly, we might be able to identify anomalies, if there are any. The data was obtained from the database with MySQL and processed with Python for visualization and further analysis.
Attachments
ZWan_MAT259A_Project1.pdf
(474.36 KiB) Downloaded 75 times

yifei_liu
Posts: 3
Joined: Fri Jan 07, 2022 12:31 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by yifei_liu » Wed Jan 12, 2022 8:37 pm

Popularity of Harry Potter Series and Related Books
Yifei Liu
Since the card RPG mobile game “Harry Potter: Magic Awaken” released in September 2021, and “Harry Potter 20th Anniversary: Return to Hogwarts” reunited the main cast members across all eight Harry Potter films to share their memorable moment in December 2021, which aroused my emotional memory of childhood. Taking this opportunity, I would like to explore whether Harry Potter (HP) Series is still popular. Whether the release of films may have a positive impact on the book popularity. As the eight films of HP series have been released for a long time and HP is known for its related products, whether the related books and films, like the Fantasy Beast series, have positive impacts on the popularity of the book series. In order to investigate the popular trend, I collected 12 books in total, including 7 HP Series books and the 5 most popular HP official related books that can be found in Seattle Public Library, compared the time that each book reaches the peak check-out number with the newly published books and released films in the past 16 years, and compared the monthly check-out number with other popular book series.
Attachments
Yifei Liu_Project1_code.txt
(1.14 KiB) Downloaded 51 times
Yifei Liu_Project 1_Presentation.pdf
(833.93 KiB) Downloaded 62 times
Yifei Liu_Project 1.pdf
(274.02 KiB) Downloaded 57 times
Last edited by yifei_liu on Thu Jan 13, 2022 2:10 pm, edited 1 time in total.

ziyu_zhang309
Posts: 3
Joined: Fri Jan 07, 2022 12:34 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by ziyu_zhang309 » Wed Jan 12, 2022 9:33 pm

I have always been interested in Culture Exposure from various counties, especially my country, China. Therefore, through the analysis of the data in Seattle Public Library, I want to find out the Chinese cultural impact on the people in the United States from 2006 till today. With the continuous development of culture and technology in China, the main content and symbols of Chinese culture exposure have also kept pace with times and made corresponding changes. Setting books as the object of analysis, I first collected statistics on the types of books people borrowed the most. Taking 2006-2010\2011-2015\2016-2020 as the time boundary, I planned to analyze the trend of Chinese cultural exposure through changes in the number of times people borrowed related books in three time periods.
Attachments
ZiyuZhang_code.pdf
(88.92 KiB) Downloaded 58 times
ZiyuZhang_Presentation.pdf
(1.09 MiB) Downloaded 66 times
ZiyuZhang_Project1.pdf
(861.09 KiB) Downloaded 57 times

sdinulescu
Posts: 3
Joined: Fri Jan 07, 2022 12:30 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by sdinulescu » Wed Jan 12, 2022 10:04 pm

The COVID-19 Pandemic introduced various health and lifestyle challenges, consequences, and disturbances, affecting the daily lives and routines of all people in some capacity. I am interested in investigating the patterns of library material check-ins (i.e. requiring that one checks out and brings back the item) from the SPL database in the two years before the COVID-19 pandemic (01/01/2018 - 12/31/2019) compared to the two years after the start of the COVID-19 pandemic (01/01/2020 - 12/31/2021). Furthermore, I am interested in how these trends vary by Dewey Classification, which is a system that organizes and classifies subject matter in the SPL database. The research questions and hypotheses detailed below were chosen based on my personal thoughts when examining the Dewey Classification system in detail from the following citation – https://www.oclc.org/content/dam/oclc/d ... maries.pdf. Discrete and continuous pictures of the queried data were investigated to provide a greater understanding of check-in trends in the SPL over the past four years.
Attachments
MAT259A_Assignment1_StejaraDinulescu (3).pdf
Quick edit to plot legend (it was backwards in previous file).
(231.64 KiB) Downloaded 55 times
Last edited by sdinulescu on Thu Jan 13, 2022 2:23 pm, edited 1 time in total.

zilongliu
Posts: 3
Joined: Fri Jan 07, 2022 12:32 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by zilongliu » Wed Jan 12, 2022 11:26 pm

The Geography of Geography in Seattle Public Library

Seattle Public library has a collection of books, cds, etc., centered around the topic of geography (and history) labeled with Dewey Decimal numbers ranging from 900 to 999, and there are rich geospatial semantics that remain to be discovered from their titles, item types and subjects. The aim of this project, the Geography of Geography in Seattle Public Library, is to answer several interesting geographic questions (GQs) such as co-occurrence of location mentions, relatedness between a place and an ideology and regional differences in place attractiveness. The analysis of results focuses on temporal and spatial dimensions of extracted semantics.

Continent data is provided on ArcGIS Hub: https://hub.arcgis.com/datasets/esri::w ... ts/explore.
Attachments
MAT259_ZilongLiu_Project1.pdf
(644.74 KiB) Downloaded 59 times

siming
Posts: 3
Joined: Fri Jan 07, 2022 12:29 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by siming » Thu Jan 13, 2022 1:13 am

Siming, Su
I am interested to see the usage time of each Dewey Class in the library. By analyzing the estimated time people spend on each Dewey class, we could get a sense of which categories are easier to understand, which takes a long time to read, and so on. Based on this information, we could set up different deadlines for returning each Dewey Class. For example, Arts and Recreation usually take less time to borrow than others, the library can set up a shorter deadline of returning this category so that others may have more chance to borrow it. To dive deeper, I would like to make some visualizations to visualize how estimated usage time (day) is distributed among each Dewey class.
Presentation_proejct.pdf
(326.54 KiB) Downloaded 40 times

lijuan
Posts: 3
Joined: Fri Jan 07, 2022 12:25 pm

Re: Proj 1 - MySQL & Knowledge Discovery

Post by lijuan » Thu Jan 13, 2022 12:40 pm

Last year, I watched two film adaptations of the famous novels , “Little Women” and “Dune”, which have the same leading actor “Timothée Chalamet”. So I am very interested in the correlation between the movie releasing activity and the check out behavior of the original novel. Hence, I designed the first query to search the check out numbers of these two novels from 2006 to 2021 to observe the variation trend. Furthermore, I also noticed that the population diversity of the Seattle metropolitan area keeps growing in the last decade. Especially more and more Asian people move to Seattle due to the increasing job opportunities in tech companies. Therefore, the second query in my project is to investigate the reading interest of history knowledge about different continents, which can help us to understand the correlation between reading behavior and population composition.
Attachments
MAT259_Project1_LijuanCheng.pdf
(92.12 KiB) Downloaded 41 times

Post Reply