Proj 1 - Data Exploration Concept & MySQL Basics

Post Reply
glegrady
Posts: 203
Joined: Wed Sep 22, 2010 12:26 pm

Proj 1 - Data Exploration Concept & MySQL Basics

Post by glegrady » Wed Jan 10, 2024 10:11 pm

Proj 1 - Data Exploration Concept & MySQL Basics

This is the 1st of 3 assignments to be realized in the MAT 259 Data Visualization course:
https://www.mat.ucsb.edu/~g.legrady/aca ... 4w259.html
----------------------------------
PROJECT ASSIGNMENT: The assignment is to 1) acquire basic skills in the MySQL relational database SQL language, and to 2) Familiarize with the Seattle Library multivariate database to ask an interesting question by which to retrieve data and analyse it.

The Seattle (SPL) database we are using is called "spl2016". The most common tables are outraw (checkouts), inraw(checkins), subject (keywords attached to each entry). Descriptions of all the metadata are here: https://www.mat.ucsb.edu/~g.legrady/aca ... ataDef.pdf

MYSQL QUERIES EXAMPLES:
https://www.mat.ucsb.edu/~g.legrady/aca ... wMysql.pdf
https://www.mat.ucsb.edu/~g.legrady/aca ... tudent.pdf
https://www.mat.ucsb.edu/~g.legrady/aca ... MySQL2.pdf
https://www.mat.ucsb.edu/~g.legrady/aca ... demos_.pdf

MYSQL BASICS. Review the following functions by Tuesday, January 16, 2024: Count(), *, distinct, as, from, group by, order by, %, having, like, soundex, where, timestampdiff(), limit, desc, not in, between, >= <=, inner join, year(), date(), time, hour(), sum(), avg(), min(), max(), case, when, then, else, not, and, or, as, substring(), rand(), floor(), variance().

Tutorials: https://www.mysqltutorial.org/mysql-com ... functions/ and https://www.w3schools.com/sql/default.asp
----------------------------------
SCHEDULE:
January 11: Introduction to MySQL
January 16: Example reviews in class
January 18: 1st hour lab work, 2nd hour, short casual presentation to class
January 23: Formal presentation to class of your assignment documentation
----------------------------------
The assignment can be posted in full detail here in the Student Forum or else attached as a PDF document. In the case of the PDF document please provide:
a) A one paragraph description of your SQL data search project in the student forum which can be repeated in the attached PDF file
b) Add a PDF that documents the work you did. The PDF should have your name in the title
c) Also post in the Student Forum, the results of your query as a CSV file(s).
----------------------------------
The PDF should include the following:
. A one-paragraph description of your query search
. Concept description
. MySQL Query
. The data and data results
. Discussion/Analysis of results
----------------------------------
WHAT SHOULD THE QUERY BE ABOUT?
. The query can be about a topic of interest to you that might be represented in the library database
. The query can explore the structure of how the database is organized, how things are classified
. The query can reveal anomalies, errors, outliers in either the data or how the data is organized, or how things are classified
. Consider that this is a warm-up exercise to produce data to be used to create a 3D visualization.

For reference, we are currently only doing MySQL queires so only need a minimum of 2 csv columns but the 3D visualization assignement will will follow will require 5 columns of values:
1) A value for horizontal position
2) A value for vertical position
3) A value for depth position
4) A value for scaling or color for each cell in the visualization
5) Possibly a string value for labeling each indivisual cell
----------------------------------
SOME PREVIOUS STUDENT EXAMPLES
. PREDICTION: https://vislab.mat.ucsb.edu/2020/p1/Gua ... rName.html
. ITEMNUMBER: https://vislab.mat.ucsb.edu/2019/p1/Jia ... index.html
. CORRELATION: https://vislab.mat.ucsb.edu/2022/p1/Lij ... Cheng.html
. INTEREST IN A TOPIC: https://vislab.mat.ucsb.edu/2021/p1/Lar ... hetty.html

ADVANCED EXAMPLES
Good to review to become familiar with how to describe the approach of how your project has evolved: https://www.mat.ucsb.edu/~g.legrady/aca ... projs.html

I presented in class:
"Hunger Games" (Brianna): https://www.mat.ucsb.edu/~g.legrady/aca ... rianna.pdf
"Legacy Media" (Shaokang): https://www.mat.ucsb.edu/~g.legrady/aca ... aokang.pdf

ALL PREVIOUS STUDENT PROJECTS: https://vislab.mat.ucsb.edu/courses.html
----------------------------------
EVALUATION CRITERIA
. The topic should be relevant or interesting. Describe why
. The query should result in extensive data saved as a csv file
. Consider the whole database, going beyond the Dewey numeric classification
George Legrady
legrady@mat.ucsb.edu

paulkim
Posts: 6
Joined: Thu Jan 11, 2024 2:20 pm

Re: Proj 1 - Data Exploration Concept & MySQL Basics

Post by paulkim » Sun Jan 21, 2024 5:50 pm

Library as third space

Concept description: Much has been written about the increasing wealth gap in Seattle and the increasing number of homeless people in Seattle. Seattle is just one city of many in the US where the wealth gap is widening and the number of homeless people is increasing; this is a nation-wide trend, and Seattle is an interesting example where a relatively recently developed dominant industry could be the explanation for the wealth gap, thus perhaps putting the wealth gap and homelessness as correlated factors. Where this project comes in is the intersection of these economic trends along with the national defunding of libraries. I am extremely defensive of the library as a critical “third space,” neither the home nor the workplace, as third spaces increasingly are disappearing or becoming privately-funded ventures.
This project, interestingly enough, began as an inquiry into whether a data type of ‘renewal’ could be reasonably extracted from the data in the SPL database. This turns out to be fairly true:

SELECT
title,
itemtype,
bibNumber,
COUNT(*) as renewals
FROM
spl_2016.inraw
WHERE
TIMESTAMPDIFF(MINUTE, cout, cin)<=1
GROUP BY
title,
itemtype,
bibNumber
ORDER BY
renewals desc
LIMIT 1000

The result of this query showed interesting results, however. The most “renewed” items, going off of my definition of the same item being checked in and out within a minute, were by far non-book items such as headphones and laptops. As such, what came to mind immediately was how this shows indeed the “third space” characteristic of the public library. It is a place that is neither home nor work. But it is also a place that might substitute for home or work. This is the abstracting work that data can sometimes do. It removes the space, or instrumentalizes information. Thus, I wanted to use data to re-inscribe the space. I am curious to see how these sorts of “non-book”, “non-CD,” etc. items have changed over time as the wealth gap and homelessness has increased in Seattle, perhaps indicating the increasing importance of the public library as a “third space” or otherwise public institution. Below is the query that I ultimately landed on:

SELECT
CASE
WHEN itemtype IN ('alaptop' , 'areq', 'aceq') THEN 'Laptops, Headphones, or Hot Spots'
WHEN
itemtype LIKE 'a%'
AND itemtype NOT IN ('alaptop' , 'areq', 'aceq')
THEN
'Adult checkouts'
WHEN itemtype LIKE 'j%' THEN 'Children checkouts'
ELSE 'Other'
END AS item_types,
CASE
WHEN HOUR(cout) < 12 THEN 'Morning'
WHEN HOUR(cout) >= 12 AND HOUR(cout) < 16 THEN 'Afternoon'
WHEN HOUR(cout) >= 16 THEN 'Evening'
END AS checkout_time,
YEAR(cout) AS year,
COUNT(*) AS number_of_checkouts,
ROUND(AVG(TIMESTAMPDIFF(HOUR, cout, cin)), 1) AS average_checkout_time_hours
FROM
spl_2016.inraw
WHERE
DAY(cout) = DAY(cin)
AND TIMESTAMPDIFF(DAY, cout, cin) < 1
GROUP BY year , item_types , checkout_time
ORDER BY year ASC , CASE
WHEN item_types = 'Laptops, Headphones, or Hot Spots' THEN 1
WHEN item_types = 'Adult Checkouts' THEN 2
WHEN item_types = 'Children Checkouts' THEN 3
WHEN item_types = 'Other' THEN 4
END ASC , CASE
WHEN checkout_time = 'Morning' THEN 1
WHEN checkout_time = 'Afternoon' THEN 2
WHEN checkout_time = 'Evening' THEN 3
END ASC

Here in this query one can see that I have filtered the items I am looking at for those items that were checked out and checked back in on the same day. I then separated these items by check out time, either in the morning, the afternoon, or the evening. I then separated different types of items that were checked out into four different categories: Laptops, Headphones, or Hot Spots; Adult Checkouts; Children Checkouts; and Other. I also looked at how long these different kinds of items were checked out, on average.

The results of this query were interesting. I originally was only interested in the Laptops, Headphones, or Hot Spots item types, because they show a different kind of function of the library, but I decided to include other item types as well because I didn’t want to just limit the inquiry to this relatively narrow category. Generally speaking, however, the checkout times of these items were much longer than the items outside of this category (which included books, CDs, DVDs, and other items). I think this shows that it was well worth separating out these items from the others; the significant difference in checkout times shows exactly how these items are used differently than others. In any case, I think this query gives a different kind of picture of the library. Instead of looking at what was checked in and out of the library, kind of not considering the library as an actual space, this query gives a glimpse of how, and when, people are using the library in the space itself.
Attachments
Same Day Checkouts.csv
(8.73 KiB) Downloaded 13 times
MAT 231 Project 1.pdf
(390.51 KiB) Downloaded 14 times

jingpeng
Posts: 3
Joined: Thu Jan 11, 2024 2:22 pm

Mental Health from Checkouts over time

Post by jingpeng » Mon Jan 22, 2024 4:53 pm

Concept:
According to an article from the University of Michigan, “College students’ anxiety, depression higher than ever, but so are efforts to receive care”: “The annual Healthy Minds Study report is based on web surveys taken by 96,000 US students across 133 campuses in the 2021-22 academic year. It found that 44% of students reported symptoms of depression, 37% reported anxiety disorders, and 15% reported having seriously considered suicide in the past year—the highest recorded rates in the history of the 15-year-old survey.”

With the checkout data from the Seattle Public Library, I’m trying to figure out the trend in books and other publications about anxiety in recent years, which can be seen as a projection of the trend in mental health changes in modern society, to some extent.

One of the perspectives is to collect data about how many publications have been released in recent years whose titles are more related to "anxiety," "stress," "panic," "depression," and “mental health.”. However, I could not get publications trending over these years and months with the current database.
Another one is to find out how many books and other things were checked out to show the needs of people who realize anxiety has already become a big problem in their lives.
Then I realized that the rapid development of media would have such a big effect that people seldom go to the library these days.
So I get all the checkout numbers, regardless of their theme and other things. Then calculate the percentage of data related to mental health in these checkout records, which should show the rule hidden behind these data.

Code: Select all

SELECT 
    YEAR(cout) AS years,
    MONTH(cout) AS months,
    COUNT(cout) AS counts
FROM
    spl_2016.outraw
WHERE
    (LOWER(title) LIKE '%anxiety%')
        OR (LOWER(title) LIKE '%stress%')
        OR (LOWER(title) LIKE '%panic%')
        OR (LOWER(title) LIKE '%depression%')
        OR (LOWER(title) LIKE '%mental health%')
GROUP BY years , months
ORDER BY years , months ASC;

SELECT 
    YEAR(cout) AS years,
    MONTH(cout) AS months,
    COUNT(cout) AS counts
FROM
    spl_2016.outraw
GROUP BY years , months
ORDER BY years , months ASC;

SELECT 
    YEAR(cout) AS years,
    MONTH(cout) AS months,
    SUM(CASE
        WHEN LOWER(title) LIKE '%anxiety%' THEN 1
        ELSE 0
    END) AS Anxiety,
    SUM(CASE
        WHEN LOWER(title) LIKE '%stress% THEN 1
        ELSE 0
    END) AS Stress,
    SUM(CASE
        WHEN LOWER(title) LIKE '%panic%' THEN 1
        ELSE 0
    END) AS Panic,
    SUM(CASE
        WHEN LOWER(title) LIKE '%depression%' THEN 1
        ELSE 0
    END) AS Depression,
    SUM(CASE
        WHEN LOWER(title) LIKE '%mental health%' THEN 1
        ELSE 0
    END) AS Mental_Health
FROM
    spl_2016.outraw
GROUP BY years , months
ORDER BY years , months ASC;
Process
Using MySQL, I have gathered checkout records from 2006 to 2023 and specifically identified those related to "mental health." I organized the data by month and calculated the percentage of checkouts that fall under the "mental health" category. This provides insights into the proportion of "mental health" checkouts concerning the overall checkout activity over these years.

Final Result
I draw several diagrams to visualize the relationship between the data.
The chart illustrating the relationship between the number of checkout publications related to anxiety and the total number of publications over time shows that before 2018, the trend was basically the same, and as the total number of publications rose, anxiety-related publications also rose. After 2018, with the downward trend of the total number, surprisingly, the number of anxiety-related publications still rose.

The chart depicting the percentage of checkout publications related to mental health out of the total number of checkout publications over time shows that despite several unusually high points, after 2018 or so, there is a higher percentage of checkout books related to mental health.

I selected the data before 2015 to get this chart. There is a lowest point in 2007-02 and a highest point in 2009.03.
The global financial crisis, often referred to as the Great Recession, started in the latter part of 2007 and continued into 2008, which could be one of the reasons why this percentage increased gradually.

Then I selected the data after 2015 to get this chart.
There is no date for the checkout record related to mental health topics in 2018-01 and 2018-02, which gives us two breakpoints. Also, due to COVID, the Seattle Public Library has been closed since March 13, 2020. Things went back to normal after July 2020.
And it still demonstrates that the percentage of anxiety-related publications rose over time, especially from 2022–11 to 2023–07.

I also draw a graph to show how each of those words performs against each other. It shows that before 2018, the quantity of stress and depression was relatively high. However, in 2015 and 2020, there was a noticeable decline. On the other hand, for other categories, the quantity has been gradually increasing since 2018.
Attachments
seperate category.csv
(6.44 KiB) Downloaded 10 times
Mental Health from Checkouts Over TIme.pdf
(1.99 MiB) Downloaded 15 times
mental health checkout.csv
(2.71 KiB) Downloaded 11 times
total checkout.csv
(2.99 KiB) Downloaded 15 times
mental health percentage.csv
(4.2 KiB) Downloaded 12 times
Last edited by jingpeng on Sun Feb 04, 2024 9:43 pm, edited 3 times in total.

yiranxiao
Posts: 4
Joined: Thu Jan 11, 2024 2:23 pm

Music Festival Echoes in SPL

Post by yiranxiao » Mon Jan 22, 2024 6:42 pm

Concept:

My concept is to analyze the impact of specific music festivals in Seattle (such as Capitol Hill Block Party, Bumbershoot, Belltown Bloom Festival, Beyond Wonderland Seattle and Northwest Folklife), on the popularity of various music genres in the Seattle Public Library's collection. While acknowledging the shift in music consumption to digital platforms, this project aims to examine how checkouts of CDs, DVDs, and books across different music genres have fluctuated over the years from 2015 to 2023.

The project will query data trends in the number of checkouts for genres like pop, rock, jazz, electronica, country, folk, soul, blues, hip-hop and funk. The focus on a range of years will help in understanding how physical media usage has changed in the context of evolving digital consumption habits. While I plan to visualize these trends on a monthly basis for a broader overview, a key focus will be on analyzing data around the specific dates of different music festivals. This approach aims to pinpoint any impacts these events may have on borrowing patterns. By observing checkout fluctuations during and around music festivals, I intend to analyze whether there is a direct correlation between these cultural events and shifts in genre popularity at SPL.

The data will reveal broader trends in music consumption and genre popularity over the years, offering a historical perspective on how music interests have evolved in the era of digital transformation.

Query:

This query provides a detailed view of the checkouts of different music genres for each day from 2015 to 2023, as a source of data for viewing specific festival dates.

Code: Select all

SELECT 
    YEAR(cout) AS year,
    MONTH(cout) AS month,
    DAY(cout) AS day,
    COUNT(IF(spl_2016.subject.subject LIKE '%jazz%',
        1,
        NULL)) AS 'jazz',
    COUNT(IF(spl_2016.subject.subject LIKE '%rock%',
        1,
        NULL)) AS 'rock',
    COUNT(IF(spl_2016.subject.subject LIKE '%electronic%'
            OR spl_2016.subject.subject LIKE '%electronica%',
        1,
        NULL)) AS 'electronic',
    COUNT(IF(spl_2016.subject.subject LIKE '%pop%',
        1,
        NULL)) AS 'pop',
    COUNT(IF(spl_2016.subject.subject LIKE '%country%',
        1,
        NULL)) AS 'country',
    COUNT(IF(spl_2016.subject.subject LIKE '%folk%',
        1,
        NULL)) AS 'folk',
    COUNT(IF(spl_2016.subject.subject LIKE '%soul%',
        1,
        NULL)) AS 'soul',
    COUNT(IF(spl_2016.subject.subject LIKE '%blues%',
        1,
        NULL)) AS 'blues',
    COUNT(IF(spl_2016.subject.subject LIKE '%funk%',
        1,
        NULL)) AS 'funk',
    COUNT(IF(spl_2016.subject.subject LIKE '%rap music%'
            OR spl_2016.subject.subject LIKE '%hip hop%',
        1,
        NULL)) AS 'hip hop'
FROM
    spl_2016.subject
        JOIN
    spl_2016.outraw ON spl_2016.outraw.bibNumber = spl_2016.subject.bibNumber
WHERE
    (itemtype LIKE '%cd%'
        OR itemtype LIKE '%dvd%'
        OR itemtype LIKE '%bk%')
        AND deweyClass LIKE '78%'
        AND YEAR(cout) BETWEEN 2015 AND 2023
GROUP BY year , month , day
ORDER BY year , month , day;
This query groups the checkout data by year and month, to observe how the popularity of different music genres has changed month by month over the years from 2015 to 2023. It will help to identify seasonal variations in genre popularity.

Code: Select all

SELECT 
    YEAR(cout) AS year,
    MONTH(cout) AS month,
    COUNT(IF(spl_2016.subject.subject LIKE '%jazz%',
        1,
        NULL)) AS 'jazz',
    COUNT(IF(spl_2016.subject.subject LIKE '%rock%',
        1,
        NULL)) AS 'rock',
    COUNT(IF(spl_2016.subject.subject LIKE '%electronic%'
            OR spl_2016.subject.subject LIKE '%electronica%',
        1,
        NULL)) AS 'electronic',
    COUNT(IF(spl_2016.subject.subject LIKE '%pop%',
        1,
        NULL)) AS 'pop',
    COUNT(IF(spl_2016.subject.subject LIKE '%country%',
        1,
        NULL)) AS 'country',
    COUNT(IF(spl_2016.subject.subject LIKE '%folk%',
        1,
        NULL)) AS 'folk',
    COUNT(IF(spl_2016.subject.subject LIKE '%soul%',
        1,
        NULL)) AS 'soul',
    COUNT(IF(spl_2016.subject.subject LIKE '%blues%',
        1,
        NULL)) AS 'blues',
    COUNT(IF(spl_2016.subject.subject LIKE '%funk%',
        1,
        NULL)) AS 'funk',
    COUNT(IF(spl_2016.subject.subject LIKE '%rap music%'
            OR spl_2016.subject.subject LIKE '%hip hop%',
        1,
        NULL)) AS 'hip hop'
FROM
    spl_2016.subject
        JOIN
    spl_2016.outraw ON spl_2016.outraw.bibNumber = spl_2016.subject.bibNumber
WHERE
    (itemtype LIKE '%cd%'
        OR itemtype LIKE '%dvd%'
        OR itemtype LIKE '%bk%')
        AND deweyClass LIKE '78%'
        AND YEAR(cout) BETWEEN 2015 AND 2023
GROUP BY year , month
ORDER BY year , month;
This query provides a year-by-year comparison of checkouts across different music genres from 2015 to 2023. The results will show how the popularity of each genre has changed annually, helping to identify long-term trends in music genre popularity.

Code: Select all

SELECT 
    YEAR(cout) AS year,
    COUNT(IF(spl_2016.subject.subject LIKE '%jazz%',
        1,
        NULL)) AS 'jazz',
    COUNT(IF(spl_2016.subject.subject LIKE '%rock%',
        1,
        NULL)) AS 'rock',
    COUNT(IF(spl_2016.subject.subject LIKE '%electronic%'
            OR spl_2016.subject.subject LIKE '%electronica%',
        1,
        NULL)) AS 'electronic',
    COUNT(IF(spl_2016.subject.subject LIKE '%pop%',
        1,
        NULL)) AS 'pop',
    COUNT(IF(spl_2016.subject.subject LIKE '%country%',
        1,
        NULL)) AS 'country',
    COUNT(IF(spl_2016.subject.subject LIKE '%folk%',
        1,
        NULL)) AS 'folk',
    COUNT(IF(spl_2016.subject.subject LIKE '%soul%',
        1,
        NULL)) AS 'soul',
    COUNT(IF(spl_2016.subject.subject LIKE '%blues%',
        1,
        NULL)) AS 'blues',
    COUNT(IF(spl_2016.subject.subject LIKE '%funk%',
        1,
        NULL)) AS 'funk',
    COUNT(IF(spl_2016.subject.subject LIKE '%rap music%'
            OR spl_2016.subject.subject LIKE '%hip hop%',
        1,
        NULL)) AS 'hip hop'
FROM
    spl_2016.subject
        JOIN
    spl_2016.outraw ON spl_2016.outraw.bibNumber = spl_2016.subject.bibNumber
WHERE
    (itemtype LIKE '%cd%'
        OR itemtype LIKE '%dvd%'
        OR itemtype LIKE '%bk%')
        AND deweyClass LIKE '78%'
        AND YEAR(cout) BETWEEN 2015 AND 2023
GROUP BY year
ORDER BY year;
I’m also curious about the most frequently checked-out album each month, along with its genre. To explore this, I focused on the data for July 2021 as a case study. My query retrieves check-out data for CDs across a range of music genres. This can help to identify the most popular album for that month.

Code: Select all

SELECT 
    CASE
        WHEN spl_2016.subject.subject LIKE '%jazz%' THEN 'jazz'
        WHEN spl_2016.subject.subject LIKE '%rock%' THEN 'rock'
        WHEN
            spl_2016.subject.subject LIKE '%electronic%'
                OR spl_2016.subject.subject LIKE '%electronica%'
        THEN
            'electronic'
        WHEN spl_2016.subject.subject LIKE '%pop%' THEN 'pop'
        WHEN spl_2016.subject.subject LIKE '%country%' THEN 'country'
        WHEN spl_2016.subject.subject LIKE '%folk%' THEN 'folk'
        WHEN spl_2016.subject.subject LIKE '%soul%' THEN 'soul'
        WHEN spl_2016.subject.subject LIKE '%blues%' THEN 'blues'
        WHEN spl_2016.subject.subject LIKE '%funk%' THEN 'funk'
        WHEN
            spl_2016.subject.subject LIKE '%hip hop%'
                OR spl_2016.subject.subject LIKE '%rap music%'
        THEN
            'hip hop'
    END AS genre,
    spl_2016.outraw.title AS Title
FROM
    spl_2016.subject
        JOIN
    spl_2016.outraw ON spl_2016.outraw.bibNumber = spl_2016.subject.bibNumber
WHERE
    (spl_2016.outraw.itemtype LIKE '%cd%')
        AND YEAR(spl_2016.outraw.cout) = 2021
        AND MONTH(spl_2016.outraw.cout) = 7
        AND (spl_2016.subject.subject LIKE '%jazz%'
        OR spl_2016.subject.subject LIKE '%rock%'
        OR spl_2016.subject.subject LIKE '%electronic%'
        OR spl_2016.subject.subject LIKE '%electronica%'
        OR spl_2016.subject.subject LIKE '%pop%'
        OR spl_2016.subject.subject LIKE '%country%'
        OR spl_2016.subject.subject LIKE '%folk%'
        OR spl_2016.subject.subject LIKE '%soul%'
        OR spl_2016.subject.subject LIKE '%blues%'
        OR spl_2016.subject.subject LIKE '%funk%'
        OR spl_2016.subject.subject LIKE '%hip hop%'
        OR spl_2016.subject.subject LIKE '%rap music%')
        AND spl_2016.outraw.deweyClass LIKE '78%'
ORDER BY genre , Title;
Conclusion:

Rock and pop music consistently lead in yearly music checkouts in SPL, with a peak in 2015 and a decline thereafter, reflecting that music consumption is shifting to digital formats. Monthly data shows a surge in these genres around September, aligning with Seattle's Bumbershoot festival. Additionally, daily checkouts peak before and after major festivals like Capitol Hill Block Party and Northwest Folklife, particularly in their respective genres. This pattern suggests that music festivals influence public interest in specific music genres, with increased borrowing of related CDs, DVDs, and books around these events.
Attachments
CD_Titles_case_of_2021_7.csv
(742.53 KiB) Downloaded 16 times
CheckOutNum_Date.csv
(116.87 KiB) Downloaded 13 times
CheckOutNum_Month.csv
(5.55 KiB) Downloaded 10 times
CheckOutNum_Year.csv
(635 Bytes) Downloaded 11 times
MAT259_Project1_Shaw_Xiao.pdf
(2.7 MiB) Downloaded 13 times

nefeli
Posts: 4
Joined: Thu Jan 11, 2024 2:21 pm

Re: Proj 1 - Data Exploration Concept & MySQL Basics

Post by nefeli » Mon Jan 22, 2024 11:03 pm

Description:
This study explores the intersection of Textile Art and Artificial Intelligence within the Seattle Public Library (SPL).
Concept Description:
After the sudden expansion of AI in the creative field, some artist expressed their concern about traditional art forms. This exploration aims to understand how traditional art forms like textile art are related or influenced by modern technological fields like AI, as reflected in the Seattle library checkout patterns.

MySQL Query:

This query focuses on consolidating the checkout data for the topic of "Artificial Intelligence", then categorizing and combining the Subject information, and later quantifying and organizing by the checkout value.

Code: Select all

SELECT 
    o.title,
    o.itemtype,
    o.deweyClass,
    GROUP_CONCAT(DISTINCT s.subject
        SEPARATOR '; ') AS subjects,
    COUNT(*) AS checkout_count
FROM
    spl_2016.outraw o
        INNER JOIN
    spl_2016.subject s ON o.bibNumber = s.bibNumber
WHERE
    (LOWER(o.title) LIKE '%artificial intelligence%'
        OR LOWER(o.title) LIKE '%creative ai%')
        AND o.cout >= '2021-01-01'
        AND o.cout < '2024-01-01'
GROUP BY o.title , o.itemtype , o.deweyClass
ORDER BY checkout_count DESC;

SELECT 
    YEAR(cout) AS year,
    MONTH(cout) AS month,
    COUNT(*) AS total_checkout_count
FROM
    spl_2016.outraw
WHERE
    (LOWER(title) LIKE '%artificial intelligence%'
        OR LOWER(title) LIKE '%creative ai%')
        AND cout >= '2021-01-01'
        AND cout < '2024-01-01'
GROUP BY year , month
ORDER BY year , month;
The same process was conducted for checkouts titled "Textile Art" or certain Dewey Decimal classifications( 729, 730, 745, 746). All the selected Dewey Classifications are related to either textile or the notion of crafting in general.

Code: Select all

SELECT 
    o.title,
    o.itemtype,
    o.deweyClass,
    GROUP_CONCAT(DISTINCT s.subject
        SEPARATOR '; ') AS subjects,
    COUNT(*) AS checkout_count
FROM
    spl_2016.outraw o
        INNER JOIN
    spl_2016.subject s ON o.bibNumber = s.bibNumber
WHERE
    (LOWER(title) LIKE '%textile%'
        OR deweyClass IN ('745' , '729', '730', '746'))
        AND o.cout >= '2021-01-01'
        AND o.cout < '2024-01-01'
GROUP BY o.title , o.itemtype , o.deweyClass
ORDER BY checkout_count DESC;

SELECT 
    YEAR(cout) AS year,
    MONTH(cout) AS month,
    COUNT(*) AS total_checkout_count
FROM
    spl_2016.outraw
WHERE
    (LOWER(title) LIKE '%textile%'
        OR deweyClass IN ('745' , '729', '730', '746'))
        AND cout >= '2021-01-01'
        AND cout < '2024-01-01'
GROUP BY year , month
ORDER BY year , month;
Conclusion:

The data indicated a cleared disparity in the volume of available resources for Textile Art compared to artificial Intelligence, suggesting a broader range of choices for the former. Both categories exhibited a similar pattern, including a significant drop in checkouts towards the end of 2022. This drop could indicate either a data recording error or a temporary closure of the library. A peak in interest for Artificial Intelligence was observed at the beginning and middle of 2023. Interestingly, this peak coincided with a dip in the Textile Arts category, after which both categories returned to their usual patterns. The observed declining trend might be attributable to data recording errors.

The analysis suggests a dynamic relationship between the interest in traditional and modern technological fields. The fluctuations in checkout patterns could reflect shifting public interests, possibly influenced by external events or technological advancements. The observed data inconsistencies highlight the importance of accurate data recording for reliable trend analysis.
Attachments
ai_month_popularity.csv
(397 Bytes) Downloaded 12 times
ai_info_popularity.csv
(10.28 KiB) Downloaded 15 times
textileart_month_popularity.csv
(405 Bytes) Downloaded 14 times
textileart_info_popularity.csv
(48.83 KiB) Downloaded 12 times
MAT259_Assignment_01_NM.pdf
(948.88 KiB) Downloaded 16 times

Post Reply