1a Basic MySQL

glegrady
Posts: 138
Joined: Wed Sep 22, 2010 12:26 pm

1a Basic MySQL

Post by glegrady » Wed Jan 04, 2017 6:20 pm

Assignment due: Thursday, January 12, 2017

After going through the MySQL class examples post 3 MySQL Queries using the Seattle Public Library database. For each of the queries provide:
. The MySQL query
. Explanation of the query - what you were looking for
. A csv file attached with 3 or 4 columns and over 300 entries
. The time it took for the query to complete (processing time)
. Give an Analysis (report back on your results, explain how you did it, and what your outcomes are)

Process:
. Become familiar with the database and imagine a question
. Explore all fields of the database
. Test the query for logical and other errors
. Start with Booleans (AND, OR NOT), wildcards
. Make the query efficient (finetune)
. Interpret the results (data analysis)
. Export to .csv file to be used in visualization

Click on "PostReply" to post your project
George Legrady
legrady@mat.ucsb.edu

glegrady
Posts: 138
Joined: Wed Sep 22, 2010 12:26 pm

Re: 1a Basic MySQL

Post by glegrady » Tue Jan 10, 2017 10:55 am

2nd posting
George Legrady
legrady@mat.ucsb.edu

sara.lafia
Posts: 5
Joined: Wed Jan 11, 2017 10:41 am

Re: 1a Basic MySQL

Post by sara.lafia » Wed Jan 11, 2017 3:59 pm

Question 1: What are the titles and subjects of the library items about California Geography?

Code: Select all

SELECT spl_2016.title.bibNumber, spl_2016.title.title, spl_2016.subject.subject, spl_2016.deweyClass.deweyClass 
FROM (spl_2016.subject INNER JOIN spl_2016.title
ON spl_2016.subject.bibNumber= spl_2016.title.bibNumber)
INNER JOIN spl_2016.deweyClass ON(spl_2016.subject.bibNumber= spl_2016.deweyClass.bibNumber)
WHERE (spl_2016.title.title LIKE '%california%') and (spl_2016.deweyClass.deweyClass>=900)
GROUP BY spl_2016.title.bibNumber
ORDER BY spl_2016.deweyClass.deweyClass
Explanation: This query is a multi-table join among the title, subject, and dewey class. It uses the bibnumber as the primary key for a join and then limits the results and sorts by to those contents that are within the dewey range of 900 or greater, which is the History and Geography subject, grouped by bibnumber, which bundles duplicates with the same id. The query returns 500 records in 0.558 seconds.

Question 2: Which of these California Geography items were checked out within the last five years?

Code: Select all

SELECT spl_2016.transactions.checkOut, spl_2016.title.bibNumber, spl_2016.title.title, spl_2016.deweyClass.deweyClass 
FROM (spl_2016.transactions INNER JOIN spl_2016.title
ON spl_2016.transactions.bibNumber= spl_2016.title.bibNumber)
INNER JOIN spl_2016.deweyClass ON(spl_2016.transactions.bibNumber= spl_2016.deweyClass.bibNumber)
WHERE (spl_2016.title.title LIKE '%california%') and (spl_2016.deweyClass.deweyClass>=900) and (date(spl_2016.transactions.checkOut)>'2010')
GROUP BY spl_2016.deweyClass.deweyClass
ORDER BY spl_2016.transactions.checkOut
Explanation: This query replaces the subject join with the checkout date and takes a subset of the California Geography contents that were active in the last year. Surprisingly, of the 500 items, only 15 items (3%) were borrowed. If the library is interested in encouraging exploration and use of this collection, this might signal that these materials would benefit from promotion. The query returns the 15 items in 0.621 seconds.

Question 3: When were California Geography items last checked out and which of the items were checked out the most overall?

Code: Select all

SELECT spl_2016.title.title, spl_2016.transactions.checkOut, spl_2016.deweyClass.deweyClass, COUNT(spl_2016.title.title)
FROM (spl_2016.transactions INNER JOIN spl_2016.title
ON spl_2016.transactions.bibNumber= spl_2016.title.bibNumber)
INNER JOIN spl_2016.deweyClass ON(spl_2016.transactions.bibNumber= spl_2016.deweyClass.bibNumber)
WHERE (spl_2016.title.title LIKE '%california%') and (spl_2016.deweyClass.deweyClass>=900)
GROUP BY spl_2016.title.bibNumber
ORDER BY COUNT(spl_2016.title.bibNumber)DESC
Explanation: Over the last 11 years, there has been great variation in the number of times particular items have been checked out. Some items, like a Pacific Crest Trail Guidebook, have been checked out 1,738 times (which seems extraordinarily high), while other items have been checked out much less often. The query creates a summary where each item has a count of checkouts, sorted in descending order. The last checkout date is also available. The query returned all 500 original items from the first query in 0.968 seconds.
Attachments
multitableJoinExport.csv
This is the output of Query 1.
(51.12 KiB) Downloaded 77 times

brooksjaredc
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 1a Basic MySQL

Post by brooksjaredc » Wed Jan 11, 2017 6:05 pm

Query 1: What are the checkout times, item types, and titles of entries on Pluto?

Code: Select all

SELECT cout, itemtype, title, deweyClass FROM spl_2016.inraw WHERE title LIKE '%pluto %'
Explanation: Instead of joining three tables, I just used the inraw table and selected the checkout times, item types, and titles of entries where the title includes the word "pluto". I included a space at the end of pluto so I wouldn't catch words like 'plutonium'.

Analysis: This query takes only 36.2 milliseconds to complete and returns just over 4000 entries. That's 4000 someone has checked out something with "pluto" in the title
query1.csv
(290.64 KiB) Downloaded 81 times
Query 2: Same as query 2, but only selecting entries classified as science?

Explanation: I took the same query as before, but then added another WHERE clause for the deweyClass to be above or equal to 500 and less than 600.

Code: Select all

SELECT cout, itemtype, title, deweyClass FROM spl_2016.inraw WHERE title LIKE '%pluto %' AND (deweyClass >= 500 and deweyClass < 600)
Analysis: This query takes only 5.2 milliseconds to complete and returns just over 1800 entries. This means that less than half of all the checkouts with 'pluto' in the title are actually science related (or at least classified as such).
query2.csv
(150.01 KiB) Downloaded 83 times
Query 3: Of the checked out items with 'pluto' in the title, for each year, what is the distribution of item types?

Explanation: I created new columns by summing up all the entries that had 'cd', or 'dvd', for example, in the item type and then grouping them by year. I've also excluded any entries that have checkout years of 1970.

Code: Select all

SELECT year(cout), sum( case when itemType LIKE '%bk%' then 1 else 0 end) as book, 
sum( case when itemType LIKE '%cd%' then 1 else 0 end) as cd,
sum( case when itemType LIKE '%dvd%' then 1 else 0 end) as dvd,
sum( case when itemType LIKE '%vhs%' then 1 else 0 end) as vhs,
sum( case when itemType LIKE '%per%' then 1 else 0 end) as magazine,
sum( case when itemType LIKE '%np%' then 1 else 0 end) as newspaper,
sum( case when itemType LIKE '%photo%' then 1 else 0 end) as photo,
sum( case when itemType LIKE '%post%' then 1 else 0 end) as poster
FROM spl_2016.inraw WHERE (title LIKE '%pluto %') AND (year(cout) > 1970)
GROUP BY year(cout)
Analysis: This query takes only 2.4 milliseconds and returns 12 rows, one for each year from 2005 to 2016. I know the assignment said that each query should return over 300 entries, but this one "collapses" many entries, so to speak, to give some interesting information. What this output tell us is that no one is checking out magazines, newspapers, photos, or posters with 'pluto' in the title that are classified as science. VHS tapes were only checked out in 2006 and 2007. Most checkouts are books (unsurprisingly), and cd's are slightly more popular than dvd's for pluto. We can also see that interest in pluto peaks in 2010, four years after being demoted from planet status, after which, interest begins to die off. There is a slight uptick in interest in 2015, possibly related to the New Horizons probe that flew by pluto in 2015 and recorded high resolution photographs and returned them to earth.
query3.csv
(370 Bytes) Downloaded 86 times
Note: I got a range of results for the processing time. Sometimes it was up to 10 seconds, and other times the same query took like 2 milliseconds. Not sure why.

wolfe
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 1a Basic MySQL

Post by wolfe » Wed Jan 11, 2017 7:50 pm

Question 1: Where do people checkout items from the library about god?

Code: Select all

  Query 1a:
  select count(checkOut), title, deweyClass, spl_2016.transactions.bibNumber 
 from spl_2016.title 
 INNER JOIN spl_2016.deweyClass
ON spl_2016.deweyClass.bibNumber=spl_2016.title.bibNumber
 INNER JOIN spl_2016.transactions
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
 where title LIKE '%god%' 
 group by
 spl_2016.title.bibNumber, spl_2016.title.title, deweyClass  
 order by count(checkout) desc;
 /*runtime duration: 5.099sec, fetch: 0.089 sec, 5467 rows  */
 
 Query 1b:
  select count(checkOut), deweyClass, itemType
 from spl_2016.transactions
  INNER JOIN spl_2016.title 
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
 INNER JOIN spl_2016.deweyClass
ON spl_2016.deweyClass.bibNumber=spl_2016.title.bibNumber
 INNER JOIN spl_2016.itemType
ON spl_2016.itemType.itemNumber=spl_2016.transactions.itemNumber
 where title LIKE '%god%' 
 group by
 deweyClass, itemType 
 order by count(checkout) desc;
 /*runtime duration: 16.360 sec, fetch: 0.211 sec, 16200 rows  */
 
god3.csv
(6.77 KiB) Downloaded 80 times
Method/Results:

I thought it would be fun to try to "find god" in the library. First I looked for items with "god" in the title to see what was checked out the most frequently. I did this by searching the titles for god, joining it with the Dewey class table and the transactions table, with a count on the checkout, grouping by title and Dewey class. Next I looked for where the items came from. I did this by joining the transactions table with Dewey class and item type and counting the checkouts where grouping by Dewey class and item type.

I found that the most popular section of the library to check out books with god in the title were books not from the Dewey class at all (140,858 checkouts), followed by films not from the Dewey class (133,482). The first group that was from a Dewey class was 38,492 checkouts from the "drawing and drawings" section. The second group from the Dewey classification system were 29,453 checkouts of CD's from the vocal music section. Next was 25,349 checkouts of books from the juvenile collection. The 6th most popular section was "Christian experience, practice, life" with 19,895 checkouts. I chose "god" because I thought it was a common word that would show up a lot. At first I did a search to see what the top items were that were checked out with god in the title. I found that
"Godzilla", "Cidade de Deus City of god", "godfather the coppola restoration", "god delusion" and "exodus gods and kings" where the top 5 items checked out. I was surprised assuming that there would be more non-fiction, but found 4 out of 5 to be films. This lead me to see where these items were.


Question 2: What kind of books only get checkout out once and either never returned or never checked out again?

Code: Select all

Query 2a:
 SELECT 
    c, title, t.bibNumber, checkout, checkin
 FROM
    (
 select count(checkOut) as c, title, spl_2016.transactions.bibNumber as bibNumber
 from spl_2016.title 
 INNER JOIN spl_2016.transactions
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
where title not like '%ILLM%'
 group by
 spl_2016.title.bibNumber, spl_2016.title.title 
 order by c
 ) t
 inner join spl_2016.transactions on
spl_2016.transactions.bibNumber=t.bibNumber
where c = 1
order by checkin, checkout; 
/*runtime duration: 147.291 sec, fetch: 0.258 sec, 18295 rows  */

Query 2b:
 SELECT 
    sum(c) as sum, deweyClass, itemType
 FROM
    (
 select count(checkOut) as c, title, spl_2016.transactions.bibNumber as bibNumber
 from spl_2016.title 
 INNER JOIN spl_2016.transactions
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
where title not like '%ILLM%'
 group by
 spl_2016.title.bibNumber, spl_2016.title.title 
 order by c
 ) t
 inner join spl_2016.transactions on
spl_2016.transactions.bibNumber=t.bibNumber
 INNER JOIN spl_2016.deweyClass
ON spl_2016.deweyClass.bibNumber=spl_2016.transactions.bibNumber
 INNER JOIN spl_2016.itemType
ON spl_2016.itemType.itemNumber=spl_2016.transactions.itemNumber
where c = 1
group by spl_2016.deweyClass.deweyClass, spl_2016.itemType.itemType
order by sum; 
/*runtime duration: 142.205 sec, fetch: .132 sec, 1356 rows  */
empty2.csv
(15.08 KiB) Downloaded 82 times
Method/Results:

I thought it would be interesting to see what books were only checked out once, sometimes due to never being returned. Originally I wanted to look at books that were never checked out, but I could not find a book that fulfilled this criteria, most likely due to the dataset being checkout history, not the entire library collection.

I first looked at this by creating a subTable which counted how many times all books were checked out. I decided to ignore books with ILLM in the title because I assume they are Interlibrary loan books or something similar. I then joined the subTable with the transactions table to get the checkin and checkout date of those books and filtered it by where the checkout count was one. This was interesting but didn't seem to give me much useful information about those books, so I decided to group the results by section and type.

Again I found that the items that were most likely to be only checked out once were adult fiction, followed by juvenile fiction and dcillb (which I don't know). The Dewey books that were most likely to be checked out only once were "895 Literatures of East & Southeast Asia", "891 East Indo-European & Celtic literatures" and "811 American poetry in English". These subjects again are literature. Outside of literature in the top 10 types were "658 General management" and "629 Other branches of engineering". This data may not be relevant because I don't know how many of each type of Dewey class and item type there are, so novels might be the most to be only checked out once, because there are more novels.

Question 3: Is there a relationship between checkout dates of books on government and politics and election years?

Code: Select all

Query 3a:
 SELECT 
    c, month, year, title, t.bibNumber
 FROM
 (select count(checkin) as c, month(checkout) as month, year(checkout) as year, title, spl_2016.transactions.bibNumber as bibNumber
 from spl_2016.title 
 INNER JOIN spl_2016.transactions
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
where (title like '%government%' or title like '%politic%') and year(checkout) != 1970
 group by
 spl_2016.title.bibNumber, spl_2016.title.title, month(checkout), year(checkout)
 order by year(checkout), month(checkout) )t
where c > 0; 
/*runtime duration: 2.708 sec, fetch: 0.416 sec, 46979 rows  */

Query 3b:
 SELECT 
    sum(c) as s, year,month
 FROM
 (select count(checkin) as c, month(checkout) as month, year(checkout) as year, title, spl_2016.transactions.bibNumber as bibNumber
 from spl_2016.title 
 INNER JOIN spl_2016.transactions
ON spl_2016.transactions.bibNumber=spl_2016.title.bibNumber
where (title like '%government%' or title like '%politic%') and year(checkout) != 1970
 group by
 spl_2016.title.bibNumber, spl_2016.title.title, month(checkout), year(checkout)
 order by year(checkout), month(checkout) )t
 group by month, year
 order by s desc; 
 /*runtime duration: 2.508 sec, fetch: 0.0003 sec, 260 rows  */
government2.csv
(2.7 KiB) Downloaded 85 times
Methods/Results:
I thought it would be interesting to see if there is a connection between the year or month and when people checked out books with politic or government in the title. I did this by creating a subTable which selected a count of how many each item which had "politic" or "government" in the title was checked out each month. Then I did a sum of these results grouped by month and year, ordering the table by the sum. I think this information is most useful to see graphed. When looking at it graphed there are two interesting local maximum in 2008 and 2012. The peak is in January of 2009 or Obama's inauguration.

griessbaum
Posts: 4
Joined: Wed Jan 11, 2017 10:40 am

Re: 1a Basic MySQL

Post by griessbaum » Wed Jan 11, 2017 10:12 pm

The attachment query_1.csv is no longer available
### 1. Checkouts vs seasons ###
I was suspecting that there would be more checkouts during the winter. I therefore queried for the sum of all checkouts per week number (regardless of the year).

Code: Select all

SELECT WEEK(checkOut), count(itemNumber) 
FROM spl_2016.transactions
WHERE checkOut > '2011'
GROUP BY WEEK(checkOut)
ORDER BY WEEK(checkOut);
The query took 82.608 seconds.

My assumption proved to be wrong. However, there appears to be a declining trend of checkouts throughout the year.

### 2. Top 10 books ###
I wanted to know the top 10 checked items

Code: Select all

SELECT title, subq.count
FROM 
(SELECT transactions.bibNumber as bibNumber,
 count(transactions.bibNumber) as count
FROM spl_2016.transactions
GROUP BY transactions.bibNumber
ORDER BY count(transactions.bibNumber) DESC
LIMIT 10) AS subq
LEFT JOIN spl_2016.title
ON subq.bibNumber = title.bibNumber
The query took 20.624 seconds.

Appears that the people of Seattle have a pretty good taste in movies!

### 3. Checkout duration ###
I wanted to know which items are checked out for the longest time.

Code: Select all

SELECT subq.avg, subq.count, subq.max, title.title
FROM 
(SELECT bibNumber, 
AVG(DATEDIFF(checkIn, checkOut)) as avg, 
MAX(DATEDIFF(checkIn, checkOut)) as max,
count(bibNumber) as count
FROM spl_2016.transactions
WHERE checkOut IS NOT NULL
AND checkIn IS NOT NULL
AND checkOut >= '2011-01-01 00:00:00'
GROUP BY bibNumber
ORDER BY AVG(DATEDIFF(checkIn, checkOut)) DESC) AS subq
LEFT JOIN spl_2016.title
ON subq.bibNumber = title.bibNumber
WHERE subq.count > 100
AND subq.max < 365
LIMIT 10
The query took 244.776 seconds.

I found that I had to filter outliers such as single, very long checkouts. However, due to the long processing time, I was not able to find appropriate filter parameters.
Attachments
query_2.csv
(255 Bytes) Downloaded 82 times
query_1.csv
(613 Bytes) Downloaded 84 times

freeberg
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 1a Basic MySQL

Post by freeberg » Wed Jan 11, 2017 10:58 pm

Question 1: How often are books about Donald Trump being checked out?

The Query:
I wanted a count for each ISBN, counting the number of times that material was checked out. To get this I used a sub-query and joined on it. Results are ordered by the most recent checkouts. The query ran in 22.2 seconds. Not bad!

Code: Select all

SELECT
	DISTINCT t.checkOut,
    sub.nCheckouts,
    ti.title,
    s.`subject`
FROM spl_2016.transactions AS t
JOIN 
	(SELECT                 # Sub-query to get the counts
		bibNumber,
		count(bibNumber) as nCheckouts
	FROM
		spl_2016.transactions
	GROUP BY
		bibNumber
	) sub
	ON
		sub.bibNumber = t.bibNumber
JOIN spl_2016.title AS ti
	ON ti.bibNumber = t.bibNumber
JOIN spl_2016.`subject` AS s
	ON s.bibNumber = t.bibNumber
WHERE
	(ti.title LIKE "%trump%" AND ti.title LIKE "%donald%") AND
    ti.title NOT LIKE "%trumpet%"  # There are a lot of books on trumpets
ORDER BY t.checkout DESC
LIMIT 300;
Question 2: What words and phrases are most frequent in each Dewey classification?

The Query: I wanted to aggregate the information stored in the title and subject of each piece of material in each Dewey classification. Luckily I found that MySQL supports a group_concat() function, which acts as an aggregator and concatenates the contents of the column given. My idea was that now I could pull this into Python or R and parse it there. Query took 4 seconds.

Code: Select all

SELECT 
	d.deweyClass,
    group_concat(s.`subject` separator ' ') AS subjects,
    group_concat(t.title separator ' ') AS titles
FROM spl_2016.deweyClass as d
JOIN spl_2016.`subject` as s
	ON s.bibNumber = d.bibNumber
JOIN spl_2016.title as t
	ON t.bibNumber = d.bibNumber
GROUP BY
	d.deweyClass
LIMIT 300;


Question 3: For how long do most people check out items? (pretty straightforward question)

The Query: To get a rough measure on the distribution of times I grabbed the minimum, maximum, and mean length of the checkout period. The results are grouped by the dewey classification. I added a count for each Dewey class. I am still trying to optimize this query, as it took ~580 seconds to run. I believe I can speed it up in the JOIN clause, but I need to brush up more on my SQL.

Code: Select all

SELECT
	# Grouping variables
    d.deweyClass,
    year(t.checkOut) as yearOut,
    # Aggregated metrics
	round(max(( unix_timestamp(checkIn) - unix_timestamp(checkOut) )/60/60/24), 3) as maxOut,
    round(avg(( unix_timestamp(checkIn) - unix_timestamp(checkOut) )/60/60/24), 3) as meanOut,
    round(min(( unix_timestamp(checkIn) - unix_timestamp(checkOut) )/60/60/24), 3) as minOut,
	max(checkout) as mostRecentOut,
    count(deweyClass) as N
FROM spl_2016.transactions as t
JOIN spl_2016.deweyClass as d
	ON d.bibNumber = t.bibNumber
WHERE
    year(t.checkOut) >= 2006 AND
    t.checkOut IS NOT NULL
GROUP BY 
	d.deweyClass, yearOut
LIMIT 300;
Attachments
TrumpCheckouts.csv
(29.42 KiB) Downloaded 87 times
group_concat_test.csv
(496.52 KiB) Downloaded 86 times
checkOutsByDewey.csv
(16.44 KiB) Downloaded 79 times

ariellalgilmore
Posts: 7
Joined: Wed Jan 11, 2017 10:40 am

Re: 1a Basic MySQL

Post by ariellalgilmore » Thu Jan 12, 2017 12:48 am

Code: Select all

SELECT DISTINCT title, subject, itemType, checkOut
FROM spl_2016.title, spl_2016.subject, spl_2016.itemType, spl_2016.transactions
WHERE subject = "music" and itemType = "jccd" and checkOut > "2016-06-21 18:00:00" and checkOUT < "2016-06-21 20:00:00"
LIMIT 400;

In this query, I was looking for music CD’s that were checked out on National Music Day June 21, 2016. The table includes the title, subject, itemType, and checkOut. In order to not have duplicate titles I included the DISTINCT function. It is also showing the checkouts of the two hours before closing and it is being limited to 400. It takes 5.1 ms to return the query.

Code: Select all

SELECT cout, itemtype, title, deweyClass
FROM spl_2016.inraw 
WHERE (deweyClass >= 780 and deweyClass < 790)
UNION
SELECT cout, itemtype, title, deweyClass
FROM spl_2016.outraw 
WHERE (deweyClass >= 780 and deweyClass < 790)
ORDER BY deweyClass
LIMIT 400;
In this query, I am combining the cout, itemtype, title, and deweyClass columns from the inraw table and and outraw table. It is also ordering the columns by the deweyClass. Unlike the previous query, which took 5.1 ms to return, this one took approximately 6 minutes. This code seemed a lot simpler than the first one too, but I assume because of the union the time is largely effected.

Code: Select all

SELECT title, HOUR(cout), count(itemNumber), deweyClass
FROM spl_2016.outraw
WHERE cout > '2016-02-015'AND title LIKE "%music%"
GROUP BY HOUR(cout)
This query demonstrates each hour of February 15, 2016, which is the Grammy award show. It gives each hour in increasing order with the count of the itemNumber, deweyClass, and title. This took only 3.3 ms. It is interesting to see that the more titles with music in them were checked out closer to the start of the Grammy award show.
Attachments
query_result3.csv
(1.14 KiB) Downloaded 80 times
query_result2.csv
(26.49 KiB) Downloaded 67 times
query_result1.csv
(27.47 KiB) Downloaded 80 times

christopherchen0
Posts: 5
Joined: Wed Jan 11, 2017 10:44 am

Re: 1a Basic MySQL

Post by christopherchen0 » Thu Jan 12, 2017 12:57 am

Question 1: How many items about gluten were checked out prior to 2007?

Code: Select all

SELECT DISTINCT spl_2016.title.title, spl_2016.collectionCode.itemNumber, spl_2016.transactions.checkOut FROM spl_2016.transactions INNER JOIN spl_2016.title ON spl_2016.transactions.bibNumber = spl_2016.title.bibNumber INNER JOIN spl_2016.itemToBib ON spl_2016.title.bibNumber = spl_2016.itemToBib.bibNumber INNER JOIN spl_2016.collectionCode ON spl_2016.itemToBib.itemNumber = spl_2016.collectionCode.itemNumber WHERE title LIKE "%gluten%" AND year(checkOut)<'2007' GROUP BY spl_2016.transactions.checkOut;
question1.csv
Question1
(57.29 KiB) Downloaded 77 times
This query gave me 526 responses in 16.281 seconds.

Analysis: I tend to like food, so all of my questions were food-themed. I know that the gluten-free movement is quite common now, but I wanted to see how many people were actively looking at items that discussed gluten before about ten years ago? Turns out a decent amount - but when playing around with the time constraints after, I found out the number of checkouts starting around 2011-2013 started to escalate significantly. In 2015 alone? 5148 checkouts were made on items about gluten.


Question 2: How many items were about food but were not in the food/health categories according to the Dewey system?

Code: Select all

SELECT spl_2016.title.title, spl_2016.deweyClass.deweyClass, spl_2016.transactions.checkOut FROM spl_2016.title INNER JOIN spl_2016.deweyClass ON spl_2016.title.bibNumber = spl_2016.deweyClass.bibNumber INNER JOIN spl_2016.transactions ON spl_2016.deweyClass.bibNumber = spl_2016.transactions.bibNumber WHERE title LIKE "%food%" AND deweyClass < 600 GROUP BY spl_2016.title.title ORDER BY deweyClass;
question2.csv
Question2
(83.71 KiB) Downloaded 77 times
This query gave me 914 responses in 2.094 seconds.

Analysis: I was wondering how many items there were about food that weren't actually in the Technology/Food and Health (600s) sections, so I just looked for items in classes 000-599. Turns out a decent amount. Everything from How do animals find food (591 - specific topics in natural history of animals), to Potluck exploring American foods and meals (428 - standard English usage prescriptive linguistics), to Shopping at Giant Foods Chinese American supermarkets in Northern California (381 - commerce & trade) and Nanny state how food fascists teetotaling do gooders priggish moralists and other boneheaded bureaucrats are turning America into a nation of children (303 - social processes). It made sense that the majority of results clustered in the 300s - the social sciences.


Question 3: Of all items about Japanese cooking, which one was the most checked out in the last 10 years?

Code: Select all

SELECT COUNT(cout), cout, spl_2016.title.title, spl_2016.deweyClass.deweyClass, subject FROM spl_2016.title INNER JOIN spl_2016.outraw ON spl_2016.title.bibNumber = spl_2016.outraw.bibNumber INNER JOIN spl_2016.deweyClass ON spl_2016.outraw.bibNumber = spl_2016.deweyClass.bibNumber INNER JOIN spl_2016.subject ON spl_2016.deweyClass.bibNumber = spl_2016.subject.bibNumber WHERE spl_2016.title.title LIKE "%japan%" AND spl_2016.deweyClass.deweyClass = 641 AND year(cout)>2007 OR spl_2016.subject.subject LIKE "%japan%" AND spl_2016.deweyClass.deweyClass = 641 AND year(cout) > 2007 GROUP BY spl_2016.title.title ORDER BY COUNT(cout)
question3.csv
Question3
(20.55 KiB) Downloaded 84 times
This query gave me 213 responses in 1.969 seconds.

Analysis: For this one, I wanted to a little more specific. First off all, I put a restriction on strictly food items (I used the Dewey class number 641 - food & drink.) Next, I put one on Japanese food (looking in both subject and titles). And finally, I wanted to only get items that were checked out within the last 10 years. First the neglected ones - 5 books were only checked out once, with 4 of them being checked out last in 2008. Somewhat unsurprisingly, the fantastic documentary Jiro Dreams of Sushi was checked out the most - a whopping 4596 times. Trailing behind at 1995 check outs, Japanese soul cooking ramen tonkatsu tempura and more from the streets and kitchens of Tokyo and beyond came in second while Japanese women dont get old or fat secrets of my mothers Tokyo kitchen took 3rd with 1659 check outs.

jingyi_xiao
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 1a Basic MySQL

Post by jingyi_xiao » Thu Jan 12, 2017 8:42 am

Question 1: How is ‘map’ related books being classified as different Dewey classification?

Code: Select all

select spl_2016.subject.bibNumber, spl_2016.subject.subject, 
spl_2016.deweyClass.deweyClass
from spl_2016.subject
inner join spl_2016.deweyClass
on spl_2016.deweyClass.bibNumber = spl_2016.subject.bibNumber
where subject like '%map%'
group by spl_2016.deweyClass.deweyClass
limit 300;
Explanation: I am interested in maps, and I wonder if map related items are mainly in Dewey classification (Class 900-History & Geography). So I query the subject of items which include ‘map’ and join the table with deweyClass table to see if that is the case.

Results and processing time: 148 rows were returned, and the processing time is 5.016sec. It’s kind of a long time for computer, I think both searching for ‘map’ and matching with deweyClass took long time.

Analysis: as expected, quite a lot of map related items are classified as Class 900 (35/148). However, Class 300(27/148), Class 500(24/148), Class 600(19/148), Class 700(17/148) also take a large portion. Class 300, 500, 600 and 700 represent social sciences, science, technology and arts& recreation, respectively. Actually, the results make sense as map has been widely used in various part of our lives, and map itself has been changed as well.
map.csv
(5.86 KiB) Downloaded 79 times
Question 2: How long do people usually keep an item(book)?

Code: Select all

select tDay, count(tDay) as num from
(select id, datediff(cin,cout) as tDay
from spl_2016.inraw 
where id< 1000) a
group by tDay;
Explanation: I chose 1000 sample in 2016 data, and try to find the pattern of the time people keep an item. And the query is to get the number of days people keep an item and how many items are keep for same number of days.

Results and processing time: 116 rows were returned, and it took 0.0054sec.

Analysis: Over 70% of items are returned within 40 days. So people usually keep an item for about a month. But there are several outliers, 76 items are returned in 65th day, 15 items are returned in 92th day. More information is needed to explain this. Maybe people need to return the book within 65 days or they would be charged with some late-return fee.
tDay.csv
(653 Bytes) Downloaded 75 times
Question 3: What is the normal title length range of an item (book, dvd, cd, etc.)?

Code: Select all

/*query of the first 1000 items and grouped by item type*/
select itemtype, count(id) as total
from spl_2016.inraw
where id < 1000
group by itemtype;

/*query of the first 1000 items which title length is less than 15 and grouped by item type*/
select itemtype, count(id) as len15
from spl_2016.inraw
where id < 1000 and length(title)<15
group by itemtype;

/*query of the first 1000 items which title length is between 15 and 30 and grouped by item type*/
select itemtype, count(id) as len1530
from spl_2016.inraw
where id < 1000 and length(title) between 15 and 30
group by itemtype;

/*query of the first 1000 items which title length is between 30 and 45 and grouped by item type*/
select itemtype, count(id) as len3045
from spl_2016.inraw
where id < 1000 and length(title) between 30 and 45
group by itemtype;

/*query of the first 1000 items which title length is greater than 45 and grouped by item type*/
select itemtype, count(id) as len15
from spl_2016.inraw
where id < 1000 and length(title)>45
group by itemtype;
Explanation: Different books have different title length. But generally speaking, is there a normal title length range for books and other items, like cd and dvd? In this query, I chose 1000 items, got the number of items which title length is less than 15, between 15 and 30, between 30 and 45, and greater than 45, to see if there is a normal range?

Results and processing time: 16, 10, 13, 12, 12 rows were returned respectively, and it took 0.020sec, 0.158sec, 0.0051sec, 0.0050sec, 0.0048sec respectively. 0.158 sec is the first time to calculate the length of title, so it took a relatively long time. After that, the processing time declined to around 0.0050sec, maybe it just calculates the length once so it saved some time.

Analysis: For most of the items, like book, cd, dvd, acvhs, the title length is normally less than 45, and concentrated between 15-30. But for music, dcillb, title length is greater than 30. So for different items, the title length tends to be different. The limitation of this query is that sample is not good enough to give a strong support.

(I manually combine the results together, because I didn’t know if there is way I can change the condition ‘WHERE’ for different situations in one query. Maybe after learning deeper into MySQL, I will figure it out.)
length.csv
(343 Bytes) Downloaded 76 times

Post Reply