1a Basic MySQL

merttoka
Posts: 21
Joined: Wed Jan 11, 2017 10:42 am

Re: 1a Basic MySQL

Post by merttoka » Sun Jan 15, 2017 5:15 pm

Question 1: What are the most checked out books per year?
2: What are the most checked out items per year?

Books query:

Code: Select all

select ye as 'Year', MAX(num) as '# of couts', title, subject.subject, deweyClass, itemtype, subject.bibNumber
	from (SELECT year(cout) as ye, count(*) as num, title, itemtype, inraw.bibNumber, deweyClass
	FROM spl_2016.inraw 
	where title != '' and cout != '1970-01-01 00:00:00' and (itemtype = 'acbk' or
															 itemtype = 'arbk' or
                                                             itemtype = 'bcbk' or
                                                             itemtype = 'drbk' or
                                                             itemtype = 'jcbk' or
                                                             itemtype = 'jrbk')
	group by title, year(cout)
	order by num DESC, year(cout) DESC) as sth 
    join subject where sth.bibNumber = subject.bibNumber
group by ye
order by ye DESC;
Items query:

Code: Select all

select ye as 'Year', MAX(num) as '# of couts', title, subject.subject, deweyClass, itemtype, subject.bibNumber
	from (SELECT year(cout) as ye, count(*) as num, title, itemtype, inraw.bibNumber, deweyClass
	FROM spl_2016.inraw 
	where title != '' and cout != '1970-01-01 00:00:00' 
	group by title, year(cout)
	order by num DESC, year(cout) DESC) as sth 
    join subject where sth.bibNumber = subject.bibNumber
group by ye
order by ye DESC;
Explanation: In a dataset, the first thing I would like to do is to see extremes. Therefore I came up with queries that lists most checked out books and items per year.

Results and processing time: 31 rows were returned for both queries and book search took 283.25 secs (4.7 mins) whereas item search was longer with 469.641 secs (7.8 mins). I know the queries are not in the most optimal shape at the moment. Using inraw table and applying a join on that is costly but gathering this information requires lots of computation and it deserves the time it took.

Analysis: When the item list is examined, it's easy to see most of them are digital media products (CDs, DVDs etc.). One of the Interesting items is 2015, where the book "Gone Girl" surpasses all digital media and become number one checked out item. Another one is 2016 in which something called "SPL HotSpot connecting Seattle" is at the top place with an itemtype of "aceqnh", which doesn't appear in itemTypes list.
mostCheckedoutBookPerYear.csv
(2.66 KiB) Downloaded 172 times
mostCheckedoutItemPerYear.csv
(2.55 KiB) Downloaded 190 times

Question 3: What are the newest items in the library?

Code: Select all

SELECT title, datediff(NOW(), min(cout)) as days, itemtype, (cout) as regDate
FROM inraw
WHERE cout > '1970-01-01 00:00:00'
GROUP BY itemNumber
HAVING days < 30
LIMIT 500
Explanation: I wanted to see which items in the library are the newest ones. I considered the earliest check out time is the arrival time of the item to the library (excluding the UNIX timestamp beginning -- 1970) and took time difference between NOW (2017-01-15) and arrival time. Time differences are in days.

Results and processing time: 500 items gathered and it took 117.266 seconds. The long query time is expected and it is due to the HAVING statement, which forces the query to run over all instances in the database. If HAVING is deleted, it runs in fractions of a second. However, without that statement, I couldn't retrieve small numbers in the set.

Analysis: I would not expect any specific item in this query since newest items could be from every domain, background or item type. But when I quickly look at the item types and titles, I find that most of the entries are fictional books.
mostCheckedoutItemPerYear.csv
(2.55 KiB) Downloaded 190 times
Attachments
newestItems.csv
(33.11 KiB) Downloaded 176 times

kschlesi
Posts: 6
Joined: Wed Jan 11, 2017 10:42 am

Re: 1a Basic MySQL

Post by kschlesi » Mon Jan 16, 2017 2:51 pm

1. I was interested to explore the locations of other countries or cultures in the library, which could indicate how SPL patrons primarily interact with those cultures. My first query asked which Dewey classifications were given to books (or other media) about Egypt. The query involved joining the “subject” and “deweyClass” tables; it ran in 4.7 sec and produced 1728 distinct results. Scrolling through, I found that common classifications are in the 910s (geography), the 930s (history of the ancient world), the 960s (history of Africa), and throughout the 700s (art/music). Most other entries are in the 290s (religions other than Christianity), the 300s (political history and mythology), and the 400s (languages — mostly hieroglyphics related). Overall, it seems the culture and history of ancient Egypt are more prominently featured than those of modern Egypt.

After scrolling through the full list, I used “floor,” “count,” and “group by” to bin the results in Dewey number increments of 10. This query ran in 86.7 ms and produced 62 rows, binning all 1728 records about Egypt. This highlights the popular categories much more easily than the initial attempt. The “subject” included with each bin is just one sample subject from that bin.

Code: Select all

SELECT floor(deweyClass/10)*10 AS deweyBin, count(deweyClass) AS deweyCount, spl_2016.subject.subject
FROM spl_2016.subject, spl_2016.deweyClass
WHERE spl_2016.subject.bibNumber = spl_2016.deweyClass.bibNumber
  AND spl_2016.subject.subject LIKE "%egypt%"
GROUP BY floor(deweyClass/10)*10
ORDER BY floor(deweyClass/10)*10;
q1_egypt_hist.csv
(2.33 KiB) Downloaded 184 times
2. Does interest in Egypt increase when the modern country is in the news? I searched for checkouts of Egyptian media in 2011, the year of the Egyptian revolution which began the “Arab spring,” compared to 2009, 2010, and 2012. I joined the “outraw” table with “subject” to search by subject, which seemed to significantly increase the time of the query (compared to, for example, searching by title, which is already in the outlaw table). Similar to the previous query, I binned the results by Dewey class increments and ordered by year. The query ran in 254 sec (!) and produced 202 rows (each collapsing multiple checkout records).

Code: Select all

SELECT year(cout) AS cOutYear, floor(deweyClass/10)*10 AS deweyClassCat, count(deweyClass) AS deweyClassCount
FROM outraw, subject
WHERE year(cout) BETWEEN "2009" AND "2012"
  AND subject LIKE "%egypt%"
  AND subject.bibNumber = outraw.bibNumber
GROUP BY floor(deweyClass/10)*10, year(cout)
ORDER BY floor(deweyClass/10)*10, year(cout);
The trends in Egypt interest depend a lot on the Dewey category. Interest in non-Dewey items and more modern-history items did climb in 2011. However, interest in “Other Religion” media concerning Egypt spiked in 2010, and interest in Egyptian geography dropped in 2011.
q2_egypt_year.csv
(2.45 KiB) Downloaded 180 times
3. Does the time of day affect the number of checkouts of children’s media? Looking at checkouts in 2016, I searched for the number of children’s media checkouts per hour, grouped by portion of the day. The query ran in 23.8 sec and produced 356 rows, one for each day of the year on which books were checked out.

Code: Select all

SELECT day(cout), month(cout),
	   sum(case when hour(cout) BETWEEN "10" AND "11" then 1/2 else 0 end) AS 10AM_12PM,
       sum(case when hour(cout) BETWEEN "12" AND "13" then 1/2 else 0 end) AS 12PM_2PM,
       sum(case when hour(cout) BETWEEN "14" AND "15" then 1/2 else 0 end) AS 2PM_4PM,
       sum(case when hour(cout) BETWEEN "16" AND "17" then 1/2 else 0 end) AS 4PM_6PM,
       sum(case when hour(cout) BETWEEN "18" AND "19" then 1/2 else 0 end) AS 6PM_8PM,
       sum(case when (hour(cout) BETWEEN "00" AND "09") OR (hour(cout) BETWEEN "20" AND "23") then 1/14 else 0 end) AS other
FROM outraw
WHERE itemtype LIKE "j%"
  AND year(cout) = "2016"
GROUP BY month(cout), day(cout);
On weekdays (M-Th) when the library is open from 10am - 8pm, the period from 4-6pm often has the most checkouts of children’s media. This is true almost every day during school months, but the distinction blurs a bit during summer months and on weekend days. The periods of time outside library hours include much lower checkout rates that likely represent online activity. Only a few days each year completely lack checkouts, including New Year’s Eve/Day, Christmas Eve/Day, and Thanksgiving Day. It would be interesting to see whether these children’s rates track with adult checkout rates, and whether they differ for checkins or for different item types and subjects.
q3_children_hours.csv
(19.68 KiB) Downloaded 185 times

Post Reply