Question 1: How is ‘map’ related books being classified as different Dewey classification?
Code: Select all
select spl_2016.subject.bibNumber, spl_2016.subject.subject,
spl_2016.deweyClass.deweyClass
from spl_2016.subject
inner join spl_2016.deweyClass
on spl_2016.deweyClass.bibNumber = spl_2016.subject.bibNumber
where subject like '%map%'
group by spl_2016.deweyClass.deweyClass
limit 300;
Explanation: I am interested in maps, and I wonder if map related items are mainly in Dewey classification (Class 900-History & Geography). So I query the subject of items which include ‘map’ and join the table with deweyClass table to see if that is the case.
Results and processing time: 148 rows were returned, and the processing time is 5.016sec. It’s kind of a long time for computer, I think both searching for ‘map’ and matching with deweyClass took long time.
Analysis: as expected, quite a lot of map related items are classified as Class 900 (35/148). However, Class 300(27/148), Class 500(24/148), Class 600(19/148), Class 700(17/148) also take a large portion. Class 300, 500, 600 and 700 represent social sciences, science, technology and arts& recreation, respectively. Actually, the results make sense as map has been widely used in various part of our lives, and map itself has been changed as well.
- map.csv
- (5.86 KiB) Downloaded 150 times
Question 2: How long do people usually keep an item(book)?
Code: Select all
select tDay, count(tDay) as num from
(select id, datediff(cin,cout) as tDay
from spl_2016.inraw
where id< 1000) a
group by tDay;
Explanation: I chose 1000 sample in 2016 data, and try to find the pattern of the time people keep an item. And the query is to get the number of days people keep an item and how many items are keep for same number of days.
Results and processing time: 116 rows were returned, and it took 0.0054sec.
Analysis: Over 70% of items are returned within 40 days. So people usually keep an item for about a month. But there are several outliers, 76 items are returned in 65th day, 15 items are returned in 92th day. More information is needed to explain this. Maybe people need to return the book within 65 days or they would be charged with some late-return fee.
- tDay.csv
- (653 Bytes) Downloaded 145 times
Question 3: What is the normal title length range of an item (book, dvd, cd, etc.)?
Code: Select all
/*query of the first 1000 items and grouped by item type*/
select itemtype, count(id) as total
from spl_2016.inraw
where id < 1000
group by itemtype;
/*query of the first 1000 items which title length is less than 15 and grouped by item type*/
select itemtype, count(id) as len15
from spl_2016.inraw
where id < 1000 and length(title)<15
group by itemtype;
/*query of the first 1000 items which title length is between 15 and 30 and grouped by item type*/
select itemtype, count(id) as len1530
from spl_2016.inraw
where id < 1000 and length(title) between 15 and 30
group by itemtype;
/*query of the first 1000 items which title length is between 30 and 45 and grouped by item type*/
select itemtype, count(id) as len3045
from spl_2016.inraw
where id < 1000 and length(title) between 30 and 45
group by itemtype;
/*query of the first 1000 items which title length is greater than 45 and grouped by item type*/
select itemtype, count(id) as len15
from spl_2016.inraw
where id < 1000 and length(title)>45
group by itemtype;
Explanation: Different books have different title length. But generally speaking, is there a normal title length range for books and other items, like cd and dvd? In this query, I chose 1000 items, got the number of items which title length is less than 15, between 15 and 30, between 30 and 45, and greater than 45, to see if there is a normal range?
Results and processing time: 16, 10, 13, 12, 12 rows were returned respectively, and it took 0.020sec, 0.158sec, 0.0051sec, 0.0050sec, 0.0048sec respectively. 0.158 sec is the first time to calculate the length of title, so it took a relatively long time. After that, the processing time declined to around 0.0050sec, maybe it just calculates the length once so it saved some time.
Analysis: For most of the items, like book, cd, dvd, acvhs, the title length is normally less than 45, and concentrated between 15-30. But for music, dcillb, title length is greater than 30. So for different items, the title length tends to be different. The limitation of this query is that sample is not good enough to give a strong support.
(I manually combine the results together, because I didn’t know if there is way I can change the condition ‘WHERE’ for different situations in one query. Maybe after learning deeper into MySQL, I will figure it out.)