Re: 2D Frequency Pattern Visualization
Posted: Thu Feb 01, 2018 4:28 pm
Concept Description[/size]
Concept Description
I wanted to see if new movie releases in the Pirates of the Caribbean series would spark interest in rewatching or watching for the first time the previous movies in the series. So, I got from the SPL library data the number of checkouts per month for the first 4 Pirates of the Caribbean movies. To visualize the dataset, I decided to make a symmetric line graph, meaning I would reflect the graph about the time axis. Also I have encoded the absolute value of the change in checkouts in the colormap with respect to the maximum of each movie's maximum change in checkouts, with white being the highest and black being 0.
SQL Queries/Data Analysis
Processing Time: 134 seconds
To do some further data processing, I wrote this small python script that gives me the absolute value of the time derivatives for each of the movie columns. Sketches/Works in Progress I initally wanted to represent the checkout number by a width of a "river". But I figured that for functionality reasons, it would be better to use a line graph. Then, to keep the "river" idea, I decided to reflect the linegraph about the x-axis. Here is a picture of that, now with all the movies. And here is the final results Results and Analysis
The vertical lines that indicate movie release are releases for the library, meaning the montht that the movie was made available in the library. I chose this date because I wanted to measure the influence a new movie has after it has been watched by someone who borrowed the dvd from the library, rather than watching at the movie theater.
There seems to be a "lag" between when the movie is released and when the maximum checkout for that movie occurs. This is probably because people who want the DVD version of the movie are "casuals", who would not know(or care) when the movie will first be made available in the library. One supporting evidence for this claim is that the first local maximum is usually not the global maximum. In any case, the graph does not seem to indicate an increased interest in the previous movies as each new movie is released.
Note that the code(not yet there) is written in such a way that minimal modification is needed to visualize another dataset. Eventually, all that will be required is that modification of file names in the pirates.py file and the title names in the SQL query.
Concept Description
I wanted to see if new movie releases in the Pirates of the Caribbean series would spark interest in rewatching or watching for the first time the previous movies in the series. So, I got from the SPL library data the number of checkouts per month for the first 4 Pirates of the Caribbean movies. To visualize the dataset, I decided to make a symmetric line graph, meaning I would reflect the graph about the time axis. Also I have encoded the absolute value of the change in checkouts in the colormap with respect to the maximum of each movie's maximum change in checkouts, with white being the highest and black being 0.
SQL Queries/Data Analysis
Code: Select all
SELECT
YEAR(cout), MONTH(cout), SUM(case when LOWER(spl_2016.inraw.title) like 'pirates of the caribbean the curse of the black pearl' THEN 1 ELSE 0 END) AS 'black pearl',
SUM(case when LOWER(spl_2016.inraw.title) like 'pirates of the caribbean dead mans chest' THEN 1 ELSE 0 END) as 'dead mans chest',
SUM(case when LOWER(spl_2016.inraw.title) like 'pirates of the caribbean at worlds end' THEN 1 ELSE 0 END) as 'at worlds end',
SUM(case when LOWER(spl_2016.inraw.title) like 'pirates of the caribbean on stranger tides' THEN 1 ELSE 0 END) as 'strangers tide'
FROM
spl_2016.inraw
WHERE
(year(cout) > 2005)
group by year(cout), month(cout)
To do some further data processing, I wrote this small python script that gives me the absolute value of the time derivatives for each of the movie columns. Sketches/Works in Progress I initally wanted to represent the checkout number by a width of a "river". But I figured that for functionality reasons, it would be better to use a line graph. Then, to keep the "river" idea, I decided to reflect the linegraph about the x-axis. Here is a picture of that, now with all the movies. And here is the final results Results and Analysis
The vertical lines that indicate movie release are releases for the library, meaning the montht that the movie was made available in the library. I chose this date because I wanted to measure the influence a new movie has after it has been watched by someone who borrowed the dvd from the library, rather than watching at the movie theater.
There seems to be a "lag" between when the movie is released and when the maximum checkout for that movie occurs. This is probably because people who want the DVD version of the movie are "casuals", who would not know(or care) when the movie will first be made available in the library. One supporting evidence for this claim is that the first local maximum is usually not the global maximum. In any case, the graph does not seem to indicate an increased interest in the previous movies as each new movie is released.
Note that the code(not yet there) is written in such a way that minimal modification is needed to visualize another dataset. Eventually, all that will be required is that modification of file names in the pirates.py file and the title names in the SQL query.