2D Frequency Pattern Visualization

kschlesi
Posts: 6
Joined: Wed Jan 11, 2017 10:42 am

Re: 2 2D Frequency Pattern Visualization

Post by kschlesi » Thu Jan 26, 2017 1:40 pm

Idea: My goal in this project is to explore how often Seattle Public Library patrons check out media about different countries, and how these patterns differ between dewey categories and change over time. This will give an idea of the contexts in which SPL users interact with these cultures. I wrote a query for the total number of checkouts since 2006 in each dewey category that have the country's name or adjective (e.g., "Ireland" or "Irish") in the title. I chose a set of interesting countries with enough records to use, and grouped the results by year and by dewey division.
early_data_vis_sketch.png
Layout: At first, I started by plotting the data as a grid, shown in the initial sketch above. Columns are countries, and rows correspond to (year, dewey category) combos. They are sorted by dewey class, and within class they are sorted by year. Grayscale intensity indicates number of checkouts of books with the name of the country or its possessive in the title. In this early stage, there is already interesting structure in the data, as certain countries stand out as particularly popular within certain dewey divisions.
900_layer_vis.png
I wanted to make the temporal "profile" of popularity for each country within each dewey category stand out more clearly. In my next iteration, shown above, I accomplished this by changing the widths of the bars. Within each dewey category block, the top bar is 2006 and the year increases as the bars descend the screen, depicting the profile of popularity for the country in that category over the past decade. (Note: the bottom bar in each block shows data for 2017, which actually helps to distinguish the blocks from each other, because the 2017 bars are all very light/narrow and mostly look like gaps. In a future iteration I may just put an empty line there for consistency.)

By default, all the bar intensities correspond directly to number of overall checkouts, so intensities can be compared between countries and years in the same panel. The bar widths are renormalized within each individual dewey category "block" -- the most popular year within each block has a bar that spans the whole block, and the remaining bars in that block are scaled proportionally to that longest bar. This creates a distinctive shape within each block that represents the waxing and/or waning of interest in that country/category over the years. The darkest bars in the entire layout still correspond to the most popular overall categories. For example, in the screenshot above, "Iraq" has bars that span the whole column in both 930 and 950, representing the most popular year in each block. But the color shows that there were many more checkouts in 950 overall compared to 930.

In the final version, I cleaned up some bugs and added text, as well as an optional color scheme and several interactive options to display more information about the countries and dewey categories. Screen shots of the final version, both colored and non-colored, are below.
900_layer_color_final.png
300_layer_norm_final.png
Interactivity: The display can be switched between different dewey classes (900s, 800s, etc.) by pressing the number keys (0-9). Above, the 900 level and the 300 level are shown.

The most significant interactive feature is a mouse-over component. Mousing over a bar will print the country, the year, and the information about that dewey classification (e.g. "war/military" for 350, "the Bible" for 220, "ancient history" for 930, "home and family management" for 640). This is displayed at the bottom of the visualization, along with the total number of checkouts in that category/country/year, and provides much more meaning to explorations of the data.

Since some countries have consistently many more checkouts overall than others (China has the most), I added the ability to normalize by total overall checkouts per country by pressing the "n" key. This sometimes prevents China's intensities from overpowering the other ones. In the screen shots above, the 300-layer version is normalized, and the overall total checkouts for the country are also displayed.

I added a color scheme, which can be turned on/off by pressing "c." After talking about it in class, I think/agree that the black and white version actually brings out the data and patterns quite clearly, so I stuck with a very subtle color scheme that uses brown tones and slowly changes the saturation with the number of checkouts (visible in the screen shot above). I accomplished this by using two RGB colors and interpolating between them, using the processing function "lerpColor," but a better way may have been to switch to HSB mode and directly play with the saturation.

Further ideas: I considering showing more information, like the most popular sub-category for the given country and year (e.g., within 740 -- "graphic/decorative arts" -- are people preferring drawing, textile arts, or glass/ceramics for each country?). This would require further MySQL queries, so for now I did not explore this idea.

One final issue is that I only searched for countries in the title, not the subject, to keep the query efficient. But searching in the subject provides more results and gives a better idea of the cultural interaction (as I discovered by trying one country at a time). Since the different country possibilities can be run in parallel queries, I might try using the idea that a previous post mentioned in an earlier project, to calculate smaller groups of countries separately and concatenate the resulting .csv files later. As I look to expand my idea for a 3D visualization, I will consider this possibility.

Code for final version: I included my code and data in a zipped file. The query for the main .csv file is included (commented out) in a .pde file. I also wrote a python script to take the main .csv file and generate a new .csv file that lists the maximum number of checkouts for each dewey category "block," which is necessary for calculating the necessary box widths. Both .csv files are included here, along with the python script for reference.
project1_final.zip
(30.8 KiB) Downloaded 130 times
Code for previous version:
project1_test6.zip
(33.27 KiB) Downloaded 122 times
Code for initial sketch:
project1_sketch.zip
(17.5 KiB) Downloaded 114 times
Last edited by kschlesi on Tue Jan 31, 2017 7:09 pm, edited 2 times in total.

christopherchen0
Posts: 5
Joined: Wed Jan 11, 2017 10:44 am

Re: 2 2D Frequency Pattern Visualization

Post by christopherchen0 » Tue Jan 31, 2017 11:31 am

My question was how different media types experience variation in popularity over the course of a ten year period. Seeing as how different media formats go out of fashion with the advent of newer platforms, I wanted to track when certain media types experience life, growth, or death.

My query:

Code: Select all

SELECT date(cout),
sum(case when itemtype LIKE "%bk%" then 1 else 0 end) as Books, 
sum(case when itemtype LIKE "%cd%" then 1 else 0 end) as CDs,
sum(case when itemtype LIKE "%dvd%" then 1 else 0 end) as DVDs,
sum(case when itemtype LIKE "%vhs%" then 1 else 0 end) as VHS,
sum(case when itemtype LIKE "%cas%" then 1 else 0 end) as Cassettes
FROM outraw
WHERE year(cout) > 2006 and year(cout) < 2017
GROUP BY date(cout)
lifeanddeath.png
For the visual representation, I made 10 concentric circles spanning outwards for each year (the inner most circle being 2007, the outermost 2016). The location on a given circle corresponds to a specific date, with the start on the right side of the circle being January 1st, moving in a clockwise motion out to February, March...all the way to December right above January.

Each one of these circles for a given year is made up of 5 different color bands - each of these correspond with the different media types (red for books, green for CDs, blue for DVDs, yellow for VHS, and cyan for cassettes).

Each of these bands are created by drawing small little arcs for every day - that is, each band for every media type per year is made up of 365 little arcs. The transparency of a given arc corresponds to how popular the item was for that day (as compared to the max value of that media type for the entire 10 years). So a space that is blank means media type was not popular (no checkouts), where as a very luminous part indicates its popularity. All of these are charted according to the date and year given which circle it's located in and the exact location on that year's circle.

I made a few interactive options just to make the viewing a little more understandable: "1"-"5" hide or show the different media types. "P" resets the view. "L" shows the labels for each of the months. "H" hides the text of the sides.
dailyyearmedia.zip
(368.53 KiB) Downloaded 109 times
From the visuals themselves, we can see certain patterns: VHS and cassettes died around 2009 and 2010, respectively. Other more obscure patterns require filtering out the other media types: it appears as though CDs experienced peak popularity between the years 2008-2010, as did DVDs. Books are much more constant, however, and maintain a similar level throughout the entire 10 year period. Note that the transparency is not comparing checkouts of 1 media type to another: it is only tracking how popular a given item was for the entire decade. In this way, we can see how media types do over the course of the 10 years.

Obvious blanks were often errors with the data itself: for some reason around September 2009-2012 there were consistently no checkout data for any of the media types. Similarly, there are strange gaps in data in 2007 (around Jan, Jun, and Oct.) Other "streaks" of no checkouts radiating outwards might correspond with holidays: a streak can be seen for Christmas, New Years, MLK Day, President's Day...etc.

Post Reply