Re: 2 2D Frequency Pattern Visualization
Posted: Thu Jan 26, 2017 1:40 pm
Idea: My goal in this project is to explore how often Seattle Public Library patrons check out media about different countries, and how these patterns differ between dewey categories and change over time. This will give an idea of the contexts in which SPL users interact with these cultures. I wrote a query for the total number of checkouts since 2006 in each dewey category that have the country's name or adjective (e.g., "Ireland" or "Irish") in the title. I chose a set of interesting countries with enough records to use, and grouped the results by year and by dewey division.
Layout: At first, I started by plotting the data as a grid, shown in the initial sketch above. Columns are countries, and rows correspond to (year, dewey category) combos. They are sorted by dewey class, and within class they are sorted by year. Grayscale intensity indicates number of checkouts of books with the name of the country or its possessive in the title. In this early stage, there is already interesting structure in the data, as certain countries stand out as particularly popular within certain dewey divisions.
I wanted to make the temporal "profile" of popularity for each country within each dewey category stand out more clearly. In my next iteration, shown above, I accomplished this by changing the widths of the bars. Within each dewey category block, the top bar is 2006 and the year increases as the bars descend the screen, depicting the profile of popularity for the country in that category over the past decade. (Note: the bottom bar in each block shows data for 2017, which actually helps to distinguish the blocks from each other, because the 2017 bars are all very light/narrow and mostly look like gaps. In a future iteration I may just put an empty line there for consistency.)
By default, all the bar intensities correspond directly to number of overall checkouts, so intensities can be compared between countries and years in the same panel. The bar widths are renormalized within each individual dewey category "block" -- the most popular year within each block has a bar that spans the whole block, and the remaining bars in that block are scaled proportionally to that longest bar. This creates a distinctive shape within each block that represents the waxing and/or waning of interest in that country/category over the years. The darkest bars in the entire layout still correspond to the most popular overall categories. For example, in the screenshot above, "Iraq" has bars that span the whole column in both 930 and 950, representing the most popular year in each block. But the color shows that there were many more checkouts in 950 overall compared to 930.
In the final version, I cleaned up some bugs and added text, as well as an optional color scheme and several interactive options to display more information about the countries and dewey categories. Screen shots of the final version, both colored and non-colored, are below.
Interactivity: The display can be switched between different dewey classes (900s, 800s, etc.) by pressing the number keys (0-9). Above, the 900 level and the 300 level are shown.
The most significant interactive feature is a mouse-over component. Mousing over a bar will print the country, the year, and the information about that dewey classification (e.g. "war/military" for 350, "the Bible" for 220, "ancient history" for 930, "home and family management" for 640). This is displayed at the bottom of the visualization, along with the total number of checkouts in that category/country/year, and provides much more meaning to explorations of the data.
Since some countries have consistently many more checkouts overall than others (China has the most), I added the ability to normalize by total overall checkouts per country by pressing the "n" key. This sometimes prevents China's intensities from overpowering the other ones. In the screen shots above, the 300-layer version is normalized, and the overall total checkouts for the country are also displayed.
I added a color scheme, which can be turned on/off by pressing "c." After talking about it in class, I think/agree that the black and white version actually brings out the data and patterns quite clearly, so I stuck with a very subtle color scheme that uses brown tones and slowly changes the saturation with the number of checkouts (visible in the screen shot above). I accomplished this by using two RGB colors and interpolating between them, using the processing function "lerpColor," but a better way may have been to switch to HSB mode and directly play with the saturation.
Further ideas: I considering showing more information, like the most popular sub-category for the given country and year (e.g., within 740 -- "graphic/decorative arts" -- are people preferring drawing, textile arts, or glass/ceramics for each country?). This would require further MySQL queries, so for now I did not explore this idea.
One final issue is that I only searched for countries in the title, not the subject, to keep the query efficient. But searching in the subject provides more results and gives a better idea of the cultural interaction (as I discovered by trying one country at a time). Since the different country possibilities can be run in parallel queries, I might try using the idea that a previous post mentioned in an earlier project, to calculate smaller groups of countries separately and concatenate the resulting .csv files later. As I look to expand my idea for a 3D visualization, I will consider this possibility.
Code for final version: I included my code and data in a zipped file. The query for the main .csv file is included (commented out) in a .pde file. I also wrote a python script to take the main .csv file and generate a new .csv file that lists the maximum number of checkouts for each dewey category "block," which is necessary for calculating the necessary box widths. Both .csv files are included here, along with the python script for reference.
Code for previous version: Code for initial sketch:
Layout: At first, I started by plotting the data as a grid, shown in the initial sketch above. Columns are countries, and rows correspond to (year, dewey category) combos. They are sorted by dewey class, and within class they are sorted by year. Grayscale intensity indicates number of checkouts of books with the name of the country or its possessive in the title. In this early stage, there is already interesting structure in the data, as certain countries stand out as particularly popular within certain dewey divisions.
I wanted to make the temporal "profile" of popularity for each country within each dewey category stand out more clearly. In my next iteration, shown above, I accomplished this by changing the widths of the bars. Within each dewey category block, the top bar is 2006 and the year increases as the bars descend the screen, depicting the profile of popularity for the country in that category over the past decade. (Note: the bottom bar in each block shows data for 2017, which actually helps to distinguish the blocks from each other, because the 2017 bars are all very light/narrow and mostly look like gaps. In a future iteration I may just put an empty line there for consistency.)
By default, all the bar intensities correspond directly to number of overall checkouts, so intensities can be compared between countries and years in the same panel. The bar widths are renormalized within each individual dewey category "block" -- the most popular year within each block has a bar that spans the whole block, and the remaining bars in that block are scaled proportionally to that longest bar. This creates a distinctive shape within each block that represents the waxing and/or waning of interest in that country/category over the years. The darkest bars in the entire layout still correspond to the most popular overall categories. For example, in the screenshot above, "Iraq" has bars that span the whole column in both 930 and 950, representing the most popular year in each block. But the color shows that there were many more checkouts in 950 overall compared to 930.
In the final version, I cleaned up some bugs and added text, as well as an optional color scheme and several interactive options to display more information about the countries and dewey categories. Screen shots of the final version, both colored and non-colored, are below.
Interactivity: The display can be switched between different dewey classes (900s, 800s, etc.) by pressing the number keys (0-9). Above, the 900 level and the 300 level are shown.
The most significant interactive feature is a mouse-over component. Mousing over a bar will print the country, the year, and the information about that dewey classification (e.g. "war/military" for 350, "the Bible" for 220, "ancient history" for 930, "home and family management" for 640). This is displayed at the bottom of the visualization, along with the total number of checkouts in that category/country/year, and provides much more meaning to explorations of the data.
Since some countries have consistently many more checkouts overall than others (China has the most), I added the ability to normalize by total overall checkouts per country by pressing the "n" key. This sometimes prevents China's intensities from overpowering the other ones. In the screen shots above, the 300-layer version is normalized, and the overall total checkouts for the country are also displayed.
I added a color scheme, which can be turned on/off by pressing "c." After talking about it in class, I think/agree that the black and white version actually brings out the data and patterns quite clearly, so I stuck with a very subtle color scheme that uses brown tones and slowly changes the saturation with the number of checkouts (visible in the screen shot above). I accomplished this by using two RGB colors and interpolating between them, using the processing function "lerpColor," but a better way may have been to switch to HSB mode and directly play with the saturation.
Further ideas: I considering showing more information, like the most popular sub-category for the given country and year (e.g., within 740 -- "graphic/decorative arts" -- are people preferring drawing, textile arts, or glass/ceramics for each country?). This would require further MySQL queries, so for now I did not explore this idea.
One final issue is that I only searched for countries in the title, not the subject, to keep the query efficient. But searching in the subject provides more results and gives a better idea of the cultural interaction (as I discovered by trying one country at a time). Since the different country possibilities can be run in parallel queries, I might try using the idea that a previous post mentioned in an earlier project, to calculate smaller groups of countries separately and concatenate the resulting .csv files later. As I look to expand my idea for a 3D visualization, I will consider this possibility.
Code for final version: I included my code and data in a zipped file. The query for the main .csv file is included (commented out) in a .pde file. I also wrote a python script to take the main .csv file and generate a new .csv file that lists the maximum number of checkouts for each dewey category "block," which is necessary for calculating the necessary box widths. Both .csv files are included here, along with the python script for reference.
Code for previous version: Code for initial sketch: