Proj 2: 2D Matrix
Proj 2: 2D Matrix
The assignment is to create a 2D Matrix visualization in Processing that is to include the following:
A 2D grid of cells based on 3 numeric columns of data from a MySQL query you have stored in a csv file as per the first assignment.
Each cell's vertical (Y) and horizontal (X) position and color value to be determined by the 3 csv columns.
--------------------------------------------
VISUALIZATION DETAILS:
MATRIX BASED: The visualization should not be a linear (time/change) frequency graph (we will do this in the 3D version) but use the cells to each represent a data value This means using at least 3 datasets from the Seattle Library Data.
SIZE/DATA: Length and Height of the matrix each represent a data value. The 3rd value is to be represented by coloring cells either using, color in RGB, or saturation, or brightness in HSB, or any other way where each cell has a x,y, location, and it is differentiated from the other cells through color, or scale or any other means.
COLOR CODING: 99% of science visualizations use the green – yellow- red color system. Experiment with/invent a different color scheme.
LABELS and TEXTS: Use standard sans-serif fonts from the Swiss Graphic Design tradition for your texts and labels. Limit font sizes to a few. These include any of the following: Arial, Helvetica, Futura, Univers and related fonts.
CENTER VISUALIZATION: Your visualization should be at the center of your screen.
DUE DATE: 1st Draft due on January 22.
--------------------------------------------
Your REPLY POST should include the following:
Concept description
--
Sketch if useful
Screen Shots of your project with descriptions
--
MySQL query
Processing time
--
Analysis
--
csv file and code that created the image
--------------------------------------------
PREVIOUS PROJECTS TO REVIEW:
These are past examples that you can study as they have some of the basic components. You should aim to build on what has already been done and go beyond.
Sandeep Bhat (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... sbhat.html
Yoon Chung Han (2012):
http://vislab.mat.ucsb.edu/2012/p2/hanyoon/index.html
Yeu-Shuan Tang (2013):
http://vislab.mat.ucsb.edu/2013/p3/Yeu/index.html
Anastasiya (2013);
http://vislab.mat.ucsb.edu/2013/p3/Anas ... index.html
Grant McKenzie (2014):
http://vislab.mat.ucsb.edu/2014/p1/Grant/index.html
More specialized
Yun Teng’s is a Treemap (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... eemap.html
Qian Liu’s is circular (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... /11w259/p2
A 2D grid of cells based on 3 numeric columns of data from a MySQL query you have stored in a csv file as per the first assignment.
Each cell's vertical (Y) and horizontal (X) position and color value to be determined by the 3 csv columns.
--------------------------------------------
VISUALIZATION DETAILS:
MATRIX BASED: The visualization should not be a linear (time/change) frequency graph (we will do this in the 3D version) but use the cells to each represent a data value This means using at least 3 datasets from the Seattle Library Data.
SIZE/DATA: Length and Height of the matrix each represent a data value. The 3rd value is to be represented by coloring cells either using, color in RGB, or saturation, or brightness in HSB, or any other way where each cell has a x,y, location, and it is differentiated from the other cells through color, or scale or any other means.
COLOR CODING: 99% of science visualizations use the green – yellow- red color system. Experiment with/invent a different color scheme.
LABELS and TEXTS: Use standard sans-serif fonts from the Swiss Graphic Design tradition for your texts and labels. Limit font sizes to a few. These include any of the following: Arial, Helvetica, Futura, Univers and related fonts.
CENTER VISUALIZATION: Your visualization should be at the center of your screen.
DUE DATE: 1st Draft due on January 22.
--------------------------------------------
Your REPLY POST should include the following:
Concept description
--
Sketch if useful
Screen Shots of your project with descriptions
--
MySQL query
Processing time
--
Analysis
--
csv file and code that created the image
--------------------------------------------
PREVIOUS PROJECTS TO REVIEW:
These are past examples that you can study as they have some of the basic components. You should aim to build on what has already been done and go beyond.
Sandeep Bhat (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... sbhat.html
Yoon Chung Han (2012):
http://vislab.mat.ucsb.edu/2012/p2/hanyoon/index.html
Yeu-Shuan Tang (2013):
http://vislab.mat.ucsb.edu/2013/p3/Yeu/index.html
Anastasiya (2013);
http://vislab.mat.ucsb.edu/2013/p3/Anas ... index.html
Grant McKenzie (2014):
http://vislab.mat.ucsb.edu/2014/p1/Grant/index.html
More specialized
Yun Teng’s is a Treemap (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... eemap.html
Qian Liu’s is circular (2011):
http://www.mat.ucsb.edu/~g.legrady/acad ... /11w259/p2
George Legrady
legrady@mat.ucsb.edu
legrady@mat.ucsb.edu
-
- Posts: 5
- Joined: Sat Jan 10, 2015 11:33 am
Re: Proj 2: 2D Matrix
Average Lending Time for each Dewey Class and Year
Description:
The visualization shows the average lending time for each Dewey class and each year from 2006 to 2013, using a heatmap metaphor. The color scales are designed using HCL Picker (http://tristen.ca/hcl-picker/), which provides a perceptually linear color scale (one exception: the blueish color scale comes from Mathematica). See http://vis4.net/blog/posts/avoid-equidi ... sv-colors/ for the reason why linear RGB/HSV is not ideal for data visualization. Each row of the matrix represents a year, each column is a Dewey category (100 sub-categories). The color of each cell represents the average lending time for the corresponding year and category. There’s a legend showing the color scale at the bottom right.
From the visualization we can see the overall trend of lending time is decreasing, and years 2011, 2012, 2013’s lending time are much smaller than previous years. From the categories’s perspective, 800 categories’ lending time are the highest, which are literatures of different languages. Other interesting patterns include: 090 Manuscripts & rare books at year 2009, 480 Classical & modern Greek languages at year 2007. These might relate to some external events.
Query:
Processing Code:
Description:
The visualization shows the average lending time for each Dewey class and each year from 2006 to 2013, using a heatmap metaphor. The color scales are designed using HCL Picker (http://tristen.ca/hcl-picker/), which provides a perceptually linear color scale (one exception: the blueish color scale comes from Mathematica). See http://vis4.net/blog/posts/avoid-equidi ... sv-colors/ for the reason why linear RGB/HSV is not ideal for data visualization. Each row of the matrix represents a year, each column is a Dewey category (100 sub-categories). The color of each cell represents the average lending time for the corresponding year and category. There’s a legend showing the color scale at the bottom right.
From the visualization we can see the overall trend of lending time is decreasing, and years 2011, 2012, 2013’s lending time are much smaller than previous years. From the categories’s perspective, 800 categories’ lending time are the highest, which are literatures of different languages. Other interesting patterns include: 090 Manuscripts & rare books at year 2009, 480 Classical & modern Greek languages at year 2007. These might relate to some external events.
Query:
Code: Select all
SELECT
class,
GROUP_CONCAT(CAST(year AS CHAR(40)) ORDER BY year) AS years,
GROUP_CONCAT(CAST(average_lending_time AS CHAR(40)) ORDER BY year) AS average_lending_times,
GROUP_CONCAT(CAST(count AS CHAR(40)) ORDER BY year) AS counts
FROM (
SELECT
SUBSTRING(deweyClass, 1, 2) AS class,
YEAR(cout) AS year,
COUNT(*) AS count,
AVG(TIMESTAMPDIFF(DAY, cout, cin)) AS average_lending_time
FROM
spl2.inraw
WHERE
cout >= "2006" AND cout < "2014"
AND itemtype LIKE "%bk"
AND deweyClass != ""
GROUP BY
class, year
) AS innerTable
GROUP BY
class
ORDER BY
class ASC;
Code: Select all
Table myTable;
int numRows, numColumns;
float data[][] = null;
int yearStart = 2006;
int yearEnd = 2013;
int numYears;
float maxValue;
PFont font = null;
// Colormap generated using HCL Picker
float colormap_hcl_1[][] = {
{ 0.1882, 0.0784, 0.0588 },
{ 0.2549, 0.1137, 0.1176 },
{ 0.3176, 0.1529, 0.1922 },
{ 0.3686, 0.2000, 0.2745 },
{ 0.4000, 0.2588, 0.3686 },
{ 0.4039, 0.3294, 0.4588 },
{ 0.3804, 0.4039, 0.5451 },
{ 0.3255, 0.4824, 0.6196 },
{ 0.2431, 0.5608, 0.6667 },
{ 0.1373, 0.6392, 0.6902 },
{ 0.0667, 0.7176, 0.6863 },
{ 0.1804, 0.7882, 0.6549 },
{ 0.3451, 0.8510, 0.6039 },
{ 0.5176, 0.9098, 0.5412 },
{ 0.7020, 0.9608, 0.4745 },
{ 0.9020, 1.0000, 0.4196 }
};
float colormap_hcl_2[][] = {
{ 0.0902, 0.1098, 0.1529 },
{ 0.1098, 0.1608, 0.2118 },
{ 0.1176, 0.2157, 0.2667 },
{ 0.1176, 0.2745, 0.3216 },
{ 0.1137, 0.3333, 0.3686 },
{ 0.1098, 0.3961, 0.4118 },
{ 0.1137, 0.4627, 0.4431 },
{ 0.1451, 0.5255, 0.4706 },
{ 0.1961, 0.5922, 0.4863 },
{ 0.2667, 0.6549, 0.4941 },
{ 0.3529, 0.7176, 0.4941 },
{ 0.4510, 0.7804, 0.4863 },
{ 0.5608, 0.8392, 0.4745 },
{ 0.6784, 0.8941, 0.4627 },
{ 0.8078, 0.9490, 0.4471 },
{ 0.9451, 0.9961, 0.4431 }
};
float colormap_hcl_3[][] = {
{ 0.1804, 0.0824, 0.0510 },
{ 0.2431, 0.1255, 0.0784 },
{ 0.3059, 0.1686, 0.1020 },
{ 0.3725, 0.2157, 0.1255 },
{ 0.4392, 0.2667, 0.1490 },
{ 0.5020, 0.3176, 0.1686 },
{ 0.5647, 0.3765, 0.1882 },
{ 0.6235, 0.4314, 0.2078 },
{ 0.6824, 0.4941, 0.2314 },
{ 0.7412, 0.5569, 0.2510 },
{ 0.7922, 0.6235, 0.2745 },
{ 0.8431, 0.6941, 0.2980 },
{ 0.8863, 0.7647, 0.3255 },
{ 0.9294, 0.8353, 0.3569 },
{ 0.9686, 0.9098, 0.3922 }
};
// Colormap from Mathematica: Table[List @@ ColorData["DeepSeaColors"][t], {t, 0, 1, 1/20}]
float colormap_deep_sea_colors[][] = {
{0.16791,0.,0.301671},
{0.195491,0.0121524,0.361726},
{0.223072,0.0243049,0.421782},
{0.250652,0.0364573,0.481837},
{0.278233,0.0486098,0.541893},
{0.305814,0.0607622,0.601948},
{0.291737,0.11414,0.648217},
{0.277661,0.167517,0.694485},
{0.263584,0.220895,0.740754},
{0.249508,0.274272,0.787022},
{0.235431,0.32765,0.833291},
{0.24481,0.394494,0.861249},
{0.254189,0.461337,0.889207},
{0.263567,0.528181,0.917166},
{0.272946,0.595024,0.945124},
{0.282325,0.661868,0.973082},
{0.380272,0.714418,0.978206},
{0.478219,0.766969,0.98333},
{0.576167,0.819519,0.988455},
{0.674114,0.87207,0.993579},
{0.772061,0.92462,0.998703}
};
//float colormap[][] = colormap_hcl_1;
//float colormap[][] = colormap_hcl_2;
float colormap[][] = colormap_hcl_3;
//float colormap[][] = colormap_deep_sea_colors;
void setFillWithColormap(float t) {
if(t < 0.0f) t = 0.0f; if(t > 1.0f) t = 1.0f;
t = t * (colormap.length - 1);
int pos = floor(t);
if(pos < 0) pos = 0;
if(pos > colormap.length - 2) pos = colormap.length - 2;
t = t - pos;
float r = colormap[pos][0] * (1.0f - t) + colormap[pos + 1][0] * t;
float g = colormap[pos][1] * (1.0f - t) + colormap[pos + 1][1] * t;
float b = colormap[pos][2] * (1.0f - t) + colormap[pos + 1][2] * t;
fill(r * 255.0f, g * 255.0f, b * 255.0f);
}
void setup() {
size(1000, 500, "processing.core.PGraphicsRetina2D");
background(255);
smooth();
font = createFont("Helvetica", 16, true);
myTable = loadTable("dewey-count-average_lending_time-yearly.csv", "header");
numRows = myTable.getRowCount();
numColumns = myTable.getColumnCount();
numYears = yearEnd - yearStart + 1;
data = new float[numYears][numRows];
maxValue = 100.0f;
for(int i = 0; i < numRows; i++) {
String[] splitted = myTable.getRow(i).getString(2).split(",");
for(int k = 0; k < numYears; k++) {
data[k][i] = Float.parseFloat(splitted[k]);
}
}
println(numRows);
}
void draw() {
background(0);
textAlign(CENTER);
noStroke();
fill(255, 255, 255);
textFont(font, 20);
text("Average Lending Time for each Dewey Class", 500, 40);
float margin_left = 80;
float box_margin = 1;
float box_width = 8;
float box_height = 40;
float row_margin = 4;
translate(margin_left, 80);
noStroke();
for(int year = yearStart; year <= yearEnd; year++) {
pushMatrix();
for(int i = 0; i < numRows; i++) {
float value = data[year - yearStart][i];
setFillWithColormap(value / maxValue);
rect(-box_width / 2.0f, -box_height / 2.0f, box_width, box_height);
if(value > maxValue) {
stroke(255, 0, 0);
noFill();
rect(-box_width / 2.0f + 0.5f, -box_height / 2.0f + 0.5f, box_width - 1.0f, box_height - 1.0f);
noStroke();
}
translate(box_width + box_margin, 0);
}
popMatrix();
noStroke();
textFont(font);
fill(255, 255, 255);
textAlign(RIGHT);
text(year, -20, 5);
translate(0, box_height + row_margin);
}
pushMatrix();
for(int i = 0; i < numRows; i++) {
translate(box_width + box_margin, 0);
pushMatrix();
if(myTable.getRow(i).getString(0).charAt(1) == '0') {
textFont(font, 12);
fill(255, 255, 255);
textAlign(RIGHT);
rotate(radians(-90.0));
text(myTable.getRow(i).getString(0) + "0", box_height / 2, -6);
} else {
textFont(font, 7);
fill(128, 128, 128);
textAlign(RIGHT);
rotate(radians(-90.0));
text(myTable.getRow(i).getString(0) + "0", box_height / 2, -8);
}
popMatrix();
}
popMatrix();
float legend_bar_width = 30;
translate(515, 30);
for(int i = 0; i <= 10; i += 1) {
float value = (float)i * 10.0f;
setFillWithColormap(value / maxValue);
rect(0, 0, legend_bar_width, 10);
textAlign(CENTER);
fill(192, 192, 192);
textFont(font, 10);
text((int)value, legend_bar_width / 2.0f, 20);
translate(legend_bar_width + 1, 0);
}
setFillWithColormap(1.0f);
rect(0, 0, legend_bar_width, 10);
stroke(255, 0, 0);
noFill();
rect(0.5, 0.5, legend_bar_width - 1, 10 - 1);
textAlign(CENTER);
fill(192, 192, 192);
textFont(font, 10);
text(" > 100", legend_bar_width / 2.0f, 20);
translate(legend_bar_width + 1, 0);
}
- Attachments
-
- dewey-count-average_lending_time-yearly.csv
- (14.99 KiB) Downloaded 401 times
Last edited by donghaoren on Wed Jan 21, 2015 11:25 am, edited 1 time in total.
-
- Posts: 5
- Joined: Sat Jan 10, 2015 11:29 am
Re: Proj 2: 2D Matrix
Borrowed Together Co-occurrence matrix
I wanted to create a high-level view of my co-occurrence dataset, but also give the user the ability to zoom in for a granular view. I wanted to create something that looked like a satellite image of the earth at night where you can see developed areas as brighter clusters of lights. For this I used transparency and image size to encode frequency information. More frequently co-occurring classes show up brighter and bigger. I used an 'orb' image from Reza's OpenGL demo.
The query:
This query returned 1,363,686 records and a max co-occurrence of 180,695. Interestingly a large fraction of them were comic books that had been borrowed together (Dewey 741). In my code, I give the option to include the comic book data, which dominates the high level visualization. Removing the comic books results in a dataset of 968,566 and a max co-occurrence of 28,971.
I couldn't attach the processing folder because it's 5MB, here is a link to it in dropbox https://www.dropbox.com/s/pi2c9cngaumab21/HW2.zip?dl=0
I wanted to create a high-level view of my co-occurrence dataset, but also give the user the ability to zoom in for a granular view. I wanted to create something that looked like a satellite image of the earth at night where you can see developed areas as brighter clusters of lights. For this I used transparency and image size to encode frequency information. More frequently co-occurring classes show up brighter and bigger. I used an 'orb' image from Reza's OpenGL demo.
The query:
Code: Select all
select
floor(t1.deweyClass) as dewey1,
floor(t2.deweyClass) as dewey2
from
spl2.inraw as t1,
spl2.inraw as t2
where
t1.cout = t2.cout and t1.cin = t2.cin
and t1.itemNumber != t2.itemNumber
and t1.deweyClass != ''
and t2.deweyClass != ''
and year(t1.cout) > 2010;
I couldn't attach the processing folder because it's 5MB, here is a link to it in dropbox https://www.dropbox.com/s/pi2c9cngaumab21/HW2.zip?dl=0
-
- Posts: 5
- Joined: Sat Jan 10, 2015 11:32 am
Re: Proj 2: 2D Matrix
It's The Economy, Stupid! - Relative Visualization
--- Edit ---
Thank you all for the feedback in class. I changed the sketch to include an outline that appears if the cursor is within a cell's boundaries to make it more apparent that my visualization is indeed a 2D matrix. Also I should have made it clearer in my presentation that the number of titles checked out in a particular month for a particular keyword is represented by a cell's color, while the cell's width represents the number of titles checked out in the Dewey categories 330-339 for that particular month - a feature that I call "Dewey warping. I included a switch to turn this feature off in the new version.
New screenshots
New zipped project
---
The visualization picks up my first submission's (very simple) premise to look at possible effects of the 2007-2008 financial crisis on the choice of books borrowed by the patrons of the Seattle public library system. I examine four keywords and map their prevalence over time to the patron's general interest in topics related to economics (Dewey decimal numbers 330-339).
For this first draft, I tried to keep the design as minimalist as possible. My goal was to reduce the necessary number of labels in particular. Hence, the visualization uses relative values - the minimum and maximum values for each column are mapped to the full range of colors within the chosen color scheme (a gradient derived from the web color "Dollar Bill Green", hex. #85bb65). This allows for some interesting design options, namely mapping the y-axis labels to the color scheme as well - in fact the year/month structure is even more apparent like this. The fifth value - the total number of titles borrowed in the Dewey categories 330-339 - changes the width of the cells in all four columns.
I also included a simple interactive element: The actual value of a cell (this includes the year and month cells) appears in the background if the cursor is within the cell's boundaries.
From the analytical perspective, the visualization emphasizes visually what already could be inferred from the numerical data: All four keywords see a noticeable increase with the onset of the 2007-2008 financial crisis, as does the general interest in topics related to economics.
Screenshots
Query
Processing Sketch
CSV Data
Zipped Project
--- Edit ---
Thank you all for the feedback in class. I changed the sketch to include an outline that appears if the cursor is within a cell's boundaries to make it more apparent that my visualization is indeed a 2D matrix. Also I should have made it clearer in my presentation that the number of titles checked out in a particular month for a particular keyword is represented by a cell's color, while the cell's width represents the number of titles checked out in the Dewey categories 330-339 for that particular month - a feature that I call "Dewey warping. I included a switch to turn this feature off in the new version.
New screenshots
New zipped project
---
The visualization picks up my first submission's (very simple) premise to look at possible effects of the 2007-2008 financial crisis on the choice of books borrowed by the patrons of the Seattle public library system. I examine four keywords and map their prevalence over time to the patron's general interest in topics related to economics (Dewey decimal numbers 330-339).
For this first draft, I tried to keep the design as minimalist as possible. My goal was to reduce the necessary number of labels in particular. Hence, the visualization uses relative values - the minimum and maximum values for each column are mapped to the full range of colors within the chosen color scheme (a gradient derived from the web color "Dollar Bill Green", hex. #85bb65). This allows for some interesting design options, namely mapping the y-axis labels to the color scheme as well - in fact the year/month structure is even more apparent like this. The fifth value - the total number of titles borrowed in the Dewey categories 330-339 - changes the width of the cells in all four columns.
I also included a simple interactive element: The actual value of a cell (this includes the year and month cells) appears in the background if the cursor is within the cell's boundaries.
From the analytical perspective, the visualization emphasizes visually what already could be inferred from the numerical data: All four keywords see a noticeable increase with the onset of the 2007-2008 financial crisis, as does the general interest in topics related to economics.
Screenshots
Query
Code: Select all
SELECT
YEAR(cout),
MONTH(cout),
SUM(CASE
WHEN title LIKE "%marx%" THEN 1
ELSE 0
END) AS marx,
SUM(CASE
WHEN title LIKE "%keynes%" THEN 1
ELSE 0
END) AS keynes,
SUM(CASE
WHEN title LIKE "%capital%" THEN 1
ELSE 0
END) AS capital,
SUM(CASE
WHEN title LIKE "%crisis%" THEN 1
ELSE 0
END) AS crisis,
SUM(CASE
WHEN deweyClass >= 330 AND deweyClass < 340 THEN 1
ELSE 0
END) AS isEcon
FROM
spl2.inraw
WHERE
DATE(cout) > '2006-1-1' AND DATE(cout) < '2012-12-31'
GROUP BY YEAR(cout) , MONTH(cout)
Code: Select all
// Data containers
Table data;
int rows,cols;
int hiddenCol;
int[][] dataMatrix;
int[] dataMatrixMax;
int[] dataMatrixMin;
// Layout parameters
final int cellWidthMax = 100;
final int cellWidthMin = 10;
int cellHeight = 6;
int cellXDist = 1;
int cellYDist = 2;
int sideDist = 60;
int middleDist = 30;
final int fontYOffset = -35;
final float labelRelX = 1.7;
final float labelRelY = 2.0;
// Color parameters
final int bgColor = 255;
final int bgFontColor = 200;
final int labelFontColor = 180;
final float gradientAngle = 97.7;
final int gradientMin = 2;
final int gradientMax = 255;
// Misc.
final int absMax = 2147483647;
PFont labelFont,bgFont;
String bgValUpper = "";
String bgValLower = "";
void setup() {
// Be smooth yo
smooth();
// Set font
labelFont = createFont("Futura",10);
bgFont = createFont("Futura",280);
// Load data
data = loadTable("assignment2_data.csv", "header");
rows = data.getRowCount();
cols = data.getColumnCount();
// Hidden column is the last column
hiddenCol = cols-1;
// Resize arrays
dataMatrix = new int[rows][cols];
dataMatrixMax = new int[cols];
dataMatrixMin = new int[cols];
// Read data into array
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; j++) {
dataMatrix[i][j] = data.getInt(i,j);
print(dataMatrix[i][j] + " ");
}
println();
}
// Initialize arrays for minimum and maximum values
for (int j=0; j<cols; j++) {
dataMatrixMax[j] = 0;
dataMatrixMin[j] = absMax;
}
// Find minimum and maximum values for each column
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; j++) {
if (dataMatrix[i][j] < dataMatrixMin[j]) dataMatrixMin[j] = dataMatrix[i][j];
if (dataMatrix[i][j] > dataMatrixMax[j]) dataMatrixMax[j] = dataMatrix[i][j];
}
}
// Cut the hidden column
cols-=1;
// Dynamic size, first two columns are always minimum width and followed by middleDist
size(2*(cellWidthMin+cellXDist)+middleDist+(cols-2)*(cellWidthMax+cellXDist)+2*sideDist,rows*(cellHeight+cellYDist)+2*sideDist);
}
void draw() {
background(255);
// Draw the background values
textFont(bgFont);
textAlign(CENTER,CENTER);
fill(bgFontColor);
// Position includes hardcoded offset beause Processing SUCKS at text positioning
text(bgValUpper,width/2,(height/3)*1+fontYOffset);
text(bgValLower,width/2,(height/3)*2+fontYOffset);
// Reset background values in each draw cycle
bgValUpper = "";
bgValLower = "";
for (int i=0; i<rows; i++) {
for (int j=0; j<cols; j++) {
noStroke();
int cellWidth,x;
// If we're not in the first two columns, resize relative to global column width
if (j!=0 && j!=1) {
cellWidth = int(map(float(dataMatrix[i][hiddenCol]),float(dataMatrixMin[hiddenCol]),float(dataMatrixMax[hiddenCol]),cellWidthMin,cellWidthMax));
x = 2*(cellWidthMin+cellXDist)+middleDist+(j-2)*(cellWidthMax+cellXDist);}
// If not, use minimum width
else {
cellWidth = cellWidthMin;
x = j*(cellWidthMin+cellXDist);}
int y = i*(cellHeight+cellYDist);
// Map cell value to column minimum and maximum
float c = map(float(dataMatrix[i][j]),float(dataMatrixMin[j]),float(dataMatrixMax[j]),gradientMin,gradientMax);
// "Dollar bill green" - see http://www.colorhexa.com/85bb65
// Inline conversion from degrees to 8 bit integer
colorMode(HSB);
fill(map(gradientAngle,0,360,0,255),c,255);
colorMode(RGB);
// If the mouse is inside the rectangle, show the corresponding values in the background
if (mouseX >= sideDist+x && mouseX <= sideDist+x+cellWidth && mouseY >= sideDist+y && mouseY <= sideDist+y+cellHeight) {
bgValUpper = str(dataMatrix[i][j]);
// Show the lower values only if we're not in the first two columns
if (j!=0 && j!=1) {bgValLower = str(dataMatrix[i][hiddenCol]);}
}
rect(sideDist+x,sideDist+y,cellWidth,cellHeight);
}
}
textFont(labelFont);
textAlign(LEFT,CENTER);
fill(labelFontColor);
// Horizontal labels
int firstTwo = sideDist+2*(cellWidthMin+cellXDist)+middleDist;
text("MARX", firstTwo,sideDist/labelRelX);
text("KEYNES", firstTwo+1*(cellWidthMax+cellXDist),sideDist/labelRelX);
text("CAPITAL", firstTwo+2*(cellWidthMax+cellXDist),sideDist/labelRelX);
text("CRISIS", firstTwo+3*(cellWidthMax+cellXDist),sideDist/labelRelX);
// Vertical labels
pushMatrix();
translate(sideDist/labelRelY,height/2);
rotate(radians(270));
textAlign(CENTER,CENTER);
text("FOUR RECESSION RELATED KEYWORDS IN TITLES 2006-2012 AND TOTAL TITLES FOR DEWEY DECIMALS 330-339: ECONOMICS",0,0);
popMatrix();
}
Zipped Project
Last edited by fabian.offert on Sat Jan 24, 2015 1:47 pm, edited 2 times in total.
Re: Proj 2: 2D Matrix
"[...] When the chips are down"
Description:
Based on Hannah Arendt’s famous testimony ([…] on the Banality of Evil), this visualization explores the following question: at times of hardship, where do the US citizens seek comfort/answers/explanations (throughout books)? Do they turn towards religion, science, philosophy or even history? Or, going even further, in recent conflicts with the Middle East, how much effort is put to understand a different historic perspective?
Analog sketch:
Analog sketch:
Six color schemes were explored. Five of them based on the Swiss style graphic design.
Query:
Processing time:
In seconds: 58.1 for each event. Total time: 216.0
Studies:
Different visual explorations. They were made to test which way was the best to convey our data and question.
Semi-Radial 2D Matrix result:
Final thoughts:
Certainly there is a slightly noticeable difference between the interest of religion topics after one of these attacks, than in other period of the year (such as the early days of december, for example, when people is distracted). However the difference between Islam vs other religions is not highly noticeable. On the other side, it is important to mention that there is indeed an increase on the interest in Middle East history, just seven days after an attack.
*
A note about the satellites:
I tried to achieve a quick grasp of the data on a first glance. I chose to draw each checked out item as a dot/satellite in the image, orbiting the cell. So, for example, in the following image we could see (without any numbers or colors) see that there have been more checked out items in "History" than, let's say religion. On top of that we can see that the amount of books checked out on the middle east is significantly smaller.
Description:
Based on Hannah Arendt’s famous testimony ([…] on the Banality of Evil), this visualization explores the following question: at times of hardship, where do the US citizens seek comfort/answers/explanations (throughout books)? Do they turn towards religion, science, philosophy or even history? Or, going even further, in recent conflicts with the Middle East, how much effort is put to understand a different historic perspective?
Analog sketch:
Analog sketch:
Six color schemes were explored. Five of them based on the Swiss style graphic design.
Query:
Code: Select all
SELECT
SUM(CASE
WHEN deweyClass >= 100 AND deweyClass < 199 THEN 1
ELSE 0
END) AS wtc100Religion,
SUM(CASE
WHEN deweyClass >= 200 AND deweyClass < 299 THEN 1
ELSE 0
END) AS wtc200Philosophy,
SUM(CASE
WHEN deweyClass >= 900 AND deweyClass < 999 THEN 1
ELSE 0
END) AS wtc900History,
SUM(CASE
WHEN deweyClass >= 297 AND deweyClass < 298 THEN 1
ELSE 0
END) AS wtc100ISLAM,
SUM(CASE
WHEN deweyClass >= 956 AND deweyClass < 957 THEN 1
ELSE 0
END) AS wtc900HistoryMiddleEast
FROM
spl2.inraw
WHERE
DATE (cout) >= '2001-09-11' AND DATE(cout) <= '2001-09-18'
/* We group items only by year (since these events have occurred in different years) */
GROUP BY YEAR(cout)
ORDER BY YEAR(cout);
/* This query is repeated four times. Each time corresponding to each event */
In seconds: 58.1 for each event. Total time: 216.0
Studies:
Different visual explorations. They were made to test which way was the best to convey our data and question.
Semi-Radial 2D Matrix result:
Final thoughts:
Certainly there is a slightly noticeable difference between the interest of religion topics after one of these attacks, than in other period of the year (such as the early days of december, for example, when people is distracted). However the difference between Islam vs other religions is not highly noticeable. On the other side, it is important to mention that there is indeed an increase on the interest in Middle East history, just seven days after an attack.
*
A note about the satellites:
I tried to achieve a quick grasp of the data on a first glance. I chose to draw each checked out item as a dot/satellite in the image, orbiting the cell. So, for example, in the following image we could see (without any numbers or colors) see that there have been more checked out items in "History" than, let's say religion. On top of that we can see that the amount of books checked out on the middle east is significantly smaller.
- Attachments
-
- chipsAreDown_05.zip
- P r o c e s s i n g f i l e s :
- (288.63 KiB) Downloaded 366 times
Last edited by jmd on Sat Jan 24, 2015 9:48 am, edited 2 times in total.
Re: Proj 2: 2D Matrix
Comic books checkout.
Revised Version
I post my homework again, I totally misunderstood the instruction. I extended my survey from 2006 to 2013.
My research topic of the first assignment was to verify relationship between the local cultural event and book checkout in Seattle republic library. "Sakura-Con" is one of the biggest cosplay events in the US, it holds every spring. I expected that this event promotes the interest of residence in comic books, especially, Japanese comic books, there were slight increase during the event. To make this result clear, I did compare comic books checkout to the Harry Potter books, the numbers showed me the Harry Potter book doesn't have specific patterns by each month, not like comic books.
MySQL processing time was varied each month it was about 40 seconds.
I set the four parameters; comic books (entire), American comic books, Japanese comic books, and Harry Potter related books. Four numbers has huge gaps in between, thus I tried to minimize the differentiation, and convey the changes visually by each months at the same time.
I choose the conventional bar graph for this assginment, however put more effort to arrange all the text in the the canvas area. I spent more time to find the ideal spaces between each text in the graph. For this time, I was only able to include the result in 2013, but I will work more to put all the information from 2004 in a processing program then, then give a function that pop up with clicking the year icons in the canvas. This would be an additional assignment.
Revised Version
I post my homework again, I totally misunderstood the instruction. I extended my survey from 2006 to 2013.
Code: Select all
SELECT
SUM(case
When deweyClass > '741.49' and deweyClass < '741.6' then 1
else 0 end) as Comic,
SUM(case
When deweyClass = '741.5973' then 1
else 0 end) as US,
SUM(case
When deweyClass = '741.5952' then 1
else 0 end) as Japan,
SUM(case
When title like '%harry potter%' then 1
else 0 end) as HP
FROM
spl2.inraw
WHERE
DATE(cout) >= '2006-01'
and DATE(cout) < '2006-02'
GROUP BY
YEAR(cout), MONTH(cout)
Code: Select all
import processing.data.StringList;
import org.apache.tools.ant.types.resources.comparators.Date;
PFont myFont;
PFont myFont1;
int [][] dataMatrix = null;
String [][] dataMatrix1 = null;
String [] Month = {"Jan","Feb","Mar","Apr","May","Jun",
"Jul","Aug","Sep","Oct","Nov","Dec"};
int numMonth = 12;
int numYear = 8;
int colorR = 255;
int colorG = 135;
int colorB = 25;
int barwidth = 500;
color from = color(1,1,1);
color to = color(colorR,colorG,colorB);
int maxCount;
int numRows, numColumns;
int xPosition = 20;
int yearcut = 12;
int ycut = 5;
PFont font= createFont("Helvetica",16,true);
PFont font_1;
Table myTable;
void setup()
{
size(1340,960);
smooth();
noStroke();
myFont =createFont("Helvetica", 38);
textFont(myFont);
myTable = new Table();
myTable = loadTable( "data06-13.csv", "header");
numRows = myTable.getRowCount();
numColumns = myTable.getColumnCount();
dataMatrix = new int[numRows][numColumns];
for (int i = 0; i < numRows; i ++)
{
for ( int j = 1; j < numColumns; j ++)
{
dataMatrix[i][j] = myTable.getInt(i,j);
}
}
maxCount = dataMatrix[0][0];
for (int i = 0; i < numRows; i ++)
{
for ( int j = 1; j < numColumns; j ++)
{
if (dataMatrix[i][j] > maxCount)
maxCount = dataMatrix[i][j];
}
}
}
void draw()
{
colorMode(RGB);
background(85,85,85);
translate(200,150);
for(int i = 0; i <numRows; i++)
{
for(int j = 1; j <numColumns; j++)
{
fill( colorR* dataMatrix[i][j] / maxCount
, colorG* dataMatrix[i][j] / maxCount
, colorB* dataMatrix[i][j] / maxCount);
//println(360* dataMatrix[i][j] / maxCount);
if(i < 12){
rect(i*92,j*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],i*92,j*16+10);
*/
} else if (i >= yearcut && i < yearcut*2){
rect((i-yearcut)*92,(j+ycut)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut)*92,(j+ycut)*16+10);
*/
} else if (i >= yearcut*2 && i < yearcut*3){
rect((i-yearcut*2)*92,(j+ycut*2)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*2)*92,(j+ycut*2)*16+10);
*/
} else if (i >= yearcut*3 && i < yearcut*4){
rect((i-yearcut*3)*92,(j+ycut*3)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*3)*92,(j+ycut*3)*16+10);
*/
} else if (i >= yearcut*4 && i < yearcut*5){
rect((i-yearcut*4)*92,(j+ycut*4)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*4)*92,(j+ycut*4)*16+10);
*/
} else if (i >= yearcut*5 && i < yearcut*6){
rect((i-yearcut*5)*92,(j+ycut*5)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*5)*92,(j+ycut*5)*16+10);
*/
} else if (i >= yearcut*6 && i < yearcut*7){
rect((i-yearcut*6)*92,(j+ycut*6)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*6)*92,(j+ycut*6)*16+10);
*/
}else if (i >= yearcut*7 && i < yearcut*8){
rect((i-yearcut*7)*92,(j+ycut*7)*16,88,13);
/*
textSize(14);
textAlign(BOTTOM);
fill(255);
text(dataMatrix[i][j],(i-yearcut*7)*92,(j+ycut*7)*16+10);
*/
}
}
}
for(int p = 0; p < numMonth; p++){
textFont(myFont);
fill(133,203,43);
textAlign(UP);
textFont(myFont,28);
text(Month[p],p*92,-5);
}
for(int k = 0; k <numYear; k ++){
text(2006+k, -110, k*80+58);
}
for(int t = 0; t <numYear; t ++){
myFont1 = createFont("Helvetica-Bold",12);
textSize(12);
textFont(myFont1);
text("C\nU\nJ\nH", -20, t*80+28);
}
textSize(32);
text("Comic books checkout", -174, -110);
textFont(myFont1);
text("C = Comic books [741.5]\nU = Comic books - US [741.5973]\nJ = Comic books - Japan [741.5952]\nH = Harry Potter books",-170,-90);
for(int q = 1; q < barwidth; q++){
color inter=lerpColor(from,to,(float(q)/barwidth));
noStroke();
fill(inter);
rect(q+600,-100, 1,20);
}
textAlign(CENTER);
fill(133,203,43);
text("0",600,-110);
text("10000",600+500*10000/maxCount,-110);
text("20000",600+500*20000/maxCount,-110);
text("30000",600+500*30000/maxCount,-110);
text(maxCount,600+500,-110);
save("hw2-3.jpg");
}
My research topic of the first assignment was to verify relationship between the local cultural event and book checkout in Seattle republic library. "Sakura-Con" is one of the biggest cosplay events in the US, it holds every spring. I expected that this event promotes the interest of residence in comic books, especially, Japanese comic books, there were slight increase during the event. To make this result clear, I did compare comic books checkout to the Harry Potter books, the numbers showed me the Harry Potter book doesn't have specific patterns by each month, not like comic books.
MySQL processing time was varied each month it was about 40 seconds.
I set the four parameters; comic books (entire), American comic books, Japanese comic books, and Harry Potter related books. Four numbers has huge gaps in between, thus I tried to minimize the differentiation, and convey the changes visually by each months at the same time.
I choose the conventional bar graph for this assginment, however put more effort to arrange all the text in the the canvas area. I spent more time to find the ideal spaces between each text in the graph. For this time, I was only able to include the result in 2013, but I will work more to put all the information from 2004 in a processing program then, then give a function that pop up with clicking the year icons in the canvas. This would be an additional assignment.
Code: Select all
SELECT
SUM(case
When deweyClass > '741.49' and deweyClass < '741.6' then 1 else 0 end) as Comic,
SUM(case
When deweyClass = '741.5952' then 1 else 0 end) as Japan,
SUM(case
When deweyClass = '741.5973' then 1 else 0 end) as US
FROM
spl2.inraw
WHERE
DATE(cout) > '20121122'
AND DATE(cout) < '20121210'
LIMIT 100000
Code: Select all
import processing.data.StringList;
import org.apache.tools.ant.types.resources.comparators.Date;
/*
class demo on Jan 15, 2015
By Mohit Hingorani
please email me at mohit@mat.ucsb.edu
*/
int [][] dataMatrix = null;
String [][] dataMatrix1 = null;
int maxCount ;
int numRows, numColumns;
PFont font = createFont( "Helvetica", 16, true);
PFont font_1;
Table myTable;
void setup()
{
size( 1000, 780);
background(0);
smooth();
frameRate(10);
myTable = loadTable( "data2013.csv", "header");
numRows = myTable.getRowCount();
numColumns = myTable.getColumnCount();
dataMatrix = new int[numRows][numColumns];
for ( int i = 0; i< numRows; i++)
{
for ( int j = 0; j< numColumns; j++)
{
dataMatrix[i][j] = myTable.getInt(i, j);
print( dataMatrix[i][j] + " ");
}
}
maxCount = dataMatrix[0][0];
for ( int i = 0; i< numRows; i++)
{
for ( int j = 0; j< 2; j++)
{
if ( dataMatrix[i][j] > maxCount) maxCount = dataMatrix[i][j];
}
}
println( numRows +" "+ numColumns + " " + maxCount );
println(maxCount);
}
void draw() {
background(164,198,184);
fill(222,237,226);
noStroke();
rect(760,0,280,1800);
fill(234,142,71);
noStroke();
rect(760,342,15,363);//comicbooks
fill(160,128,77);
noStroke();
rect(775,471,15,234);//us
fill(80,63,35);
noStroke();
rect(790,631,15,74);//japan
fill(4,5,6);
noStroke();
rect(805,685,15,20);//harry
translate( 70, 50);
noStroke();
font_1 = createFont("Calibri-Bold",26);
textFont(font_1,26);
textSize(26);
fill(0,102,153,204);
textAlign(LEFT);
text("Comic books\nCheckout\nBy countries\n2013", 705,20);
textFont(font_1,20);
textSize(20);
fill(0,102,153,204);
textAlign(LEFT);
text("Sakura-Con\nMarch 29-31", 705,220);
textFont(font_1,20);
textSize(20);
fill(234,142,71);
textAlign(LEFT);
text("Comic books\n 741.5", 710,310);
textFont(font_1,26);
textSize(16);
fill(160,128,77);
textAlign(LEFT);
text("Comic books-US\n 741.5973", 725,433);
textFont(font_1,26);
textSize(16);
fill(80,63,35);
textAlign(LEFT);
text("Comic books-Japan\n 741.5952", 740,593);
textFont(font_1,26);
textSize(16);
fill(5,4,2);
textAlign(LEFT);
text("Harry Potter books", 755,650);
/*
for(TableRow row : myTable.rows()){
String checkoutdate = row.getString("Year 2013");
String[] list = split(checkoutdate, " ");
for (int k = 0; k < list.length; k++){
textAlign(CENTER);
textSize(22);
text(list[k],k*100+500,k*100);
println(checkoutdate);
println(list.length);
println(list.getRowCount());
}
}
*/
for ( int i = 1; i< numRows ; i++)
{
for ( int j = 1; j < numColumns ; j++)
{
fill( 349* dataMatrix[i][j] / maxCount
, 249* dataMatrix[i][j] / maxCount
, 149* dataMatrix[i][j] / maxCount);
rect( i * 60 - 65, j +650 , 43, -dataMatrix[i][j]/30 );
font_1 = loadFont("Helvetica-48.vlw");
textSize(14);
fill(0,102,153,204);
textAlign(RIGHT);
text(dataMatrix[i][j]+5,i*60-23,-dataMatrix[i][j]/29.8 + 653);
}
if ( frameCount % 20 ==0) println( "draw called " + frameCount );
}
for(int q = 1; q < numRows; q++){
String s = myTable.getString(q,0);
//println(s);
textFont(font_1,15);
textSize(15);
fill(160,128,77);
textAlign(LEFT);
//pushMatrix();
rotate(-PI/3.2);
//translate(0,20*q);
translate(0,51);
text(s, q*34-100,0);
rotate(PI/3.2);
//translate(0,-20*q);
//popMatrix();
}
//save("datahw_intae.jpg");
}
- Attachments
-
- as2.zip
- (61.87 KiB) Downloaded 368 times
Last edited by intae on Fri Jan 23, 2015 1:19 am, edited 2 times in total.
Re: Proj 2: 2D Matrix
Awake - Empirical Hours of a Library
Description
This visualization exercise explores the active hours of the Seattle Public Library. Based on the time differences between the latest and the earliest check out of a day, the daily awake time in minutes is calculated. This time measure is generally related to the opening hours of the library, but also influenced by activities such as online reservations, interlibrary loans, transfers and removal of books. By mapping the daily awake time onto a 2D matrix indicating weekday and week of year, the temporal distribution of such events is explored as it is found in the data base currently. If daily awake time and/or renewal activity was well constrained, this visualization offers to explore behavioral patterns of different user groups in more depth, for example by repeating this analysis for different item categories or Dewey classes.
For the design of the 2D matrix I relied on representation of the data using only one single color and scaling the brightness and transparency factors of the HSB color ranges. Daily awake times < 200 minutes resulting from national holidays were removed from the data set to improve visibility of patterns. In another visualization approach, the cell sizes are directly determined by the observed awake hours. Here, I used the maximum awake value across the whole time range from 2005-2013, but the matrixes could also be plotted for each year separately or into an animation or 3D cube in the future. The current form of visualization will be a useful diagnostic for the quality of filtering approaches that can help identify library activity patterns, such as awake/asleep times and activity categories (first time check-outs, renewals, online loans/renewals, transfers).
SQL Query
Processing Code
Analysis
Besides the already known weekday-weekend differences that show up in a film tape fashion in the first visualization (Fig. 1), we also note the following.
(1) There seems to be a seasonal cycle of the awake hours that is lined up with hours of daylight throughout the year.
(2) Library activities in the data base occur all the way to the end of a day beyond opening hours until midnight, but only rarely between 12am and 10am (background colored gaps in Fig. 2).
(3) Diurnal, weekly and seasonal cycles of awake hours will be more pronounced if corrections can be made for data base entries that are not directly associated with user activity. By using more advanced SQL queries that incorporate data flags from inlog and outlog different activities can be disentangled and then re-mapped using the visualisation concepts presented here.
Viz
Description
This visualization exercise explores the active hours of the Seattle Public Library. Based on the time differences between the latest and the earliest check out of a day, the daily awake time in minutes is calculated. This time measure is generally related to the opening hours of the library, but also influenced by activities such as online reservations, interlibrary loans, transfers and removal of books. By mapping the daily awake time onto a 2D matrix indicating weekday and week of year, the temporal distribution of such events is explored as it is found in the data base currently. If daily awake time and/or renewal activity was well constrained, this visualization offers to explore behavioral patterns of different user groups in more depth, for example by repeating this analysis for different item categories or Dewey classes.
For the design of the 2D matrix I relied on representation of the data using only one single color and scaling the brightness and transparency factors of the HSB color ranges. Daily awake times < 200 minutes resulting from national holidays were removed from the data set to improve visibility of patterns. In another visualization approach, the cell sizes are directly determined by the observed awake hours. Here, I used the maximum awake value across the whole time range from 2005-2013, but the matrixes could also be plotted for each year separately or into an animation or 3D cube in the future. The current form of visualization will be a useful diagnostic for the quality of filtering approaches that can help identify library activity patterns, such as awake/asleep times and activity categories (first time check-outs, renewals, online loans/renewals, transfers).
SQL Query
Code: Select all
#QUERY 2 modified/shortened for hw2:
SELECT barcode, deweyClass,
DAYOFWEEK(cout) as weekday_num,
WEEK(cout,1) as week_num,
TIMESTAMPDIFF(MINUTE,
MIN(cout),
MAX(cout)) as awake,
DATE_FORMAT(MIN(cout), '%H') as earliest_cout_hour,
DATE_FORMAT(MAX(cout), '%H') as latest_cout_hour,
DATE_FORMAT(cout, '%Y-%m-%d') as day_cout
FROM
(SELECT barcode, itemtype, cout, cin, deweyClass
FROM spl2.inraw
WHERE
TIME_TO_SEC(cout) > 0
and deweyClass <> '') as db1
WHERE
date(cout) >= '2005-02-23'
and date(cout) <= '2014-01-05'
GROUP BY day_cout
ORDER BY weekday_num;
Code: Select all
int cellWidth = 22;
int cellHeight = 50;
int bgColor = 15;
float factHeight;
float morningOffset;
int rowMargin = 180;
int colMargin = 80;
int numWeekdays = 7;
int numWeeks = 52;
String[] weekdayNames = {
"Sunday",
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday"
};
float [][] dataMatrix = null;
int [][] dataAwake = null;
int [][] dataEarliestCout = null;
int [][] dataLatestCout = null;
int maxAwake;
int numRows, numCols;
PFont myFont1 = createFont( "Helvetica", 24, true);
PFont myFont2 = createFont( "Helvetica", 16, true);
Table myTable;
void setup()
{
//setting up a screen size 1400*500
size( 1400, 500);
colorMode(RGB);
// set background to white
background(255);
//activate smoothing anti-aliasing functions between pixels
smooth();
// set framerate to 10 per second. NOrmal video is 30 or 60
frameRate( 10);
//load the awake data set
myTable = loadTable( "awake_Week_weekday_matrix.csv", "header");
//header for col names as follows:
//barcode deweyClass weekday_num week_num awake earliest_cout_hour latest_cout_hour
// assign these variables to contain the number of rows and columns from myTable
numRows = myTable.getRowCount();
numCols = myTable.getColumnCount();
// initialize the 2D array with number of rows and columns
dataMatrix = new float[numRows][numCols];
// copy everything from table into a 2D array
maxAwake = (int)dataMatrix[0][4]; //find max awake value [minutes] for scaling of colorbar
for ( int i = 0; i< numRows; i++) // a for loop where i is set to 0 and increments all the way to numRows by 1
{
for ( int j = 0; j< numCols; j++)// a for loop where j is set to 0 and increments all the way to numColumns by 1
{
dataMatrix[i][j] = myTable.getInt(i, j); // copying the table integer value at mytable (i,j) position into the dataMatrix
print( dataMatrix[i][j ] + " "); // print out the value of dataMatrix
}
if ( dataMatrix[i][4] > maxAwake) // this is an if statement which checks for the condition in brackets // if true, it executes the statements in brackets
{
maxAwake = (int)dataMatrix[i][4];
}
println(); // switch to next line in the prompt, improves legibility
}
//get sorted week - weekday matrix from getWeekdayWeek function
println("getting sorted week - weekday matrix...");
dataAwake = getWeekWeekday(dataMatrix, 4); // 4 is colIndex for awake
println("interpolating missing data...");
dataAwake = interpMissing(dataAwake);
dataEarliestCout = getWeekWeekday(dataMatrix, 5); // 4 is colIndex for earliest cout hour
dataLatestCout = getWeekWeekday(dataMatrix, 6); // 4 is colIndex for earliest cout hour
}
// Mapping function to convert SQL output to week-weekday matrix that is easier to plot
// This look up table could be generated much more efficiently, but given the size of the dataset it works OK.
int[][] getWeekWeekday(float[][]data, int colIndex) {
int[][] dataWeekWeekday = new int[numWeeks][numWeekdays];
for (int w=1; w<=numWeeks; w++) {
for (int wd=1; wd<=numWeekdays; wd++) {
dataWeekWeekday[w-1][wd-1] = 0;//zero initializer
for (int i=0; i<data.length; i++) {//go through values in data file and sort into week weekday matrix
if (((int)data[i][3] == w) && ((int)data[i][2] == wd)) {
if ((int)data[i][colIndex] > dataWeekWeekday[w-1][wd-1]) {
dataWeekWeekday[w-1][wd-1] = (int)data[i][colIndex];
}
}
}
print( dataWeekWeekday[w-1][wd-1] + " ");
}
println(); // switch to next line in the prompt, improves legibility
}
return dataWeekWeekday;
}
// Interpolation function for missing values in data matrix
int[][] interpMissing(int[][]data) {
for (int w=1; w<=numWeeks; w++) {
for (int wd=1; wd<=numWeekdays; wd++) {
if (data[w-1][wd-1] == 0) {
println("zero value detected, interpolate with data for the same day from rest of the year");
int wdaySum = 0;
for (int w1=1; w1<=numWeeks; w1++) {
wdaySum += data[w1-1][wd-1];
}
data[w-1][wd-1] = wdaySum/numWeeks;
}
print( data[w-1][wd-1] + " ");
}
println(); // switch to next line in the prompt, improves legibility
}
return data;
}
void draw()
{
// refresh background every frame
background(bgColor);
textFont(myFont1);
text("Awake - Empirical hours of the library [2005-2013]", 1000, 40);
//creating the cells sequentially
// this where the actual drawing of data happens
// we run two nested loops, one for rows and the other for columns , we go through each cell to retrive its value
// and fill the color of the cell accordingly
for ( int i = 0; i< numWeeks; i++)
{
for ( int j = 0; j < numWeekdays; j++) // -1 as the last column is empty, our dataMatrix is smaller than the table by one column
{
colorMode(HSB);
stroke(1, 192, 224* dataAwake[i][j]/maxAwake,224* dataAwake[i][j]/maxAwake );
strokeWeight(1);
fill(bgColor); // fill boxes with same color as background
//plot 24 hour rectangle with stroke around it
rect(rowMargin + i * cellWidth, colMargin + j * cellHeight, cellWidth-2, cellHeight);//*;
fill(1, 192, 224* dataAwake[i][j]/maxAwake, 224* dataAwake[i][j]/maxAwake );
//colorMode(RGB);
factHeight = (float)(dataLatestCout[i][j]-dataEarliestCout[i][j])/24;//hours dependent scaling factor for height
morningOffset = (float)dataEarliestCout[i][j]/24 * cellHeight;
//plot rectangle that is mapped onto the area of the observed hours
rect(rowMargin + i * cellWidth, colMargin + j * cellHeight + morningOffset, cellWidth-2, cellHeight*factHeight);//*;
}
}
textAlign(RIGHT, CENTER);
textFont(myFont2);
fill(154);
for (int i=0; i<numWeekdays; i++) {
text(weekdayNames[i], rowMargin * 0.91, colMargin + cellHeight/2 + i*cellHeight);
}
if ( frameCount % 20 ==0)
{
println( "draw called " + frameCount ); // here we are printing out the frameCount ( how many frames have been drawn so far)
}
}
Besides the already known weekday-weekend differences that show up in a film tape fashion in the first visualization (Fig. 1), we also note the following.
(1) There seems to be a seasonal cycle of the awake hours that is lined up with hours of daylight throughout the year.
(2) Library activities in the data base occur all the way to the end of a day beyond opening hours until midnight, but only rarely between 12am and 10am (background colored gaps in Fig. 2).
(3) Diurnal, weekly and seasonal cycles of awake hours will be more pronounced if corrections can be made for data base entries that are not directly associated with user activity. By using more advanced SQL queries that incorporate data flags from inlog and outlog different activities can be disentangled and then re-mapped using the visualisation concepts presented here.
Viz
- Attachments
-
- awake_Week_weekday_matrix.csv
- data file
- (103.88 KiB) Downloaded 309 times
Last edited by menzer on Thu Jan 22, 2015 10:18 am, edited 5 times in total.
Re: Proj 2: 2D Matrix
Forgotten Books
Description:
The Seattle Public Library has more than 100,000 books being checked out at Jan 1st 1970, and being returned from 2006 to 2014. I’m really curious of the hidden information behind this group of books as they are 1) sharing the same strange checking out data; 2) somehow they got returned after 36 years. The data visualization below took a first step of exploring the data by figuring out what the types of these books are.
The books are grouped by Dewey Classification and item type, and each group is shaped as rectangle, which the area indicates the amounts of books in the group. The upper left corner is connected with Dewey, while the lower right is connected with item type. So from the data we can know, for example, the biggest group among the forgotten books is Arts & Recreation in CD format. The position of each rectangle is determined by noise algorithm, which represents these books are kind of noises in comparison of the whole the library.
For the updated version: I will make it interactive so the rectangle can be moved around together with the two connected lines, which would be highlighted while moving.
SQL:
Processing:
Description:
The Seattle Public Library has more than 100,000 books being checked out at Jan 1st 1970, and being returned from 2006 to 2014. I’m really curious of the hidden information behind this group of books as they are 1) sharing the same strange checking out data; 2) somehow they got returned after 36 years. The data visualization below took a first step of exploring the data by figuring out what the types of these books are.
The books are grouped by Dewey Classification and item type, and each group is shaped as rectangle, which the area indicates the amounts of books in the group. The upper left corner is connected with Dewey, while the lower right is connected with item type. So from the data we can know, for example, the biggest group among the forgotten books is Arts & Recreation in CD format. The position of each rectangle is determined by noise algorithm, which represents these books are kind of noises in comparison of the whole the library.
For the updated version: I will make it interactive so the rectangle can be moved around together with the two connected lines, which would be highlighted while moving.
SQL:
Code: Select all
SELECT
FLOOR(deweyClass / 100) * 100 AS Dewey,
SUM(CASE
WHEN itemtype LIKE '%cdrom' THEN 1
ELSE 0
END) AS CD_ROM,
SUM(CASE
WHEN itemtype LIKE '%vhs' THEN 1
ELSE 0
END) AS Video_VHS,
SUM(CASE
WHEN itemtype LIKE '%bk' THEN 1
ELSE 0
END) AS Book,
SUM(CASE
WHEN itemtype LIKE '%cd' THEN 1
ELSE 0
END) AS CD,
SUM(CASE
WHEN itemtype LIKE '%mus' THEN 1
ELSE 0
END) AS Music_Score,
SUM(CASE
WHEN itemtype LIKE '%dvd' THEN 1
ELSE 0
END) AS DVD,
SUM(CASE
WHEN itemtype LIKE '%disk' THEN 1
ELSE 0
END) AS Diskette,
SUM(CASE
WHEN itemtype LIKE '%kit' THEN 1
ELSE 0
END) AS Kit,
SUM(CASE
WHEN itemtype LIKE '%vid' THEN 1
ELSE 0
END) AS Video,
SUM(CASE
WHEN itemtype LIKE '%cas' THEN 1
ELSE 0
END) AS Audio_Tape,
SUM(CASE
WHEN itemtype = 'aceq' OR itemtype = 'ucunkn' THEN 1
ELSE 0
END) AS Uncataloged
FROM
spl2.inraw
WHERE
cout = '1970-01-01' AND deweyClass > 0
GROUP BY FLOOR(deweyClass / 100) * 100
ORDER BY Dewey DESC
Code: Select all
PFont hel;
Table table;
float[][] bookMatrix = null;
int[] dewey;
String[] deweyType;
String[] itemType;
int[] verticalDistanceOne;
int[] verticalDistanceTwo;
int minimum = 2000;
int numRows, numCols;
void setup() {
size(1280,720);
background(230);
hel = loadFont("Calibri-48.vlw");
textFont(hel);
table = loadTable("ForgottenBooks.csv");
numRows = table.getRowCount();
numCols = table.getColumnCount();
bookMatrix = new float[numRows-1][numCols-1];
dewey = new int[numRows-1];
deweyType = new String[numRows-1];
itemType = new String[numCols-1];
verticalDistanceOne = new int[numRows-1];
verticalDistanceTwo = new int[numCols-1];
for(int k=1; k<numCols; k++){
itemType[k-1] = table.getString(0,k);
}
for(int i=1; i<numRows; i++){
for(int j=0; j<numCols; j++){
if(j == 0){
dewey[i-1] = table.getInt(i,j);
if(dewey[i-1] == 900){
deweyType[i-1] = "History & Geography";
}
if(dewey[i-1] == 800){
deweyType[i-1] = "Literature";
}
if(dewey[i-1] == 700){
deweyType[i-1] = "Arts & Recreation";
}
if(dewey[i-1] == 600){
deweyType[i-1] = "Technology";
}
if(dewey[i-1] == 500){
deweyType[i-1] = "Pure Science";
}
if(dewey[i-1] == 400){
deweyType[i-1] = "Language";
}
if(dewey[i-1] == 300){
deweyType[i-1] = "Social Sciences";
}
if(dewey[i-1] == 200){
deweyType[i-1] = "Religion";
}
if(dewey[i-1] == 100){
deweyType[i-1] = "Philosophy and Psychology";
}
if(dewey[i-1] == 0){
deweyType[i-1] = "General, CS & Info";
}
}
else{
bookMatrix[i-1][j-1] = table.getInt(i,j);
if(bookMatrix[i-1][j-1] < minimum){
bookMatrix[i-1][j-1] = minimum + random(-300,300);
}
}
}
}
//Dawing on canvas
fill(0);
textSize(12);
for(int i=0; i<numRows-1; i++){
verticalDistanceOne[i] = 120 + 60 * i;
textAlign(RIGHT);
text(deweyType[i], 140, verticalDistanceOne[i]);
}
for(int j=0; j<numCols-1; j++){
verticalDistanceTwo[j] = 100 + 60 * j;
textAlign(LEFT);
text(itemType[j], width-90, verticalDistanceTwo[j]);
}
textSize(30);
textAlign(CENTER);
text("Forgotten Books", width/2, 30);
textSize(12);
text("(Books has been checked out for more than 36 years)", width/2, 45);
translate(width/2, height/2);
fill(#4A88DE,50);
for(int i=0; i<numRows-1; i++){
for(int j=0; j<numCols-1; j++){
if(bookMatrix[i][j] > 0){
stroke(20,100);
float xRect = randomGaussian()*120;
float yRect = randomGaussian()*120;
if(bookMatrix[i][j] > 1000){
rect(xRect,yRect,bookMatrix[i][j]/500,bookMatrix[i][j]/500);
stroke(20,20);
line(140 - width/2, verticalDistanceOne[i] - height/2 - 3.5, xRect, yRect);
line(width-90-width/2, verticalDistanceTwo[j]-height/2 -3.5, xRect + bookMatrix[i][j]/500, yRect + bookMatrix[i][j]/500);
}
}
}
}
}
-
- Posts: 5
- Joined: Sat Jan 10, 2015 11:34 am
Re: Proj 2: 2D Matrix
Seattle Library's Most Controversial
The visualizations below show the controversy score calculated on each item, binned by subject and popularity. Since the data is somewhat sparse, the graphs also indirectly communicate the distribution of popular items by subject (blackspace indicates a lack of data).
To recap, the controversy score is calculated as
Alpha and beta were set equal to each other for these visualizations.
The query was quite intensive, and 10 separate queries actually had to be used to get the full dataset without timing out. The query is below, selecting the final 100 rows:
Afterwards, the data was appended and Microsoft Excel was used to bin items based on popularity, yielding the final matrix that could be visualized, as below:
Unfortunately, scaling the color values does not reveal the differences in value very well. The first attempt at solving this problem was to highlight bins that had high differentials from their neighbors (I also tried a secondary color).
Finally, I tried using hue rather than value to indicate the differences in controversy score.
The third visualization seems to most useful, as it highlights the few high controversy scores amidst a distribution which does not tend to deviate from the mean. The highest controversy (as determined by our score) subjects were 470 (Latin) and 700 (The Arts). This means that books related to latin or the arts had the highest variance of checkout time, perhaps meaning some people enjoyed or found the book useful, while others checked it back in immediately. Do some people gain and quickly lose enthusiasm for these subjects, while others keep the books close? More investigation would be needed, however, neither of these subjects are very popular. Another interesting data point was the highly popular books in subject 380 (Transportation and Commerce). At this point that data point requires further investigation. The next versions of the visualizations may be able to answer these questions in more detail, perhaps through interaction.
The code is given below.
Finally, everything is attached.
The visualizations below show the controversy score calculated on each item, binned by subject and popularity. Since the data is somewhat sparse, the graphs also indirectly communicate the distribution of popular items by subject (blackspace indicates a lack of data).
To recap, the controversy score is calculated as
Alpha and beta were set equal to each other for these visualizations.
The query was quite intensive, and 10 separate queries actually had to be used to get the full dataset without timing out. The query is below, selecting the final 100 rows:
Code: Select all
SELECT
floor(deweyClass) as topic,
AVG(popularity.popularity) / 18552 + (STD(UNIX_TIMESTAMP(inraw.cin) - UNIX_TIMESTAMP(inraw.cout))) / 7775136 AS controversy,
AVG(popularity.popularity) as pops
FROM
spl2.inraw,
spl2.popularity
WHERE
inraw.bibNumber = popularity.bib
AND
(UNIX_TIMESTAMP(inraw.cin) - UNIX_TIMESTAMP(inraw.cout)) > 0
AND
(UNIX_TIMESTAMP(inraw.cin) - UNIX_TIMESTAMP(inraw.cout)) < 7776000
AND deweyClass >= 900
AND deweyClass < 1000
AND itemtype = 'acbk'
GROUP BY floor(deweyClass)
ORDER BY topic ASC;
Unfortunately, scaling the color values does not reveal the differences in value very well. The first attempt at solving this problem was to highlight bins that had high differentials from their neighbors (I also tried a secondary color).
Finally, I tried using hue rather than value to indicate the differences in controversy score.
The third visualization seems to most useful, as it highlights the few high controversy scores amidst a distribution which does not tend to deviate from the mean. The highest controversy (as determined by our score) subjects were 470 (Latin) and 700 (The Arts). This means that books related to latin or the arts had the highest variance of checkout time, perhaps meaning some people enjoyed or found the book useful, while others checked it back in immediately. Do some people gain and quickly lose enthusiasm for these subjects, while others keep the books close? More investigation would be needed, however, neither of these subjects are very popular. Another interesting data point was the highly popular books in subject 380 (Transportation and Commerce). At this point that data point requires further investigation. The next versions of the visualizations may be able to answer these questions in more detail, perhaps through interaction.
The code is given below.
Code: Select all
/*
Controvery of Public Library Items
By James Schaffer
*/
int radius = 100;
int xPosition = 20;
float [][] dataMatrix = null;
float maxValue ;
int numRows, numColumns;
int rectWidth = 14;
int rectHeight = 18;
int drawSpaceWidth, drawSpaceHeight;
int horizontalMargin, verticalMargin;
PFont headerFont = createFont( "Arial", 20, true);
PFont legendFont = createFont( "Calibri", 14, true);
Table myTable;
void setup() {
size( 1870, 950);
background(0);
smooth();
frameRate( 10 );
myTable = loadTable( "matrix.csv" );
numRows = myTable.getRowCount();
numColumns = myTable.getColumnCount();
dataMatrix = new float[numRows][numColumns];
for ( int i = 0; i< numRows; i++) {
for ( int j = 0; j< numColumns; j++) {
dataMatrix[ i ][ j ] = myTable.getFloat( i, j );
print( dataMatrix[ i ][ j ] + " ");
}
println();
}
maxValue = dataMatrix[0][0];
for ( int i = 0; i< numRows; i++) {
for ( int j = 0; j< numColumns; j++) {
maxValue = max( dataMatrix[ i ][ j ], maxValue );
}
}
drawSpaceWidth = rectWidth * numRows;
drawSpaceHeight = rectHeight * numColumns;
horizontalMargin = ( width - drawSpaceWidth ) / 2;
verticalMargin = ( height - drawSpaceHeight ) / 2;
noLoop();
println( numRows + " "+ numColumns + " " + maxValue );
println( " setup done");
}
void draw() {
background(0);
colorMode(RGB);
fill( 200 );
textAlign( LEFT );
textFont( headerFont );
text( "Controversy of Books for Each Subject Over Popularity", 15, 25 );
textFont( headerFont );
textAlign(CENTER, CENTER );
text( "Popularity", horizontalMargin / 2, verticalMargin + drawSpaceHeight / 2 );
text( "Subject", horizontalMargin + drawSpaceWidth / 2, verticalMargin + drawSpaceHeight + verticalMargin / 2 );
textFont( legendFont );
textAlign( RIGHT, CENTER );
for ( int i = 0; i < numColumns; i++ ) {
if ( i == 0 )
text( "<10", horizontalMargin - 10, verticalMargin + rectHeight / 2 + i * rectHeight );
else if ( i == numColumns - 1 )
text( "400+", horizontalMargin - 10, verticalMargin + rectHeight / 2 + i * rectHeight );
else
text( "" + i * 10 + "-" + ( i + 1 ) * 10, horizontalMargin - 10, verticalMargin + rectHeight / 2 + i * rectHeight );
}
textAlign( LEFT, CENTER );
rotate( PI / 2 );
for ( int i = 0; i < numRows; i++ ) {
text( "" + i * 10, verticalMargin + drawSpaceHeight + 5, -1 * ( horizontalMargin + rectHeight / 2 + i * rectWidth ) );
}
rotate( - PI / 2 );
textFont( legendFont );
textAlign( LEFT, CENTER );
text( "less controversial", drawSpaceWidth + horizontalMargin - 400, verticalMargin / 2 - 25 );
textAlign( RIGHT, CENTER );
text( "more controversial", drawSpaceWidth + horizontalMargin, verticalMargin / 2 - 25 );
colorMode(RGB);
noStroke();
for ( int i = 0; i < 100; i += 1 ) {
fill( 0, 55 + 200 * i / 100, 55 + 200 * i / 100 );
rect( drawSpaceWidth + horizontalMargin - 400 + i * 4, verticalMargin / 2 - 15, 4, 30 );
}
translate( horizontalMargin, verticalMargin );
for ( int i = 0; i< numRows; i++) {
for ( int j = 0; j < numColumns; j++) {
if ( dataMatrix[ i ][ j ] != -1 ) {
noStroke();
if ( dataMatrix[ i ][ j ] > 0.27 )
stroke( 255 );
fill( 0, 55 + 200 * dataMatrix[ i ][ j ] / maxValue, 55 + 200 * dataMatrix[ i ][ j ] / maxValue );
rect( i * rectWidth, j * rectHeight, rectWidth, rectHeight );
}
}
}
}
- Attachments
-
- controversy.zip
- (240.36 KiB) Downloaded 282 times
-
- Posts: 5
- Joined: Sat Jan 10, 2015 11:31 am
Re: Proj 2: 2D Matrix
Harry Potter Books vs Movie Checkouts
This visualization makes a comparison over the number of checkouts per month through time on the Harry Potter series. Surprisingly, there were checkouts in the 1970s even though the first book was not published until 1997. A Google search shows that 1970 is an important year in the Harry Potter Series Timeline, such as the First Wizarding War against Lord Voldemort, which could mean an abnormality in the database or fans having fun. Furthermore, there are no checkouts in the 1990s; the first checkout occurring in 2004. This was found through initial queries to the database for single books/movies which resulted in various files that were condensed into one file through a Python script.
Query Times:
movie 1 = 4.571 / 85.180
movie 2 = 2.041 / 55.817
movie 3 = 0.990 / 55.873
movie 4 = 1.722 / 54.279
movie 5 = 10.379 / 49.689
movie 6 = 25.976 / 32.707
movie 7 = 39.783 / 19.633
movie 8 = 45.162 / 13.898
book 1 = 8.434 / 73.689
book 2 = 6.752 / 70.252
book 3 = 7.034 / 70.564
book 4 = 6.105 / 73.784
book 5 = 3.755 / 76.823
book 6 = 1.472 / 73.619
book 7 = 13.705 / 64.365
The y-axis has the Harry Potter Book or Movie, while the x has the months ordered for each year. The number of checkouts for that month gives each cell its transparency value. Red is affected by the numerical value of the month, green by if it is a book or movie, and blue by the year.
The query in sql that does something similar (excluding the 1970s because it's impossible) to what the python script did is below, however, it does not run all together at once because the query takes more than 600 seconds.
Query Times: 208.733 ; 186.213 ; 188.482 ; 189.655 ; 185.660 ; 183.450 ; 187.194 ; 188.523 ; 190.531 ; 188.359 ; 188.204 ; 186.845 ; 186.194 ; 185.098 ; 188.341
Combined Query Time: 2831.482
Result 1
Result 2
The Processing code for the first and second results is below, commenting out the fun fonts and signature.
Analysis
From the image it can be seen that the Harry Potter movies were checked out with more frequency after the third movie and the book that was checked out the most frequently was the last book in 2007. However, the movie that was checked out the most was the sixth. The colors also show that frequencies are more solid (less transparent) during the initial period that they begin to appear in the library; the colors fade out as the years move on. Lastly, the database abnormality of the 1970s still remains somewhat of a mystery and the only boxes filled in for that year are in the first column, January. It is also interesting to note that the sixth book did not have many checkouts in 1970 and the last movie appears to have none.
Revision
Here are new results using HSB. Here is the new processing code for the new results using the last .csv generated excluding 2004 and 2014. To get the results above just change "colorTheme" to a value from 0-5.
This visualization makes a comparison over the number of checkouts per month through time on the Harry Potter series. Surprisingly, there were checkouts in the 1970s even though the first book was not published until 1997. A Google search shows that 1970 is an important year in the Harry Potter Series Timeline, such as the First Wizarding War against Lord Voldemort, which could mean an abnormality in the database or fans having fun. Furthermore, there are no checkouts in the 1990s; the first checkout occurring in 2004. This was found through initial queries to the database for single books/movies which resulted in various files that were condensed into one file through a Python script.
Query Times:
movie 1 = 4.571 / 85.180
movie 2 = 2.041 / 55.817
movie 3 = 0.990 / 55.873
movie 4 = 1.722 / 54.279
movie 5 = 10.379 / 49.689
movie 6 = 25.976 / 32.707
movie 7 = 39.783 / 19.633
movie 8 = 45.162 / 13.898
book 1 = 8.434 / 73.689
book 2 = 6.752 / 70.252
book 3 = 7.034 / 70.564
book 4 = 6.105 / 73.784
book 5 = 3.755 / 76.823
book 6 = 1.472 / 73.619
book 7 = 13.705 / 64.365
The y-axis has the Harry Potter Book or Movie, while the x has the months ordered for each year. The number of checkouts for that month gives each cell its transparency value. Red is affected by the numerical value of the month, green by if it is a book or movie, and blue by the year.
The query in sql that does something similar (excluding the 1970s because it's impossible) to what the python script did is below, however, it does not run all together at once because the query takes more than 600 seconds.
Query Times: 208.733 ; 186.213 ; 188.482 ; 189.655 ; 185.660 ; 183.450 ; 187.194 ; 188.523 ; 190.531 ; 188.359 ; 188.204 ; 186.845 ; 186.194 ; 185.098 ; 188.341
Combined Query Time: 2831.482
Code: Select all
SELECT
YEAR(cout),
MONTH(cout),
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%sorcerers stone%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book1,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%sorcerers stone%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie1,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%chamber of secrets%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book2,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%chamber of secrets%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie2,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%prisoner of azkaban%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book3,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%prisoner of azkaban%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie3,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%goblet of fire%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book4,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%goblet of fire%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie4,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%order of the phoenix%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book5,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%order of the phoenix%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie5,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%half%' AND title LIKE '%prince%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book6,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%half%' AND title LIKE '%prince%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie6,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title NOT LIKE '%soundtrack%'
AND itemtype LIKE '%bk%')
THEN
1
ELSE 0
END) AS Book7,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title LIKE '%part 1%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie7,
SUM(CASE
WHEN
(title LIKE '%harry potter%'
AND title LIKE '%deathly hallows%'
AND title LIKE '%part 2%'
AND title NOT LIKE '%soundtrack%'
AND (itemtype LIKE '%dvd%'
OR itemtype LIKE '%vhs%'))
THEN
1
ELSE 0
END) AS Movie8
FROM
spl2.inraw
WHERE
DATE(cout) >= '2004-1-1'
GROUP BY YEAR(cout) , MONTH(cout)
Result 2
The Processing code for the first and second results is below, commenting out the fun fonts and signature.
Code: Select all
int [][] dataMatrix = null;
int numRows = 15; //7 books and 8 movies
int numColumns = 576; //1970, 2004-2014 (12) are the included years * 12 months * 4 data entries
PFont funFont1 = createFont("Courier", 14, true);
PFont funFont2 = createFont("Chalkboard", 24, true);
PFont funFont3 = createFont("Apple Chancery", 24, true);
PFont font1 = createFont("Arial", 12, true); //:(
PFont font2 = createFont("Helvetica", 18, true); //:(
Table myTable;
int cellWidth = 9;
int cellHeight = 18;
int rowMargin = 100;
int colMargin = 100;
void setup()
{
size(1490, 560);
background(255);
smooth();
println("SetUp Done!");
myTable = loadTable("HP_Summary.csv", "header");
println("Rows: " + myTable.getRowCount());
println("Columns: " + myTable.getColumnCount());
println(144 * 15);
dataMatrix = new int[numRows][numColumns];
int k = 0;
int m = 0;
println("---------------------BEGIN--------------------");
for(int i = 0; i < myTable.getRowCount(); i++)
{
for(int j = 1; j < myTable.getColumnCount(); j++)
{
dataMatrix[k][m] = myTable.getInt(i,j);
m++;
//print("(K,M: " + k + "," + m + "), (i,j: " + i + "," + j + "), " + myTable.getInt(i,j) + ", ");
//println(k + ", " + m);
if(m == 576)
{
m = 0;
k++;
}
}
}
}
void draw()
{
background(255);
stroke(0, 0, 0, 100);
strokeWeight(0.2);
//Background with Transparency
fill(255,0,0,10);
rect(0, 0, width, height);
//Background for Cells
fill(255);
rect(rowMargin, colMargin, cellWidth * (numColumns/4), cellHeight * numRows);
//Outline Border for Cells
noFill();
rect(rowMargin, colMargin, cellWidth * (numColumns/4), cellHeight * numRows);
//Title
textAlign(CENTER, CENTER);
//textFont(funFont2);
textFont(font2);
fill(101, 55, 253);
text("Harry Potter Book and Movie Checkouts", width / 2, height * 0.14);
//Data Cells
//colorMode(HSB); //looks ugly =[
for(int i = 0, k = 0; i < numColumns; i+= 4, k++)
{
for(int j = 0; j < numRows; j++)
{
stroke(255, 153, 255, 100);
strokeWeight(0.014);
fill(((float)dataMatrix[j][i + 1] * 255.0) / 12.0, //month
(((float)dataMatrix[j][i] + 1.0) * 14.0) / 2.0, //book or movie
dataMatrix[j][i + 2] % 255, //year
((float)dataMatrix[j][i + 3]/494.0) * 255.0); //checkouts
rect(rowMargin + k * cellWidth, colMargin + j * cellHeight, cellWidth, cellHeight);
}
}
//Line Annual Separators
for(int i = 0, k = 0; i < numColumns; i += 4, k++)
{
stroke(101, 55, 253, 100);
strokeWeight(1);
if(k == 0)
line(rowMargin + (k * cellWidth),
colMargin + numRows * cellHeight,
rowMargin + (k * cellWidth),
colMargin + numRows * cellHeight + 15);
if((k + 1) % 12 == 0)
line(rowMargin + ( (k + 1) * cellWidth),
colMargin + numRows * cellHeight,
rowMargin + ( (k + 1) * cellWidth),
colMargin + numRows * cellHeight + 15);
}
//Row Labels
String[] labels = {"Book 1", "Film 1", "Book 2", "Film 2", "Book 3", "Film 3",
"Book 4", "Film 4", "Book 5", "Film 5", "Book 6", "Film 6",
"Book 7", "Film 7", "Film 8"};
textAlign(CENTER, CENTER);
//textFont(funFont1);
textFont(font1);
fill(101, 55, 253, 200);
//fill(128, 3, 253);
for(int i = 0; i < 1500; i += 100)
text(labels[i/100],
rowMargin * 0.7,
colMargin + cellHeight / 2 + i / 100 * cellHeight);
//Column Labels
textAlign(RIGHT, CENTER);
String[] yearL = {"1970", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011",
"2012", "2013", "2014"};
for(int i = 1; i <= 144; i++)
{
if(i % 12 == 0 )
{
text(yearL[i / 12 - 1],
rowMargin - (cellWidth / 2) + (i * cellWidth) - (cellWidth * 3.5),
height - 1.8 * colMargin);
}
}
//Signature
/*
textAlign(CENTER, CENTER);
textFont(funFont3);
fill(101, 55, 253, 214);
text("by Nataly Moreno", width - (width * 0.15), height - (height * 0.15) );
*/
}
From the image it can be seen that the Harry Potter movies were checked out with more frequency after the third movie and the book that was checked out the most frequently was the last book in 2007. However, the movie that was checked out the most was the sixth. The colors also show that frequencies are more solid (less transparent) during the initial period that they begin to appear in the library; the colors fade out as the years move on. Lastly, the database abnormality of the 1970s still remains somewhat of a mystery and the only boxes filled in for that year are in the first column, January. It is also interesting to note that the sixth book did not have many checkouts in 1970 and the last movie appears to have none.
Revision
Here are new results using HSB. Here is the new processing code for the new results using the last .csv generated excluding 2004 and 2014. To get the results above just change "colorTheme" to a value from 0-5.
Code: Select all
///////////////////////////////////////////////////////////////////////////////////////////
// MAT 259 - Assignment 1 Revision
// by Nataly Moreno
///////////////////////////////////////////////////////////////////////////////////////////
int [][] dataMatrix = null;
int numRows = 15; //7 books and 8 movies
int numColumns = 108; //2004-2013: 9 years * 12 months
//Fun Fonts for me :D
PFont funFont1 = createFont("Courier", 14, true);
PFont funFont2 = createFont("Chalkboard", 24, true);
PFont funFont3 = createFont("Apple Chancery", 24, true);
//Other Fonts
PFont font1 = createFont("Arial", 12, true);
PFont font2 = createFont("Helvetica", 18, true);
Table myTable;
int maxCheckOuts = -1;
int cellWidth = 9;
int cellHeight = 18;
int rowMargin = 100;
int colMargin = 100;
int barPosX = 675;
int barPosY = 450;
int barLength = 400;
int barHeight = 30;
int barColor = -1;
int colorTheme = 1;
int colorMode[] = {HSB, HSB, HSB, HSB, HSB, HSB};
color backgroundColor[] = {color(0),
color(0),
color(0),
color(0),
color(114, 114, 114, 100),
color(0)
};
color cellBackground[] = {color(128),
color(128),
color(128),
color(128),
color(128),
color(128)
};
color cellOutline[] = {color(255, 153, 255, 100), //scheme 0
color(180, 153, 255, 100), //scheme 1
color(180, 150, 255, 100), //scheme 2
color(180, 150, 255, 100),
color(180, 150, 255, 100),
color(180, 150, 255, 100)
};
color dataCells[] = {color(180, 155, 255), //scheme 0
color(180, 150, 255, 100), //scheme 1
color(255, 150, 50), //scheme 2
color(255, 150, 50),
color(170),
color(178, 255, 102)
};
color lineSeparator[] = {color(30, 144, 255, 100), //scheme 0
color(100, 255, 100, 100), //scheme 1
color(255, 150, 50, 100), //scheme 2
color(255, 150, 50, 100),
color(255, 150, 50, 100),
color(255, 235, 0, 100)
};
color colorBarOutline[] = {color(30, 144, 255,200), //scheme 0
color(180, 155, 255, 100), //scheme 1
color(255, 150, 50, 100), //scheme 2
color(255, 150, 50, 100),
color(255, 150, 50, 100),
color(255, 150, 50, 100)
};
color colorLabel[] = {color(30, 144, 255), //scheme 0
color(100, 255, 100, 100), //scheme 1
color(255, 150, 50), //scheme 2
color(255, 150, 50),
color(0),
color(255, 220, 0)
};
color outlineBorder[] = {color(30, 144, 255, 130), //scheme 0
color(200, 255, 100, 100), //scheme 1
color(255, 150, 50, 130), //scheme 2
color(255, 150, 50, 130),
color(0),
color(255, 255, 0, 130)
};
int barDiv[] = {100, 100, 100, 100, 100, 100};
///////////////////////////////////////////////////////////////////////////////////////////
// Helper Functions
///////////////////////////////////////////////////////////////////////////////////////////
void printDataMatrix()
{
for(int i = 0; i < numRows; i++)
{
for(int j = 0; j < numColumns; j++)
{
print(dataMatrix[i][j] + " ");
}
println("\n");
}
}
void cellColor(int i, int j)
{
switch(colorTheme)
{
case 0:
fill(red(dataCells[colorTheme]), green(dataCells[colorTheme]), dataMatrix[j][i]);
break;
case 1:
if(dataMatrix[j][i] != 0)
{
fill( red(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts,
green(dataCells[colorTheme]),
blue(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts);
}
else
{
fill(0, 0, 0);
}
break;
case 2:
if(dataMatrix[j][i] != 0)
{
fill( hue(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts,
saturation(dataCells[colorTheme]),
brightness(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts);
}
else
{
fill(0, 0, 0);
}
break;
case 3:
if(dataMatrix[j][i] != 0)
{
fill( red(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts,
green(dataCells[colorTheme]),
blue(dataCells[colorTheme]) * 4);
}
else
fill(0);
break;
case 4:
if(dataMatrix[j][i] != 0)
{
fill(red(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts);
}
else
fill(0);
break;
case 5:
if((j < 13) && (j % 2 == 0))
{
if(dataMatrix[j][i] != 0)
{
fill( hue(dataCells[colorTheme]) - 30,
saturation(dataCells[colorTheme]) + 70,
brightness(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts);
}
else
fill(0);
}
else
{
if(dataMatrix[j][i] != 0)
{
fill( hue(dataCells[colorTheme]),
saturation(dataCells[colorTheme]) + 30,
brightness(dataCells[colorTheme]) * dataMatrix[j][i] / maxCheckOuts);
}
else
fill(0);
}
break;
default:
break;
}
}
void colorBarFill(int i)
{
switch(colorTheme)
{
case 0:
fill( red(dataCells[colorTheme]), green(dataCells[colorTheme]),
blue(dataCells[colorTheme]) * i / barDiv[colorTheme]);
break;
case 1:
fill( red(dataCells[colorTheme]) * i / barDiv[colorTheme],
green(dataCells[colorTheme]),
blue(dataCells[colorTheme]) * i / barDiv[colorTheme]);
break;
case 2:
fill( hue(dataCells[colorTheme]) * i / barDiv[colorTheme],
saturation(dataCells[colorTheme]),
brightness(dataCells[colorTheme]) * i / barDiv[colorTheme]);
break;
case 3:
fill( red(dataCells[colorTheme]) * i / barDiv[colorTheme],
green(dataCells[colorTheme]),
blue(dataCells[colorTheme]) * 4);
break;
case 4:
fill( red(dataCells[colorTheme]) * i / barDiv[colorTheme]);
break;
case 5:
if(barColor == 0) //it's a book
{
fill( hue(dataCells[colorTheme]) - 30,
saturation(dataCells[colorTheme]) + 70,
brightness(dataCells[colorTheme]) * i / barDiv[colorTheme]);
}
else // it's a film
{
fill( hue(dataCells[colorTheme]),
saturation(dataCells[colorTheme]) + 30,
brightness(dataCells[colorTheme]) * i / barDiv[colorTheme]);
}
break;
default:
break;
}
}
///////////////////////////////////////////////////////////////////////////////////////////
// Program's Basic Functions
///////////////////////////////////////////////////////////////////////////////////////////
void setup()
{
size(1190, 560);
background(255);
smooth();
println("SetUp Done!");
myTable = loadTable("XcelCombined.csv", "header");
println("Rows: " + myTable.getColumnCount());
println("Columns: " + myTable.getRowCount());
dataMatrix = new int[numRows][numColumns];
int k = 0; //row
int m = 0; //column
for(int i = 0; i < myTable.getRowCount(); i++)
{
for(int j = 2; j < myTable.getColumnCount(); j++)
{
maxCheckOuts = max(myTable.getInt(i,j), maxCheckOuts);
dataMatrix[k][m] = myTable.getInt(i,j);
k++;
if(k == 15)
{
k = 0;
m++;
}
}
}
}
void draw()
{
colorMode(RGB);
background(255);
//Background Over-Color
fill(backgroundColor[colorTheme]);
rect(0, 0, width, height);
//Background for Cells
fill(cellBackground[colorTheme]);
rect(rowMargin, colMargin, cellWidth * numColumns, cellHeight * numRows);
//Data Cells
for(int i = 0, k = 0; i < numColumns; i++, k++)
{
for(int j = 0; j < numRows; j++)
{
colorMode(RGB);
stroke(cellOutline[colorTheme]);
strokeWeight(0.014);
colorMode(colorMode[colorTheme]);
cellColor(i,j);
colorMode(RGB);
rect(rowMargin + k * cellWidth, colMargin + j * cellHeight, cellWidth, cellHeight);
}
}
//Line Annual Separators
stroke(lineSeparator[colorTheme]);
fill(colorLabel[colorTheme]);
for(int i = 0, k = 0; i < numColumns; i++, k++)
{
strokeWeight(1);
if(k == 0)
line(rowMargin + (k * cellWidth),
colMargin + numRows * cellHeight,
rowMargin + (k * cellWidth),
colMargin + numRows * cellHeight + 15);
if((k + 1) % 12 == 0)
line(rowMargin + ( (k + 1) * cellWidth),
colMargin + numRows * cellHeight,
rowMargin + ( (k + 1) * cellWidth),
colMargin + numRows * cellHeight + 15);
}
//Row Labels
String[] labels = {"Book 1", "Film 1", "Book 2", "Film 2", "Book 3", "Film 3",
"Book 4", "Film 4", "Book 5", "Film 5", "Book 6", "Film 6",
"Book 7", "Film 7", "Film 8"};
textAlign(CENTER, CENTER);
textFont(font1);
for(int i = 0; i < 1500; i += 100)
text(labels[i/100],
rowMargin * 0.7,
colMargin + cellHeight / 2 + i / 100 * cellHeight);
//Column Labels
textAlign(RIGHT, CENTER);
String[] yearL = {"2005", "2006", "2007", "2008", "2009", "2010", "2011",
"2012", "2013"};
for(int i = 1; i <= numColumns; i++)
{
if(i % 12 == 0 ) //12 months
{
text(yearL[i / 12 - 1],
rowMargin - (cellWidth / 2) + (i * cellWidth) - (cellWidth * 3.5),
height - 1.8 * colMargin);
}
}
//Title
textAlign(CENTER, CENTER);
textFont(font2);
text("Harry Potter Book and Movie Checkouts", width / 2, height * 0.14);
//Outline Border for Cells
stroke(outlineBorder[colorTheme]);
strokeWeight(0.5);
noFill();
rect(rowMargin, colMargin, cellWidth * numColumns, cellHeight * numRows);
//Color Bar
if(colorTheme < 5)
{
noStroke();
for(int i = 0; i < barDiv[colorTheme]; i+=1)
{
colorMode(colorMode[colorTheme]);
colorBarFill(i);
rect(barPosX + (i * (barLength / barDiv[colorTheme])), barPosY,
barLength / barDiv[colorTheme], barHeight);
}
}
else //Fill the two color bars
{
noStroke();
for(int i = 0; i < barDiv[colorTheme]; i+=1)
{
colorMode(colorMode[colorTheme]);
barColor = 0;
colorBarFill(i);
rect(barPosX + (i * (barLength / barDiv[colorTheme])), barPosY,
barLength / barDiv[colorTheme], barHeight / 2);
barColor = 1;
colorBarFill(i);
rect(barPosX + (i * (barLength / barDiv[colorTheme])), barPosY + 30,
barLength / barDiv[colorTheme], barHeight / 2);
}
}
//Color Bar Outline
if(colorTheme < 5)
{
noFill();
colorMode(RGB);
stroke(colorBarOutline[colorTheme]);
strokeWeight(0.4);
rect(barPosX, barPosY, barLength, barHeight);
}
else //This is for the double bar
{
noFill();
colorMode(RGB);
stroke(colorBarOutline[colorTheme]);
strokeWeight(0.4);
rect(barPosX, barPosY, barLength, barHeight / 2);
rect(barPosX, barPosY + 30, barLength, barHeight / 2);
}
//Color Bar Label
if(colorTheme < 5)
{
fill(colorLabel[colorTheme]);
textAlign(CENTER, CENTER);
textFont(font1);
text("Least checkouts", barPosX, barPosY + 45);
text("Most checkouts", barPosX + barLength, barPosY + 45);
}
else //This is for the double bar
{
fill(colorLabel[colorTheme]);
textAlign(CENTER, CENTER);
textFont(font1);
text("Least checkouts", barPosX + 45, barPosY + 65);
text("Most checkouts", barPosX + barLength - 40, barPosY + 65);
text("Books", barPosX - 25, barPosY + 5);
text("Films", barPosX - 25, barPosY + 35);
}
}
- Attachments
-
- XcelCombined.csv
- Results from the second query
- (5.95 KiB) Downloaded 281 times
-
- HP_Summary.csv
- CSV Combined by Python Script
- (109.1 KiB) Downloaded 271 times
Last edited by nataly_moreno on Sun Feb 01, 2015 2:22 am, edited 3 times in total.