Proj 2 - 2D Visualization

glegrady
Posts: 138
Joined: Wed Sep 22, 2010 12:26 pm

Proj 2 - 2D Visualization

Post by glegrady » Thu Jan 03, 2019 1:26 pm

Schedule:
Jan 22: Assignment Description / Introduction to Processing
Jan 24: Visual Language Overview / class lab work / Student research on 2D visualization examples
Jan 29: Work-in-Progress presentation
Jan 31: Final project presentation / Documentation to be complete
---------------------------------------
Description
Create a 2D visualization in Processing using results from a MySQL query, stored in a csv file. Each cell's vertical (Y) and horizontal (X) position and color value to be determined by a 3 csv columns.

SIZE/DATA: Length and Height of the matrix each represent a data value. The 3rd value is to be represented by coloring cells either using, color in RGB, or saturation, or brightness in HSB, or any other way where each cell has a x,y, location, and it is differentiated from the other cells through color, or scale or any other means. Screen size should be large enough to explore detail (HD 1920 pixels x 1980 pixels). Your visualization should be centered on your screen.

COLOR CODING: 99% of science visualizations use the green – yellow- red color system. Experiment with/invent a different color scheme. Try http://tristen.ca/hcl-picker/#/hlc/6/1/21313E/EFEE69

LABELS and TEXTS: Use standard sans-serif fonts from the Swiss Graphic Design tradition for your texts and labels. Limit font sizes to a few. These include any of the following: Arial, Helvetica, Futura, Univers and related fonts.
---------------------------------------
We are demonstrating a 2D matrix demo in class to:
1) Introduce how to program in Processing;
2) how to organize the data in 2D;
3) How to label and use color in a 2D visualization
---------------------------------------
A few examples to explore:
Christina Last - 2D Visualization of Travel Books: http://vislab.mat.ucsb.edu/2018/p1/Chri ... index.html
Hannah Wolfe - Lost & Borrowed Books: http://vislab.mat.ucsb.edu/2017/p1/Hann ... index.html
Anastasiya Lazareva - Borrowed Co-Occurrence: http://vislab.mat.ucsb.edu/2015/anastas ... index.html (Processing 2.21)
Bo Yan Average - Standard deviation: http://vislab.mat.ucsb.edu/2015/boyan/p1/index.html
Robert Miller - 2D Spatial Treemap: Treemap: http://vislab.mat.ucsb.edu/2014/p1/Robert/index.html
Mike Godwin - Kohonen Self-Organizing 2D Map https://www.mat.ucsb.edu/g.legrady/acad ... index.html
---------------------------------------
Other Visualization and Data Science References:
Course Resources: https://www.mat.ucsb.edu/~g.legrady/aca ... ences.html
Chord Diagram: https://charticulator.com/gallery/globa ... urces.html
Nodes: http://www.visualcomplexity.com/vc/
--
Principal Component Analysis, Isomap, Locally Linear Embedding: https://www.youtube.com/watch?v=RJVL80G ... 8PLl8nGXmw
t-SNE: https://lvdmaaten.github.io/tsne/#implementations
https://www.displayr.com/using-t-sne-to ... rediction/
---------------------------------------
Your REPLY POST should include the following:
. Concept description
. MySQL Queries
. Processing time
. Sketches and work-in-progress screenshots of your project with descriptions
. Final results & Analysis
---------------------------------------
Please zip your processing code to include data folder for "ready to run". Add the zip file as an attachment.
George Legrady
legrady@mat.ucsb.edu

carlylarsson
Posts: 2
Joined: Thu Jan 10, 2019 11:02 am

Re: Proj 2 - 2D Visualization

Post by carlylarsson » Wed Jan 30, 2019 9:26 pm

I decided to use the data I collected from the first assignment to create word clouds.

sarahwells
Posts: 4
Joined: Thu Jan 10, 2019 10:52 am

Re: Proj 2 - 2D Visualization

Post by sarahwells » Wed Jan 30, 2019 9:39 pm

For this project I explored the popularity of a favorite childhood book series: A Series of Unfortunate Events by Lemony Snicket.
The new Netflix series was released in three parts from 2017 to 2019, so I was curious the relationship between this releases and the series checkouts. I also was interested to see the proportional number of checkouts of each book in the 13 book series. I queried the database for the total checkouts by month and book from 2016 to January 2019 with the following mysql script which took 0.159sec to run:

Code: Select all

SELECT 
    title,
    DATE_FORMAT(cout, '%Y-%m') AS yearmonth,
    COUNT(*) AS count
FROM
    spl_2016.outraw
WHERE
    title IN ('austere academy' , 'bad beginning',
        'carnivorous carnival',
        'end',
        'ersatz elevator',
        'grim grotto',
        'hostile hospital',
        'miserable mill',
        'penultimate peril',
        'reptile room',
        'slippery slope',
        'vile village',
        'wide window')
        AND callnumber IN ('J SNICKET' , 'DVD J LEMONY')
        AND YEAR(cout) > 2018
GROUP BY title , DATE_FORMAT(cout, '%Y-%m')
ORDER BY DATE_FORMAT(cout, '%Y-%m');
I created a pivot table in google sheets for the desired 2D data format. "TitleYearMonth Pivot Table 1.csv"

Then the first display I made showed Titles vs Months and represents popularity with shading and different colors for each title. I experimented with a number of color schemes but ended up using a rainbow to indicate the series progression pleasantly "darkerrainbow.jpg". [You can get back to this view by pressing 'o']
shot1.png
From here you can highlight a title and click to see the behavior of that specific title over the months. Here we see the Vile Village with the bar length normalized to the largest month count for any title and the shading indicative of popularity proportion for that specific title. One can use the arrows to navigate the different books in order from this view. [You can get back to this view by pressing 'a-m' representing books 1-13 respectively]
shot2.png
Now I was also interested in seeing the total trends and how much each title affected them. So I made a new table using the month total count and representing the portion provided by each title in the appropriate color. Here we can see the leap on/after release dates clearer. Those release dates being January 13th, 2017, March 30th, 2018, and January 1st, 2019. [You can get back to this view by pressing 't']
shot3.png
From here we can highlight a month, such as the most popular month, March 2017. This will give us the data on this month using the book cover images from SPL website as the marks and using size and height to indicate popularity.
shot4.png
Analysis: Using a variety of views, we find an increase in checkouts throughout the series in accordance with the Netflix series release dates. Generally the beginning of the series "The Bad Beginning" shows the most checkouts and the checkout numbers drop as we continue along the series. We also find that the rate of this decline each month is fairly consistent with the overall picture, indicating no strong correlation between which title was being released and which title is being read. I posit this may be because readers who do get past the first book are reading through the series rather than waiting to synchronize with the Netflix shows.

Here is my processing code for the final project: (Please excuse the mess of commented out ideas due to many competing edits)

Code: Select all

/********************************************************
 * Data Visualization - A Series of Unfortunate Events  *
 * Tested in Processing 3                               *     
 *                                                      *
 * Author: Sarah Wells                                  *
 * Supervisor: George Legrady                           *
 *                                                      *
 * Purpose: Data visualization by title and month for   *
 *          A Series of Unfortunate Events              *
 *                                                      *
 *                                                      *
 * Usage:   No key: Overview by Title per month ('o')   *
 *          'a' -1, 'b' -2, 'c' -3, 'd' -4, 'e' -5,     *
 *          'f' -6, 'g' -7, 'h'-8, 'i' -9, 'j' -10,     *
 *          'k' -11, 'l' -12, 'm' - 13 (Book Numbers)   *
 *          't' Month cummulative Titles                *
  *         arrows to move through Books or Months      *
 ********************************************************/

// Layout variables 
float constantHorizontalMargin = 160;
float constantVerticalMargin = 80;
float placeholder = 0;
int booknum = 14, monthnum = -1;

// width and height for each cell
float cellWidth, cellHeight, standardCellWidth, otherCellHeight, otherCellWidth, gridWidth, gridHeight, bookMaxHeight;

// 2D matrix table related variables
Table table;

int numRows, numColumns;
float[][] dataMatrix;
float maxValue=0, minValue=5000, maxMonth=0;
float[] maxMonthBook;

String[] titles;
String[] dateym;

PImage colorBar;
String[] colorPath = {"darkerrainbow.jpg"};


PImage[] books;

/*
int[][] positions = 
{{0,0},
{5,1},
{8,3},
{10,7},
{8,12},
{4,14},
{5,19},
{10,18},
{15,16},
{19,13},
{24,14},
{27,17},
{29,21}
};

*/




//diagonal 
int[][] positions = 
{{0,0},
{3,2},
{6,4},
{9,6},
{12,8},
{15,10},
{18,12},
{21,14},
{24,16},
{27,18},
{30,20},
{33,22},
{36,24}
};


/*
bookcolor =     R: 219 G: 187 B: 124
1 bad beginning =   R: 43 G: 55 B: 78
2 reptile room = R: 57 G: 12 B: 39
3 wide window =   R: 50 G: 62 B: 64
4 miserable mill =   R: 113 G: 97 B: 136
5 austere =   R: 84 G: 84 B: 86
6 ersatz = R: 212 G: 97 B: 104
7 vile = R: 91 G: 102 B: 88
8 hostile =   R: 201 G: 124 B: 96
9 carnivorous =   R: 60 G: 143 B: 185
10 slippery = R: 119 G: 99 B: 48
11 grim = R: 166 G: 211 B: 186
12 penultimate = R: 233 G: 69 B: 41
13 end = R: 167 G: 145 B: 108
*/

int[][] bookcolors = {{249,220,180},
{43,55,78},
{57,12,39},
{50,62,64},
{113,97,136},
{84,84,86},
{212, 97, 104},
{91, 102, 88},
{201, 124, 96},
{60, 143, 185},
{119, 99, 48},
{166, 211, 186},
{233, 69, 41},
{167, 145, 108}
};

boolean colorScheme = false;

PFont fontBold, fontReg;
String[] fontlist = PFont.list();



void setup(){
  size(1280, 800); // setup the size of the window
  // In Processing 3, the size() function must be the first line of code inside setup()
  // and needs to use numbers instead of variables. 

  // Make the screen resizable.                     
  //surface.setResizable(true);
  
  
  //check available fonts
  //println(fontlist);
  fontBold = createFont("TimesNewRomanPS-BoldMT.tff", 10);
  fontReg = createFont("TimesNewRomanPSMT.tff", 10);
  //set reg font
  textFont(fontReg, 10);

  
  //Put data into Processing table from csv
  table = loadTable("TitleYearMonth Pivot Table 1.csv");
  
  // Get row and column information
  numRows = table.getRowCount(); 
  numColumns = table.getColumnCount();
  println("Rows: " + numRows + " Columns: " + numColumns);
  
  //Rows - 17
  //Cols -38
  
  // Create a 2D array to receive data from the table 
  dataMatrix = new float[numRows-3][numColumns-2]; // doesn't need the first two columns, first three rows
  titles = new String[numRows-4];
  dateym = new String[numColumns-3];
  maxMonthBook = new float[numRows-3];
  books = new PImage[13];
  
  for(int i=0; i<numRows-3; i++){
    for(int j=0; j<numColumns-2; j++){
      dataMatrix[i][j] = table.getFloat(i+3,j+2);
     
      //Find the max/min value of the whole data except totals in last rows/columns
      if(dataMatrix[i][j] > maxValue && i<numRows-4 && j<numColumns-3){
        maxValue = dataMatrix[i][j];
      }
      if(dataMatrix[i][j] < minValue && i<numRows-4 && j<numColumns-3){
        minValue = dataMatrix[i][j];
      }
      if(dataMatrix[i][j] > maxMonth && j<numColumns-3){
        maxMonth = dataMatrix[i][j];
      }
      if(dataMatrix[i][j] > maxMonthBook[i] && i<numRows-4 && j<numColumns-3){
        maxMonthBook[i] = dataMatrix[i][j];
      }
    }
    //check max each book per month worked
    if(i<numRows-4){
    println("Max Month #" + (i+1) + ": " + maxMonthBook[i]);
    }
  }
  
  println("Max Value is " + maxValue);
  println("Min Value is " + minValue);
  //max = 56
  //min = 0
  
  //check total overall count
  println("Total overall checkouts: " + dataMatrix[numRows-4][numColumns-3]);
  //5683, correct
  
  //check max count per Month total overall count
    
  println("Total checkout in Max Month: " + maxMonth);
  //297, correct
  
    //modifying the cell size according to window size WAY ONE
  otherCellWidth = (width - constantHorizontalMargin*2)/(numColumns-3);
  standardCellWidth = (width - constantHorizontalMargin*2)/(numRows-4);
  otherCellHeight = (height - constantVerticalMargin*2)/(numRows-4);
  
  //grid 32*24
  gridWidth = (width - constantHorizontalMargin*2)/32;
  gridHeight = (height - constantVerticalMargin*2)/24;
  bookMaxHeight = (height - constantVerticalMargin*2)/10;
  
  //load book images
  for(int i=0; i<13; i++){
    if(i<9){
    books[i]= loadImage("0" + (i+1) + ".jpeg");
    }
    else{
      books[i]= loadImage((i+1) + ".jpeg");
    }
  }
  
  //modifying the cell size according to window size WAY TWO
  //cellWidth = (width - constantHorizontalMargin*2)/(numRows-3);
  cellWidth = (width - constantHorizontalMargin*2)/maxMonth;
  cellHeight = (height - constantVerticalMargin*3)/(numColumns-4);
  
    
    // put title and date in year month format data from the table to the array, print to check
    for(int i=0; i<numRows-4; i++){
    titles[i] = table.getString(i+3, 0);
      println("Title #" + (i+1) + " " + titles[i]);
    }
    for(int i=0; i<numColumns-3; i++){
    dateym[i] = table.getString(1, i+2);
    println("Date #" + (i+1) + " " + dateym[i]);
    }
  
  
}

void draw(){
  background(255);
  if(monthnum != -1){
    
  //bookcolored background
  //background(bookcolors[0][0],bookcolors[0][1], bookcolors[0][2]);
  background(0);
  }
  //backdrop of image white will have to modify for different views
  //fill(255);
  //rect(constantHorizontalMargin+60,constantVerticalMargin, 1280-2*constantHorizontalMargin, 736-2*constantVerticalMargin);
  
  translate(60, 0);
  colorBar = loadImage(colorPath[0]);
  
  /*
    if(monthnum != -1){
    //draw lines
  for(int i=0;i<12; i++){
    fill(0);
    stroke(20);
  line(constantHorizontalMargin + gridWidth*positions[i][0], constantHorizontalMargin + gridHeight*positions[i][1],
  constantHorizontalMargin + gridWidth*positions[i+1][0], constantHorizontalMargin + gridHeight*positions[i+1][1]);
  }
    }
     */
     // FIRST WAY
   // shading table
  //Draw 2D Matrix
  
  if(booknum==14&&monthnum==-1){
  for(int i=0; i<dataMatrix.length-1; i++){
    for(int j=0; j<dataMatrix[0].length-1; j++){
      noStroke();
      color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);
      fill(c, (int)255*dataMatrix[i][j]/maxValue);
      //color like the book:
      //fill(bookcolors[i+1][0],bookcolors[i+1][1],bookcolors[i+1][2], 50+(int)205*dataMatrix[i][j]/maxValue);
      rect(constantHorizontalMargin + otherCellWidth*j, constantVerticalMargin 
           + otherCellHeight*i, otherCellWidth, otherCellHeight);
    }
  }
  
    //Draw Title
  
  textAlign(CENTER, CENTER);
  textSize(16);
  fill(0);
  text("A Series of Unfortunate Events: Popularity By Month", width/2, constantVerticalMargin/2);

//side labels
  for(int i=0; i<titles.length; i++){
    //date year month format
    textAlign(RIGHT, CENTER);
          color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);

    fill(c);
    textSize(10);
          //color like the book:
      //fill(bookcolors[i+1][0],bookcolors[i+1][1],bookcolors[i+1][2]);
      if(mouseX > constantHorizontalMargin-30 && mouseX < constantHorizontalMargin
      && mouseY > (constantVerticalMargin +otherCellHeight*i) && mouseY <(constantVerticalMargin +otherCellHeight*(1+i))){
        textSize(16);
      }
      
    
    text(titles[i], constantHorizontalMargin-3, constantVerticalMargin +otherCellHeight*(0.5+i));
    
    
     //show total number on the right side
 fill(0);
    textAlign(LEFT, CENTER);
    text((int)dataMatrix[i][numColumns-3], width-constantHorizontalMargin+3,constantVerticalMargin +otherCellHeight*(0.5+i));
  }
  
  //bottom slanted labels
  
  
  for(int i=0; i<numColumns-3; i++){ 
    
 fill(0);
    
    textAlign(CENTER, TOP);
    textSize(12);
    translate(constantHorizontalMargin + otherCellWidth*i - 2, constantVerticalMargin + otherCellHeight*(numRows-3)-10);
    rotate(-1);
    text(dateym[i], 0, 0);
    rotate(1);
    translate(-(constantHorizontalMargin + otherCellWidth*i - 2), -(constantVerticalMargin + otherCellHeight*(numRows-3)-10));
  
}
  
  }
  else if(monthnum==-1&&booknum!=14){
  
  // SECOND WAY
  //cell based on book
  
  //Draw 2D Matrix
  for(int j=0; j<dataMatrix[0].length-1; j++){
    placeholder=0;
     for(int i=0; i<dataMatrix.length-1; i++){
     if(booknum!=0){
        i+=(booknum-1);
        cellWidth = (width - constantHorizontalMargin*2)/maxValue;
        //fill(5*(i-7)^2,255-5*(i-7)^2,(int) pow(1.4,i), (int)255*dataMatrix[i][j]/maxMonthBook[i]);
      }
      else{
         cellWidth = (width - constantHorizontalMargin*2)/maxMonth;
                  //fill((5*pow((i-7),2)),255 - (5*pow((i-7),2)), (int)pow(2,i) , (int)255*dataMatrix[i][j]/maxValue);

         //fill(5*(i-7)^2,255-5*(i-7)^2,(int) pow(1.4,i), (int)255*dataMatrix[i][j]/maxValue);
      }
      noStroke();
      strokeWeight(1);
      
      
        color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);
  
        if(booknum!=0){
        fill(c, (int)255*dataMatrix[i][j]/maxMonthBook[i]);
        //color like the book
        //fill(bookcolors[i+1][0],bookcolors[i+1][1],bookcolors[i+1][2], 50+(int)205*dataMatrix[i][j]/maxMonthBook[i]);
        
        }
        else{
        fill(c);
        if(mouseX>(constantHorizontalMargin)
        && mouseX < (1280-constantHorizontalMargin)
        && mouseY > (constantVerticalMargin + cellHeight*j)
        && mouseY < (constantVerticalMargin + cellHeight*(j+1))
        ){
          fill(c, 100);
        }
        //color like the book
        //fill(bookcolors[i+1][0],bookcolors[i+1][1],bookcolors[i+1][2]);
        }
        
        //sets to book per key pressed and rescales

      
      /*
      if(!colorScheme){
        float greyScale = map(dataMatrix[i][j], minValue, maxValue, 255, 0);
        fill(greyScale);
      }
      else{
        color c = colorBar.get((int)map(dataMatrix[i][j], minValue, maxValue, 2, colorBar.width-2), colorBar.height/2);
        fill(c);
      }
      
      */
      rect(constantHorizontalMargin + placeholder, constantVerticalMargin 
           + cellHeight*j, cellWidth*dataMatrix[i][j], cellHeight);
      placeholder += cellWidth*dataMatrix[i][j];
              
        //ends loop, only the one book
      if(booknum!=0){
        i=dataMatrix.length;
      }
    }
  }
  
  

  
    
//bottom slanted labels
  
if(booknum==0){
  
  for(int i=0; i<numRows-4; i++){ 
    
    color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);
    fill(c);
            //color like the book
        //fill(bookcolors[i+1][0],bookcolors[i+1][1],bookcolors[i+1][2]);
    
    textAlign(CENTER, TOP);
    textSize(12);
    translate(constantHorizontalMargin + standardCellWidth*i - 2, constantVerticalMargin +cellHeight*(numColumns-3) + 60);
    rotate(-1);
    text(titles[i], 0, 0);
    rotate(1);
    translate(-(constantHorizontalMargin + standardCellWidth*i - 2), -(constantVerticalMargin +cellHeight*(numColumns-3) + 60));
  }
  
}
   
  //Draw vertical labels

      
  for(int j=0; j<dateym.length; j++){
    //date year month format
    textAlign(RIGHT, CENTER);
    fill(0);
    textSize(10);
    
            if(mouseX>(constantHorizontalMargin)
        && mouseX < (1280-constantHorizontalMargin)
        && mouseY > (constantVerticalMargin + cellHeight*j)
        && mouseY < (constantVerticalMargin + cellHeight*(j+1))
        ){
          textSize(16);
        }
        
    text(dateym[j], constantHorizontalMargin-3, constantVerticalMargin +cellHeight*(0.5+j));
    
    
     //show total number on the right side
         if(booknum ==0){
    textAlign(LEFT, CENTER);
    text((int)dataMatrix[numRows-4][j], width-constantHorizontalMargin+3,constantVerticalMargin +cellHeight*(0.5+j));
         }
         else{
           textAlign(LEFT, CENTER);
           //textFont('Arial Bold');
    text((int)dataMatrix[booknum-1][j], width-constantHorizontalMargin+3,constantVerticalMargin +cellHeight*(0.5+j));
         }
  }

  
 
  //Draw Title
  
  textAlign(CENTER, CENTER);
  textSize(16);
  fill(0);
  if(booknum ==0){
  text("A Series of Unfortunate Events: Popularity By Month", width/2, constantVerticalMargin/2);
  }
  else{
    
   color c = colorBar.get((int)((colorBar.width-2)*(booknum)/13), colorBar.height/2);
    fill(c);
            //color like the book
        //fill(bookcolors[booknum][0],bookcolors[booknum][1],bookcolors[booknum][2]);

    text(titles[booknum-1], width/2, constantVerticalMargin/2);
  }
  
  
  
  }
  else{
    //do stuff for monthnum
    placeholder=0;
    
    //bookMaxHeight
    
   
    
     for(int i=0; i<dataMatrix.length-1; i++){
       
                cellWidth = (width - constantHorizontalMargin*2)/maxMonth;
                
                  //fill((5*pow((i-7),2)),255 - (5*pow((i-7),2)), (int)pow(2,i) , (int)255*dataMatrix[i][j]/maxValue);

         //fill(5*(i-7)^2,255-5*(i-7)^2,(int) pow(1.4,i), (int)255*dataMatrix[i][j]/maxValue);
      noStroke();
      /*position equations
       image(books[i], constantHorizontalMargin + placeholder, constantVerticalMargin 
           , (int) bookMaxHeight*dataMatrix[i][monthnum]/(1.5*dataMatrix[numRows-4][monthnum]/13), 
           (int) bookMaxHeight*dataMatrix[i][monthnum]/(dataMatrix[numRows-4][monthnum]/13));
           
            textAlign(CENTER, TOP);
    textSize(12);
    translate(constantHorizontalMargin + placeholder, constantVerticalMargin +cellHeight*(numColumns-3) -280);
    rotate(-1);
    text(titles[i], 0, 0);
    rotate(1);
    translate(-(constantHorizontalMargin + placeholder), -(constantVerticalMargin +cellHeight*(numColumns-3) -280));
    
      placeholder += bookMaxHeight*dataMatrix[i][monthnum]/(1.5*dataMatrix[numRows-4][monthnum]/13)+5;

*/

/*

image(books[i], constantHorizontalMargin + gridWidth*positions[i][0], constantVerticalMargin + gridWidth*positions[i][1]
           , (int) bookMaxHeight*dataMatrix[i][monthnum]/(1.5*dataMatrix[numRows-4][monthnum]/13), 
           (int) bookMaxHeight*dataMatrix[i][monthnum]/(dataMatrix[numRows-4][monthnum]/13));
           
           */
           
           
           /*
           
image(books[i], constantHorizontalMargin + i*standardCellWidth, 680-36*bookMaxHeight*dataMatrix[i][monthnum]/dataMatrix[numRows-4][monthnum]
           , (int) bookMaxHeight,
           (int) bookMaxHeight*3/2);
           */
           
           image(books[i], constantHorizontalMargin + i*standardCellWidth, 680-640*dataMatrix[i][monthnum]/56,
           (int) bookMaxHeight*dataMatrix[i][monthnum]/(1.5*dataMatrix[numRows-4][monthnum]/10),
           (int) bookMaxHeight*dataMatrix[i][monthnum]/(dataMatrix[numRows-4][monthnum]/10));
           
           
           
           
          
      

       /*
         cellWidth = (width - constantHorizontalMargin*2)/maxMonth;
                  //fill((5*pow((i-7),2)),255 - (5*pow((i-7),2)), (int)pow(2,i) , (int)255*dataMatrix[i][j]/maxValue);

         //fill(5*(i-7)^2,255-5*(i-7)^2,(int) pow(1.4,i), (int)255*dataMatrix[i][j]/maxValue);
      noStroke();
      
      
        color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);
        fill(c);
        




      rect(constantHorizontalMargin + placeholder, constantVerticalMargin 
           , cellWidth*dataMatrix[i][monthnum], cellHeight*(numRows-4));
           
           
           
           textAlign(CENTER, TOP);
    textSize(12);
    translate(constantHorizontalMargin + placeholder, constantVerticalMargin +cellHeight*(numColumns-3) -280);
    rotate(-1);
    text(titles[i], 0, 0);
    rotate(1);
    translate(-(constantHorizontalMargin + placeholder), -(constantVerticalMargin +cellHeight*(numColumns-3) -280));
    
      placeholder += cellWidth*dataMatrix[i][monthnum];
      
      */
              
    }
    
      //Draw Title
  
  textAlign(CENTER, CENTER);
  textSize(20);
  fill(255);
  text("A Series of Unfortunate Events Popularity: " + dateym[monthnum] + " Total: " + dataMatrix[numRows-4][monthnum], width/2, constantVerticalMargin/2);

  
      /*
      for(int i=0; i<numRows-4; i++){ 
    
    color c = colorBar.get((int)((colorBar.width-2)*(i+1)/13), colorBar.height/2);
    fill(c);
    
    textAlign(CENTER, TOP);
    textSize(12);
    translate(constantHorizontalMargin + standardCellWidth*i - 2, constantVerticalMargin +cellHeight*(numColumns-3) -260);
    rotate(-1);
    text(titles[i], 0, 0);
    rotate(1);
    translate(-(constantHorizontalMargin + standardCellWidth*i - 2), -(constantVerticalMargin +cellHeight*(numColumns-3) -260));
  }
  */
  }
}


void mouseClicked(){
  if(monthnum==-1 && booknum==0 && mouseX > constantHorizontalMargin && mouseX < 1280 - constantHorizontalMargin/2){
    
  for(int j=0; j<dateym.length; j++){

    if(mouseY > (constantVerticalMargin + cellHeight*j) && mouseY <= (constantVerticalMargin +cellHeight*(1+j))){
      monthnum=j;
    }
    
  }
  }
  else if(booknum==14){
    for(int i=0; i<13; i++){
          if(mouseX > constantHorizontalMargin-30 && mouseX < constantHorizontalMargin
      && mouseY > (constantVerticalMargin +otherCellHeight*i) && mouseY <(constantVerticalMargin +otherCellHeight*(1+i))){
        booknum = i+1;
      }
    }
  }
  else{
    monthnum=-1;
    booknum=0;
  }
}


void keyPressed(){
  if(monthnum>-1 && monthnum <34 && key == CODED && (keyCode == UP || keyCode == RIGHT)){
    monthnum +=1;
  }
  else if(monthnum>0 && monthnum <35 && key == CODED && (keyCode == DOWN || keyCode == LEFT)){
    monthnum -=1;
  }
  else if(monthnum==-1 && booknum >0  && booknum <13 && key == CODED && (keyCode == UP || keyCode == RIGHT)){
    booknum +=1;
  }
  else if(monthnum==-1 && booknum >1 && booknum <=13 && key == CODED && (keyCode == DOWN || keyCode == LEFT)){
    booknum -=1;
  }
  else{
  monthnum=-1;
  if(key == 'a'){
    booknum = 1;
  }
    else if(key == 'b'){
    booknum = 2;
  }
    else if(key == 'c'){
    booknum = 3;
  }
    else if(key == 'd'){
    booknum = 4;
  }
    else if(key == 'e'){
    booknum = 5;
  }
    else if(key == 'f'){
    booknum = 6;
  }
    else if(key == 'g'){
    booknum = 7;
  }
    else if(key == 'h'){
    booknum = 8;
  }
    else if(key == 'i'){
    booknum = 9;
  }
    else if(key == 'j'){
    booknum = 10;
  }
    else if(key == 'k'){
    booknum = 11;
  }
    else if(key == 'l'){
    booknum = 12;
  }
    else if(key == 'm'){
    booknum = 13;
  }
  else if(key == 'o'){
    booknum=14;
  }
  else if(key =='t'){
    booknum=0;
  }
  }
}

Attachments
SarahWellsSeriesOfUnfortunateEvents2D.zip
(119.6 KiB) Downloaded 27 times
Last edited by sarahwells on Thu Jan 31, 2019 4:09 pm, edited 2 times in total.

meilinshi
Posts: 4
Joined: Thu Jan 10, 2019 10:57 am

Re: Proj 2 - 2D Visualization

Post by meilinshi » Thu Jan 31, 2019 12:00 am

Concept Description
I’m interested in the checkouts within Dewey Class 400 – Language. For this project, I focused on 10 languages listed below and would like to see if there is a relative popularity in language learning at Seattle Public Library. To measure popularity, I used total checkouts number and average keep time of language related items from 2006 to 2018.
language.png

MySQL Queries & Processing Time
Query #1 On total checkouts (27.002 sec)

Code: Select all

SELECT 
    YEAR(cout) AS years,
    MONTH(cout) AS months,
    COUNT(IF(deweyClass >= 420 AND deweyClass < 430,
        1,
        NULL)) AS 'English',
    COUNT(IF(deweyClass >= 430 AND deweyClass < 440,
        1,
        NULL)) AS 'German & related',
    COUNT(IF(deweyClass >= 440 AND deweyClass < 450,
        1,
        NULL)) AS 'French & related',
    COUNT(IF(deweyClass >= 450 AND deweyClass < 460,
        1,
        NULL)) AS 'Italian & related',
    COUNT(IF(deweyClass >= 460 AND deweyClass < 470,
        1,
        NULL)) AS 'Spanish & related',
    COUNT(IF(deweyClass >= 491.7
            AND deweyClass < 491.8,
        1,
        NULL)) AS 'Russian',
    COUNT(IF(deweyClass >= 492.7
            AND deweyClass < 492.8,
        1,
        NULL)) AS 'Arabic',
    COUNT(IF(deweyClass >= 495.1
            AND deweyClass < 495.2,
        1,
        NULL)) AS 'Chinese',
    COUNT(IF(deweyClass >= 495.6
            AND deweyClass < 495.7,
        1,
        NULL)) AS 'Japanese',
    COUNT(IF(deweyClass >= 495.7
            AND deweyClass < 495.8,
        1,
        NULL)) AS 'Korean'
FROM
    spl_2016.outraw
WHERE
    deweyClass >= 420 AND deweyClass < 495.8
        AND YEAR(cout) BETWEEN 2006 AND 2018
GROUP BY MONTH(cout),YEAR(cout)
ORDER BY YEAR(cout), MONTH(cout);
Query #2 On average keeping time of items (first half: 39.060 sec, second half: 65.159 sec)

Code: Select all

SELECT 
    class, years, AVG(TIMESTAMPDIFF(DAY, cout, cin)) AS AVG_TIME
FROM
    (SELECT 
        #SUBSTRING(deweyClass, 1, 5) AS class,
        SUBSTRING(deweyClass, 1, 2) AS class, #for deweyClass 420 to 470
            YEAR(cout) AS years,
            cin,
            cout,
            TIMESTAMPDIFF(DAY, cout, cin)
    FROM
        spl_2016.inraw
    WHERE
        YEAR(cout) BETWEEN 2006 AND 2018
            AND TIMESTAMPDIFF(DAY, cout, cin) > 0
	#		AND deweyClass >= 420 AND deweyClass < 470
	#         AND (deweyClass >= 491.7
	#         AND deweyClass < 491.8
        #         OR deweyClass >= 492.7
        #         AND deweyClass < 492.8
	#         OR deweyClass >= 495.1
	#         AND deweyClass < 495.2
	#         OR deweyClass >= 495.6
	#         AND deweyClass < 495.7
	#         OR deweyClass >= 495.7
	#         AND deweyClass < 495.8)
    GROUP BY years , class , cin , cout) AS aTable
GROUP BY class , years
Sketches & Screenshots
I was trying to show the checkouts number by matrix and use color to convey the change in pattern. With time on x-axis and language on y axis, I put keeping time as another layer of data overlaid on the checkouts matrix.
Image from iOS (1).jpg
Image from iOS.png

I used two colorbars to show the total checkout pattern [press ‘c’ to change colorbar]. There is some minor difference, which I assume is because of the different cutoff value for each colorbar (i.e. see difference in Arabic and Korean). I put average number of checkouts for each language on the right to better understand.
color.gif
I also highlighted each year of data and added language name and dewey class number for each flag icon.
full.gif
And then I would like to visualize the pattern in keeping time of items. [press ‘t’ to show the keeping time’] I normalized the data by cell height and added a line within each row to show the change in keeping time of items over time.
3.png

Final Result & Analysis
From the visualization, we can see that English and Spanish are the two most popular languages. Followed by them are French, German, Italian, Chinese and Japanese. Russian, Arabic and Korean are the three least popular languages from the selected ten. And the overall checkouts shows a decreasing pattern, with a peak from 2008 to 2012.
Keeping time is higher for English, German, Russian (the highest), Arabic and Korean at the beginning, but interestingly after 2012, all of them decreased to 25-30 days and remained stable since then.
Attachments
meilin_2dvisualization.zip
(33.45 KiB) Downloaded 28 times

suburtner
Posts: 4
Joined: Thu Jan 10, 2019 10:58 am

Re: Proj 2 - 2D Visualization

Post by suburtner » Thu Jan 31, 2019 3:21 am

Proj 2 - 2D Visualization
Susan Burtner
----------

Concept description

Create a network where nodes represent subjects of maps and edges represent whether the subjects are in the same collection. The size of the nodes correspond to the number of titles in that subject. While I have used the shared attributed of collectionCode for this visualization, you could use something like deweyClass as well.

MySQL queries

Query

Code: Select all

SELECT `subject`, COUNT(title.title) AS numberOfTitlesInSubject, collectionCode.collectionCode#,
    #COUNT(outraw.itemNumber) AS numberOfCheckouts, 
FROM title, `subject`, itemType, itemToBib, collectionCode#, outrraw
WHERE itemType.itemtype LIKE '%acmap' AND
    collectionCode.itemNumber = itemType.itemNumber AND
    itemType.itemNumber = itemToBib.itemNumber AND
    itemToBib.bibNumber = subject.bibNumber AND
    subject.bibNumber = title.bibNumber
    #title.bibNumber = outraw.bibNumber
GROUP BY `subject`, collectionCode
ORDER BY numberOfTitlesInSubject DESC
LIMIT 20;
Output
table.png
Processing time

> 2.10 s

Sketches and work-in-progress screenshots of your project with descriptions

Initial sketch
sketch1.JPG
The first sketch of my idea for Assignment #2. While I wanted to do deweyClass originally, there were so few maps with Dewey classifications. I went with the collectionCode instead.

Work-in-progress screenshots
error1.png
Placing nodes randomly on the screen.
error2.png
Trying to place nodes on a circle. My error was that I was only comparing each node value to the next one, when I needed to compare it to every other one.
error3.png
Trying to get more meaningful sizes and colors.
error4.png
Normalizing the size of the nodes using log.


Final results & analysis
final.png
The final product (so far.) I rotated the nodes, modified the colors, and added labels and text.

This 2D visualization can tell us about how map items are often grouped together by subject. For example, if you look at "Spotted Owl," you can see that it is in the same collection as subjects concerning "wildlife management California," "wildlife management Northwest Pacific," "Endangered species California," and "Endangered species Northwest Pacific." It is interesting to note that maps with no subject have the most titles in them, followed by topographic maps. It also seems like the more specific the map, such as "Peru maps" and "Carson National Forest Maps," the less likely it is to be a part of a collection.

Overall I'm happy with how the visualization turned out. The most difficult part was putting the labels on the graph such that they were close to the nodes, but offset by the width of the node, and this changes according to whichever quadrant you are in (if you think in terms of the "top-left" quadrant, "bottom-right," etc.) In the future, I would like to add an action, in particular, if you move your mouse over a node, it highlights which edges and nodes it is connected to.


REVISION: March 4th, 2019

I have modified the original assignment in the following ways:
  • include functionality of hovering over a node to see which other subjects are in the same collection
  • show collection name when hovering over a node
as2_modifed.png
Attachments
mapNetwork.zip
(7.9 KiB) Downloaded 15 times
Last edited by suburtner on Mon Mar 04, 2019 3:14 pm, edited 2 times in total.

anaghauppal
Posts: 3
Joined: Thu Jan 10, 2019 10:50 am

Media Type Visualization

Post by anaghauppal » Thu Jan 31, 2019 11:54 am

My work is very much a work-in-progress, leading to more questions than answers. I used the data collected for my previous SQL project (checkouts by media type), and slightly changed the formats of my files by re-running my query a bit differently.

Code: Select all

SELECT YEAR(cout) AS CheckoutYear,
	MONTH(cout) AS CheckoutMonth,
	count(id) AS CheckoutCount
FROM spl_2016.outraw
WHERE itemtype='acbk'
GROUP BY YEAR(cout), MONTH(cout)
I produced four such files book, dvd, VHS and Cd-ROM.

My initial hope was to do a circular chart, which I am still working to figure out - I've made some progress.

But for the time being, I decided to pursue a simpler bar-based chart, which still has many problems code-wise, but there seem to be bigger data-related problems with it. Image
Attachments
Screen Shot 2019-01-31 at 11.50.13 AM.png
Project2.zip
(10.25 KiB) Downloaded 18 times

jiaheng
Posts: 4
Joined: Wed Jan 16, 2019 10:17 am

Re: Proj 2 - 2D Visualization

Post by jiaheng » Thu Jan 31, 2019 2:51 pm

Concept Description
I'm interested in one particular metadata - itemNumber. It is a unique and incrementing id that gets assigned and incremented each time a new item gets acquired. I want to plot the itemNumber as well as the deweyClass of each item over time to see if there are any interesting patterns.

MySQL Queries

Code: Select all

SELECT DISTINCT
    i.itemNumber,
    i.bibNumber,
    it.itemType,
    t.title,
    b.deweyClass
FROM
    spl_2016.itemToBib AS i
    INNER JOIN spl_2016.itemType AS it ON i.itemNumber = it.itemNumber
    INNER JOIN spl_2016.title AS t ON i.bibNumber = t.bibNumber
    INNER JOIN spl_2016.deweyClass AS b ON i.bibNumber = b.bibNumber
WHERE
    b.deweyClass > 0
ORDER BY
    i.itemNumber
This query returns itemNumber, bibNumber, itemType, title and deweyClass of all the items sorted by itemNumber. It takes about 29 seconds and returns 2,391,281 results.
It's worth noting that there are duplicates with the same item number, and only differs slightly in their titles. I removed them later when loading into processing. After removing the duplicates, there remain 2,326,621 records.
I then used a python script to substitute the itemType of each record to book or media or misc based on their original itemType. Here is the python script I wrote.

Code: Select all

import csv

MEDIA = {'accas', 'arcas', 'bccas', 'jccas', 'jrcas', 'accd', 'arcd', 'jccd', 'jrcd', 'accdrom', 'arcdrom',
         'bccdrom', 'drcdrom', 'jccdrom', 'acdisk', 'ardisk', 'jrdisk', 'acdvd', 'ardvd', 'bcdvd', 'jcdvd', 'jrdvd', 'xrcdrom', 'acrec', 'arrec', 'jcrec', 'jrrec', 'ucflpdr', 'acvhs', 'alvhs', 'arvhs', 'bcvhs', 'blvhs', 'jcvhs', 'jlvhs', 'jrvhs', 'xrvhs', 'scmed', 'acvid', 'arvid', 'arweb', 'jrweb'}
BOOK = {'acbk', 'arbk', 'bcbk', 'drbk', 'jcbk', 'jrbk', 'bccd'}
MISC = {'acfold', 'acart', 'dcillb', 'dcilll', 'ackit', 'arkit', 'bckit', 'jckit', 'jrkit', 'xrper', 'acper', 'arper', 'bcper', 'drper', 'jrper', 'acmap', 'armap', 'ahmfc', 'armfc', 'drmfc',
        'ahmfm', 'armfm', 'drmfmnp', 'drmfper', 'acmus', 'armus', 'xrmus', 'arnp', 'drnp', 'dcord', 'drord', 'acunkn', 'arunkn', 'acpam', 'arpam', 'xrpam', 'xrbk', 'acphoto', 'arphoto', 'acpost', 'arpost', 'acslide', 'arslide', 'jcmus', 'jrmus', 'drtest', 'ucfold', 'ucunknj', 'ucunkn', 'unk'}


def main():
    with open('data/out.csv', 'r') as csv_file:
        with open('data/out_processed.csv', 'w+') as out:
            csv_reader = csv.DictReader(csv_file, delimiter=',')
            fieldnames = ["itemNumber", "bibNumber",
                          "itemType", "title", "deweyClass"]
            csv_writer = csv.DictWriter(out, fieldnames=fieldnames)
            csv_writer.writeheader()
            for row in csv_reader:
                item_type = row['itemType']
                if item_type in MEDIA:
                    row['itemType'] = 'media'
                elif item_type in BOOK:
                    row['itemType'] = 'book'
                elif item_type in MISC:
                    row['itemType'] = 'misc'
                else:
                    row['itemType'] = 'misc'
                csv_writer.writerow(row)
            print('Done!')


if __name__ == '__main__':
    main()
WIP screenshots
I started with a simple black background and plotting on x-y plane with deweyClass on the y-axis and item number on the x-axis. Since the item number is merely a number that gets incremented over time, it's hard to get the exact time that the item was bought.
wip.png
I then changed the background color to white and decided to use three different colors to represent three categories, book, media or misc. I used the color from Maroon 5's album cover "Red Pill Blues".
wip_full.png
wip_book.png
wip_media.png
wip_misc.png
Lastly, I added a title, some instructions on how to use the program, and hover text when moving the mouse over a dot. The hover text can be switched between showing the current dot's Dewey class or showing its title.
final_full_title.png
final_full.png
final_book.png
final_media.png
final_misc.png
Optimization
I also really care about the performance of the program. After finishing the project, I spent some time to try to optimize my program further. Initially, I put my drawing function inside draw(), which runs in a loop. Because there are about 2.3 million circles that need to be drawn on the canvas, it takes about 30 seconds to redraw each time, and it makes keyboard and mouse interaction practically impossible since all the time are spent on drawing. After some research, I came up with using PGraphics. It allows me to draw to a buffer and save it, and later I can just use the content from the buffer without needing to redraw the frame. Therefore I used 4 PGraphics, to save the full, the book, the media, and the misc graph individually when the application starts up. I also put each of the drawing function in separate threads so that the operation can be done in parallel. This really improved the performance of my application. Currently, it takes about 30 seconds on startup to draw the four graphs, and then it takes no time to redraw and switch between them. Of course, I can further divide the drawing area into smaller ones and use more threads to further accelerate if I feel the need to.

Final Results & Analysis
From the graph, we can clearly see that there are distinct vertical and horizontal lines that are missing the dots. Horizontal lines mean that some particular Dewey classes are never bought, in this case, 28 - 50 in Computer Science, 134 - 154 in Philosophy, 302 - 307 in Social Sciences, 917 - 929 are some of the examples. Vertical lines mean that there are some item numbers that are missing. However, it's really hard to find the exact purchase date of the item, since I wasn't even sure if the item number is evenly distributed. Therefore, I added the hover text function, so that when the mouse is over some dot, it will show the title of that item. I think this can give more information about the items and we can use the title to get the publish date of the item.

Future Improvements
I thought about adding the checkout time of each item, for example, if I can get the first checkout time of each item, I can then use that as an indication of approximately when the item is purchased. I wrote two versions of the query to do that, as shown below. However, they are all too slow to run, and I couldn't finish running them in time. I'm hoping to find new ways to go about it.

Code: Select all

SELECT
    i.itemNumber,
    i.bibNumber,
    it.itemType,
    cout_tmp.cout,
    cout_tmp.title,
    b.deweyClass
FROM
    spl_2016.itemToBib AS i
    INNER JOIN spl_2016.itemType AS it ON i.itemNumber = it.itemNumber
    INNER JOIN (
        SELECT DISTINCT ir.itemNumber, ir.cout, ir.title
    FROM spl_2016.inraw AS ir
        INNER JOIN spl_2016.inraw AS ir2
        ON ir.itemNumber=ir2.itemNumber AND ir.cout < ir2.cout
    ) AS cout_tmp
    ON (i.itemNumber=cout_tmp.itemNumber)
    INNER JOIN spl_2016.deweyClass AS b ON i.bibNumber = b.bibNumber
WHERE
    b.deweyClass > 0
ORDER BY
    i.itemNumber

Code: Select all

SELECT
    i.itemNumber,
    i.bibNumber,
    it.itemType,
    MIN(ir.cout),
    ir.title,
    b.deweyClass
FROM
    spl_2016.itemToBib AS i
    INNER JOIN spl_2016.itemType AS it ON i.itemNumber = it.itemNumber
    INNER JOIN spl_2016.inraw AS ir ON i.itemNumber=ir.itemNumber
    INNER JOIN spl_2016.deweyClass AS b ON i.bibNumber = b.bibNumber
WHERE
    b.deweyClass > 0
GROUP BY i.itemNumber, i.bibNumber, it.itemType, ir.title, b.deweyClass
ORDER BY i.itemNumber
Updates * 2/7/2019
I've made a few updates to make the visualization clearer and the program more user-friendly. Changes include:
  • hover text shows dewey class and title together
  • hover text now adjusts according to position and won't be out of canvas
  • change background and legend colors
  • make the circle smaller
  • add comment section and more comments
Below are some screenshots.
update_full.png
update_book.png
update_media.png
update_misc.png
time_series.zip
(86.98 MiB) Downloaded 20 times
Last edited by jiaheng on Thu Feb 07, 2019 12:58 pm, edited 2 times in total.

wilsonmui
Posts: 5
Joined: Mon Apr 16, 2018 10:21 am

Re: Proj 2 - 2D Visualization

Post by wilsonmui » Thu Jan 31, 2019 2:56 pm

Concept description

In this project, I wanted to visualize the checkout activity for Dewey classes 914-918 over a four year period. The goal was to create a visualization that could easily show checkout activity over a period of time. Hopefully, it would make changes in checkout activity very clear to viewers. Dewey classes in 910 relate to geography and travel. The classes 914-918 refer to Europe, Asia, Africa, North America, and South America.

The changes in checkouts is mainly shown through the use of color. For the colors, each row also is independent of the others. This means that the checkout volume may vary greatly between different categories, but the color will not represent this. To show the difference in checkout volume, the font size of the categories is changed. Bigger font will mean that there are more checkouts overall.

Pressing numbers 0-4 would change the colors used. Some color schemes have a much greater difference in color, and this could make the changes in checkouts more apparent. Hovering the mouse over a colored square would display the number of checkouts for the category over that month.

MySQL Queries

Code: Select all

SELECT 
    SUBSTR(deweyClass, 1, 3) AS dClass,
    SUM(IF(MONTH(cout) = 1, 1, 0)) AS Jan,
    SUM(IF(MONTH(cout) = 2, 1, 0)) AS Feb,
    SUM(IF(MONTH(cout) = 3, 1, 0)) AS Mar,
    SUM(IF(MONTH(cout) = 4, 1, 0)) AS Apr,
    SUM(IF(MONTH(cout) = 5, 1, 0)) AS May,
    SUM(IF(MONTH(cout) = 6, 1, 0)) AS Jun,
    SUM(IF(MONTH(cout) = 7, 1, 0)) AS Jul,
    SUM(IF(MONTH(cout) = 8, 1, 0)) AS Aug,
    SUM(IF(MONTH(cout) = 9, 1, 0)) AS Sep,
    SUM(IF(MONTH(cout) = 10, 1, 0)) AS Oct,
    SUM(IF(MONTH(cout) = 11, 1, 0)) AS Nov,
    SUM(IF(MONTH(cout) = 12, 1, 0)) AS 'Dec'
FROM
    spl_2016.outraw
WHERE
    (FLOOR(deweyClass) = 914
        OR FLOOR(deweyClass) = 915
        OR FLOOR(deweyClass) = 916
        OR FLOOR(deweyClass) = 917
        OR FLOOR(deweyClass) = 918)
        AND YEAR(cout) BETWEEN 2014 AND 2018
GROUP BY dClass;
Time to process: 54 seconds.
When first creating the visualization, I used the same dataset, but combined the months of each year. "January" represented the sum of checkouts of the Januarys from 2015-2018. I hoped this would help exaggerate the variances to better view the patterns. However, the variance did not look big enough, so the pattern was hard to see.

Code: Select all

SELECT 
    SUBSTR(deweyClass, 1, 3) AS dClass,
    SUM(IF(MONTH(cout) = 1, 1, 0) and YEAR(cout)=2015) AS "1/15",
    SUM(IF(MONTH(cout) = 2, 1, 0) and YEAR(cout)=2015) AS "2/15",
    SUM(IF(MONTH(cout) = 3, 1, 0) and YEAR(cout)=2015) AS "3/15",
    SUM(IF(MONTH(cout) = 4, 1, 0) and YEAR(cout)=2015) AS "4/15",
    SUM(IF(MONTH(cout) = 5, 1, 0) and YEAR(cout)=2015) AS "5/15",
    SUM(IF(MONTH(cout) = 6, 1, 0) and YEAR(cout)=2015) AS "6/15",
    SUM(IF(MONTH(cout) = 7, 1, 0) and YEAR(cout)=2015) AS "7/15",
    SUM(IF(MONTH(cout) = 8, 1, 0) and YEAR(cout)=2015) AS "8/15",
    SUM(IF(MONTH(cout) = 9, 1, 0) and YEAR(cout)=2015) AS "9/15",
    SUM(IF(MONTH(cout) = 10, 1, 0) and YEAR(cout)=2015) AS "10/15",
    SUM(IF(MONTH(cout) = 11, 1, 0) and YEAR(cout)=2015) AS "11/15",
    SUM(IF(MONTH(cout) = 12, 1, 0) and YEAR(cout)=2015) AS "12/15",
    SUM(IF(MONTH(cout) = 1, 1, 0) and YEAR(cout)=2016) AS "1/16",
    SUM(IF(MONTH(cout) = 2, 1, 0) and YEAR(cout)=2016) AS "2/16",
    SUM(IF(MONTH(cout) = 3, 1, 0) and YEAR(cout)=2016) AS "3/16",
    SUM(IF(MONTH(cout) = 4, 1, 0) and YEAR(cout)=2016) AS "4/16",
    SUM(IF(MONTH(cout) = 5, 1, 0) and YEAR(cout)=2016) AS "5/16",
    SUM(IF(MONTH(cout) = 6, 1, 0) and YEAR(cout)=2016) AS "6/16",
    SUM(IF(MONTH(cout) = 7, 1, 0) and YEAR(cout)=2016) AS "7/16",
    SUM(IF(MONTH(cout) = 8, 1, 0) and YEAR(cout)=2016) AS "8/16",
    SUM(IF(MONTH(cout) = 9, 1, 0) and YEAR(cout)=2016) AS "9/16",
    SUM(IF(MONTH(cout) = 10, 1, 0) and YEAR(cout)=2016) AS "10/16",
    SUM(IF(MONTH(cout) = 11, 1, 0) and YEAR(cout)=2016) AS "11/16",
    SUM(IF(MONTH(cout) = 12, 1, 0) and YEAR(cout)=2016) AS "12/16",
    SUM(IF(MONTH(cout) = 1, 1, 0) and YEAR(cout)=2017) AS "1/17",
    SUM(IF(MONTH(cout) = 2, 1, 0) and YEAR(cout)=2017) AS "2/17",
    SUM(IF(MONTH(cout) = 3, 1, 0) and YEAR(cout)=2017) AS "3/17",
    SUM(IF(MONTH(cout) = 4, 1, 0) and YEAR(cout)=2017) AS "4/17",
    SUM(IF(MONTH(cout) = 5, 1, 0) and YEAR(cout)=2017) AS "5/17",
    SUM(IF(MONTH(cout) = 6, 1, 0) and YEAR(cout)=2017) AS "6/17",
    SUM(IF(MONTH(cout) = 7, 1, 0) and YEAR(cout)=2017) AS "7/17",
    SUM(IF(MONTH(cout) = 8, 1, 0) and YEAR(cout)=2017) AS "8/17",
    SUM(IF(MONTH(cout) = 9, 1, 0) and YEAR(cout)=2017) AS "9/17",
    SUM(IF(MONTH(cout) = 10, 1, 0) and YEAR(cout)=2017) AS "10/17",
    SUM(IF(MONTH(cout) = 11, 1, 0) and YEAR(cout)=2017) AS "11/17",
    SUM(IF(MONTH(cout) = 12, 1, 0) and YEAR(cout)=2017) AS "12/17",
    SUM(IF(MONTH(cout) = 1, 1, 0) and YEAR(cout)=2018) AS "1/18",
    SUM(IF(MONTH(cout) = 2, 1, 0) and YEAR(cout)=2018) AS "2/18",
    SUM(IF(MONTH(cout) = 3, 1, 0) and YEAR(cout)=2018) AS "3/18",
    SUM(IF(MONTH(cout) = 4, 1, 0) and YEAR(cout)=2018) AS "4/18",
    SUM(IF(MONTH(cout) = 5, 1, 0) and YEAR(cout)=2018) AS "5/18",
    SUM(IF(MONTH(cout) = 6, 1, 0) and YEAR(cout)=2018) AS "6/18",
    SUM(IF(MONTH(cout) = 7, 1, 0) and YEAR(cout)=2018) AS "7/18",
    SUM(IF(MONTH(cout) = 8, 1, 0) and YEAR(cout)=2018) AS "8/18",
    SUM(IF(MONTH(cout) = 9, 1, 0) and YEAR(cout)=2018) AS "9/18",
    SUM(IF(MONTH(cout) = 10, 1, 0) and YEAR(cout)=2018) AS "10/18",
    SUM(IF(MONTH(cout) = 11, 1, 0) and YEAR(cout)=2018) AS "11/18",
    SUM(IF(MONTH(cout) = 12, 1, 0) and YEAR(cout)=2018) AS "12/18"
    
FROM
    spl_2016.outraw
WHERE
    (FLOOR(deweyClass) = 914
        OR FLOOR(deweyClass) = 915
        OR FLOOR(deweyClass) = 916
        OR FLOOR(deweyClass) = 917
        OR FLOOR(deweyClass) = 918)
        AND YEAR(cout) BETWEEN 2014 AND 2018
GROUP BY dClass;
Time to process: 56 seconds.
This query was the final one used to retrieve information on number of checkouts for each month from 2015-2018.

Screenshots
Screen Shot 2019-01-31 at 3.33.17 AM.png
This was the result of using the first query. The amount of columns and color scheme did not help with displaying the changes in checkout activity. There was also no information about the amount of checkouts one category had in relation with the others. I didn't think this was good at showing information about the data set, so I made changes.

Final results & Analysis
Screen Shot 2019-01-31 at 3.47.28 AM.png
Screen Shot 2019-01-31 at 3.33.41 AM.png
The higher contrast color schemes (0 and 4) are best at visualizing the changes in monthly checkout activity over the four year period. Europe and North America had a significantly greater checkout volume compared to the others. The library may have been closed on the January and February of 2018, as there was no checkouts for any category. The low checkouts for March 2018 is likely a continuation of possible events from the previous month.

For the categories with greater volume, there seems to be a pattern with checkout activity. There are more checkouts in the start of the year. Then the number of checkouts tend to decrease until the end of the year.

There does not appear to be a defined pattern for low volume categories such as Asia and South America. The changes don't appear to follow any particular trends. However, their amount of checkouts each month is less than a tenth of the checkouts for the large volume categories (Europe, North America).
Attachments
wilson_proj_2.zip
(205.08 KiB) Downloaded 15 times
Last edited by wilsonmui on Thu Jan 31, 2019 4:12 pm, edited 2 times in total.

yichenli
Posts: 7
Joined: Mon Apr 16, 2018 10:23 am

Re: Proj 2 - 2D Visualization

Post by yichenli » Thu Jan 31, 2019 3:01 pm

. Concept description
Feeling of boredom or meaninglessness is a major symptom of depression. Use of the English word "boredom" is preceded by certain words expressing similar concepts. Historically, the concept of "boredom" as introduced into English could be found in words "acedia", "ennui", "melancholia", and "tedium".
This visualization demonstrates the checkout behavior and keep duration of items containing variations of words "depression", "melancholia", "boredom", and "tedium" ("acedia" and "ennui" were omitted for neatness of color scheme and to avoid overcrowding, these two were chosen because of their obscurity as introduced words).
Checkout of items associated each word are color coded and visualized by vertically repeating 4 different characters. It is hypothesized that items related to "depression" have more frequent check-out, therefore, I chose the "|" character for depression to provide a grid for the visualization.
Only one year (2016) is chosen to avoid over crowding of the characters. The vertical axis is checkout month (denotes by number of dashes for aesthetic purpose), the horizontal axis is checkout day (in the month). The day numbers (1-31) also determine vertical value, since we don't experience time in jumps of months and it would be easier that way to visualize checkout keep time. Each "I" should be 5 days.
A red-colored axis system (with mouse position as origin) is used to navigate titles and checkout times located on the right. The titles are partially cut out since it might read as poetic.
Ideally, color schemes could be switched between RGB and CMYK primary colors, it is not functional, here, I have included screenshots from manually changing fill colors.

. MySQL Queries

Code: Select all

SELECT 
    title,
    (title LIKE '%depression%'
        AND title NOT LIKE '%great depression%'
        AND title NOT LIKE '%new deal%') AS depression,
    (title LIKE '%melanchol%') AS melancholia,
    (title LIKE '%boredom%') AS boredom,
    (title LIKE '%tedium%') AS tedium,
--     (title LIKE '%ennui%') AS ennui
    MONTH(cout),
    DAY(cout),
    HOUR(cout),
    MONTH(cin),
    DAY(cin),
    HOUR(cin)
FROM
    spl_2016.inraw
WHERE
    DATE(cout) BETWEEN '2016-01-01' AND '2016-12-31'
    AND(HOUR(cout) BETWEEN 10 and 20)
        AND (-- title LIKE '%schizophrenia%'
--         OR title LIKE '%bipolar%'
        (title LIKE '%depression%'
        AND title NOT LIKE '%great depression%'
        AND title NOT LIKE '%new deal%')
        OR title LIKE '%melanchol%'
        OR title LIKE '%boredom%'
        OR title LIKE '%tedium%'
--         OR title LIKE '%ennui%'
)
GROUP BY title , MONTH(cout), DAY(cout),HOUR(cout),MONTH(cin),DAY(cin),HOUR(cin)
ORDER BY MONTH(cout) ASC,DAY(cout) ASC
. Processing time
The query took 35.556 seconds.

. Sketches and work-in-progress screenshots
I do not have sketches from this visualization since this is a second draft after a visually unappealing visualization of another query.
IMG_4624.JPG
^ a cell phone photo of one of the earliest versions that is not saved.
IMG_4637.JPG
^a cell phone photo of an early version. i think this version actually shows more correspondance between the four keywords.
1.png
^an early version of this visualization. I adjusted around pixels to maintain more consistent margins.

. Final results & Analysis
The end result is kind of an interactive confusion.
2_b.png
^grayscale
2_cmyk.png
^cmyk color scheme
2_rgb.png
^rgb color scheme
It is found that items related to depression is the most checked-out compared to items related to other words.

Remaining Issues
Information regarding checkout times and titles on the right crowd together. Color scheme switch is not functional.

The visualization that was scraped altogether (draft 1 of project)
2.png
1.png
Attachments
asgn2_draft2.zip
(2.15 MiB) Downloaded 15 times
Last edited by yichenli on Fri Feb 01, 2019 8:55 pm, edited 7 times in total.

chantalnguyen
Posts: 4
Joined: Thu Jan 10, 2019 10:51 am

Re: Proj 2 - 2D Visualization

Post by chantalnguyen » Thu Jan 31, 2019 3:18 pm

Concept description
For this project, I continued in the same direction as my first assignment, analyzing the most popular foreign cuisines as determined by SPL checkouts of food and drink-related Dewey media.

MySQL queries and processing time
I queried the number of checkouts for each month between 2006 and 2018 for a total of 197 countries, not including the United States. I did a few test runs at first and noticed that there are a number of countries whose names make up other words that have been used in titles -- for example, the string "dutch" is part of "dutch oven", which has nothing to do with Dutch food, so the query below includes some AND NOT title LIKE '...' to account for instances that I noticed. I wasn't able to catch everything - I'm sure that some of the titles that include "Turkey" might be about the poultry and not the country, for example.

Code: Select all

SELECT 
    YEAR(cout),
    MONTH(cout),
    COUNT(IF(title LIKE '%afghan%', 1, NULL)) AS 'Afghanistan',
    COUNT(IF(title LIKE '%albania%', 1, NULL)) AS 'Albania',
    COUNT(IF(title LIKE '%algeri%', 1, NULL)) AS 'Algeria',
    COUNT(IF(title LIKE '%andorra%', 1, NULL)) AS 'Andorra',
    COUNT(IF(title LIKE '%angola%', 1, NULL)) AS 'Angola',
    COUNT(IF(title LIKE '%anguilla%', 1, NULL)) AS 'Anguilla',
    COUNT(IF(title LIKE '%antigua%'
            OR title LIKE '%barbuda%',
        1,
        NULL)) AS 'Antigua & Barbuda',
    COUNT(IF(title LIKE '%argentin%', 1, NULL)) AS 'Argentina',
    COUNT(IF(title LIKE '%armenia%', 1, NULL)) AS 'Armenia',
    COUNT(IF(title LIKE '%australia%', 1, NULL)) AS 'Australia',
    COUNT(IF(title LIKE '%austria%', 1, NULL)) AS 'Austria',
    COUNT(IF(title LIKE '%azerbaijan%'
            OR title LIKE '%azeri%',
        1,
        NULL)) AS 'Azerbaijan',
    COUNT(IF(title LIKE '%baham%', 1, NULL)) AS 'Bahamas',
    COUNT(IF(title LIKE '%bahrain%', 1, NULL)) AS 'Bahrain',
    COUNT(IF(title LIKE '%bangladesh%', 1, NULL)) AS 'Bangladesh',
    COUNT(IF(title LIKE '%barbad%', 1, NULL)) AS 'Barbados',
    COUNT(IF(title LIKE '%belarus%', 1, NULL)) AS 'Belarus',
    COUNT(IF(title LIKE '%belgi%', 1, NULL)) AS 'Belgium',
    COUNT(IF(title LIKE '%belize%', 1, NULL)) AS 'Belize',
    COUNT(IF(title LIKE '%benin%', 1, NULL)) AS 'Benin',
    COUNT(IF(title LIKE '%bermuda%', 1, NULL)) AS 'Bermuda',
    COUNT(IF(title LIKE '%bhutan%', 1, NULL)) AS 'Bhutan',
    COUNT(IF(title LIKE '%bolivia%', 1, NULL)) AS 'Bolivia',
    COUNT(IF(title LIKE '%bosnia%'
            OR title LIKE '%herzegovina%',
        1,
        NULL)) AS 'Bosnia & Herzegovina',
    COUNT(IF(title LIKE '%botswana%', 1, NULL)) AS 'Botswana',
    COUNT(IF(title LIKE '%brazil%'
            OR title LIKE '%brasil%',
        1,
        NULL)) AS 'Brazil',
    COUNT(IF(title LIKE '%brunei%', 1, NULL)) AS 'Brunei',
    COUNT(IF(title LIKE '%bulgaria%', 1, NULL)) AS 'Bulgaria',
    COUNT(IF(title LIKE '%burkina%', 1, NULL)) AS 'Burkina Faso',
    COUNT(IF(title LIKE '%burundi%', 1, NULL)) AS 'Burundi',
    COUNT(IF(title LIKE '%cambod%', 1, NULL)) AS 'Cambodia',
    COUNT(IF(title LIKE '%cameroon%', 1, NULL)) AS 'Cameroon',
    COUNT(IF(title LIKE '%canad%', 1, NULL)) AS 'Canada',
    COUNT(IF(title LIKE '%cape verde%', 1, NULL)) AS 'Cape Verde',
    COUNT(IF(title LIKE '%cayman%', 1, NULL)) AS 'Cayman Islands',
    COUNT(IF(title LIKE '%central african republic%',
        1,
        NULL)) AS 'Central African Republic',
    COUNT(IF(title LIKE '%chad%', 1, NULL)) AS 'Chad',
    COUNT(IF(title LIKE '%chilean%', 1, NULL)) AS 'Chile',
    COUNT(IF(title LIKE '%china%'
            OR title LIKE '%chinese%',
        1,
        NULL)) AS 'China',
    COUNT(IF(title LIKE '%hong kong%', 1, NULL)) AS 'Hong Kong',
    COUNT(IF(title LIKE '%macau%'
            OR title LIKE '%macanese%'
            OR title LIKE '%macao%',
        1,
        NULL)) AS 'Macau',
    COUNT(IF(title LIKE '%colombia%', 1, NULL)) AS 'Colombia',
    COUNT(IF(title LIKE '%comoros%', 1, NULL)) AS 'Comoros',
    COUNT(IF(title LIKE '%congo%'
            AND NOT title LIKE '%democratic%',
        1,
        NULL)) AS 'Congo',
    COUNT(IF(title LIKE '%drc%'
            OR title LIKE '%democratic republic of congo%',
        1,
        NULL)) AS 'Democratic Republic of Congo',
    COUNT(IF(title LIKE '%costa rica%', 1, NULL)) AS 'Costa Rica',
    COUNT(IF(title LIKE '%croat%', 1, NULL)) AS 'Croatia',
    COUNT(IF(title LIKE '%cuba%', 1, NULL)) AS 'Cuba',
    COUNT(IF(title LIKE '%cypr%', 1, NULL)) AS 'Cyprus',
    COUNT(IF(title LIKE '%czech%', 1, NULL)) AS 'Czech Republic',
    COUNT(IF(title LIKE '%denmark%'
            OR title LIKE '%danish%',
        1,
        NULL)) AS 'Denmark',
    COUNT(IF(title LIKE '%djibouti%', 1, NULL)) AS 'Djibouti',
    COUNT(IF(title LIKE '%dominica%'
            OR title LIKE '%dominiquais%'
            AND NOT title LIKE '%republic%'
            AND NOT title LIKE '%dominicana%',
        1,
        NULL)) AS 'Dominica',
    COUNT(IF(title LIKE '%dominican%', 1, NULL)) AS 'Dominican Republic',
    COUNT(IF(title LIKE '%ecuador%', 1, NULL)) AS 'Ecuador',
    COUNT(IF(title LIKE '%egypt%', 1, NULL)) AS 'Egypt',
    COUNT(IF(title LIKE '%salvador%', 1, NULL)) AS 'El Salvador',
    COUNT(IF(title LIKE '%equatorial%', 1, NULL)) AS 'Equatorial Guinea',
    COUNT(IF(title LIKE '%eritrea%', 1, NULL)) AS 'Eritrea',
    COUNT(IF(title LIKE '%estonia%', 1, NULL)) AS 'Estonia',
    COUNT(IF(title LIKE '%ethiopia%', 1, NULL)) AS 'Ethiopia',
    COUNT(IF(title LIKE '%fiji%', 1, NULL)) AS 'Fiji',
    COUNT(IF(title LIKE '%finland%'
            OR title LIKE '%finnish%',
        1,
        NULL)) AS 'Finland',
    COUNT(IF(title LIKE '%france%'
            OR title LIKE '%french%',
        1,
        NULL)) AS 'France',
    COUNT(IF(title LIKE '%french guiana%',
        1,
        NULL)) AS 'French Guiana',
    COUNT(IF(title LIKE '%gabon%'
            AND NOT title LIKE '%vagabond%',
        1,
        NULL)) AS 'Gabon',
    COUNT(IF(title LIKE '%gambia%', 1, NULL)) AS 'Gambia',
    COUNT(IF(title LIKE '%georgia%', 1, NULL)) AS 'Georgia',
    COUNT(IF(title LIKE '%german%', 1, NULL)) AS 'Germany',
    COUNT(IF(title LIKE '%ghana%', 1, NULL)) AS 'Ghana',
    COUNT(IF(title LIKE '%britain%'
            OR title LIKE '%british%'
            OR title LIKE '%united kingdom%'
            AND NOT title LIKE '%british columbia%',
        1,
        NULL)) AS 'United Kingdom',
    COUNT(IF(title LIKE '%greek%'
            OR title LIKE '%greece%',
        1,
        NULL)) AS 'Greece',
    COUNT(IF(title LIKE '%grenada%', 1, NULL)) AS 'Grenada',
    COUNT(IF(title LIKE '%guadeloupe%', 1, NULL)) AS 'Guadeloupe',
    COUNT(IF(title LIKE '%guatemala%', 1, NULL)) AS 'Guatemala',
    COUNT(IF(title LIKE '%guinea%'
            AND NOT title LIKE '%bissau%',
        1,
        NULL)) AS 'Guinea',
    COUNT(IF(title LIKE '%bissau%', 1, NULL)) AS 'Guinea-Bissau',
    COUNT(IF(title LIKE '%guyan%', 1, NULL)) AS 'Guyana',
    COUNT(IF(title LIKE '%haiti%', 1, NULL)) AS 'Haiti',
    COUNT(IF(title LIKE '%hondura%', 1, NULL)) AS 'Honduras',
    COUNT(IF(title LIKE '%hungar%', 1, NULL)) AS 'Hungary',
    COUNT(IF(title LIKE '%iceland%', 1, NULL)) AS 'Iceland',
    COUNT(IF(title LIKE '%india%', 1, NULL)) AS 'India',
    COUNT(IF(title LIKE '%indonesia%', 1, NULL)) AS 'Indonesia',
    COUNT(IF(title LIKE '%iran%', 1, NULL)) AS 'Iran',
    COUNT(IF(title LIKE '%iraq%', 1, NULL)) AS 'Iraq',
    COUNT(IF(title LIKE '%ireland%'
            OR title LIKE '%irish%',
        1,
        NULL)) AS 'Ireland',
    COUNT(IF(title LIKE '%israel%', 1, NULL)) AS 'Israel',
    COUNT(IF(title LIKE '%italy%'
            OR title LIKE '%itali%',
        1,
        NULL)) AS 'Italy',
    COUNT(IF(title LIKE '%ivory%'
            OR title LIKE '%ivoire%'
            AND NOT title LIKE '%carnivory%',
        1,
        NULL)) AS 'Ivory Coast',
    COUNT(IF(title LIKE '%jamaica%', 1, NULL)) AS 'Jamaican',
    COUNT(IF(title LIKE '%japan%', 1, NULL)) AS 'Japan',
    COUNT(IF(title LIKE '%jordan%'
            AND NOT title LIKE '%julie jordan%',
        1,
        NULL)) AS 'Jordan',
    COUNT(IF(title LIKE '%kazakh%', 1, NULL)) AS 'Kazakhstan',
    COUNT(IF(title LIKE '%kenya%', 1, NULL)) AS 'Kenya',
    COUNT(IF(title LIKE '%korea%', 1, NULL)) AS 'Korea',
    COUNT(IF(title LIKE '%kuwait%', 1, NULL)) AS 'Kuwait',
    COUNT(IF(title LIKE '%kyrgyz%', 1, NULL)) AS 'Kyrgyzstan',
    COUNT(IF(title LIKE '%laos%'
            OR title LIKE '%lao roots%',
        1,
        NULL)) AS 'Laos',
    COUNT(IF(title LIKE '%latvia%', 1, NULL)) AS 'Latvia',
    COUNT(IF(title LIKE '%leban%', 1, NULL)) AS 'Lebanon',
    COUNT(IF(title LIKE '%lesoth%', 1, NULL)) AS 'Lesotho',
    COUNT(IF(title LIKE '%liberia%', 1, NULL)) AS 'Liberia',
    COUNT(IF(title LIKE '%libya%', 1, NULL)) AS 'Libya',
    COUNT(IF(title LIKE '%liechtenstein%',
        1,
        NULL)) AS 'Liechtenstein',
    COUNT(IF(title LIKE '%lithuania%', 1, NULL)) AS 'Lithuania',
    COUNT(IF(title LIKE '%luxembourg%', 1, NULL)) AS 'Luxembourg',
    COUNT(IF(title LIKE '%macedonia%', 1, NULL)) AS 'Macedonia',
    COUNT(IF(title LIKE '%madag%', 1, NULL)) AS 'Madagascar',
    COUNT(IF(title LIKE '%malawi%', 1, NULL)) AS 'Malawi',
    COUNT(IF(title LIKE '%malay%', 1, NULL)) AS 'Malaysia',
    COUNT(IF(title LIKE '%maldiv%', 1, NULL)) AS 'Maldives',
    COUNT(IF(title LIKE '% mali%'
            AND NOT title LIKE '%somali%'
            AND NOT title LIKE '%minimal%'
            AND NOT title LIKE '%malibu%',
        1,
        NULL)) AS 'Mali',
    COUNT(IF(title LIKE '%malta%'
            OR title LIKE '%maltese%',
        1,
        NULL)) AS 'Malta',
    COUNT(IF(title LIKE '%martiniqu%', 1, NULL)) AS 'Martinique',
    COUNT(IF(title LIKE '%mauritania%', 1, NULL)) AS 'Mauritania',
    COUNT(IF(title LIKE '%mauriti%', 1, NULL)) AS 'Mauritius',
    COUNT(IF(title LIKE '%mayotte%', 1, NULL)) AS 'Mayotte',
    COUNT(IF(title LIKE '%mexic%', 1, NULL)) AS 'Mexico',
    COUNT(IF(title LIKE '%moldova%', 1, NULL)) AS 'Moldova',
    COUNT(IF(title LIKE '%monaco%'
            OR title LIKE '%monegasque%',
        1,
        NULL)) AS 'Monaco',
    COUNT(IF(title LIKE '%mongol%', 1, NULL)) AS 'Mongolia',
    COUNT(IF(title LIKE '%montegr%', 1, NULL)) AS 'Montenegro',
    COUNT(IF(title LIKE '%montserrat%', 1, NULL)) AS 'Montserrat',
    COUNT(IF(title LIKE '%morocc%', 1, NULL)) AS 'Morocco',
    COUNT(IF(title LIKE '%mozambique%', 1, NULL)) AS 'Mozambique',
    COUNT(IF(title LIKE '%burma%'
            OR title LIKE '%burmese%'
            OR title LIKE '%myanmar%',
        1,
        NULL)) AS 'Burma',
    COUNT(IF(title LIKE '%namibia%', 1, NULL)) AS 'Namibia',
    COUNT(IF(title LIKE '%nepal%', 1, NULL)) AS 'Nepal',
    COUNT(IF(title LIKE '%netherlands%'
            OR title LIKE '%holland%'
            OR title LIKE '%dutch%'
            AND NOT title LIKE '%dutch oven%',
        1,
        NULL)) AS 'The Netherlands',
    COUNT(IF(title LIKE '%new zealand%', 1, NULL)) AS 'New Zealand',
    COUNT(IF(title LIKE '%nicaragua%', 1, NULL)) AS 'Nicaragua',
    COUNT(IF(title LIKE '%niger%'
            AND NOT title LIKE '%nigeria%'
            AND NOT title LIKE '%feniger%',
        1,
        NULL)) AS 'Niger',
    COUNT(IF(title LIKE '%nigeria%', 1, NULL)) AS 'Nigeria',
    COUNT(IF(title LIKE '%norw%', 1, NULL)) AS 'Norway',
    COUNT(IF(title LIKE '% oman%', 1, NULL)) AS 'Oman',
    COUNT(IF(title LIKE '%pakistan%', 1, NULL)) AS 'Pakistan',
    COUNT(IF(title LIKE '%palestin%', 1, NULL)) AS 'Palestine',
    COUNT(IF(title LIKE '%panama%', 1, NULL)) AS 'Panama',
    COUNT(IF(title LIKE '%papua new guinea%',
        1,
        NULL)) AS 'Papua New Guinea',
    COUNT(IF(title LIKE '%paraguay%', 1, NULL)) AS 'Paraguay',
    COUNT(IF(title LIKE '%peru%', 1, NULL)) AS 'Peru',
    COUNT(IF(title LIKE '%philippin%'
            OR title LIKE '%filipin%',
        1,
        NULL)) AS 'Philippines',
    COUNT(IF(title LIKE '%poland%'
            OR title LIKE '%polish%',
        1,
        NULL)) AS 'Poland',
    COUNT(IF(title LIKE '%portug%', 1, NULL)) AS 'Portugal',
    COUNT(IF(title LIKE '%puerto ric%', 1, NULL)) AS 'Puerto Rico',
    COUNT(IF(title LIKE '%qatar%', 1, NULL)) AS 'Qatar',
    COUNT(IF(title LIKE '%romania%', 1, NULL)) AS 'Romania',
    COUNT(IF(title LIKE '%russia%', 1, NULL)) AS 'Russia',
    COUNT(IF(title LIKE '%rwanda%', 1, NULL)) AS 'Rwanda',
    COUNT(IF(title LIKE '%saint kitts%', 1, NULL)) AS 'Saint Kitts and Nevis',
    COUNT(IF(title LIKE '%saint lucia%', 1, NULL)) AS 'Saint Lucia',
    COUNT(IF(title LIKE '%saint vincent%',
        1,
        NULL)) AS 'Saint Vincent and the Grenadines',
    COUNT(IF(title LIKE '%samoa%', 1, NULL)) AS 'Samoa',
    COUNT(IF(title LIKE '%sao tome%', 1, NULL)) AS 'Sao Tome and Principe',
    COUNT(IF(title LIKE '%saudi arabia%', 1, NULL)) AS 'Saudi Arabia',
    COUNT(IF(title LIKE '%senegal%', 1, NULL)) AS 'Senegal',
    COUNT(IF(title LIKE '%serbia%', 1, NULL)) AS 'Serbia',
    COUNT(IF(title LIKE '%seychell%', 1, NULL)) AS 'Seychelles',
    COUNT(IF(title LIKE '%sierra leone%', 1, NULL)) AS 'Sierra Leone',
    COUNT(IF(title LIKE '%singapor%', 1, NULL)) AS 'Singapore',
    COUNT(IF(title LIKE '%slovakia%', 1, NULL)) AS 'Slovakia',
    COUNT(IF(title LIKE '%slovenia%', 1, NULL)) AS 'Slovenia',
    COUNT(IF(title LIKE '%solomon islands%',
        1,
        NULL)) AS 'Solomon Islands',
    COUNT(IF(title LIKE '%somalia%', 1, NULL)) AS 'Somalia',
    COUNT(IF(title LIKE '%south africa%', 1, NULL)) AS 'South Africa',
    COUNT(IF(title LIKE '%spain%'
            OR title LIKE '%spanish%',
        1,
        NULL)) AS 'Spain',
    COUNT(IF(title LIKE '%sri lanka%', 1, NULL)) AS 'Sri Lanka',
    COUNT(IF(title LIKE '%sudan%', 1, NULL)) AS 'Sudan',
    COUNT(IF(title LIKE '%suriname%', 1, NULL)) AS 'Suriname',
    COUNT(IF(title LIKE '%swazi%', 1, NULL)) AS 'Swaziland',
    COUNT(IF(title LIKE '%sweden%'
            OR title LIKE '%swedish%',
        1,
        NULL)) AS 'Sweden',
    COUNT(IF(title LIKE '%switzerland%'
            OR title LIKE '%swiss%',
        1,
        NULL)) AS 'Switzerland',
    COUNT(IF(title LIKE '%syria%', 1, NULL)) AS 'Syria',
    COUNT(IF(title LIKE '%taiwan%', 1, NULL)) AS 'Taiwan',
    COUNT(IF(title LIKE '%tajik%', 1, NULL)) AS 'Tajikistan',
    COUNT(IF(title LIKE '%tanzania%', 1, NULL)) AS 'Tanzania',
    COUNT(IF(title LIKE '%thai%', 1, NULL)) AS 'Thailand',
    COUNT(IF(title LIKE '%togo%'
            AND NOT title LIKE '%prostogo%',
        1,
        NULL)) AS 'Togo',
    COUNT(IF(title LIKE '%tonga%', 1, NULL)) AS 'Tonga',
    COUNT(IF(title LIKE '%trinidad%'
            OR title LIKE '%tobag%',
        1,
        NULL)) AS 'Trinidad & Tobago',
    COUNT(IF(title LIKE '%tunis%', 1, NULL)) AS 'Tunisia',
    COUNT(IF(title LIKE '%turkey%'
            OR title LIKE '%turkish%',
        1,
        NULL)) AS 'Turkey',
    COUNT(IF(title LIKE '%turkmen%', 1, NULL)) AS 'Turkmenistan',
    COUNT(IF(title LIKE '%caicos%', 1, NULL)) AS 'Turks & Caicos',
    COUNT(IF(title LIKE '%tuval%', 1, NULL)) AS 'Tuvalu',
    COUNT(IF(title LIKE '%uganda%', 1, NULL)) AS 'Uganda',
    COUNT(IF(title LIKE '%ukrain%', 1, NULL)) AS 'Ukraine',
    COUNT(IF(title LIKE '%united arab emirates%'
            OR title LIKE '%uae%',
        1,
        NULL)) AS 'United Arab Emirates',
    COUNT(IF(title LIKE '%uruguay%', 1, NULL)) AS 'Uruguay',
    COUNT(IF(title LIKE '%uzbek%', 1, NULL)) AS 'Uzbekistan',
    COUNT(IF(title LIKE '%vanuat%', 1, NULL)) AS 'Vanuatu',
    COUNT(IF(title LIKE '%venezuela%', 1, NULL)) AS 'Venezuela',
    COUNT(IF(title LIKE '%viet%' AND NOT title LIKE '%soviet%', 1, NULL)) AS 'Vietnam',
    COUNT(IF(title LIKE '%yemen%', 1, NULL)) AS 'Yemen',
    COUNT(IF(title LIKE '%zambia%', 1, NULL)) AS 'Zambia',
    COUNT(IF(title LIKE '%zimbabwe%', 1, NULL)) AS 'Zimbabwe'
FROM
    spl_2016.inraw
WHERE
    YEAR(cout) BETWEEN 2006 AND 2018
        AND deweyClass >= 641
        AND deweyClass < 642
GROUP BY MONTH(cout) , YEAR(cout)
ORDER BY YEAR(cout) , MONTH(cout)
This query took 215 seconds.
I then also queried the top 8 items for each country, along with the respective item type and number of check-ins. I'm not sure if there is a more elegant way to accomplish this, but I basically had to do (QUERY) UNION (QUERY) UNION (QUERY) ... (QUERY) for each country, where an example of such a QUERY would be (the entire thing is too long to paste here):

Code: Select all

SELECT 
    SUBSTRING(itemType, 3) AS shortType,
    title,
    COUNT(bibNumber) AS Counts
FROM
    spl_2016.inraw
WHERE
    (title LIKE '%afghan%')
        AND deweyClass >= 641
        AND deweyClass < 642
GROUP BY shortType , title
ORDER BY Counts DESC
LIMIT 8
This query took 2.5 hours (9204 seconds).
The number of rows in the output is not 8 times the number of countries, as some countries don't have fewer than 8 different items or no related check-ins at all. I had to manually pad out the .csv file with empty rows so that each country had 8 (empty or non-empty) rows, to facilitate importing the result into Processing. I couldn't figure out if there's a way to do this in the MySQL query.
Edit 2/7/19: I also realized that if there was a title that matched multiple queries, it would only show up in the results for the first query. For instance, the top result of an individual query for Singapore is "Cradle of flavor home cooking from the spice islands of Indonesia Malaysia and Singapore", but this result won't show up under Singapore in the full query, as it will be part of the results for Indonesia. I would have to manually patch together several queries to get the correct set of top 8 items for each country.

Sketches
photo_2019-01-31_14-47-37.jpg
My plan was to arrange the data using concentric circles. The outermost circle would represent the country with the highest total checkout counts, with each smaller circle representing the next most popular countries.
I also wanted to show, when a circle is hovered over, the corresponding country's number of counts per month as a curve wrapped around the circle. Lines radiating from the center would mark out each year.

Work-in-progress screenshots

I considered having the radius of each circle be proportional to the corresponding country's total number of checkouts; the radii proportional to the square root of the checkouts; and the radii evenly spaced apart.
propradius.png
sqrtradius.png
evenradius.png
Final results and analysis
I ended up choosing two modes: one with the radii evenly spaced apart, and one with the radii proportional to the square root of the checkouts. Pressing any key will toggle between the two modes.
Screen Shot 2019-01-31 at 3.08.01 PM.png
Screen Shot 2019-01-31 at 3.08.07 PM.png
Hovering over a circle will show the number of counts per month. In the lower portion of the screen, the top items (up to 8) are listed for the country, along with a colored bar to the left of the title. The length of the bar indicates the number of check-ins and the color of the bar indicates whether the item is a book, CD, DVD, or VHS. The bottom right corner shows the total number of checkouts for the country.

Unfortunately for some of the most popular cuisines (#1 Italy, #2 France, #3 India), the rollover graph goes off the screen. (I tried plotting the log counts or sqrt counts, but it smoothed out too much of the variation between months, so I decided not to do that.)
Screen Shot 2019-01-31 at 3.03.49 PM.png

Some results that caught my eye included the noticeable peak of interest in Chinese food around 2008-2011 followed by a decline:
Screen Shot 2019-01-31 at 3.03.51 PM.png
while Korean food seemed to regain some popularity in 2016 onwards:
Screen Shot 2019-01-31 at 3.02.49 PM.png
and interest in the United Kingdom jumped significantly in 2016. Moreover, the top 5 items for the UK are all related to the Great British Bake Off, and half of the most popular items are DVDs for that show:
Screen Shot 2019-01-31 at 3.02.52 PM.png
Here is my Processing code and data.
MAT259_Proj2_CN.zip
(31.04 KiB) Downloaded 14 times
(Edit 2/7/19: fixed minor bug in source code)
Last edited by chantalnguyen on Thu Feb 07, 2019 4:44 pm, edited 1 time in total.

Post Reply