3D Visualization

merttoka · Post by **merttoka** » Mon Feb 27, 2017 2:56 pm

In this assignment, I wanted to explore the relationship between historical weather data of Seattle and check out information about the Library.

I started everything by acquiring the historical weather data for Seattle, which is downloaded from National Oceanic and Atmospheric Administration(NOAA)'s Surface Data Hourly Global dataset (https://www7.ncdc.noaa.gov/CDO/cdopoema ... olution=40). The dataset contains entries starting from 2005 April and extends up to today. After getting rid of unnecessary columns, I ended up with the following dataset:

Code: Select all

DATE: Date of current item,
TIME: Time of current item,
DIR: Wind direction (0,360), 0 is North,
SPD: Speed of the wind in mph,
TEMPF: Temperature in F,
TEMPC: Temperature in C,
SLP: Sea Level Pressure

weather_05_17.csv: (4.39 MiB) Downloaded 205 times

After that, I retrieved items in the library and concatenated their Subject fields for each 6-hour period. This query took 5 hours in total and since single query that covers 2005 and 2017 timeouts, I had to query 6 month intervals and stitch resulting CSV files together. Resultant CSV file and the query as follows:

Code: Select all

SELECT 
    DATE(sth.checkOut) 'date',
    HOUR(sth.checkOut) 'hour',
    COUNT(*) 'count',
    GROUP_CONCAT(sth.subject
        SEPARATOR '\n') 'subjects'
FROM
    (SELECT 
        s.bibNumber, s.subject, t.checkOut
    FROM
        spl_2016.transactions t
    JOIN spl_2016.subject s ON s.bibNumber = t.bibNumber
    WHERE
        t.checkOut >= '2005-04-19'
            AND t.checkOut <= '2017-01-01') AS sth
GROUP BY DATE(sth.checkOut) , FLOOR( HOUR(sth.checkOut) / 6)

words_05_17.csv: (12.35 MiB) Downloaded 200 times

I, then, preprocessed this document in Python in order to get frequencies of each word in subject fields. While doing that, I utilized stemming and lemmatization module of NLTK library (http://www.nltk.org), which reduces inflectional forms and sometimes derivationally related forms of a word to a common base form (i.e. cars becomes car). This process gave me a list of words with their frequencies for the given time period. The code and resulting CSV is below:

Code: Select all

# UTILITIES
import csv
from array import * 
from datetime import datetime

# NLP STUFF
import nltk
from nltk import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
from nltk.probability import FreqDist
lmtzr = WordNetLemmatizer()

class Item(object):
    #date, time, count, subjects, titles
    
    def __init__(self, d, c, s):
        self.datetime = d
        self.count = c
        self.subjects = s


def getFrequencyList(sentence):
    # TOKEN EXTRACTION
    tokens = nltk.word_tokenize(sentence)
	
    # CONVERT LOWERCASE
    tokens = [w.lower() for w in tokens]
    
    # LEMMATIZATION
    lemmatized = []
    for x in range(len(tokens)):
        lemmatized.append(lmtzr.lemmatize(tokens[x]))

    # FREQUENCY
    fdist1 = FreqDist(lemmatized)
    return fdist1.most_common(len(fdist1))
    # returns as [(',', 18713), ('the', 13721), ...

theArray = []
with open('words_05_17.csv', 'rt', encoding='utf-8') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',')
         
    try:
        for row in csvreader:
            if(row[0] != 'date'):            
                print("%s %s:00"%(row[0],row[1]))
                dt = datetime.strptime("%s %s:00"%(row[0],row[1]),'%Y-%m-%d %H:00')
                subject = ''
                for x in getFrequencyList(row[3]):
                    subject += "%s %s#"%(x[0], x[1])
                i = Item(dt, int(row[2]), subject)
                theArray.append(i)
    except UnicodeDecodeError:
        print("wrong read...")
    except UnicodeEncodeError:
        print("wrong read...")
        
with open('nltk_05_17.csv', 'wt', encoding='utf-8') as csvfile:
    csvwriter = csv.writer(csvfile)
    for x in range(len(theArray)):
        csvwriter.writerow([theArray[x].datetime, theArray[x].count, theArray[x].subjects])

nltk_05_17.csv: (6.86 MiB) Downloaded 189 times

When I read these files from Processing sketch, I created following class schema to accommodate the data as efficiently as possible.

: Class Diagram

Here Keyword class keeps related information about each of the words, and the code keeps a collection for unique keywords by updating necessary fields if a duplicate occurs. Once file reading is completed, I sprinkled the word collection in space with respect to their Average Date, Average Temperature and Average Frequency values. Since frequency values are too condensed towards lower values, I used the log scale for that axis.

Whenever the user starts to type something with the keyboard, the system will search the phrase inside of the keywords (Regex is allowed: http://regexr.com). On the right-hand side of the GUI, the search results are listed, and on the left side, the first word of the list is detailed. Also in the word cloud, search results are highlighted. The Time Barcode shape at the detail panel denotes usage dates of the current word, and the red line corresponds to its Average Date. Time barcode starts from the beginning of the data on its left (2005 April) and goes up to January 2017.

In addition to these, it is possible to filter what's drawn inside the cube with sliders at the bottom-left.

The left-most buttons focus the camera to a specific axis. Invert Colors option changes the color scheme from dark to light. Auto Rotation rotates the camera automatically, and Color Search option paints the words that are the result of the current search term in terms of their average temperature values.

: Light theme and colored search results

: Dark theme and colored search results

After using the visualization, a couple of minutes later I discovered that it shows a pattern on low-frequency items. Since these items only appear once in the dataset, their average temperature is the temperature they are used in. Since this is the case, Date-Temperature plane clearly shows seasonality (winter-summer differences in terms of temperature).

: Seasonality on low-frequency data

The source code (requires PeasyCam and ControlP5 libraries) is below:

MertToka3D.zip: sourceCode; (9.42 MiB) Downloaded 182 times

Media Arts and Technology

3D Visualization

Re: 3D Visualization