3D Visualization

chantelchan
Posts: 8
Joined: Wed Apr 12, 2017 5:15 pm

Re: 3D Visualization

Post by chantelchan » Thu Feb 22, 2018 3:30 pm

Concept
I wanted to show the different kinds of books related to food that the Seattle Public Library has to offer. "Food" has its own category under the Dewey Decimal System, 641. In addition to that, the decimal points specify more which group it belongs to. By following this catalog, I was able to narrow down the groups into 6 categories:
  • Gastronomy, Epicurism, Nutrition
  • Drinks
  • Food History, Preservation
  • Cooking
  • Cooking Specific Dishes, Techniques
  • Miscellaneous
I took inspiration of my animated visualization from agar.io, a game where you can eat other nodes, and become bigger. This is a fun and appropriate concept to apply to the topic I am working on: food!

Query
In my query, I wanted to generate the deweyClass type, and the number of checkouts each title has per year. Later when I export to .csv, I took the values of the counts of each year and took the average and standard deviation of the title. The average shows the typical number of checkouts of each year, and the standard deviation shows how high the variation is between all the values and the average.

Code: Select all

SELECT 
    title,
    MOD(deweyClass, 641) AS Dewey,
    COUNT(bibNumber) AS Counts,
    SUM(CASE
        WHEN (YEAR(cout) = 2006) THEN 1
        ELSE 0
    END) AS '2006',
    SUM(CASE
        WHEN (YEAR(cout) = 2007) THEN 1
        ELSE 0
    END) AS '2007',
    SUM(CASE
        WHEN (YEAR(cout) = 2008) THEN 1
        ELSE 0
    END) AS '2008',
    SUM(CASE
        WHEN (YEAR(cout) = 2009) THEN 1
        ELSE 0
    END) AS '2009',
    SUM(CASE
        WHEN (YEAR(cout) = 2010) THEN 1
        ELSE 0
    END) AS '2010',
    SUM(CASE
        WHEN (YEAR(cout) = 2011) THEN 1
        ELSE 0
    END) AS '2011',
    SUM(CASE
        WHEN (YEAR(cout) = 2012) THEN 1
        ELSE 0
    END) AS '2012',
    SUM(CASE
        WHEN (YEAR(cout) = 2013) THEN 1
        ELSE 0
    END) AS '2013',
    SUM(CASE
        WHEN (YEAR(cout) = 2014) THEN 1
        ELSE 0
    END) AS '2014',
    SUM(CASE
        WHEN (YEAR(cout) = 2015) THEN 1
        ELSE 0
    END) AS '2015',
    SUM(CASE
        WHEN (YEAR(cout) = 2016) THEN 1
        ELSE 0
    END) AS '2016',
    SUM(CASE
        WHEN (YEAR(cout) = 2017) THEN 1
        ELSE 0
    END) AS '2017'
FROM
    spl_2016.inraw
WHERE
    (FLOOR(deweyClass) = 641) &&
    (deweyClass % 641 < 1)
GROUP BY bibNumber , deweyClass , title
ORDER BY Counts DESC

Pictures
ILF_table.png
ILF1.png
Start
ILF2.png
Cooking, angled
ILF3.png
Specific Food
ILF4.png
Drinks
Animation
Each title is a point in space, characterized by these attributes:
1. Size: determined by the number of counts
2. Speed: determined by the average
3. Jitter: determined by the standard deviation

The position each point initially starts off with is randomly generated.

Improvements
When the mouse is hovering over a node, that node might be in front/behind multiple nodes as well. This would mean that the words will overlap each other, making the information illegible. I was hoping to make a header that would display all the nodes that the cursor hovers over, as well as its category, titles, and count.

The information and functions that are calculated are taxing on the processor. This causes the frame rate on my laptop to be very slow, making the animations discrete and jagged. Due to the lag, it makes it difficult for the user to keep track of the movements of the nodes. The only solution I can think of to fix this problem is making my algorithm more efficient, thus information is processed faster.
iLoveFood.zip
(762.92 KiB) Downloaded 168 times

echotheohar
Posts: 4
Joined: Fri Jan 19, 2018 11:14 am

Re: 3D Visualization

Post by echotheohar » Tue Feb 27, 2018 2:22 pm

Concept:

I was interested in a visualization that would resemble a pin board for the purpose of side by side comparison. The pinboard would be staggered back on its Z-axis but still maintain a side by side view. I also would have liked to incorporate some line relationships between the "swatches" (similar to the image below), but was too unfamiliar with the HE_MESH documentation to isolate exactly where each dot was drawing in the matrix, so I needed to scrap that last aspect of the idea.

Image

this is how far i got (FRONT VIEW):
Image

and SIDE VIEW:
Image

Discoveries:

I found out that the HE_MESH library (a Java library for creating and manipulating polygonal meshes) was incompatible with certain features of some libraries, and completely prevented others from opening at all! For example, the HE_MESH library impacts the HUD function in peasy cam, leaving me unable to overlay text on the camera. Additionally, there is something that affects the zoom in orthogonal view in P3D. I do not know enough about the HE_MESH documentation to isolate the problem completely, but I would not recommend using this library if you can help it! I was working on a project unrelated to this class and noticed that I could not even run the basic video library released by the Processing foundation when I had HE_MESH installed.

view of project in perspective view:
Image

Changes:

If I could do this project again, I would fix the spacing thats occurring in each of the boxes (whenever I attempt to shift the Z-axis it throws the X and Y axis off kilter), and I would probably redraw the matrix of dots by hand, instead of using HE_MESH to generate it. That way, I would have more control over the location of each dot, and hopefully be able to map relationships between the swatches on a Z axis.
Attachments
Echo3D.zip
(9.47 KiB) Downloaded 151 times

bensonli
Posts: 4
Joined: Fri Jan 19, 2018 11:04 am

Re: 3D Visualization

Post by bensonli » Tue Feb 27, 2018 3:12 pm

Dynamical Representation of Supply and Demand
Concept
For this project, I would like to map statistics from my dataset to the parameters of a physical system. The specific statistics are:
1. the current supply in the library of each "title" that I am considering
2. the time difference between when a specific book is checked into the library and when it is checked out again.

These values will be mapped respectively to:
1. The strength of a 1/r radial field.
2. the strength of a 1/r circulation field. also the color of the particles.

This field is what will dictate the motion of many particles, which will exist inside a box. Because the values of the current supply and time difference will change over time, the particles' motions will thus be governed by a time varying acceleration field.

To summarize, the behavior of the particles in relation to the supply and demand is roughly:
1. as supply goes up, the strength of the radial field increases and pulls the particles into the center, making the system more ordered.
2. as supply goes down, the radial strength decreases , so particles are able to explore a greater region of the cube. To emphasis this, I have added a springlike noise field.
3. as demand goes up, circulation goes up, and the system will move circulate faster. Note that I will be using an exponentially smoothed time_diffs, rather than a average or instantaneous one.


Work in Progress
Sql Query

Code: Select all

SELECT 
    itemNumber, title, cout, cin
FROM
    spl_2016.inraw
WHERE
    (deweyClass< 5.121 AND deweyClass> 5.09)--> all same except (6.451,6.29) (4.671,4.59) (5.134,5.132)
ORDER BY itemNumber
Python script
This python script takes the the csvs generated from the sql query and converts it into a time_diff and influx array for each title. The steps required to do so are:
1. shift checkin column down relative to checkout
2. calculate diff across the row
3. filter out negative time diffs and time diffs over 150(to prevent outliers from significantly affecting the data)
4. convert checkout and checkin time to relative position in an array
5. creates the influx array by adding +/- 1 to the influx array according to the relative position from step 4.
6. "hashes" the time _diffs from step 3 with the hash being the value from step 4 and average with the influx array created in step 5 element wise.

7. Along the way, also calculates max supply for each title,as well as a global max supply and global max/min time_diffs

Code: Select all

import datetime
import pandas as pd
import numpy as np

path1='/home/benson/Dropbox/Code/Projects/Mat259_3D/raw_data/programming_languages.csv'
path2='/home/benson/Dropbox/Code/Projects/Mat259_3D/raw_data/networking.csv'
path3='/home/benson/Dropbox/Code/Projects/Mat259_3D/raw_data/machine_learning.csv'
path4='/home/benson/Dropbox/Code/Projects/Mat259_3D/raw_data/software.csv'

paths = [path1, path2, path3, path4]


def data_extractor(path, number_of_days=4500):
    number = number_of_days
    df = pd.read_csv(path)
    df['cin'] = df['cin'].shift(1)
    shifted_df= df.groupby('itemNumber').apply(lambda group: group.iloc[1:])


    ################################################################
    def stringToDatetime(string):
        return datetime.datetime.strptime(string, '%Y-%m-%d %H:%M:%S')

    shifted_df['cout_date']= shifted_df['cout'].apply(stringToDatetime)
    shifted_df['cin_date']= shifted_df['cin'].apply(stringToDatetime)
    shifted_df['time_diff_date']= shifted_df['cout_date']- shifted_df['cin_date']

    shifted_df['time_diff']= shifted_df['time_diff_date'].apply(lambda date: date.days)
    ################################################################
    ## Creating a new dataframe for better readability
    filtered_df = shifted_df[shifted_df.time_diff>0]
    cutoff = 150
    def frac_over(series):
        over_sum =0
        for s in series:
            if s>cutoff:
                over_sum += 1
        total = len(series)
        return over_sum/total

    # print frac to make sure we arn't cutting off too much
    print("If this percentage is too high, increase the cutoff variable (current value = {})in this script: {}".format(cutoff, frac_over(filtered_df.time_diff)))
    filtered_df = filtered_df[filtered_df.time_diff<cutoff]
    max_time_diff = filtered_df['time_diff'].max()
    print("Max time diff: {}".format(max_time_diff))
    min_time_diff = filtered_df['time_diff'].min()
    print("Min time diff: {}".format(min_time_diff))
    supply = len(list(filtered_df['itemNumber'].unique()))
    print("Max supply: {}".format(supply))
    ################################################################
    def base_diff(string):
        start= datetime.datetime(2006,1,1,0,0)
        end = datetime.datetime.strptime(string, '%Y-%m-%d %H:%M:%S')
        diff=end-start
        return diff.days

    filtered_df['cout_time']= filtered_df['cout'].apply(base_diff)
    filtered_df['cin_time']= filtered_df['cin'].apply(base_diff)
    # get max time to make sure there is no error
    print("latest checkout time: {}".format(filtered_df['cout_time'].max()))
    print("latest checkout time: {}".format(filtered_df['cin_time'].max()))
    ################################################################
    def checkout_array():
        a=np.zeros(number, dtype ='float64')
        for time in filtered_df.cout_time:
            a[time] +=1
        return a
    def checkin_array():
        a=np.zeros(number, dtype = 'float64')
        for time in filtered_df.cin_time:
            a[time] +=1
        return a
    def timediffs_array():
        a=np.zeros(number, dtype = 'float64')
        for i,time in enumerate(filtered_df.cout_time):
            a[time] += filtered_df["time_diff"][i]
        return a

    checkouts = checkout_array()
    checkins = checkin_array()
    time_diffs = timediffs_array()
    final_time_diffs = np.divide(time_diffs, checkouts,
            out= np.zeros_like(time_diffs), where=(checkouts!=0))
    # checking with my jupyter notebook
    print("Average checkouts per day: {}".format(checkouts.mean()))
    print("Unnormalized time diffs average: {}".format(time_diffs.mean()))
    print("Normalized time diffs average: {}".format(final_time_diffs.mean()))

    influx = checkouts - checkins
    return influx, final_time_diffs, supply, max_time_diff, min_time_diff

number_of_days = 4500
dataset= np.zeros((number_of_days,2*len(paths)), dtype='float64')

if __name__ == '__main__':
    global_max_time_diff =0
    global_max_supply =0
    global_min_time_diff =0
    for i,item in enumerate(paths):
        print("Logs for title {}".format(i))
        influx, final_time_diffs, supply, max_time_diff, min_time_diff = data_extractor(item, number_of_days)
        dataset[:,2*i] = influx
        dataset[:,2*i+1] = final_time_diffs
################################################################
        if max_time_diff>global_max_time_diff:
            global_max_time_diff= max_time_diff
        if supply> global_max_supply:
            global_max_supply = supply
        if min_time_diff< global_max_time_diff:
            global_min_time_diff = min_time_diff
################################################################
    print("Change the global_max_supply variable in the Mat259_3D pde file to: {}".format(global_max_supply))
    print("Change the global_max_time_diff variable in the Mat259_3D pde file to: {}".format(global_max_time_diff))
    print("Change the global_min_time_diff variable in the Mat259_3D pde file to: {}".format(global_min_time_diff))
    np.savetxt("dataset.csv", dataset, delimiter = ',')
        

Processing Code
v0.2: Initial system of box particles constrained to a box
3D_1.png
v0.3: changed from particles bouncing to a flowfield
3D_2.png
v0.4: noise was too big, so I turned it down
3D-3.png
v0.5: I changed from a velocity field to a flowfield
3D_4.png
v0.6: now I have the 4 systems set up
3D_5.png
v0.7: now each individual system is dependent on the data
3D_6.png
v0.8: increased max speed from 10 to 40, so particles can better resist the field
3D_7.png
v0.9: added labels, colors, buttons and time lapse. But buttons are not functionally yet.
3D_8.png
v1.0: added background, framerate indicator, and changed colors and fonts of buttons. Buttons are now functional.
3D_9.png
Analysis
Although things look somewhat nice, the ability to distinguish high demand and low supply(which indicates that the library should perhaps buy more books in this genre) is lacking. One reason for this was my choice of dataset. Because I chose categories rather than specific titles, the current supply would stay relatively stable, and was always above 90 percent. And, because I had a max speed constraint, accelerations could not accumulate, thus flattening the behavior of the particles. This is somewhat mitigated by the color encoding.
Attachments
Mat259_3D.zip
(8.82 MiB) Downloaded 150 times
Last edited by bensonli on Fri Mar 09, 2018 6:37 pm, edited 1 time in total.

admjahnke
Posts: 5
Joined: Fri Jan 19, 2018 11:07 am

Re: 3D Visualization

Post by admjahnke » Tue Feb 27, 2018 11:29 pm

Description:
This 3D visualization is meant to reflect all the different types of DIY (Do It Yourself) media offered through the Seattle Public Library.
The mySQL search was fairly basic but yielded over 5000 results. I did end up editing out close to 500 results. The eliminated results did not contain a checkout date or checkout time.

Code: Select all

SELECT
 bibNumber, itemType, COUNT(bibNumber) AS Counts
FROM
 spl_2016.inraw
WHERE
 title Like '%DIY%'
GROUP BY bibNumber , itemType
ORDER BY Counts DESC 
Attached are some screen shots with the CSV file exported to my processing model.

The Y axis of the cube, 7am and 2pm, is the checkout time of the displayed media. The X axis measures the checked out media for the years between 2006 and 2015. When the display is rotated, the Z axis of the model, reflects the frequency of checkouts. The checkouts for any given media tops out at 155. There are a number of outlier dots. Those dots in the Excel file do not contain a date or a time. Red = Books, Green = Dvds, Yellow = Cd's, Turquoise = Books as will (jcbk).
Attachments
DIY_Project_3_AdamJahnke.zip
(5.56 MiB) Downloaded 144 times
Screen Shot 2018-02-27 at 9.57.37 PM.png
Screen Shot 2018-02-27 at 9.54.15 PM.png
Screen Shot 2018-02-27 at 9.54.00 PM.png
Screen Shot 2018-02-27 at 9.52.25 PM.png
Screen Shot 2018-02-27 at 9.51.56 PM.png

annikatan
Posts: 6
Joined: Fri Jan 19, 2018 11:03 am

3D Visualization: Feminism

Post by annikatan » Thu Mar 01, 2018 3:37 pm

CONCEPT
I followed the same theme of using terms like ‘feminism’ and ‘feminist’ from my first projects. I was inspired by the examples from the HE_Mesh library. My goal was to build a spherical harmonics or a shape similar to that. Spherical harmonics are used in mathematics and physical sciences. They are special functions and complete set of orthogonal functions defined on the surface of a sphere.

DATA ATTRIBUTES
12,388 datasets
elements (string) - categorized under dewey classification
times (time) - item checkout time
dates (date) - item checkout date
counts (integer) - Categorized under collection code
titles (string) - item title

X-axis: time
Y-axis: dates
Z-axis: counts
color: elements

QUERY
Duration: 0.962 sec / 59.221 sec

Code: Select all

SELECT title, bibNumber, collcode, deweyClass, TIME(cout) AS times, DATE(cout) AS dates
FROM spl_2016.outraw
WHERE
	(title LIKE "%feminism%") 
    OR (title LIKE "%feminist%")
LIMIT 50000;
ANALYSIS
My visualization builds 4 stems. It’s difficult to detect the relationship between the axes since the points seem sporadically random. We can see the most popular dewey classification category is in Social Studies (golden orange). I also noticed the dewey class without identification (None; white) is mostly concentrated towards the center.

IMPROVEMENTS
In the future, I would want to focus on the months and years instead of the checkout time. Since a single item or an item with the same title can be checked out multiple times, I would conceptually lose more data points. My original SQL code keeps count for item checkout per year-month. Graphing the dataset on this information would allow viewers to see chronological changes with checkout spikes and dips. This can be due to popular feminist movements and the Women’s March.
mat_proj3.zip
(234.71 KiB) Downloaded 141 times
Attachments
Screen Shot 2018-03-07 at 1.23.04 AM.png
Screen Shot 2018-03-07 at 1.17.35 AM.png

Post Reply