4 Student Defined Final Project

glegrady
Posts: 138
Joined: Wed Sep 22, 2010 12:26 pm

4 Student Defined Final Project

Post by glegrady » Wed Jan 04, 2017 6:29 pm

Final Assignment Definition


Each student chooses their datasource. Anyone wanting to continue with the Seattle data can do so.

Software to be used is Processing.

Please provide as with any relevant information pertaining to your project such as brief concept description, sketch, sequel query, data, screen shots, analysis and code.

As with the previous assignment, the three critical issues to address are:

Innovation in content : your query question and outcomes. How original, engaging, unusual, your query, or your approach to the query may be, and how interesting the data may be. The data has to be multivariate, and granular (meaning a lot of data) so that we can see patterns forming in the data.

Innovation in design/form : The design needs to go beyond our demos. Areas of exploration are in how you use space, form, colors, data organization, timing, interaction, coherence, elegance, etc.
Interaction with the Data: Each student selects their own data but you can also continue with the Seattle data if you want. Each student selects the topic of what the final project will address.

Computation : The third evaluation is the computational component. First of all, the code needs to work. Special consideration will be for unusual, elegant expression, utilizing functions, algorithms, etc that you can introduce to the class.
George Legrady
legrady@mat.ucsb.edu

sara.lafia
Posts: 5
Joined: Wed Jan 11, 2017 10:41 am

Sara Lafia Final Project *updated 3/16/17*

Post by sara.lafia » Wed Mar 08, 2017 9:29 pm

Concept
This project exploits the semantics of graduate student research by classifying theses and publications into research topics, based on their free-text descriptions. This visualization provides an alternative view of research trajectories at UCSB; rather than siloing graduate research by department, as most research is reported, a treemap visualization of topics is employed, showing thematic groupings over time.

Query
The data were obtained by scraping the Alexandria Digital Research Library using several Python libraries, including Beautiful Soup. This was done in the absence of a dedicated API or other means of bach downloading document metadata. Permission to access and publish the data online was obtained from the UCSB Library. The data, which are graduate student theses and dissertations, are those which have been made available through an open access agreement. The fields obtained for each document include: Title, Year, Author, Department, Degree Supervisor, and Description. This script returned records for the 1,730 theses and dissertations completed from 2011-2016, which were loaded into a csv table.

Process
The first version of this treemap series showed change in publication counts over time by university department. This was not particularly insightful, as the treemap hierarchy makes larger departments, such as engineering, more visually salient, dwarfing smaller research units, such as art. LDA topic modeling was applied to the document descriptions, using the MALLET tool, to create thematic groupings. An optimal number of topics between 30 and 100 was tested; the best fit was 70 topics. Each document is thus assigned to its most probable topic, which are coded from 0 to 70. The area of each square in the treemap translates to topic document quantity. A cluster tiling algorithm is applied to the topics to configure them into a planar, space-filling map, which changes based on a topic’s area over time. Tracking the movement of the square in the time slices is thus a proxy for assessing the change in thesis or dissertation counts for each department for each of the 70 generated research topics.

Result
Several interesting trends are visible. Research areas, such as work done on climate, appears to increase slightly in volume in more recent years. However, there is a lag in research, so such conclusions should be drawn with caution. The university library, particularly the IRC, would be interested in this visualization as one of several research outputs. Displaying this visualization in such a space would facilitate discovery of research happening across campus that transcends academic boundaries. Grouping the documents by topic, rather than department, is a first step toward this goal.

Next Steps
As this is related to my ongoing research to apply spatialization frameworks to non-spatial information, future work could track a variety of other relationships, such as advisor-student collaborations, cross-disciplinary collaboration. This visualization, which interesting at a high-level view of university level productivity and trends, is not particularly granular although the data obtained are far more detailed. Combining both the department and topical views of research could allow for further exploration of individual documents, as cells clustered within the treemap.

Interface
export1.png
Subject Treemap - Climate
Code - updated (3/16/17)
SaraLafia_Final3.zip
Final Project Code
(1.47 MiB) Downloaded 52 times
Last edited by sara.lafia on Thu Mar 16, 2017 1:07 pm, edited 2 times in total.

wolfe
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 4 Student Defined Final Project

Post by wolfe » Thu Mar 09, 2017 2:44 pm

Concept
Visualize fiction titles using word2vec mapping for words in 3 dimensions, and graphing the titles on the resulting word cloud.

Training Data
I am using the google news word vectors described here:
https://code.google.com/archive/p/word2vec/
and accessible here:
https://github.com/mmihaltz/word2vec-GoogleNews-vectors
After that data is loaded I am continuing the train the data using gensim a topic modeling library:
https://radimrehurek.com/gensim/
The resulting vectors are brought down to three dimensions using t-SNE scikit learn:
http://scikit-learn.org/stable/modules/ ... .TSNE.html

Next Steps
-Continue to work on analyzing the data
-Use a larger collection of book titles
-Potentially create a timeline and be able to see the process of books being sorted
-Continue to explore how to visualize this data
-Create lookup table for titles so it doesn't have to run each time
-Look at user interface, make it more intuitive

Progress Screenshots

"aunt" selected when viewing the word cloud
aunt_wordView.png
book selected when viewing the title cloud
aNovel_titleView.png
"fire" hovered over when viewing the title geometries
fire_geometry.png
fire selected when viewing the title geometries
fire_geometry_selected.png
there is an interesting divide in the titles when looking at genre
theSCIFI_MYSTERY_divide2.png
-------------------------------------------------------------------------------------------------------------------------------
Update:

Revised Concept
Visualize fiction titles using word2vec mapping for book's subjects in 3 dimensions, and graphing the titles and subjects on the resulting word cloud.

Training Data
I am getting the fiction titles and subjects from Open Library:
https://openlibrary.org/data#bulk_download
I analyzed the books subjects using LDA from the gensim python library:
https://radimrehurek.com/gensim/models/ldamodel.html
I then trained the subject words on the google word2vec vectors described above.

Interaction
Each LDA defined subject is given a separate color which can be turned on and off.
The titles can be viewed in Title Mode, where the vectors can be turned on and off.
The word cloud can be used to see what books have subjects that include the second word, the word subject information can be turned on and off.
The geometry mode is pretty, slow and relatively useless.
When a point is hovered over the words/titles appear, the user can use the up and down arrow keys to cycle through the words/titles. The right and left arrow keys can be used to select a specific word/title and see the connections related to it.

Result
Overall the project works best in title view when comparing two separate datasets or when in word view seeing which books are similar. For example in the title view "Fiction/Character/Fictitious/Mystery/Detective/Investigator" or the mystery category that I think of it as deviates the most from all other groups, particularly from the "kids books" and "christian/life/friendship books". The "Fiction/State/Unit/Murder/War/Historical/Civil" group also separated itself a lot from the "Fiction/English/Women/Young/Jew/Lesbian" group, and Fiction/Relationship/MANWOMAN,Travel,Person,Magic,Teenager" group. I see this as "War/Mystery/Suspense" separating from books targeted at older and younger women focusing more on relationships. The vector mode can be useful for visually seeing the divides.

The word view is interesting because you can see what words are associated with which topics and the relationships between them. For example most books with prarie in the title are under "christian/family" books, while indians is a combination of those books (pilgrim/indians) and "war/mystery" books (war/indians).

Final Screenshots
All the topics visualized at once
01-rainbow.png
Two topics shown
02-juvenile+myst2.png
The subject path of a title selected
03-title-selected.png
Two views of the same data with vectors turned on and off
04-state+women2.png
05-state+women.png
Two topics shown with vectors turned on
06-relation+state.png
Subjects associated with "prairies"
07-word-praries.png
Subjects associated with "indians"
08-word-indians2.png
Geometry view with indians selected
09-geometry-indians.png
Pretty zoomed in of geometry view surrounding "needles"
10-geometry-needles-selected.png
11-geometry-needles.png
Source code
HannahWolfeFinalProject.zip
(23.12 MiB) Downloaded 52 times
Last edited by wolfe on Sat Mar 18, 2017 1:42 pm, edited 2 times in total.

jingyi_xiao
Posts: 5
Joined: Wed Jan 11, 2017 10:43 am

Re: 4 Student Defined Final Project

Post by jingyi_xiao » Fri Mar 10, 2017 11:48 am

I found an interesting dataset American Time Use Survey(ATUS)https://www.bls.gov/tus/ from Bureau of Labor Statistics. I think this dataset is really interesting and I'm going to explore the temporal and spatial characteristic of American time usage.

Data (2015)
o 10905 interviewees
o basic info: age, sex, location (which state they live), salary (week earning, some no data because this is sensitive), whether Monday, Tuesday or … the interviewee got interviewed, and so on. (file name: info_2015)
o Time spending data (around 180,000 data): from 4am – 4am the second day, so 24-hour activities. (file name: activity_2015)
All activities like sleeping, eating and so on are classified into different groups with unique code, like sleeping is 1.

Visualization
The default setting is age 15-24, gender male, no children nor elder, weekday, no data about weekly earning and working hours. Different color represent different color. (color doesn't mean anything, randomly chose).

Top view
picture1.png
Front view
pciture2.png
Check "Label" and "transparency" checkbox, mouse over a vertex at margin
picture3.png
Check "Annotation" checkbox
picture 4.png
Check "Overview" checkbox ( All data display together. Overall it's kind of a nice pattern, but because of the huge data it has to render, it's really slow if you want to change angles. And it takes a lot of memory to render, so I don't suggest run this mode for a long time.)
picture5.png
Check different age, gender group to see different patterns.

Code
FinalProject_Jingyi.zip
(749.77 KiB) Downloaded 49 times
Last edited by jingyi_xiao on Sat Mar 18, 2017 9:56 am, edited 2 times in total.

brooksjaredc
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 4 Student Defined Final Project

Post by brooksjaredc » Fri Mar 17, 2017 10:31 am

The Life and Death of the Sun

For my final project I used data from my own research. I used the stellar evolution software MESA (http://mesa.sourceforge.net/) to simulate a 1-dimensional model of a star with the same mass as the Sun from its birth to its death. I recorded the burning and mixing regions in the model, along with the mass, radius, wind rate, surface temperature, and the elemental abundances.

I went through a few different visualizations before landing on my final idea. The first was based off cylinders. The center cylinder showed the mixing and burning regions by mass coordinate, with the main abundances as smaller cylinders on the sides that were colored by the burning rate of that element. The background was black if there was weak-to-no wind and white if there is a strong wind. There was also a sound aspect that I got rid of because it was really annoying.
Screen Shot 2017-03-14 at 10.32.11 AM.png
Screen Shot 2017-03-14 at 10.33.08 AM.png
Screen Shot 2017-03-14 at 10.33.35 AM.png
After going through a couple more ideas, I settled on the visualization shown below where the mixing and burning regions are shown as rectangular boxes in mass coordinate from center on the right to surface on the left, and each timestep is laid along the y-axis. Since not all the data can fit in the visualization at one time without significant lag, I have the boxes moving towards the viewer and only render 200 timesteps at a time. The surface is capped by a yellow square to show the mass of the model as its changes through its evolution.
Screen Shot 2017-03-17 at 10.38.13 AM.png
Screen Shot 2017-03-17 at 10.38.40 AM.png
I also have a sphere that sits above the boxes and changes its radius based of the radius of the model, and the surface temperature of the model is roughly mapped to the color of the sphere. There are tick marks to show the scale of the radius of the model.
Screen Shot 2017-03-17 at 10.39.17 AM.png
Screen Shot 2017-03-17 at 10.39.37 AM.png
I moved the elemental abundance cylinders into 2D bars on the HUD but work essentially the same. The wind rate to background color is also the same.
Screen Shot 2017-03-17 at 10.40.13 AM.png
Screen Shot 2017-03-17 at 10.41.09 AM.png
There is also the option to pause the animation with the spacebar and can be speed up and down with the right and left arrow keys. The HUD can also be toggled by pressing 'h'.
Screen Shot 2017-03-17 at 10.41.22 AM.png
Screen Shot 2017-03-17 at 10.41.43 AM.png
The visualization is a fun and engaging way to show people how a star like the Sun evolves through its life-cycle. I am happy with the way it turned out.
Screen Shot 2017-03-17 at 10.41.57 AM.png
Screen Shot 2017-03-17 at 10.42.29 AM.png
Screen Shot 2017-03-17 at 10.42.45 AM.png
Here is the code:
kipp_hills.zip
(1.51 MiB) Downloaded 44 times

ariellalgilmore
Posts: 7
Joined: Wed Jan 11, 2017 10:40 am

Re: 4 Student Defined Final Project

Post by ariellalgilmore » Fri Mar 17, 2017 2:24 pm

Trends of the Top Crunchbase People

My final project I decided to take the top ten people off https://www.crunchbase.com/app/search/people and show their google trends. Crunchbase "is the leading platform to discover innovative companies and the people behind them." I had an internship in a Equity Crowdfunding company (similar to Kickstarter) that helped me discover this website. For each person listed, Crunchbase also lists their primary company they are related to. I then used Google Trends to see how often each person and their company were being searched over the past 5 years. This gave a weekly report of how many times each person/company was being googled.

After collecting all this data, I was unsure of what type of visualization I wanted to do, so to get ideas to inspire me I looked at openprocessing.com. I started with a few ideas that I thought would give an interesting perspective, but had a hard time transforming it into a 3D visualization. I struggled a lot with this, but eventually came across one that I thought would show the information the best and in a unique way. I used this idea https://www.openprocessing.org/sketch/409820 and was able to distort it into my own creation

Visualization:
Each person has it's own shape created from different quads. The length of each quad is dependent on how many times that person was searched in that given week. As you can see in the image below, Mark Zuckerberg was searched a lot during a period in 2012, while Steve Jobs is constantly being searched a lot from 2012 to 2017. The shape is also rotating each quad at a certain angle and that angle depends on how often there company is being searched. Because this is very unclear to see, but just added an interesting effect to the shape, I also added color which reinforced how often the company was being searched. Again looking at Mark Zuckerberg, one can see that Facebook was being googled a lot more in 2012 then 2017. The reason Microsoft is all the pink color is probably people trying to constantly figure out how to use or download Microsoft on to there computer. The first image, I used the colors pink and blue because in my opinion I liked those colors the most.

Controls:
I came across a few other colors though that also helped show the data, so I decided to add a slider, so that the viewer is able to choose what colors they want to see. In the second image, I set the values to zero, created a brighter red when the company is searched more and a dark red when the company is not searched as much. The rotate button at the top rotates each person's shape, so some of the quads might become more visible if they are being obstructed by the angle they are positioned at. Also, if you press the "f" key it will fill the each shape, which might give the viewer a clearer perspective and pressing "d" will reset the fill. Any other key will rotate the entire image, so one can view it from the side or back to get a different perspective.
Screen Shot 2017-03-16 at 2.12.03 PM.png
Screen Shot 2017-03-16 at 2.12.30 PM.png
Screen Shot 2017-03-16 at 2.13.13 PM.png
Analysis:
I thought it was really interesting to see the comparison between the shape and colors to all the top ten Crunchbase people. Some people we can see are getting searched a lot exactly when there company is being searched a lot, while others might just get searched a lot all the time, but there company is only being searched so often. We can see the difference looking at Mark Zuckerberg for example was searched a lot in 2012 and Facebook was searched a lot in 2012. This was the moment when Facebook surpassed 1 Billion users. The shape becomes larger again in 2016 and is still on the pink side in 2016 because this is when they announced Facebook Live. On the other hand, since 2012 till 2017 Steve Jobs seems like he is constantly being searched a lot because the shapes are larger, but a large majority of them are colored blue suggesting that Apple is not being searched that often. The few moments when the shapes are the bright pink are I assume when Apple is announcing or releasing new products. The people with smaller shapes, such as Evan Spiegal (snapchat), was not popular until closer to 2017.
final8.zip
(12.35 KiB) Downloaded 37 times

freeberg
Posts: 5
Joined: Wed Jan 11, 2017 10:39 am

Re: 4 Student Defined Final Project

Post by freeberg » Sat Mar 18, 2017 9:42 am

WikiViz: Visualizing ~16,000 Wikipedia articles by their text-based Principle Components.

The visualization opens with each Wikipedia article represented as a sphere in 3D space. Each sphere is positioned by their 1st, 2nd, and 3rd Principle Components. The remaining Principle Components determine the size and color of the sphere. Interestingly, under certain conditions, 97% of all Wikipedia articles can be linked back to Philosophy by recursively clicking the first hyperlinked article in the main body. The user can click the numbered buttons at the top to remove "layers" of articles from the visualization. A "layer" is how many steps the article is from Philosophy... so Philosophy is at layer 0, Empiricism is at layer 1, and so on. The drop down text box allows the user to select an article by title, and see its path traced back to Philosophy. An article at layer 2 will have a shorter path than an article at layer 6. Many of these paths are logical, for example:

Philosophy -> Empiricism -> Aristotelianism -> Thomism -> Value (ethics) -> Wrongdoing -> Murder

But some paths are more surprising:

Philosophy -> Cubism -> Indigenous People of the Americas -> Native Americans -> Washington Redskins Name Controversy

There are some other tidbits of interactivity: The "HELP" button at the bottom right will give the user some background information and the instructions to interact with the visualization. Pressing "1" will remove the axes, "2" removes the HUD, "3" inverts all colors, "4" brings up the help menu.

Overall I am happy with the visualization. There was actually a happy accident in the data collection! I misplaced a step in a recursive function within the Python web scraper and it went far deeper into Wikipedia than I planned. However, the extra depth showed some very long and interesting paths like the two listed above. Also, all the text rotates to be easily readable for the user. So even if the user flips and rotates the visualization, all the information is easily ingestible.

If I could go back and redo the project or plan for a second iteration, there are some aspects I would change. Firstly, I had to cut a corner and run PCA on batches at a time because my computer could not handle it all at once. So it would be advisable to buy some space on AWS and do the PCA there. That may show more distinct “neighborhoods” of articles in the visualization.
Attachments
colorsInverted.png
helpMenu.png
rollOverLabel.png
murderPath.png
main.png
wikiViz.zip
code + data
(1.29 MiB) Downloaded 29 times

christopherchen0
Posts: 5
Joined: Wed Jan 11, 2017 10:44 am

Re: 4 Student Defined Final Project

Post by christopherchen0 » Mon Mar 20, 2017 3:40 pm

My goal for this project was looking at interesting ways to create connections within a set of data. I knew from the start, I wanted to do something with music as I have a somewhat familiar relationship with it. I turned towards Spotify – a music streaming/social media platform because it had an interesting feature I had previous taken for granted when using the application: the related artists tab.

When you look at an artist’s in Spotify, you might notice a feature called “Related artists” in which the application compiles a list of 20 artists that are connected to the original musician – handy for finding new music that you might like. According to their posts online, these connections are made through user-created playlists. If Artist A is placed in many different playlists with Artist B, those links are eventually created. (Of course there are some odd recommendations that may be made in such a system – one user complained about how their calm Christian rock was apparently similar to death metal). For our purposes we’re using a well-known artist: Radiohead, so such anomalies might not be created.

First, the gathering of the data: I wrote some Javascript to gather the information. I knew it had to be in a JSON format and I knew I would have to make recursive searches – one query would lead to 20 more, added to the end of an array. From those 20 new searches, I would select the top one and get 20 more and appended to the end (array.push). I found fetch() hugely useful. Here’s the code:

Code: Select all

<!DOCTYPE html>
<html>

<head>
    <title>Display Name</title>
</head>

<body>

    <script>
        count = 0;
        var dataArr = [];
        var firstEntry = {
          "external_urls": {
              "spotify": "https://open.spotify.com/artist/4Z8W4fKeB5YxbusRsdQVPb"
          },
          "followers": {
              "href": null,
              "total": 2016328
          },
          "genres": ["alternative rock", "indie rock", "melancholia", "permanent wave", "rock"],
          "href": "https://api.spotify.com/v1/artists/4Z8W4fKeB5YxbusRsdQVPb",
          "id": "4Z8W4fKeB5YxbusRsdQVPb",
          "images": [{
              "height": 640,
              "url": "https://i.scdn.co/image/afcd616e1ef2d2786f47b3b4a8a6aeea24a72adc",
              "width": 640
          }, {
              "height": 320,
              "url": "https://i.scdn.co/image/563754af10b3d9f9f62a3458e699f58c4a02870f",
              "width": 320
          }, {
              "height": 160,
              "url": "https://i.scdn.co/image/4067ea225d8b42fa6951857d3af27dd07d60f3c6",
              "width": 160
          }],
          "link": null,
          "name": "Radiohead",
          "popularity": 76,
          "type": "artist",
          "uri": "spotify:artist:4Z8W4fKeB5YxbusRsdQVPb"
        }

        dataArr.push(firstEntry);

        function getData(id) {
            console.log("shit's happening")
            fetch("https://api.spotify.com/v1/artists/" + id + "/related-artists")
                .then(function(response) {
                    return response.json();
                })
                .then(function(SpotifyData) {
                    var infoLength = SpotifyData.artists.length;
                    for (infoIndex = 0; infoIndex < infoLength; infoIndex++) {
                        SpotifyData.artists[infoIndex].link = id
                        dataArr.push(SpotifyData.artists[infoIndex])
                    }
                    count++
                    if (count < 421) {
                        getData(dataArr[count].id);
                        console.log(dataArr);

                    }
                })
        }

        getData(firstEntry.id);
    </script>
</body>

</html>

</script>
</body>

</html>
For the design of the project, I played around with different ways of representing the correlation. The one I showed on Thursday was a failure, though an…interesting attempt. My hesitation to use lines connecting data points was that it felt too simple – and not intricate enough. But by playing through enough different designs, I eventually settled on one I found challenging and intricate enough.
spotifynetwork2.png
Mostly, though, I just wanted to see what the thing would look like.
I settled on a conical design. Each data point, which spawned 20 further searches, would be the pointy top of a cone. 20 lines radiated out from that one point, before drawing each of the other cones. I decided to create 4 generations of searches: the first generation, which was just Radiohead. That gave me the second generation, 20 artists that were similar to Radiohead. The third gave me 400, and the fourth gave me 8000.
spotifynetwork3.png
To finish it off, some finishing touches: I created node points in between the lines indicating the artists. The data itself came with a host of useful information: artist names, image URLS, genres, follower counts, popularity, etc. Their transparency correlates to their follower count on Spotify, and the point sizes themselves correspond to their current popularity, according to Spotify. Then came the roll-over effects. I displayed the artist names on hovering, in addition to genre(s), follower count, and their primary image in Spotify.
spotifynetwork7.png
NOTE/TIP: When navigating my project, it might be very handy to have a mouse with a scroll wheel…using peasycam’s controls, it might be best to CLICK DOWN MIDDLE CLICK AND DRAG in order to pan in a given direction.

spotifynetwork4.png
Attachments
spotifynetwork.zip
(1.46 MiB) Downloaded 24 times

kschlesi
Posts: 6
Joined: Wed Jan 11, 2017 10:42 am

Re: 4 Student Defined Final Project

Post by kschlesi » Wed Mar 22, 2017 10:56 pm

3D Final Project: Capital Bike Flow

Concept: With this project, I chose to explore the visualization of transportation data. I was specifically interested in finding a design to highlight interesting patterns in the flow of traffic between different points, without relying on geographical locations and routes. I chose to use publicly available data from Capital Bikeshare, Washington, DC’s bike share system (http://www.capitalbikeshare.com).

Data: Obtaining the data for this project was as simple as downloading some zipped files from the Capital Bikeshare website. However, since this data simply consisted of large .csv files listing the characteristics of millions of bike trips, figuring out the best ways to work with the data took more effort. I ended up creating my own locally hosted MySQL database so that I could store the data and design queries. Although it took some time to resolve formatting inconsistencies in the original data, this setup made the analysis much easier.

After looking through multiple years of data, I decided to focus on data from about 2.5 million bike rides from Jan - Sep 2016. It includes the start and end times, duration, start and end bike stations, bike ID, and account type of the user for each ride. Eventually I would like to compare data from different years and especially from different cities, but the work involved in cleaning and processing data from other cities wasn't realistic for this project.

Process: I treated the data as a network of 392 bike stations and the routes between them, and used network properties to determine the 3D layout. First, I used a force-directed graph layout algorithm (implemented with the graphviz algorithms found in the python graph module "networkx") to find the station positions in 2D, keeping each pair of stations closer together if they had more total rides between them. To determine the 3rd coordinate of each station, I tried a few different graph metrics that quantify the role of each node in the graph. These choices create interesting 3D shapes for the city based on its bike riding patterns.

My initial idea was to show each bike ride separately as its own data point in the visualization of the bike routes between stations, but the variation in number of rides between each pair of stations, and the sheer number of rides overall, made this too slow and unwieldy. Instead, I found the average properties of the rides on each route by month, and drew a curve representing each route, whose shape was determined by these properties. Since the many curves in the graph caused a lot of visual clutter, I made them mostly translucent, and only plotted those bike routes that had at least 200 total rides. I added interactivity that allows the user to highlight a station’s tours by mousing over them, and to make the other stations invisible.

Overall, I am fairly happy with the way the visualization turned out and how I was able to capture different aspects of station and route relationships with 3D positioning. The hardest parts of this project were working out issues with the display and camera, and preprocessing the data: even after performing MySQL queries, I had to write several python scripts to perform the clustering analysis, and to re-format and threshold the data in order to reduce the sketch loading time.

Final Visualization:

Each bike station is represented by a white or gray dot, whose 3D position is determined by its properties in the network or graph of bike rides. First, the stations are clustered in 2D, using a force-directed graph layout algorithm. This method simulates the dots as if they were connected by mechanical forces such as springs, and arranges them to satisfy the forces by minimizing the spring energy. I used the number of rides between each pair of stations to set the force strengths; this means that stations with more rides between them will tend to be placed closer together, and those with very few rides or no rides between them will be pushed further apart. Running this algorithm provides the 2D positions (as seen from the top view).

The height (third position coordinate) of each station is then determined by either the station's "degree of connectivity" or its "betweenness centrality." (Pressing "d" and "c" will switch between these options.) Stations with a higher degree of connectivity have routes between them and many other stations, while those with a low degree only have routes to a few other stations. Stations with a high betweenness centrality are on the most efficient paths of travel through the network, and are important "links" between different areas of the network. Examples of these are the stations right on the river between DC and Virginia, or other places where there are only a few paths between different clusters of stations.

The routes between pairs of stations are represented by orange or blue curves. Orange lines represent rides by long-term users (locals) and blue lines represent rides by visiting users (tourists). As each route moves from January to September (denoted by increasing saturation), the displacement of the route from the straight-line path between its two stations denotes the number of rides along that route in that month. By pressing "1" and "2", a user can change the route displacements to instead represent the average duration of rides on that route.

There are several interactive components. Mousing over a station will display the station’s name, and highlight the routes to and from that station. Pressing "r" causes all stations to disappear when not moused over. As mentioned above, a user can switch between viewing the station connectivity or centrality, and between viewing the number of bike rides and the average duration of rides. Pressing "v", "l", or "b" will turn on the visitor or local rides, or both. Pressing "i" toggles the text interface, and pressing “m” toggles the station name display. Finally, different views are available by pressing "." and "t".

Screenshots:

Initial view. Shape of the city is visible with station height corresponding to the degree of connectivity.
initial_view.png
Rotated view, with Ballston Metro station and its connections highlighted with the mouseover interaction.
highlighted_route.png
Top view. The split running along the left side corresponds to the Potomac River. Lincoln Memorial, an often-used tourist station, is highlighted
top_view_LM.png
Another top view with a different station highlighted, this one with
top_view_dupont.png
Side view with station height = betweenness centrality. Many of the stations with highest betweenness centrality lie near the river crossing, which can be seen as the split near the right hand side.
Lynn & 19th St station in Virginia is highlighted, a popular point for crossing the river into DC.
side_centrality_lynn_rideno.png
Same side view as above, now with routes showing ride duration. Clearly the blue (tourist) rides take much longer to complete on average.
side_centrality_lynn_duration.png
Close-up of route mainly used by commuters: 8th and F St NE (highlighted) to Union station (in top left corner). Many locals ride this route, especially after the winter months are over.
8&F_commuter.png
The same route, this time with all other routes invisible for easier viewing.
highlight1.png
Another example of a highlighted route with the other routes invisible.
highlight2.png
Close-up of Union Station, one of the most popular bike stations, with route lines showing number of rides. Most of these routes see many more locals than tourists riding.
union_rideno.png
Close-up of Union Station, now in “ride duration” mode. As with most stations, the tourist rides from this station are much longer than the local rides in duration.
union_duration.png
Code: Here I have attached my code for rendering the sketch in Processing, as well as the data. I have also included the Python scripts written to perform the data preprocessing and 2D clustering, as well as a table of the graph metrics calculated in Python.
bikestream_vF.zip
(2.91 MiB) Downloaded 25 times
(Edit: Fixed typo in display text)

griessbaum
Posts: 4
Joined: Wed Jan 11, 2017 10:40 am

Re: 4 Student Defined Final Project

Post by griessbaum » Thu Mar 23, 2017 12:41 pm

Visualizing power production in Germany.

Large parts of German electricity generation is traded on the European Power Exchange Day-Ahead Spot market (EPEX-Spot). Electricity producers, retailers and traders place bids here every day to sell or buy electricity for each hour of the upcoming day. Once a day, the marked is cleared, the clearing price for each hour is determined, and all the according transactions are executed.
Since historically large portions of the power producing assets are owned by a small number of companies, insider knowledge is a big concern. The EPEX (and an according law) therefore are enforcing a number of transparency measures. One of them being that every power plant above a nominal power of 100 MW is obliged to report their power production with an hourly resolution. This data is publicly available on the website http://www.eex-transparency.com.

The data-set contains more than 200 reporting power plants over a time span of more than two years. Visualizing these time-series in a two-dimensional fashion is messy and information is not easily conceivable. I therefore attempted to line up the each time-series in a three dimensional space next to each other. Instead of spreading the data out in a linear space though, I bent time around the x-axis of the visualization. This has two effects: a) The visualization appears to resemble the rotation of a steam turbine in a thermal power plant, which, given the used data appears to be an appropriate metaphor. b) Due to the fact that data further in the future is hidden behind the horizon, it does not have to be visualized and hence saves computational power.

From a theoretical point of view, each power plant has a specific (short-term marginal (i.e. not considering deprecation cost etc.)) power production price, which is defined by the fuel price and the efficiency of the plant. If the plants are ordered from lowest to highest power production price, a so called merit-order-list is created. Again theoretically, given a certain power demand, the merit order list can be used to determine the most expensive power plant that has to be activated to provide a certain power demand. The set of activated power plants therefore theoretically corresponds to the current power demand. There are occasions where we expect power plants not to be activated according to the merit order list because e.g. the power demand is changing too quickly for larger, cheaper plants to react so that smaller, more expensive power plants have to be activated first.

Given the theoretical behavior, I expected to see waves in my three-dimensional visualization. The waves would hereby be induced by the changes in the residual power demand (i.e. the power demand minus the power from renewable power generation). I expected that we would clearly see how the more flexible gas power plants would be ramping up quickly with increasing residual load and then would be relieved by cheaper coal power plants.

Since the behavior of the power plant operation is theoretically driven by the residual load, I included power production from wind and solar into the visualization. They are visualized as an opposingly rotating object on top of the screen. I expected that we would see how the renewable power production would "push down" the conventional power generation.

However, the waves, at best, are very hard to observe in the chosen visualization. The dispatch appears to be far more random than expected.
This ultimately leads to the conclusion that plant dispatch is far more complex (in its original meaning) than anticipated with the before described mental model. A few of the invalid simplifications may include:
a) the power production cost of a power plant does not stay constant over its range of power output. I.e. the cost of marginal power output at 50 % load may be very different from the marginal power output at 100 % load. Consequently power plants are not activated strictly one after another.
b) There is a geospatial component to power production. Network congestion may force operators to activate more expensive power plants before cheaper power plants.
c) Power plant dispatch is heterogeneously optimized by the individual operators and may obey to long term strategies and tactics resulting in (for the uninformed observer) surprising dispatch.
Attachments
final_4.jpg
final_3.jpg
final_2.jpg
final_1.jpg
HW5.zip
(56.48 KiB) Downloaded 25 times

Post Reply