4 Student Defined Final Project

Posts: 15
Joined: Wed Jan 11, 2017 10:42 am

Reddit Image Data Visualization

Post by merttoka » Thu Mar 23, 2017 4:48 pm

Reddit, judged by its post activity, is one of the biggest social media used today. In some content, it is commonly referred as *the internet* thanks to its wide range of subreddits and managemental structure. Subreddits are managed by internet users, rather than company employees, and the almost anything has a subreddit that brings like-minded people together. Images on Reddit have a special meaning attached to them since the burst of internet memes from websites such as 4chang and 9gag, which resulted in a post-modern internet culture idea organized around memes. In that sense, following visualization could be considered as a mere attempt at laying down a map of such a cultural phenomena.

I used the Reddit image data found on Stanford SNAP). The dataset contains a subset of images uploaded to Reddit from July 2008 to Jan 2013. Regardless of the owner and the subreddit, the same image is labeled with a unique image_id. Provided reddit_id, it is fairly easy to navigate to the image URL -- https://www.reddit.com/ + reddit_id.

The visualization is built with `Processing 3.3` and it utilizes `PeasyCam` and `ControlP5` libraries in order to function.

I decided to use the polar coordinate system and assigned `subreddit`s to `theta`, `user`s in the subreddit to the `radius`. The placement of `user` values is performed using `log scale` and starts increasing from outer circle towards the center. Then, the images belong to the `user`s in the specific `subreddit` is displayed using vertical axis and increases proportionally to the number of `image`s of the `user` in current `subreddit`.

Right clicking on one of the `subreddit`s selects the subreddit and displays selected `image` information on the drop-down list. At the same time, the user has an option to search specific words or regular-expression phrases in the textbox, which will visually result in a highlight of matched items with yellow. Clicking on any item in the dropdown list will follow `image URL` and will display comment page of `image`.

Pressing ` key (located next to 1 in most keyboards) resets selection and returns everything unfiltered view.

After finalizing the visualization, I have realized that a couple of `subreddit`s are dominating in terms of the popularity of `image`s in this dataset. The dominating `subreddit`s are mostly `internet meme` hosts, such as `funny`, `pics`, `WTF` and `gifs`, which might result in an expected amount of `image`s inside. However, another possibility of this domination could also easily be the sampling methodologies of the source of the dataset.

Another point can be deduced after examining the visualization is that the popularity of a `subreddit` is enhanced by some users, who have a comparatively greater amount of uploads to that `subreddit`. In other words, if there is a highly active `user` in a `subreddit`, there is a greater chance for that `subreddit` going viral and gaining, even more, visits.

Post Reply