Subscribers to my Flickr stream have probably noticed a number of images of some kind of map flowing past lately. They were the result of me tracking my progress on a pet project. I have more or less finished work on it this week, so I thought I’d detail what I did over here.
Background
Following my Twitter dataviz sketches, I thought I’d take another stab at prototyping with Processing. On the one hand I wanted to increase my familiarity with the environment. On the other, I continued to be fascinated with data-visualization, so I wanted to do another design exercise in this domain. I was particularly interested in creating displays that assist in decision making and present data in a way that allows people to ‘play’ with it — explore it and learn from it.
The seed for this thing was planted when I saw Stamen’s work on the mySociety travel-time maps. I thought the idea of visually overlaying two datasets and allowing the intersection to be manipulated by people was simple but powerful. But, at that time, I saw no way to ‘easily’ try my hand at something similar. I had no ready access to any potentially interesting data, and my scraping skills are limited at best.
Luckily, I was not the only one whose curiosity was piqued. After seeing Ben Cerveny demoing the same maps at The Web and Beyond 2008, Alper wondered how hard it would be to create something similar for the Netherlands. He presented a way to do it with freely available tools and data (to an extent) in a workshop at a local unconference.
I did not attend the event, but after seeing his blog post, I sent him an email and asked if he was willing to part with the data he had collected from the Dutch public transport travel planning site 9292. Alper being the nice guy he is, he soon emailed me a JSON containing of the data.
So that’s the background. I had an example, I had some data, and I had a little experience with making things in Processing.
JSON
The first step was to read the data in the JSON file from Processing. I followed the instructions on how to get the JSON library into Processing from Ben Fry’s book (pages 315–316). On the Processing boards, a cursory search unearthed some code examples. After a little fiddling, I got it to work and could print the data to Processing’s console.
Plotting
Next up was to start visualizing it. I used the examples of scatterplot maps in Visualizing Data as a starting point, and plugged in the JSON data. Pretty soon, I had a nice plot of the postal codes that actually resembled the Netherlands.
Coloring
From there, it was rather easy to show each postal code’s travel time. I simply mapped travel times to a hue in the HSB spectrum. The result nicely shows colored bands of travel-time regions and also allows you to pick out some interesting outliers (such as Groningen in the north).
Selecting
At this point, I wanted to be able to select travel-time ranges and hide postal codes outside of that range. Initially, I used the keyboard for input. This was OK for this stage of the project, but of course it would need to be replaced with something more intuitive later on. In any case, I could highlight selected points and dim others, which increased the display’s explorability considerably.
Coloring, again
The HSB spectrum is quick and easy way of getting access to a full a range of colors. It served me well in my Twitter visualizations. However in this case it left something to be desired, aesthetically speaking. Via Tom Carden I found the wonderful cpt-city, which catalogues gradients for cartography and the like. Initially I struggled with ways to get these colors into Processing, but then it turned out you could easily read out the colors of pixels from images. This allowed me to cycle through many palettes just by adding the files to my Processing sketch. I discovered that a palette with a clear division in the middle was best, because that provides you with an extra reference point besides the beginning and end.
Selecting, again
I next turned to the interaction bits. I knew I wanted a so-called dual slider that would allow people to select the upper and lower limit of travel time. In the Processing book, there is code for plenty of interface widgets, but sadly no dual slider. I looked around on the Processing board and could find none either, to my surprise. Even in the UI libraries (such as controlP5 and Interfascia) I could not locate one.
So I decided to lower the bar and first include two horizontal sliders, one for the upper and one for the lower limit. These I made using the code on pages 448–452 of the Processing book. Not perfect, but an improvement over the keyboard controls.
Selecting, yet again
Next, I decided I’d see if I could modify the code of the horizontal scrollbar so that I would end up with a dual slider. After some messing about (which did increase my understanding of the original code considerably) I managed to get it to work. This was an unexpected success. I now had a decent dual slider.
Exploring
So far there was no way of telling which point corresponded to which postal code. So, I added a rollover that displayed the postal code’s name and travel time. At this point it became clear the data wasn’t perfect — some postal codes were erroneously geocoded by GeoNames. For instance, code 9843 (which is Grijpskerk, 199 minutes to the Dam) was placed on the map as Amsterdam Noord-Oost!
Adding more data
Around this point I visited Alper in Delft and we discussed adding a second dataset. Although housing prices à la mySociety would have been interesting, we decided to take a different route and add a second travel-time set for cars. My first step in integrating this was to simply generate a map each for the public transport and car travel data and manually juxtapose them. What I liked about this was that even though you know intuitively that traveling by car is faster, the two maps next to each other provide a dramatic visual confirmation of this piece of knowledge.
Representing
Moving ahead with the extra data, I started to struggle with how to represent both travel times. My first effort was to draw two sets of dots on top of each other (one for car travel times and one for public transport) and color each accordingly. For each set I introduced a separate slider. I wasn’t very satisfied with the result of this. It did not help in understanding what was going on that much.
Showing differences
After discussions with Alper and several other people, I decided it would make more sense to show the difference between travel times. So I calculated the percentage difference between public transport and car travel time for each postal code. This value I mapped to a color. Here, a simple gradient worked better than the palettes used earlier for travel times.
I also discarded the idea of having two dual sliders and simply went with one travel time selector. Although more user-friendly, it created a new problem: for some points both travel times would fall within the selected range, and for others one or the other. So I needed an extra visual dimension to show this. This turned out to be the greatest challenge.
After trying many approaches, I eventually settled on using the shape of the point to show which travel times fell within the range. A small dot meant that only the public transport travel time is within the range, a donut means only the car travel time is selected, and a big dot represents selection of both times.
Final tweaks
Around this point I felt that it was time to wrap up. I had learnt about all I could from the exercise and any extra time spent on the project would result in marginal improvements at best. I added a legend for both the shapes and color, improved the legibility of the rollover and increased the visual affordance of the slider, and that was it.
Thoughts
It is becoming apparent to me that the act of building displays like this is playful in its own way. Through sketching in code, you can have something like a conversation with the data and get a sense of what’s there. Perhaps the end result is merely a byproduct of this process?
I’m amazed at how far a novice programmer like myself, with a dramatic lack of affinity for anything related to mathematics or physics, can get by simply modifying, augmenting and combining code that is already out there. I have no ambition whatsoever of becoming a professional developer of production-quality code. But building a collection bits and pieces of code that can do useful and interesting things seems like a good strategy for any designer. I am learning to trust my innate reluctance to code stuff from scratch.
Also, isn’t it cool that it is becoming increasingly feasible for regular citizens to start analyzing data that is — or at least should be — publicly available? Government still has a long way to go. Why do we need to go through the painstaking process of scraping this data from sources such as 9292 which for all intents and purposes is a public service?
I will probably make the final prototype available online at some point in the future. For now, if you have any questions or comments I would love to hear them here, or via email.
Update: Alper has released a JSON file containing all the data I used to make this. Go on and grab it, and make some displays of your own!
And another update: I’ve decided to make this application available for download, including source files.