Getting Started with D3: A review of Mike Dewar’s guide
This new addition from O’Reilly, the first one to be written on D3, walks users through the process of writing visualizations on open data sources — in particular the MTA’s awesome open data feed. 1
The book starts by walking the uninitiated through some basic visualizations. In so doing, one can learn what D3 actually is.
What D3 is and isn’t
On the one hand I would be lying if I said that there wasn’t something frustrating about the experience of drafting a basic visualization in D3. It takes quite a bit of work to get a graph that would be a snap in Excel (…if the data were readily in readable form…) Part of the reason that the experience is frustrating is because there is so much to learn. Part of the reason is that D3 exposes you directly to the data — so the process of assembling any visualization is much more complicated. But there is a light at the end of the tunnel.
Most of the book is on wielding the D3.js library, which handles data and cleanly slots the coder right between the design of the webpage and the data. D3 allows for beautiful style, as its many proponents convincingly show. But, ultimately, D3 is not about style, but rendering. As Dewar sums it up nicely:
The
D3library focuses on the layout, using scales to let us accurately place data…leaving the designer to worry about matters of style.
Dewar uses Python to clean a data set, D3 to manipulate a dataset into a visualization, and other tools (CSS) to get the colors and fonts to look good.
If you are looking for a simple library to output the standard visualizations in a very concise format — D3 is not for you. Rather, D3 allows users to have more detailed controls of the calculations and manipulations that display the data. This allows for considerable freedom in the kinds of visualizations that scientists can come up with — and also makes it simple to add transitions and a UI.
The first example of the book (a color-coded list of the service-status of train lines) demonstrates the way in which the visualization itself is built directly off of the data set. The code literally first lists the names of the data points, then color-codes accordingly.
This is a very novel approach for anyone used to building graphs with pre-fab tools — in something like Microsoft Excel or even R (ggplot somewhat excluded).
In D3, one does not just plot(). This means that a basic example (a bar plot) is comparatively difficult to render. One has to build the bar plot from the data on up (it starts with a list). Once this is accomplished, though, it’s only a hop skip and a jump to interactive visualizations that would be cumbersome or impossible in other frameworks.
If we are witnessing a burgeoning in the development and deployment of novel visualizations, then frameworks like D3 seem like they will be instrumental in this continued growth — if they haven’t been already.
The comparison to artisanal vs. pre-fab construction seems apt. A hundred years ago, plumbers would routinely solder steel piping to fit customized designs. Today, those skills pretty much don’t exist (in the developed world) — short of the rare custom-construction. Rather, piping arrives in a delimited set of designs, which fit together according to a fixed set of rules.
But lest D3 come off as backward — what it really does is allow for the flexibility of artisanal construction while minimizing (as much as possible) the necessary training. But training is required — hence the book.
What the book is
One neat aspect of the book is the way that the work of visualization is tied to interpretation. The book unfolds the visualization process (which is itself quite detailed, involving the minutiae of every pixel) in parallel to data interpretation or analysis.
The development process of the book is then one of ongoing refinement in which, after each attempt, one asks questions of the data (e.g. ‘are we seeing a diurnal pattern?’) and tries to make the patterns as explicit as possible. It reminded me of Tufte, as each refinement clarifies the mental and physical picture.
As a nice bonus, the reader is exposed to Dewar’s own creative process. Besides dealing with munging problems (getting the data in a usable form), the data scientist often confronts the problem of where to find/supplement/complete the data. This solution, also, is iterative. As Dewar points out, it can including asking an internet user group to make certain data available.
For a concise introduction that covers the basics and leads right into some really neat, slick visualizations — I’d recommend the book.
-
Add to to-do stack: sleek visualizations of the same for Chicago. ↩