No Unitaskers in Your Data Exploration Kitchen


What does Alton Brown, a Food Network personality, have to do with data exploration? More than you’d expect – his wise maxim that unitaskers don’t belong in your kitchen transcends the culinary world.

Alton brown holds up a strawberry slicer as he reviews Amazon’s dumbest kitchen gadgets.

The strawberry slicer is the quintessential unitasker. The only thing is does well is produce a fancy looking garnish, but it’s hardly the only tool for the job. In the kitchen, every tool makes every other tool a little less accessible, thus only the most versatile tools belong in a well-run kitchen. There, the chef is often seen with the chef’s knife, never the strawberry slicer.

What is a unitasker in the data exploration kitchen? It is a chart which may be attractive, but is really only capable of conveying one message. Like the strawberry slicer, it can make a good garnish – for example, when it’s time to present results in a newspaper article or in a powerpoint presentation. However, it doesn’t belong during the main preparation, where data exploration is done. That is done by the chef’s knife.

A unitasker – more pizzazz than substance. Looks pretty, communicates the overall shape of the data, but the difference between the years is nearly impossible to see. The layers are there more for form than function.

Data visualization is a vital part of the data analysis workflow. Early on in the process, the goal is for the analyst to get a broad sense of the dataset and to discover potentials paths to insight. The search space is huge at this point, as there are a million ways to slice and dice the data using a plethora of techniques. To make this problem tractable, we must take advantage of our keen sense of visual perception and domain expertise to reduce the search space. Charts offer a quick and effective way to spot the most relevant prospects.

At this stage, you want to look for possible relationships between many data dimensions. To observe many data dimensions simultaneously, we need to achieve a high information density using limited screen real estate. The screen here is our metaphorical kitchen, it’s where we can’t afford to be cluttered with unitaskers if we want to walk away with something useful. Each chart must be able reveal multiple aspects of the data to justify the space it takes up.

Our goal with LL Notebook is to provide users with the chef’s knife of data exploration. To the casual by-stander, LL Notebook’s charts look admittedly plain, but like the chef’s knife, they are multi-purpose, form follows function, and optimized for human cognition. With a little bit of practice, they are the general-purpose tools that you can rely on every single time.

Let’s see how LL Notebook differs in approach to the unitasker chart. The underlying dataset is from a smart meter capturing hourly electricity usage (exported via pge.com) from a single family household over 6 years. Instead of trying to gratuitously munge data dimensions together, each dimension can stand in its own right. Relationships between dimensions are revealed by applying filters and dragging the filters around. Think of it as a pivot table on steroids.

We can see the same overall shape of usage, and if we brush over the year dimension, the qualitative change over time becomes clear. The modes around 1.7kwh and 3.0kwh disappeared in the later years. Clearly, something was done in this household in terms of energy efficiency.

Since we’re dealing with multi-purpose charts, we can proceed to do so much more.

We can see the relationship between year and usage from another perspective. What immediately jumps out is the strong relationship between hour of day and usage. This makes intuitive sense as usage should follow the circadian rhythm of the household inhabitants. The linear relationship between usage and cost is clear as well, though it’s interesting that the two modes diverge in cost as usage increases. This reveals the tiered pricing of electricity.
We can filter year in conjunction to see if these effects are persistent over time. The modes diverge at a greater magnitude in the early years than for the later years, where the two cost modes barely emerged. Maybe the pricing tiers converged over time, or maybe the higher tiers were just not reached at all in the later years? Ask more questions!

In this brief example, you can already see how easy and intuitive it is to interact with and to perceive relationships in the data using only simple charts. In this instance, the household was able to verify that their energy efficiency investments were working, and to further tweak their energy usage. The instantaneous feedback in LL Notebook nudges the analyst to ask more questions. If you’re on your computer, visit the LL Notebook demo to explore this dataset using the chef’s knife of data exploration!


Published by

David Lin

Founder at LiquidLandscape

Leave a Reply

Your email address will not be published. Required fields are marked *

To create code blocks or other preformatted text, indent by four spaces:

    This will be displayed in a monospaced font. The first four 
    spaces will be stripped off, but all other whitespace
    will be preserved.
    
    Markdown is turned off in code blocks:
     [This is not a link](http://example.com)

To create not a block, but an inline code span, use backticks:

Here is some inline `code`.

For more help see http://daringfireball.net/projects/markdown/syntax