Geography · Economics · Visualization

One Man’s Public Comment: “More Data, Less Infrastructure”

A thought experiment:

A well-meaning colleague, knowing you’re the “map person”, approaches with a seemingly straightforward request.

“Where can I get the most up-to-date file of the official US state boundaries?”

There are dozens of places to get such a file–but the most up-to-date?  Wanting to save professional face, you surreptitiously google “us states shapefile” and see the following top three entries:

  1. a 163 kb file from ESRI, undetermined date
  2. a 23 mb file from NOAA, 1:2M scale, “Valid date–2012″, Last modified date–2007
  3. two choices from the National Atlas: 1:2M 6.8mb file dated June, 2005; 1:1M 10.6mb file dated July, 2012.

And that, in a nutshell, is a problem that a National Spatial Data Infrastructure (NSDI) should solve, right?

*   *   *   *   *   *   *

The FGDC recently released a NSDI Strategic Plan draft document for public comment. And I was informed I was supposed to have an opinion. So your dutiful blogger, reluctantly disturbed his summer reverie by downloading the PDF and had a look.

What I found was a combination of geo-is-important boilerplate and marketing copy for a technology called the Geospatial Platform. When I see 10 bullet points for “The Desired Future State of the NSDI”, I wonder if we’ll end up any closer to solving our “most up-to-date US state boundaries” conundrum. Given the documented shortcomings of past FGDC performance  I imagine my skepticism is broadly shared.


Infrastructure? We Have Infrastructure Everywhere

I’ve covered in the past how we’re in an era of unprecedented private sector investment in spatial data infrastructure (see–Google, Apple, Microsoft). So when I read

“The National Spatial Data Infrastructure extends far beyond data”

my reaction is let’s not extend anywhere until we get the data piece right.

This suspicion of scope creep is deepened when we read that

“the Geospatial Platform initiative is a critical component for the continued development of the NSDI.”

What exactly is this Geospatial Platform?

“The Platform is a Web-based first generation service environment that provides access to a suite of well managed, highly available, and trusted geospatial data, services, applications, and tools for use by Federal agencies and their State, local, Tribal, and regional partners.”

That there is some enterprise-ready commercial vendor marketing copy.

The larger point is the FGDC should only have a single-minded focus on data, and leave the tools and applications to others. I’m happy to pay my Water Utility bill every month, but I don’t need them selling me my bathroom sink as well.

What’s Missing

Do a word search of the proposal and you know what terms don’t come up?

“search engine”


The failure of the geospatial community to search-engine optimize its content is why when you google your address you’re much more likely to find real estate content than, say, the property boundary from your county GIS department.  To Geospatial Professionals™ “authoritative” means accuracy and precision, to normal people what’s authoritative is what’s atop the first page of Google results. An SDI that doesn’t confront that fact is one whose Portal/Platform-centric approach is destined for irrelevance.

The most important step forward in spatial data infrastructure this year has been Github adding visualization support for data in the GeoJSON format .  Though originally a platform for managing programming code, Github has evolved into a platform to manage the sharing of data as well.

A Modest Proposal

What does a core set of curated geospatial data files maintained on Github look like?
A lot like Nathaniel Kelso’s Natural Earth project (main site , Github ).  Kelso states his motivation for the project–

“In a time when the web is awash in geospatial data, cartographers are forced to waste time sifting through confusing tangles of poorly attributed data to make clean, legible maps.”

So here’s my proposal: why don’t we take 5% of the taxpayers’ money we’re poised to hand over to a commercial vendor to create the Geospatial Platform and use it post the official US government up-to-date versions of the data found on Natural Earth on Github too.  That way users of data can “fork” and “watch” the repository, etc. ensuring everyone–the government, its citizens, and the private sector busy creating tools/interfaces/services–is working off the latest-and-greatest most authoritative version.  In other words, can the FGDC replicate for a small sliver of the NSDI budget the pro bono efforts of Kelso and his small group of volunteers?

Infrastructure is Easy, Culture is Hard

In one respect throwing money at a vendor to create a white-label version of their cloud-based platform and calling it the national SDI is the easiest option. Because the hard part isn’t standing up a portal, it’s ensuring that the agencies that “own” the various datasets in a common authoritative repository make a credible long-term commitment to keeping key data up-to-date in a complete, transparent manner.

Fresh, accurate, an open government data has never been in higher demand.  Let’s make that the singular focus and de-emphasize the “infrastructure” bit which seems a little too much like a mere continuation of two decades of benefits falling mostly to vendors rather than end-users.


—Brian Timoney


Tunnel photo courtesy of   Sprengstoff72‘s Flickr stream
Shadow photo courtesy of   Kevin Dooley’s Flickr stream


There Are Few 2nd Impressions Online

Mom was right: we live in a judgmental world where first impressions linger.


On the web, it’s worse: the majority of us are impulsive information grazers on a restless daily quest to gather the factoids, bite-sized insights, and small amusements that feed a post-modern soul.

So when building an online web map to capture these 40 seconds of distracted attention, the challenge is to manage the initial perception of your content with the assumption that no one reads the small print before diving in. And as we’ve mentioned many times before, the user response to confusion is not “maybe I should read the Help”; they simply leave.

What I discuss below aren’t “bad” maps per se (a couple of them are very well executed), but rather come with perception-management challenges that, while addressed through text, nonetheless open a gulf between what they are trying to communicate and what the distracted, hurried user (admittedly, me) is absorbing in the first few seconds of inspection.

One Dot/One Vote In Los Angeles Mayoral Race


I loved this map that was published by the LA Times the day after the mayoral election. But then I winced when I saw “One dot = one vote”.  Because having a background in Elections I was darn glad I wasn’t manning the phones the next day…

Irate Citizen: “Why does the LA Times have a map of where every voter lives and who they voted for? That’s an unlawful violation of privacy!”

Before explaining the very defensible methodology of placing a dot randomly within the voter’s precinct boundary, Irate Citizen has slammed the phone down and is now composing invective for his City Council-person.

The larger point is that people impute a very high degree of precision/accuracy to anything that is placed on an official-looking map. So if one dot equals one voter, then that voter must be exactly where he/she is placed on the map.  Never mind it’s written twice that dots were randomly placed. I would’ve gone for One Dot = 5 Voters just to break that instant mental link between a single voter, their location, and their secret ballot that is the cornerstone of democracy.


Oklahoma Tornado Damage Estimates

Over the last few years I have been impressed by the work of SpatialKey and would rank Doug McCune (@dougmccune )  in my Top-5-Funniest-Presenters-I’ve-Seen-At-A-Tech-Conference. But disaster mapping is tricky because things are moving fast and emotions are raw.


It clearly announces itself as a Population Density map with tornado tracks overlaid.  In the immediate aftermath such information is handy in trying to wrap one’s head around a basic sense of scale.  But as soon as the skies clear we have expectations of actual damage assessments.  Further, the use of a red palette is not emotionally neutral, especially in the context of a disaster.  So with your varying shades of red you’re asking too much of the viewer NOT to infer that the most intense reds indicate the greatest damage (rather than the highest concentration of residents).  Add the mercurial nature of tornado damage–one neighbor’s house is leveled, another’s is untouched–and one wonders whether the choropleth approach should have been left on the sideline until actual damage could be tallied and mapped.

Yes, That Is A Lot of Data


This Guardian map was depicts 11,000 CIA rendition flights.  Which is a lot. But with an initial view that displays all of those routes, the first impression is one of visual confusion.  A handy rule of thumb is to give your viewer an insightful look of the data on the initial load so even if they don’t interact with the map but merely gaze at it for 5-7 seconds they’ll have a useful takeaway.  I’m not sure this map meets that threshold.

When Photoshop Attacks

A symptom of advanced cartophilia is looking for meaning in a map where there is none. This gem was passed around among a few of us on Twitter–

You’re immediately tempted to grab your cube-mate: “bro, check out this pattern of crazy intense poverty in southern Hunterdon County.”  Then you remember that Hunterdon County, NJ is one of the top ten wealthiest counties in the US.

You’ve just been punk’d by a headphone-wearing, Photoshop-wielding Millennial who is so much better than his crap graphic design job.  And is probably too young to recall this Onion classic.

*   *   *   *   *   *   *   *

There is an unfortunate asymmetry between the time it takes to craft an effective online map and the time viewers actually spend with it.  Often we’re presenting large amounts of not-easily-digestible data to someone not completely paying attention, may not be wholly fluent in the emerging grammars of visual information, and with a hair-trigger mouse finger eternally poised to click through to the next thing. Bridging the gap between the reality of your map and the perception of your map can often seem thankless.

Welcome to the web.


—Brian Timoney


Fashion photo courtesy of   Pricey’s Flickr stream

PDF Sharing is Not Data Sharing–A Public Service Announcement


“More tears are shed over answered prayers than unanswered ones.”

- Truman Capote


Like many readers of this blog, I was heartened by last week’s Executive Order from the President of the United States declaring “Making Open and Machine Readable the New Default for Government Information.”  Finally, tangible progress on the rocky road leading to Transparency, Accountability, and Economic Benefit.

But as I finished said Executive Order, my heart sank. For it didn’t include an explicit disqualification of the PDF format as meeting the “machine readable” threshold.

No criminal classifications either.

Nor mandatory jail sentences.

In a word: Weak.

‘The Heart of Nerd Darkness’


If you’ve never suffered personally the travails of extracting usable data from a PDF, I strongly recommend Jeremy Merrill’s wrenching first-person account of compiling usable, ready-for-analysis data from some 2 million records trapped inside PDFs.  Similarly painful experiences led to Caitlin Rivers’ post ” ‘Send Me Your Data–PDF is Fine’, Said No One Ever…” which includes helpful guidelines for would-be data publishers.

How bad is the problem? People have admitted online to reading PDFs data aloud and having a colleague key the data into a spreadsheet.

This is not the world Gutenberg intended.

Sins of Omission, Sins of Commission

With PDFs fulfilling the deep human need to impose one’s print layout on others, we’re tempted to forgive their progenitors for they know not what they do.

And then there’s Orange County, California.

In a long-running case against the Sierra Club over the county’s right to charge $475,000 for its parcel database, the county defended the access granted to citizens by observing that information about each of the county’s 640,000 parcels was available as a freely downloadable PDF.


Who among the GIS crowd has not re-digitized features from a PDF while internally raging with the knowledge that the source data already exists in digital form?

Exactly.  What makes the PDF so infuriating is what makes it so beloved among the passive-aggressive set dedicated to being only semi-helpful.


The Scanned Image of Data Inside a PDF

The horror.

The horror.



—Brian Timoney

UPDATE: Steve Romalewski blogged his experience with NYC open data as PDF in 2011.

* “PDF Sharing is Not Data Sharing” was the title of a talk given by Victoria Smith-Campbell at the 2013 DRCOG Regional Data Summit wherein she recounted her vast experience re-tracing wildfire boundaries from PDFs so as to enable re-distribution as KML.


Shadow photo courtesy of   antonychammond’s Flickr stream