MapBrief™

Geography · Economics · Visualization

PDF Sharing is Not Data Sharing–A Public Service Announcement

 

“More tears are shed over answered prayers than unanswered ones.”

- Truman Capote

 

Like many readers of this blog, I was heartened by last week’s Executive Order from the President of the United States declaring “Making Open and Machine Readable the New Default for Government Information.”  Finally, tangible progress on the rocky road leading to Transparency, Accountability, and Economic Benefit.

But as I finished said Executive Order, my heart sank. For it didn’t include an explicit disqualification of the PDF format as meeting the “machine readable” threshold.

No criminal classifications either.

Nor mandatory jail sentences.

In a word: Weak.

‘The Heart of Nerd Darkness’

 

If you’ve never suffered personally the travails of extracting usable data from a PDF, I strongly recommend Jeremy Merrill’s wrenching first-person account of compiling usable, ready-for-analysis data from some 2 million records trapped inside PDFs.  Similarly painful experiences led to Caitlin Rivers’ post ” ‘Send Me Your Data–PDF is Fine’, Said No One Ever…” which includes helpful guidelines for would-be data publishers.

How bad is the problem? People have admitted online to reading PDFs data aloud and having a colleague key the data into a spreadsheet.

This is not the world Gutenberg intended.

Sins of Omission, Sins of Commission

With PDFs fulfilling the deep human need to impose one’s print layout on others, we’re tempted to forgive their progenitors for they know not what they do.

And then there’s Orange County, California.

In a long-running case against the Sierra Club over the county’s right to charge $475,000 for its parcel database, the county defended the access granted to citizens by observing that information about each of the county’s 640,000 parcels was available as a freely downloadable PDF.

Lovely.

Who among the GIS crowd has not re-digitized features from a PDF while internally raging with the knowledge that the source data already exists in digital form?

Exactly.  What makes the PDF so infuriating is what makes it so beloved among the passive-aggressive set dedicated to being only semi-helpful.

 

The Scanned Image of Data Inside a PDF

The horror.

The horror.

 

 

—Brian Timoney

 
 
UPDATE: Steve Romalewski blogged his experience with NYC open data as PDF in 2011.

* “PDF Sharing is Not Data Sharing” was the title of a talk given by Victoria Smith-Campbell at the 2013 DRCOG Regional Data Summit wherein she recounted her vast experience re-tracing wildfire boundaries from PDFs so as to enable re-distribution as KML.

 

Shadow photo courtesy of   antonychammond’s Flickr stream


Geospatial Contractors Cynically Attempt to Take Over US Federal Mapping

The party is over.

During the 1990s anything related to IT was expensive and fat profit margins were easily procured. Post-9/11 was very good for geospatial contracting with both the escalation of defense spending to support three wars as well as the mushrooming requirements of the Department of Homeland Security. But now sequestration–and its impacts on the DoD in particular–are the unmistakable sign that a golden era of contracting has drawn to a close.

But over the last decade another geospatial industry sprung up–the one we’re all familiar with: Internet-based, massive high-performance platforms taking full advantage of the plunging costs of computing to elevate mapping to its current status as a core component of the everyday web experience.

After a couple of decades of easy living, what would you do when confronted with the prospect of competing against lower-margin, faster-paced innovation?

You wouldn’t settle for half-measures, that’s for sure.  No, you too would get your lobbying group MAPPS busy helping draft something like H.R. 1604 “Map It Once, Use It Many Times”: a private sector takeover of Federal mapping activities in the United States.

 

US Federal Geospatial Has Always Been A Mess

No one is disputing that the problem isn’t large.  The government’s watchdog arm–the GAO–has consistently identified the same problem, over and over:

For decades, the federal government has tried to reduce duplicative geospatial data collection by coordinating GIS activities within and outside the federal government.

- June 2, 2003

But measures such as better implementation of FGDC standards and NSDI initiatives such as Geospatial One-Stop were going to fix all of that.  How did that work out?

We found that federal agencies had not effectively implemented policies and procedures that would help them to identify and coordinate geospatial data acquisitions across the government. As a result, the agencies make duplicative investments and risk missing opportunities to jointly acquire data.

- April, 2013

So MAPPS comes along with this “Map It Once, Use It Many Times.”  We should all be on our feet cheering, right? Because the narrative is so easy to buy into:

Lumbering, ineffective government bureaucracies wasting tax-payer money for decades in desperate need of the efficiency-creating skills of the private sector.

It’s a great story.

Except for one glaring, very inconvenient fact.

MAPPS Members Made Hundreds of Millions of Dollars From All of That Inefficiency, Redundancy, and Lack of Coordination

Somehow the fairy tale MAPPS is trying to sell omits the small detail that much of what their legislation claims to fix were essential features of a business model that was very, very lucrative for its members. Because where did the billions go, actually? Are there hordes of retired Park Service cartographers and USGS geodesists kicking back on their yachts in La Jolla that I don’t know about?

“Map It Once, Use It Many Times”?  After decades of the contractor game of “Capture it Once, And Sell It To As Many Different Agencies as Possible”?

So MAPPS members fancy themselves the solution to the problems that just so happened to have made them a lot of money over the years?

Got it.

Let’s dig into the details of the legislation to more fully appreciate the nobility of our White Knight Geospatial Saviors.

All Your Mapping Belongs to Us

Acronyms.  We need more acronyms.

The first order of business is to establish that National Geospatial Technology Administration (NGTA) within the US Geological Survey.  As part of the NGTA we are also establishing a National Geospatial Policy Commission (Sec 201).  The National Geospatial Policy Commission will be tasked with creating a National Geospatial Data Plan.  Of course, a National Geospatial Database will also be created (Sec 103) that will house the predictable stuff: cadastral, orthoimagery, elevation and bathymetry, etc., etc.  Additionally, other data “useful in carrying out national priorities”–healthcare, broadband, home mortgages, emergency response, and a bunch more–will be included.

What agencies’ geospatial functions will be completely taken over by the NGTA?

How about all of the geospatial functions that reside in the Department of Interior (BIA, BLM, NPS, USFWS, USGS), USDA (including Forest Service), and NOAA.

That’s a lot.

Now how might this National Geospatial Database be funded? Section 103 e(2) suggests “user fees” will be considered.  Since a vast quantity of geospatial data generated by the agencies listed above is provided free to the public, discussion of “user fees” is crazy talk, right?

Not in the MAPPS universe. These folks have openly described the attractiveness of user fees–taxes!–with respect to the use of geospatial data.

So a possible result of all of this efficiency and cost savings is that the public will end up paying for data that now is free…

 
The Gutting of the Federal Mapping Workforce

    SECTION 303 CONVERSION TO CONTRACTOR PERFORMANCE.
    (a) Conversion of Activities Identified by Commission- Each agency head shall convert, to the maximum extent possible, to performance by private geospatial firms, all activities identified by the National Geospatial Policy Commission for conversion under section 202(b)(2) that are performed by or for the agency.

We’ve seen this movie before.

In 1998 President Clinton signed the Federal Activities Inventory Reform (FAIR) Act that called for streamlining government by using the private sector to provide services more cheaply than government employees whenever possible.

Fifteen years in, what does the data say?

The net cost to the tax payer of a contractor is twice as much as a federal employee.

And we get to throw out institutional memory and trample what’s left of the rich mapping traditions of the USGS, NPS, etc. in the bargain.

An Industry in Desperate Need of Government Encouragement

    SEC. 402 STRATEGY FOR ENCOURAGING FEDERAL USE OF PRIVATE GEOSPATIAL FIRMS.
    (a) Development of Strategy- Not later than one year after the date of the enactment of this Act, the Administrator shall cooperate with private geospatial firms, and any associations composed exclusively of such firms, to develop a comprehensive strategy to encourage and enhance the use of private geospatial firms by Federal agencies and other entities that receive Federal funds, including State and local governmental agencies, universities, nonprofit organizations, and foreign governments.

It’s not enough that the private sector take over Federal mapping activities in toto–now anyone who receives government dollars will be “encouraged” to use private geospatial contractors. Because heaven forbid a university that receives Federal funds use its own cartography lab to create the campus map when there is a private firm at the ready.

Unbelievable.

But all in the name of free enterprise, of course.

      Hey, look: Private Sector investment in Spatial Data Infrastructure

 
Yet the Free Market is Massively Investing in Spatial Data Infrastructure

You would never know from reading MAPPS press releases that right now, in 2013, in the United States of America, some notable companies are making sizable investments in spatial data: Google, Microsoft, Apple, and even Amazon.

National Spatial Data Infrastructure?  That would be Google to the average citizen.

Isn’t this the joy of capitalism writ large?  Private companies risking private capital to invest millions in mapping and create a massive consumer surplus that gives joy every time you turn on your smartphone?

What MAPPS intentionally confuses is their pro-business stance with the principles of free markets.  Because as Luis Zingales so memorably puts it, “true capitalism lacks a strong lobby“.

 

If You’re Part of the Problem, Perhaps You Should Have a Lesser Role in the Solution

No one disputes that government IT and its procurement systems are broken.  But the solutions put forth by H.R. 1604 simply aren’t credible given the two decades-long track record of heavy contractor involvement in federal geospatial activities.  Recent innovative projects the FCC and National Park Service offer glimpses of the alternative paths available when agencies don’t reflexively cede control of their projects to outside vendors. Because frankly, the technology is the easy part: it’s the anthropology–the organizational culture–that’s difficult. For far too long the upper management at these agencies has rubber-stamped increasingly cumbersome technology from private contractors seemingly optimized to create and perpetuate prolonged organizational dependency.

*    *    *    *    *    *

Admittedly, whinging at the mendacity of a lobbying group does have a rather pronounced tilting-at-windmills feel to it.  If MAPPS is vigorously representing the interests of their members, so be it, right?  But what this proposed legislation makes abundantly clear is that these members–Sanborn, ESRI, Hexagon, et al–have such a dispiriting, narrow vision of their future: utterly backwards looking; taxpayer-be-damned, Federal-employee-be-damned.

What remains unclear is if this baldly retrograde expression of self-interest will be countered with anything other than servile, silent assent by the industry at large.

 
 
 
—Brian Timoney

 
 
 

Times Square photo courtesy of   Tasayu Tasnaphun’s Flickr stream
Stealing donuts photo courtesy of   whatmattdoes’ Flickr stream
Dirty quarters photo courtesy of   .sanden.’s Flickr stream
Old map photo courtesy of   iwouldificould’s Flickr stream
GeoEye rocket photo courtesy of   misterbisson’s Flickr stream

 


 

The Flawed Economics of Closed Government Data


 

How much should citizens pay a county for a digital copy of property records and aerial photos?  Sciotto County, Ohio says $2000.  Actually it hired Woolpert to figure it out for them, and they said $2000.  Which sounds a bit spendy, especially given that cities like Philadelphia and Denver give away the same type of information. For free.  Such a discrepancy in pricing would indicate someone is doing it wrong.  But Sciotto County went to the trouble of arguing their point to the Ohio Supreme Court where six of the seven judges agreed with them.

Others have pointed out the wrongness of the decision. But what if the majority’s conclusion flowed from premises about the relationships between data, software, and distribution formats that simply don’t apply in 2013?  To what degree should a citizen be economically punished for the technical inefficiency of its government?

What if It’s More Cost-Effective To Give Data Away Than To Charge For It?

The costs of distributing digital information over the web have plunged dramatically over the past few years:  the infrastructure to, say, distribute aerial imagery of a city, would have cost thousands annually–now it’s probably a few dollars per month.  But while distribution costs have plunged, employee costs haven’t: wouldn’t efficient government dictate that that data dissemination be as automated as possible, without the costly frictions of employees processing paperwork and money? To which the gatekeepers of closed data gravely utter the phrase “cost recovery”.


 

Are You Recovering Costs Or Merely Acting Out an Accounting Fiction?

The conversion of paper records to digital data was a large expense:  this was an era when databases and GIS programs required their own specialized hardware (kids, google “Sun Sparc”).  So naturally jurisdictions wanted to get some of that expense back by charging the public for derivative products–printed maps, as well as data files on floppies, CD-ROMs, etc.  And with money changing hands comes paperwork and the legal licensing and disclaimers: in a word, overhead.  The city of Denver stepped back and evaluated their data sales program, and concluded they weren’t making real money because–

  • a large number of data sales were to city contractors, who turned around and billed the city for their outlay
  • the license for the data was so restrictive that 3rd party usage options were severely limited
  • besides processing paperwork, employees spent significant time walking purchasers through the process of FTP download

In short, if someone wants to play the “cost recovery” card as an economic rationale for selling government data let’s open up the books and get a full accounting of employee time spent processing paperwork, hand-holding over the phone, and determining if a contractor is simply adding another cost to the bill leaving the net gain from the sale at less than zero.

License-Free Publishing Without Vendor Baggage

Another important piece of the value puzzle is having effective, low-cost publishing platforms. A project with impressive traction is the open source publishing platform CKAN.  Already used around the world by governments of all levels, the US recently announced that CKAN will be powering the next version of Data.gov, including Geo.Data.gov.  Such a critical mass of usage means that the cycles of iterative innovation will be difficult for any commercial vendor to keep pace with.  The City of Chicago is taking an even simpler tack by posting datasets on Github.

Github as A Potent Accelerant For Tools and Best Practices

Github?

Isn’t that a website where dorks post programming code?

Yes and no. Yes, code is posted, but that isn’t why some consider it a more potent social force than Facebook or LinkedIn. Clay Shirky lays out some far-reaching implications of the Git architecture that go way beyond being simply a code repository.  Sure, large open source projects such as CKAN are freely accessible, but just as important are all manner of “glue” tools to build bridges for information out of proprietary systems.  Here in Colorado we have the OpenColorado data catalog based on CKAN.  But available too is a suite of tools is available on Github to automate the transfer of data out of the ESRI eco-system.  It’s not just about lowering the sticker price of data publishing, but also lowering the level-of-effort so that it’s more cost-effective to push your information to the commons than any one-off custom in-house solution.  With such an efficient platform for best-practices dissemination, small jurisdictions such as Sciotto County, Ohio are no longer dependent on the technical savvy of their in-house personnel (or the mixed motives of Woolpert),


 

Do You Believe In Economic Multiplier Effects?

Of course you do, especially if you own a smart phone.  Because in 2000, when the US turned off the “selective availability” that intentionally degraded GPS accuracy (yes, Al Gore was involved!) you and I did not have GPS-enabled phones.  Think of all of the businesses and services that use GPS that didn’t exist in 2000?  One consultant’s report estimates the direct economic impact of GPS in the US to be a cool $67 billion per year.

The last refuge of the zealous closed-data bureaucrat reluctant to release datasets to the public is “what would they use it for anyway?”  Parcels and foreclosure lists have obvious economic benefits, but fire hydrants? But given a critical mass of reliably updated datasets, who can predict what economically important insights will be gleaned in the near future, especially with the advent of the sensor web.

Consider the City of Denver’s experience: the 75% reduction in phone calls for assistance in purchasing and downloading data was predictable.  But what has surprised the Open Data backers has been the tangible increase in interdepartmental efficiency.  You know the drill: mid-level person in Department A emails mid-level person in Department B looking for data.  Department B person asks her boss; boss tells her to ask Department A what they want the data for, etc., etc., ad nauseum.  Now communication is limited to a single email response, a hyperlink, and “have a nice day.”  New York City, an early pioneer in the Open Data Movement, has a team dedicated to analyzing seemingly disparate datasets to tally everything from how many trees were lost to Hurricane Sandy (9,662) to which restaurants were clogging the sewers with illegally dumped cooking grease.  It turns out that unclogging a city’s data flows actually unclogs its sewers: who knew?

*   *   *   *   *   *   *

With three of the US’ five largest cities (New York, Chicago, and Philadelphia) making clear, credible commitments to Open Data, such everyday “victories” will become more commonplace. The Governor of New York recently launched a state-wide open data catalog along with an Executive Order to state agencies to publish data further nudging a “critical mass” of information that opens up opportunities for using data in creative, unforeseeable ways. Given these large-scale commitments, and the technology platforms highlighted above, it’s increasingly obvious that in an environment of restricted government spending few jurisdictions will be able to continue down the economically irrational path of keeping information collected at the taxpayers’ expense walled off from the public.

 

 
—Brian Timoney


Lincoln photo courtesy of   Cayusa’s Flickr stream
Door photo courtesy of   doc(q)man’s Flickr stream
Loops photo courtesy of   Kevin Dooley’s Flickr stream