tecosystems

What’s a Document?

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

One of the most interesting byproducts of the transition, fully underway around the world, to XML based document formats from binary alternatives, is the ability to treat the asset as a container of items rather than a discrete item itself. Both ODF and OOXML allow applications to manipulate the contents of assets that were previously opaque at a minute, granular level, even as their respective proponents would doubtless argue their respective superiority at that particular game.

For those of you – and there are one or two at least, I’m sure – that are not office format wonks, here’s the English translation of the above: the files that you today produce in Excel, Powerpoint, or Word can now be carved up, dynamically reassembled and presented. Annual reports can contain continually updating economic data, mortgage applications real-time interest rates, or – nearer and dearer to my heart – baseball scouting reports, moving performance data.

Documents today can have, as IBM’s Doug Heintzman noted last Wednesday at IBM’s annual analyst event, more in common with a web page than the document you or I might have authored a few years – or a year – ago. Parts of it might be static, parts of it might be dynamic, but each of those parts might arrive from separate, external sources of record. The days of static documentation are drawing to a close, thanks to innovation – finally – in an area that should have seen it years ago.

While we at RedMonk are so far out on the bleeding edge that we can’t even see the mainstream when it comes to our own work habits (though not our coverage, hopefully), it’s nevertheless worth noting that I really don’t create documents at this point. Customer, expense and other operational spreadsheets are kept in Google Docs, and frankly they’re more webpage – even database – than they are spreadsheet at this point. At no point in their lifecycle, generally, are they transmitted as ODF, OOXML, or PDF: I can’t honestly remember the last time I exported one for the purposes of sending. When we need to collaborate with an external party, we simply share the asset. Even the pieces I author for this space are documents only in a nominal sense. Each is composed in emacs, then pasted to WordPress. There, it is reforged as an entirely different asset, pulling in pictures, videos, or other embedded assets, all while collecting comments, trackbacks, and revisions to become something new and distinct.

Is that a document? I’d argue not.

The closest I come to creating documents, at least in the traditional sense, is in Impress – the OpenOffice.org Powerpoint alternative. This I use to create the presentations I deliver at conferences, customer events and the like. The presentations tend to be discrete, unevolving assets that I “share” simply by posting them to the web. We do reuse presentations (occasionally) and slides (frequently) within RedMonk, but for the most part presentations are not living documents in the way that a customer spreadsheet is.

But that’s the exception to the rule, which is living assets, and it’s driven primarily by technical limitations. Limitations that I hope are removed. Soon.

For us then, settling on the definition of a “document” is problematic, because it reflects a lifecycle and a lifespan that are, at best, antiquated. Much, if not most, of our output is collaborative, rather than singularly authored, and most of it has a life expectancy far beyond any of the Word documents I authored in my capacity as a systems integrator. Particularly the content that lives on the web. A document, for me, has become a snapshot of the real, living asset, rather than an asset in and of itself. If our Google Doc’s spreadsheet is the Platonic ideal, the ODF capture of it is merely the shadow on the wall.

Which begs the question: are we creating documents, really, anymore? What does document mean in a networked, composable, and programmatically manipulable age? Or perhaps your natural inclination might be – like mine – to view the above as splitting hairs, a pointless, unresolvable debate of semantics.

Whatever my natural inclination might be towards such questions, however, my considered opinion is that the question matters. Maybe a lot.

Not to me, personally. First, because as mentioned, I live on the cutting edge and I’m not terribly relevant relative to the average office user of today, or maybe three to four years out. But more because I’m in a position to realize how documents are evolving, and what they might be capable of if we can get creative. The terminology is not going to have much bearing on what I think of a given technology.

Not everyone is so lucky, however.

As I see it, the danger in continuing to call the content we’ll be creating – using a rapidly evolving set of tools – over the next few years “documents” is that it will stunt the imagination. An example: when I was approached, years ago, about attending the ODF Summit, I had to explain in detail why I believed that messaging (email) and collaboration (wiki) vendors should be included in thee discussion. So tight was the focus on an “office productivity” format, it was non-obvious even to some ODF experts that wikis might, at some point, become consumers and producers of ODF.

The term document, in my view, is a legacy term, and as such, it brings with it preconceived notions of what a document is, should be, and can be. My concern, then, is that these preconceived notions end up predetermining the perceptions of what the assets are capable of.

To be sure, we should not – must not – try to reframe the traditional definition of a document. For those mainstream folks that will make up the bulk of the user population for the foreseeable future, their definition of what a document is is set, and it would be folly to try and change this.

But neither should we let that definition carry forward, tainting more capable formats with the legacy of its limited capabilities. No, we need a new definition or term, I believe. Something more accurately descriptive, and yet non-threatening. Database? Too intimidating, too misleading. Web page? Likewise. Container? I don’t love it.

So I don’t have the replacement term worked out yet: sue me. That doesn’t change the fact, in my opinion, that we’ll need one.

And if the format advocates have their way, probably soon.