James Governor's Monkchips

OpenXML vs ODF: does the archiving argument stack up?

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Rob Weir is a master rhetorician and a great storyteller, with a wonderful grasp of history. He is also one of the most visible voices in the anti-OpenXML brigade. For those of you that haven’t followed the story, Microsoft has put forward an XML format to compete with the Anyone-but-Microsoft Club’s Open Document Format (ODF).

Its not my goal to rehash the story here, but to ask a question about electronic documents and forensics. I don’t think the world needs one document format, and I am not sure Rob’s argument that paper is a single format that enabled innovation over hundreds of years holds together. Paper is found in, and underpins, a variety of formats, and that’s what actually makes it so valuable, rather than the other way around. Document storage, yes its an important issue. Document retrieval – even more so. But what I really take issue with is the idea that Microsoft’s OpenXML format could become uninspectable in future. I just don’t buy it. Today I saw that WordPress can render Microsoft Office documents using Thinkfree. I also learned that the Powerpoint equivalent Google just acquired allowed the authoring of Powerpoint slides without using Powerpoint. It begins to put me in mind of Mark Twain – the only things sure in life are death, taxes, and third parties (reverse) engineering around Microsoft Office formats.

As soon as we get into the forensics of document history, want to know who said what in 1922, or who had babies with whom, then tools are needed to analyse that information. Microfiches, libraries and so on. Its not the fact the records are on paper that makes them searchable; its the way the documents are organised and indexed. And you know Google is going to index all documents, regardless of format. Weir says he doesn’t see PDF as the answer. The answer to what, Rob? PDF is the reality of the official document world today, just as Microsoft Office is the reality of the hard-drive corporation. Is OpenXML really so heinous?

The pro-ODF argument almost begins to feel like the pitch for The Semantic Web: where top down standardisation will enable all kinds of cool stuff. I have a lot of sympathy for the Clay Shirkey pushback, which argues useful semantics actually emerges from tagging and how people use stuff.

One of my current hobby horses is that we the industry needs to move beyond good vs evil, manichaen black vs white, beyond the single answer to a problem. Our monoetheism does us no favours. A more polytheistic sense, of using the right tools for the job, and being in mastery, bringing a more distributed spirituality into our technology saturated lives. And document formats seems an obvious place for that kind of thinking. One true format? What do we need that for and what god are we worshipping? What are the problems we’re trying to solve?

26 comments

  1. […] Интересные ответы на утверждения Вейра есть в блоге у Джеймса Говернора, известного пропагандиста open source. […]

  2. […] Интересные ответы на утверждения Вейра есть в блоге у Джеймса Говернора, известного пропагандиста open source. […]

  3. Great post. I generally am much more sympathetic to the ODF side of the argument, but Rob went a little too far off the deepend on this argument.

  4. Good post; I reckon the pro-ODF folk like are damaging their own case with over-the-top rhetoric; having said which, Microsoft is not doing a great job on its side either. I thrashed out some of the issues with Jean Paoli recently; it’s on my blog (link above).

    Tim

  5. I think the tactical error that most of us on the pro-ODF side of the argument make is the “101 reasons why OpenXML isn’t great” frenzy we get into; thinking of every possible angle rather than focusing on the top 2 or 3. To me the biggest single disappointment isn’t any technical pros or cons but the mockery MSFT is making of the standards process (after it became clear governments were going to mandate open standards). The single weakest argument is ‘the world only needs one open document format’ – just like we only need 640K, 5 computers world wide (or whatever the IBM quote is), or any other short sited prediction the IT world is littered with – on the other hand neither do we need a XML standard littered with works like Win95 / Word 6 / WordPerfect / Lotus 1-2-3 backwards compatibility hacks.

    It’s also worth pointing out that while many have reverse engineered the MS formats to the point of being able to extract text or numeric data, most (small) business I’ve worked with would make these observations about the alternatives (OpenOffice mostly, but also Google Docs)

    * Functionality that I need is all there

    * Minimal training required

    * Main reason not to deploy is that can’t guarantee 100% Office document fidelity both in and out.

    * Other deployment block is macros execution which OOo is only just starting to tackle and AFAIK Google docs / search doesn’t (yet). What if what you’re searching for is the result of a macro execution – or the embodied formula or process (AFAIK this is an area where ODF is weak too)

  6. James – knowing you are not exactly a Microsoft apologist, your post made me actually reevaluate which camp i was rooting for. I was finding it so easy to get caught up in the whose side is winning when I realized, wait, we all could actually lose if it’s just one format. That solves nothing, plus ODF sort of sucks when you strip it down despite how emotionally i wanted MSFT to take one. Hats off for helping me at least question myself and get back to my day job which has always been “what problem are we trying to solve”. My boss the CIO isnt going to ask me which friggin format i implemented, but does it work, who supports it and does it WORK WITH WHAT WE HAVE.

  7. James-

    It’s utter naivete. You’d amend your view if you understood how the MOOXML format operates throughout the stack.

    Horses for courses is an appealing sentiment; but not here.

  8. […] I asked whether the archiving argument for ODF against OpenXML stacked up. One comment came from Sam Hiser, who came back with: “It’s utter naivete. You’d amend […]

  9. James, I try to avoid taking sides on issues before I understand them. I presume something like that underlies your file format agnosticism.

    I do not understand why you cite the examples of the Thinkfree plugin and the Tonic Systems presentations app acquired by Google. As nearly as I can tell, neither supports Microsoft’s new XML formats, but instead support the old Microsoft binary formats about which all concerned could probably agree are heading for the ashbin of history. So those applicat ions’ relevance to your article’s thesis seems weak.

    I also think you are missing the key market requirement of software interoperability. In the old days of software as an end-point, less than full interoperability was tolerable because each document would be manually inspected for fidelity in conversion. And those of us who have learned to cope with lossy file conversions in our personal work tend to believe that what works good enough for us is good enough for all.

    But in the emerging business processes and line of business systems, as in a SOA — where data integrity must be ensured in wholly automated document parsing and conversion processes — lossy conversion fidelity is unacceptable. Only full fidelity is acceptable.

    The market requirement of file conversions without data loss is also at least arguably compelled for governments and enterprises. See e.g., Sarbanes-Oxley Act, 15 U.S.C. 7261(b) (financial information must “not contain an untrue statement of a material fact”); E-SIGN Act, 15 U.S.C. 7001(d)(1)(B) (electronically preserved records must “accurately reflect[] the information set forth in the contract or other record” and be “in a form that is capable of being accurately reproduced for later reference, whether by transmission, printing, or otherwise”).

    Given such such market requirements — particularly in the context of fully automated business processes — the issue of dueling file formats really boils down to the accuracy of Microsoft’s claim that interoperability can be achieved by file conversions between its XML formats and OpenDocument. If *full* fidelity can not be reliably achieved in wholly automated conversions between the two formats, then the case for the world having duplicative document file format standards for the same functionality falls flat on its face.

    Thus far, the news on full fidelity conversions between the two formats is not good. There are sound technical reasons for believing full fidelity will never be achieved, given that Microsoft’s XML formats appear to defy document parsing using XPath, a critical requirement in automated XML transformations. And of course Steve Ballmer predicted that the Microsoft-CleverAge-Novell ODF file conversion plug-ins never would achieve 100 per cent fidelity. http://www.eweek.com/article2/0,1895,2050848,00.asp?kc=EWEWEMNL103006EP17A

    So the case for dueling document file formats seems weak. I’ll leave you with a quotation from an IBM document describing the complexity facing IT managers in migrating to a Service Oriented Architecture and ask that you ponder why any sane IT manager would prefer to have dueling incompatible file formats in the administered system. http://www-128.ibm.com/developerworks/library/ws-migratesoa/

    “Over the last four decades the practice of software development has gone through several different programming models. Each shift was made in part to deal with greater levels of software complexity and to enable the assembly of applications through parts, components, or services. More recently, Java technology contributed platform-neutral programming, and XML contributed self-describing, and thus platform-neutral, data. Now Web services has removed another barrier by allowing the interconnection of applications in an object-model-neutral way. Using a simple XML-based messaging scheme, Java applications can invoke DCOM-based, CORBA-compliant, or even COBOL applications. CICS or IMS transactions on a mainframe in Singapore can be invoked by a COM-based application driven by Lotus Script running on a Domino server in Munich. Best of all, the invoking application most likely has no idea where the transaction will run, what language it is written in, or what route the message may take along the way. A service is requested, and an answer is provided.

    . . .

    “As an industry, we have gone through multiple computing architectures designed to allow fully distributed processing, programming languages designed to run on any platform, greatly reducing implementation schedules, and a myriad of connectivity products designed to allow better and faster integration of applications. However, the complete solution continues to elude us. … Now you find more complex environments. Legacy systems must be reused rather than replaced, because with even more constrained budgets, replacement is cost-prohibitive. You find that cheap, ubiquitous access to the Internet has created the possibility of entirely new business models, which must at least be evaluated since the competition is already doing it. Growth by merger and acquisition has become standard fare, so entire IT organizations, applications, and infrastructures must be integrated and absorbed. In an environment of this complexity, point solutions merely exacerbate the problem, and will never lead us out of the woods. Systems must be developed where heterogeneity is fundamental to the environment, because they must accommodate an endless variety of hardware, operating systems, middleware, languages, and data stores. The cumulative effect of decades of growth and evolution has produced severe complexity. With all these business challenges for IT, it is no wonder that application integration tops the priority list of many CIOs[.]”

  10. Andew, Tim and Paul: thanks.

    Marbux – thanks for your comment. I think I actually do understand the issues quite well, but there is plety more to learn. I like you calling out the regulatory requirement for fidelity. that is a good argument, and one that talks to a specific vertical use case. Unfortunately the IBM quote brings very little additional meat to the argument. SOA Web Services, as explained there, is still a work in progress, and a somewhat ugly one at that. What is more it arguably *supports* my argument in that it calls for reuse rather than rip and replace. Things need to interoperate. Its not about dueling formats but working formats. The Thinkfree argument is one by history. We have a good few years of proprietary MS formats to go by, and interop was indeed possible. We have to take users forward into the automated process world you describe. But rip and replace is not going to happen

  11. James-

    “Its not about dueling formats but working formats.”

    Again, you betray something like misunderstanding. MOOXML is conceived, designed & implemented to not work with other than MS software.

    It’s not a format; it’s a manifestation of competitive values running untethered. It doesn’t “work” in the way the Internet tells us a post-modern format should function.

    James. It doesn’t work. It’s not a contender because it doesn’t meet the requirements of the market. You may be bored with the rhetoric but that’s no excuse for bad recommendations.

  12. […] than replying to each one individually. James Governor takes a closer look at archival formats OpenXML vs ODF: does the archiving argument stack up? “The industry needs to move beyond good vs evil, manichaen black vs white, beyond the single answer […]

  13. Thanks for the compliments, James. As you know I’ve been working on a series of posts, attempting to make the case that there should be a single document format. Three of the four posts have been written so far: http://www.robweir.com/blog/2007/03/case-for-single-document-format-part-i.html

    I’ve been looking at the problem mainly from economic and historical angles, examining the forces that tend to lead to a single standard. These are forces or tendencies, not absolutes, of course. The reader should note, however, that in none of these posts in this series have I called for ODF to be that single format, nor have I called for OOXML to be thrown away. In fact, I don’t believe I’ve even mentioned ODF or OOXML in these posts.

    Sure I have my own thoughts on that subject, and have voiced them in other posts. But the argument about the desirability, and in my mind at least, the inevitability of a single document format, is independent of the actual underlying formats. Similarly, one could argue that a single rail gauge was desirable and inevitable in the US without having to call one gauge evil or another gauge a gift from heaven. True interoperability is the goal. The format is just the means to that goal. Would you agree on that point?

    I think we also need to get away from the black & white thinking that having a single document format is about picking winners and losers. That is the Manichaenism that I’m seeing, the view that we either have multiple formats in simultaneous use or we will all suffer from lack of innovation and competition. A single format should benefit all, the entire community, just like a single internet, a single telephone network and a single rail gauge benefits all. That is what formal standards are about.

    I’d note DIN’s (the German standards organization) definition of a standard as a, “…document which has been elaborated consensually and accepted by an acknowledged institution and which lays down for general and recurrent application rules, guidelines or characteristics for activities or the results thereof, whereby an optimal degree of regulation in a given connection is striven for.”

    I’d emphasize the phrases “elaborated consensually”, “general and recurrent application,” and “optimal degree of regulation”. Certainly, a format dictated by Microsoft and fully usable only within their product is lacking by these criteria. Is ODF perfect? No. But it continues to be “elaborated consensually”, and it is that open, transparent process that continues to improve a file format which is capable of, and has already demonstrated, “general and recurrent” use. I think the key is to get Microsoft to come to the table to participate in this consensual standardization effort, rather than having them create islands of non-interoperable documents and document processing systems in an widening ocean of interoperability.

  14. I believe that most of us have asked Microsoft to sit down with OASIS and work to make a unified format that meets their needs, so that users of software and those of us that support them can have their needs met.

    The #1 request I get is for office document format conversions between the old Microsoft formats, WordPerfect formats, older OpenOffice formats, and OpenDocument formats. I also get a few requests for people trying to convert home-user Microsoft Works formats into something else. Except for the Works conversions, I see the same thing in the search requests for my blog: “how do I convert between document formats?”

    Until and unless there is faithful conversion between the three major XML-based office document formats (don’t forget China’s UOF), it is *best for end users* for there to be a single format that is usable by any vendor at any time for any purpose. As far as archiving goes, the format needs those qualifications so that it can still be opened after the original software maker has gone on to something else. As I have said before, “users want choice of applications, not file formats, while vendors want choice of file formats, not applications.” I am firmly on the side of the users (whom I support–I know which side my bread is buttered on).

    So once again, I encourage Microsoft to stop the partisan warfare and work with OASIS to unify the two formats (and hopefully China’s UOF) into a single interoperable format for all vendors.

  15. [J.Governor]: “One true format? What do we need that for and what god are we worshipping? What are the problems we’re trying to solve?”
    ________________________________________________
    That statement alone indicates you haven’t studied the issue on which you’re writing. Marbux made several cogent arguments, all but one of which you nonchalantly dismissed as suggesting a “rip and replace” document format strategy. But no one is arguing that. The .doc format is well parsed, but it’s only one of Microsoft’s proprietary formats that had to be updated six out of the last eight versions of MS Office — and came with its own incompatibilities with itself!

    “The problem we’re trying to solve” is to free ourselves and our documents from Microsoft. Microsoft did not want to open its formats, so the world built a better, and easier one in ODF using open standards.

    James you seem to want to hang on to MS-OOXML and legacy formats forever. In the past fifteen years, those very formats have changed so drastically to the point that I cannot even read my Word 2.0 files in Word 2003 or 2007. I don’t want that. You think a government wants that? Why should taxpayers fund such nonsense in the coming decades just to make Microsoft’s shareholders richer?

    In the end, it’s really simple: it’s not about the program, it’s about the format. MS .doc is dying; MS-OOXML is DOA, and ODF not only “works,” but makes sense in a wide variety of ways. I haven’t “gone off the deep end” because I support ODF. Quite the contrary. If the goal is interoperability — beyond Microsoft Office of, by, and for itself — then the ball is in Microsoft’s court.

    When I read pro-MS-OOXML posts like yours, it follows the same structure: “Why don’t ‘those’ people just shutup and go away.” The pro-Microsoft crowd invested in the MS-OOXML format and the format is already dead. Let’s have the intelligence, let’s have the DECENCY to sign the death certificate, collect the insurance, and invest in a format with a future — ODF.

  16. I wondered about your arguments until I picked up a copy of “Joel on Software” and had a read. What he said about project specifications (specs) seemed to make sense in this (standards-focused) context.

    If you consider the document standard as an industry-wide specification for office productivity/automation suites, you avoid the hassles faced by Speedy in Joel’s entertaining article/s.

    Of course, DOC is an MS Word memory dump and is only included in modern Word Processors such as AbiWord because it currently has a stranglehold on the wordprocessing user, and furthermore changes whenever a new version of MS Word comes out; it can’t be realistically called a standard. ODF indeed has a different focus, and so, can accurately be termed a specification for word processing and other office productivity/automation.

    Joel also makes an interesting point on specifications – the longer the specification, the lesser it is likely to be read.

    And yes, we’re all hoping Microsoft will turn up to the ODF party, sooner or later, but better sooner than later.

  17. […] but so does Rob Weir Filed under: Uncategorized — ctrambler @ 12:02 pm James Governor disagree with Rob Weir over the issues of long term archiving potential of ODF and OpenXML. He has a point […]

  18. Zaine- i wouldnt say i was nonchalant in pushing back against marbux. i largely pushed back against a quote from an IBM primer. I would say if the reason we need ODF is to underpin WS-* style interop then we’re in trouble given that approach is questionable. I maintain the the market will dictate what happens next, and do we really expect that no companies will adopt the latest versions of Office? A lot of people have taken my piece to be an an argument in favour of OOXML at the expense of ODF. that is, an advisory to use the former. It was never intended thusly.

    Thanks Wesley.

    W^L+ – The throwaway line about “hopefully China too” is a very interesting one. I had not even been considering that issue in the broader analysis. make a single format even less likely, doesnt it?

  19. I should’ve added that, from my perspective, DOCX and related file formats seem to be XML-annotated memory dumps; ECMA 376 seems to tie itself to replicating _all_ of MS Office 2k7’s quirks, and some of them are too absurd to be taken seriously – ones like the clipart one, and the Leap Year. (I’ve left some rather scathing remarks about the monocultural expectations behind the clipart fiasco on Rob Weir’s blog – I don’t feel the need to repeat them here. 😉

    And so, tied as it is to MS Office 2k7, as opposed to ODF’s plethora of re-implementations, I think ECMA 376 would fail as an archiving file format because of its single source nature. (Please note: ECMA 376 will no doubt have a wide collection of sub-implementations – Brian Jones is sure of that, and I expect he is right. MS Office has always had a vast set of independent software developed on top of it, but then, so does AutoCAD, and Oracle, etc. But that isn’t the same as a wide set of independent re-implementations a la OO.org, WASCE, KOffice, et alii.)

  20. James-

    I apologize for insulting you and one of your commentors.

    It certainly is true that a more constructive approach to the discussion by me would be more useful to everybody.

    With regards,
    -Sam

  21. China’s UOF is under active “harmonization” with ODF, as I understand it. There is a project to create a translator, but the goal is to merge the two into one or at least make them close enough that they are easily converted one to another. I believe someone from Sun articulated this at a conference recently.

    I sincerely hope that Microsoft will give up its dreams of empire and instead work with the rest of the world on a unified format. Users have enough to worry about trying to choose between BlueRay and HD-DVD, trying to avoid picking a loser like so many did with VHS vs. Beta. They don’t need to get stuck trying to pick a file format.

  22. […] writes in reference to an article by James Governor.  Governor writes that he does not see where the need for document archiving necessarily […]

  23. Third comment – once we start talking about macros and other little nasties, we rapidly run into the question of “what language?”

    It’s perfectly well understood – by me at least – that Visual Basic in some form or another, rules the roost in MS Office. But if you consider these file formats as part and parcel of a web-inclusive SOA – ie, you have software components that include web servers and web browsers in the stack – you also have to consider J[ava]script, Perl, php, et alii.

    Merely including Visual Basic as a macro language, no longer cuts the mustard. And that is where Microsoft’s ECMA 376 again, comes a cropper. I haven’t seen any evidence on any Microsoft blogger’s site that they’re aware of this – mind you, I haven’t seen any evidence on any FOSS blogger’s site that they’re aware of a wider macro language issue either. It’s just I think for the FOSS people, it’s a case of getting a kick in the pants and nobody’ll be able to stop them. Microsoft’ll need to stifle the Steelypips first.

  24. […] First, James Governor commented and say that he cannot see 100% ODF in enterprise to the exclusion of OpenXML. As I had replied, its true. I cannot see that either. A few years back I would not believe it is possible to see non 100% Microsoft format, no matter what it is. However, today this is possible. In the next few years, bar some catastrophic events, we will probably see ODF slowly going into enterprise. Best case scenario is it follows the trend of Firefox but slightly faster. Lets say 15% in two years. That is good enough to break the document format monopoly held by Microsoft. […]

  25. The problem with OOXML for archiving is surely that it is not fully documented. It specifies things in terms of Microsoft application behaviour and binary objects, the format of which the specificvation does not document and which only Microsoft knows.

    Surely this is completely unsuitable for any kind of archiving – in order to archive things for posterity it is necessary to force conversion of documents into a fully documented format, and any undocumented content should be manually checked to see if it is converted correctly rather than allowing the unspecified content to get into the archive.

    To do this I would suggest the best way to go is to save all legacy Microsoft in PDF format which is fully documented, and any create and store new documents in ODF or PDF depending on whether they need to be edited or not.

  26. […] (as an aside, I wasn’t aware that Google had its own format). Likewise, my colleague has said that he doesn’t “think the world needs one document format”. I don’t agree. […]

Leave a Reply to James Governor has a point, but so does Rob Weir « CyberTech Rambler Cancel reply

Your email address will not be published. Required fields are marked *