OOXML is defective by design

Monday, May 10, 2010

The OOXML interoperability scam

Stéphane Rodriguez, May 2010

Every time the Microsoft Office team pushes a comment on the wire, there is another pledge for interoperability. It has been so common for the last few years that if you haven't actually watched what it might mean, pretty much OOXML is synonym with interoperability.

Of course, it does not matter that the word interoperability alone does not mean anything. That is why Microsoft uses it so much. You can pretty much put an interoperability label to anything as long as it is not accurately defined. Does it mean document-level interoperability? Application-level interoperability? Or, perhaps is it just Microsoft-only interoperability (a good guess!)?

The pledge for interoperability cannot possibly mean document-level interoperability since we are not there : OOXML is full of non-XML streams, barely defined at all (the official papers lack everything related to international features, and that is just one example), so that ends any serious discussion precociously. In the remainder of his article, I'll be taking a look at application-level interoperability, in case Microsoft means that.

Just out of generosity I'll be doing tests with Office 2010, the latest version their crappy product suite, allowing them to take the time (no less than 3 years) worth of improvements for features that are supposed to be part of Office 2007 already (which RTM'ed in Nov 2006). Specifically I'll be focusing on Excel 2010 since I've already mentioned a lack of application-level interoperability in my article about content controls in Word (pointing out that so-called XML features like this actually require running Word instances to work, contrary to the open standard pledge).

A good example, when it comes to application-level interoperability, is what is stored in the Windows clipboard when you copy/paste content to or from an Excel spreadsheet. Indeed, Microsoft Office has been used to store binary formats in the clipboard, i.e. proprietary, therefore limiting interoperability across applications. Scenarios like this include the ability to copy/paste content with high fidelity between a running instance of Excel and a running instance of another application (related to spreadsheets or not, assumging it's a OOXML client). Therefore a good measure of whether or not Microsoft has improved interoperability in the OOXML timeframe is whether Excel 2010 actually stores OOXML in the clipboard. Let's see if that is the case.

To run the test, we simply create a trivial spreadsheet with a few values in cells and a couple graphics :

A simple spreadsheet to run the test

The test consists in selecting this cell area and hitting Ctrl-C. That stores the content in the clipboard (hosted by the Windows shell). We know that if we hit Ctrl-V in a new spreadsheet, the content will be pasted with high fidelity, leading us to believe that not only everything needed to do just that is stored in the clipboard, but given the advances and openness of OOXML, we might be able to do that from a separate application.

Let's see what is in the clipboard at that moment. To do that, we simply use the built-in Windows clipboard viewer : hit Windows+R, enter clipbrd.exe and hit enter. We can see, by default, a text version of what is supposed to be there :

The content of the clipboard

The clipboard stores content in one or more formats, either standard or proprietary. The clipboard viewer lists them :

Storing part of a spreadsheet : internal file formats in the clipboard

  • Enhanced metafile (0 byte)
  • Metafile (16 bytes)
  • Bitmap (0 byte)
  • Unicode text (254 bytes)
  • Text (127 bytes)
  • Displayed text (14 bytes)
  • Regional parameters (4 bytes)
  • OEM text (127 bytes)
  • DIB Bitmap (829320 bytes)
  • DataObject (4 bytes)
  • Biff12 (6219 bytes)
  • Biff8 (15872 bytes)
  • Biff5 (10752 bytes)
  • Sylk (1615 bytes)
  • DIF (848 bytes)
  • XML Spreadsheet (1972 bytes)
  • HTML Format (13763 bytes)
  • Csv (127 bytes)
  • Hyperlink (156 bytes)
  • Rich Text Format (32768 bytes)
  • Embed source (161348 bytes)
  • Native (161348 bytes)
  • OwnerLink (35 bytes)
  • Object Descriptor (154 bytes)
  • Link Source (188 bytes)
  • Link Source Descriptor (154 bytes)
  • Link (45 bytes)
  • ObjectLink (54 bytes)
  • Ole Private Data (728 bytes)

In the screen capture above, non-grayed formats are standard and implemented by pretty much all applications, whereas grayed-out formats are proprietary and application-specific (Excel). Of all formats, only a couple can actually contain the relevant content in high fidelity. These are :

  • Biff12
  • Biff8
  • Biff5
  • XML Spreadsheet
  • HTML Format

Let's take a look at the list we have.

  • BIFF is the acronym of the binary Excel file format. 5, 8 and 12 are revisions of the file format.
  • XML Spreadsheet is actually Excel 2003's data-only XML file format, a file format deemed so poor that it was completely rewritten to what became OOXML. And of course, this is just data so we lose the graphics.
  • HTML format : this is spaghetti HTML-MSO markup for pure display purposes. Graphics are expressed in terms of bitmaps so we would lose the definition.

As a result, only the BIFF formats are really at our disposal. It goes even worse when you realize that BIFF12 is actually not BIFF at all. In fact, in every single Excel release, Microsoft creates a new revision of the BIFF file format and for some reason it is still called BIFF8. BIFF12, on the other hand, is a completely new file format. It is Zipped and respects the OPC packaging, but it is made of .bin entries which store a new file format which is neither the regular BIFF nor XML.

The conclusion is that there is actually no way for an OOXML consumer application to rely on the "standard" OOXML to interoperate with Excel 2010 (and Excel 2007). It's all back to binary formats even though, by registering proprietary formats in the clipboard, the Excel team had years to implement this opportunity to store real OOXML in there. What is a "standard" good for if it's not used by the one and only reference application out there? This is lousy engineering at its worst. Or, for what matters, lock-in strategy.

The article could end here, but just to show how backwards is just about everything Microsoft does, let's take a look at how Excel interoperates with itself when it comes to actually realizing a Ctrl-C / Ctrl-V.

Let's take our test spreadsheet again. Select the cell area and hit Ctrl-C. Now create a new workbook in the same Excel instance while leaving the other workbook open, then hit Ctrl-V. Everything is copied there, in high fidelity including graphics.

Now if you create the new workbook in a new Excel instance (so you have two running Excel instances), hitting Ctrl-V will not copy the graphics and will go as far as incorrectly copy the data : see for instance we had the list of values {2,3,5} and they have been turn to {2-Jan-00, 3-Jan-00, 5-Jan-00}. Formulas are lost and everything. Which is utterly ridiculous.

Copying content across two Excel instances is inaccurate, incorrect and incomplete

At this point, we can conclude that either copying content accurately across Excel instances does not work at all (so a few hundred million clients out there are using a broken product), or it incorrectly uses the content in the clipboard, or the way it works is when the SAME Excel instance communicates across workbooks in order to pass internal data structures and graphics structures. Which means the clipboard is just for second class or third class citizens, and copy/paste is simply unreliable unless you are doing it with Excel. Which is the very definition of a lock-in product. Microsoft has every right to sell lock-in products, but they should not try to lie governments and the public out there : the OOXML standard is no evidence of any form of application-level interoperability. We have just proven it with a trivial scenario.

Just for the record, the BIFF12 file in the clipboard does not contain graphics at all. So even what is stored in the clipboard is just second class or third class citizenship.

Comparing OpenOffice and Microsoft Office

Microsoft Office is just a disaster. Out of sheer curiosity, and out of fairness, it is interesting to see how OpenOffice compares. Remember that, in the public out there, Microsoft apologists call OpenOffice a simplistic product that does not even support formulas.

Again, we'll be sticking with OpenOffice calc, the counterpart of Microsoft Excel. Tests will be done with OpenOffice 3.2, the latest major release of the product suite.

Comparing the application-level interoperability in OpenOffice 3.2

The content in question is a replica of what has been used so far. In OpenOffice, we select the cell area around a few cells, formulas and graphics. And we hit Ctrl-C. And then looking up the clipboard, we have the following formats in there :

  • DataObject (4 bytes)
  • Star Embed Source (XML) (10563 bytes)
  • Star Object Descriptor (XML) (77 bytes)
  • GDIMetaFile (28597 bytes)
  • CF_ENHMETAFILE (0 bytes)
  • CF_METAFILEPICT (16 bytes)
  • CF_DIB (632648 bytes)
  • Windows Bitmap (632662 bytes)
  • HTML (HyperText Markup Language) (10679 bytes)
  • HTML Format (10785 bytes)
  • CF_SYLK (164 bytes)
  • Link (36 bytes)
  • CF_DIF (1422 bytes)
  • CF_UNICODETEXT (472 bytes)
  • CF_TEXT (236 bytes)
  • Rich Text Format (5876 bytes)
  • Ole Private Data (568 bytes)
  • CF_LOCALE (4 bytes)
  • CF_OEMTEXT (236 bytes)
  • CF_BITMAP (0 bytes)

Off these, only a couple can represent the content in full fidelity (given their size and name). The potential candidates are :

  • Star Embed Source (XML) (10563 bytes)
  • HTML (HyperText Markup Language) (10679 bytes)
  • HTML Format (10785 bytes)

Obviously HTML won't be kept much longer for further inspection since HTML is by design spaghetti markup designed for display purposes only. Graphics are replaced by bitmaps so we lose the definitions.

In fact only one format remains, Star Embed Source. Chances are that this name comes from Star Office, the original name of the OpenOffice project. And indeed, upon inspection, it's an .ODS file, that is a full fidelity copy of the content.

If we grab the content of the clipboard, make sure to call it a .ODS file and open it in OpenOffice, it shows the following :

Grabbing the content of the clipboard : full fidelity .ODS file

It has everything you inspect, including formulas, formatting and graphics themselves. So if you were an ODS client, just reading the clipboard, you are able to interoperate with OpenOffice. This is the very definition of application-level interoperability. And it's very surreal that it works with OpenOffice, and not Microsoft Office...

Now for the second test, let's see if we copy the content and paste it in a new workbook of another instance of OpenOffice, does it remain in full fidelity?

Simply select the cell area again, hit Ctrl-C, then go in the OpenOffice install folder (double-clicking on the OpenOffice shortcut in the desktop will not start a new instance so we have to workaround that), unfold c:\Program files\OpenOffice 3\program and double-click on scalc.exe. A new OpenOffice instance starts. Hit Ctrl-V. And you can see for yourself that everything is there : cells, formatting, formulas, graphics.

Simple tests like this leave me a bit speechless when you see that Microsoft Office is supposed to be the rolls royce of Office programs in the world, the de facto standard. And in fact it's just crap. On the contrary OpenOffice, the free suite, is actually a more serious product when it comes to application-level interoperability. This had to be said...

Friday, January 8, 2010

Shaving off standard XML for proprietary stuff

Stéphane Rodriguez, Jan 2010

Previous articles :
- Office 2010, "operation Barbarossa" edition
- Microsoft's latest aggression on ODF, codenamed "cast lead"
- Beating a dead horse
- Follow up on Microsoft latest bullshit announcement
- Microsoft latest bullshit : native support of ODF in Office 2007
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

To get a feel of what happened when Microsoft shaved Word 2003 and 2007 off the XML feature which is, as we are being told, infringing one of I4I's patents, all you have to do is start Word 2003 and show the help menus.

Unfold the XML section, and depending if you are online or not, you'll see a different thing. First start with the offline Word 2003 as shown below :

Word 2003's offline help uncovers the XML features

And then to go online, only to see the entire XML section shaved off :

Word 2003's online help disclosing shaved off XML features

Ironically enough is the fact that all XML features before the cut off were regular, shall we say interoperable features, such as XML documents, XML schemas (standard XSD), document binding (standard XSLT), validation and so on. Pretty much your perfect XML toolkit.

It's now shaved off. And Microsoft has made it mandatory by pushing updates of their Office suite touching present and previous Word releases.

(as a side note, only Word is touched because neither Excel or Powerpoint ever implemented a standard XML toolkit. All their built-in or canada dry XML was already proprietary crap à la Microsoft).

Since those features are shaved off, one would expect XML to be entirely off of Word. Nope.

In fact XML is still implemented there, since in fact the Word team (the horse behind it being Mr Brian Jones, whose name appears in a gazillion patents related to Office Open XML formats, the ISO file format bought by and for Microsoft) tried many times to come up with some kind of XML stuff. They never got there on the first try, but the feature remained there when they came up with another attempt. And the second try remained there when they came up with the third try and so on. Long story short, there is XML crap all over the place in Word, resulting from previous failed attempts.

At the time of writing they still have smart tags, smart documents, content controls. And more.

Now what is of interest is Microsoft is touting as the alternative XML features made available for customers and that does not infringe, at least they tell us so, with the I4I's patents.

The stuff in question is called content controls. It used to be called Custom XML. See a previous article I wrote in which I have excerpts from a few Microsoft Office guys telling us what this thing is, including Brian Jones.

Of course Custom XML is now called "content controls" since Custom XML is too close to I4I's patents. It's obvious that it touches the patents and therefore should be removed from the product as well.

So what are content controls you ask? The Office Word team tells us in a couple articles they wrote. You can look here for the definition, and here for a few tutorials for developers. Those articles were written with much innocence back in 2006.

Microsoft's response to the infringing XML : more infringing XML...

Content controls are an extension of ActiveX controls, itself based on the 2-decade old OLE technology. See the kind of water we are treading here...

Content controls are for instance rich text controls, date controls and so on. It's 100% proprietary. To render, activate, bind a content control at run-time is Microsoft proprietary, undocumented and therefore subject to implementation restrictions. Again, for more, take a look at Brian Jones's and al. patents on the subject here.

Running content controls requires a license of Word.

Let me repeat. $Running$ $content$ $controls$ $requires$ $a$ $license$ $of$ $Word$.

To summarize, Microsoft shaved off interoperable and regular XML technologies (XSD schemas, XSLT transforms) in favor a proprietary alternative, content controls.

Who really benefits? That is left as an exercise to the reader.

Tuesday, November 24, 2009

Office 2010, "operation Barbarossa" edition

Stéphane Rodriguez, November 2009

Previous articles :
- Microsoft's latest aggression on ODF, codenamed "cast lead"
- Beating a dead horse
- Follow up on Microsoft latest bullshit announcement
- Microsoft latest bullshit : native support of ODF in Office 2007
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

With the public release of the latest crap from Redmond in the form of a beta preview of Office 2010, I thought it was a good time to revisit some blatant design and interoperability flaws in the file format, a file format which in and of and itself is more a farce and a fraud, an embarassment for whoever with a software engineering grade. Coming from Microsoft however, the digital nazis of our time, that was expected.

What is discussed here is not an exhaustive list of flaws which, if fixed, would make it a proper file format. There is no "just a few bad apples" thingy going on here. It is fatally flawed. This file format is defective by design. Furthermore it was designed and maintained by persons who should not be allowed to touch a keyboard ever again.

As we are going to see, even though no less than four years have passed (the beta of Office 2007 was made available in late 2005), the situation hasn't changed a bit. Microsoft has obviously concentrated their budgets elsewhere, thereby exemplifying that their agenda was not to come up with an XML-based file format for Office files which would be useful for the general interest, but instead to rubber timestamp whatever piece of crap they had and then move on. As can be seen by anyone putting his hands on it, XML is in fact a pretext. The very design of the file format makes them not suitable at all in a standards world revolving around XML tools, mappings and data/application platform interoperabilities.

Lest not forget that OOXML was defined not as an XML format per se, but as a mapping meant to preserve a so-called binary legacy. A mapping whose absence in the ECMA/ISO papers made the ISO blitz a fraud. Now Microsoft has since gone even further by beginning a global ethnic cleansing policy, the introduction of a malware called MS-ODF, i.e. a Microsoft canada dry version of the ODF file format, whose intent is not secret : what can be the purpose of deliberately storing Microsoft own proprietary formulas and other details in the ODF files (obviously expecting ODF formulas), and to limit the support of ODF to just a limited set of features ? Interestingly, while they had resources to cover for the planned genocide of existing ODF assets, they apparently did not have such resources for coming up with a Save As ISO 29500 format in Microsoft Office 2007 or Microsoft Office 2010, i.e. the claim with which they began the aggression war on civil people. This basic fact is now routinely hidden behind words. The Gauleiters at Microsoft claim to support transient OOXML files not strict OOXML files. The only problem is that "transient" means just about anything they want, i.e. including undocumented stuff, breaking changes, whatever pleases them. While "strict" means the actual ISO papers they fought so much for, to the point of bribing people left and right when the disastrous quality of the draft was pointed out by good willing parties. Microsoft does not intend to support what they fought for. Redmond-Nuremberg trials pending.

Anyway, here is a summary of flaws I came up with two years ago, and for each flaw I revisit whether its status has improved in Office 2010 beta.

1) Self-exploding spreadsheets
==> problem remains as is. More about it below.

2) Entered versus stored values
==> problem remains as is. Excel derailed the IT world two years ago with a floating-point flaw.

3) Optimization artefacts become a feature instead of an embarrasment
==> problem remains as is

4) VML isn't XML
==> problem remains as is

5) Open packaging parts minefield
==> problem remains as is

6) International, but US English first and foremost
==> problem remains as is. More to say about it below.

7) Many ways to get in trouble
==> problem remains as is

8) Windows dates
==> problem made even worse. See below.

9) All roads lead to Office 2007
==> problem remains as is

10) A world of ZIP+OLE files
==> problem remains as is

11) Document security is a (bad) joke
==> problem remains as is

12) BIFF is gone...not!
==> problem made even worse. Introduction of a new binary file format, BIFF14, with .XLS file extension to account for new features

13) Document backwards compatibility subject to neutrino radio-activity
==> problem made even worse. See below

14) ECMA 376 documents just do not exist
==> problem remains as is. Oviously ECMA 376 continues to be just vaporware. There is simply no "Save as ECMA 376" or "Save as strict ISO 29500" option available.

15) How the ISO OpenDocument format (ODF) compares?
==> problem made even worse. with the introduction of the MS-ODF file format, incompatible with the ISO ODF standard, the situation was intentionally made worse.

Now that we have a pretty good idea of how "good" (NOT!) Office 2010 is, let's add a few more flaws.

16) Self-exploding charts
17) International, but US English first and foremost
18) Worsening the issue with Windows dates
19) XML as bad as binary

16) Self-exploding charts

All it takes to get an idea of how much Microsoft pays attention to backwards compatibility is a counter-example. I show this case of a trivial .XLS spreadsheet including a chart, which opens well in all Excel versions including Excel 2007, but explodes in Excel 2010 beta. This spreadsheet is generated by a third-party component. Obviously Excel 2010 beta introduces breaking changes in how it parses .XLS spreadsheets. In this case, the legend is destroyed and the plot area bounces off the boundaries.

A trivial .XLS spreadsheet including a chart, opened in Excel 2003

The same file, opened in Excel 2007 SP1

The same file, open in Excel 2010 beta

17) International, but US English first and foremost

All it takes to get an idea of how much Microsoft respects the assets of their own customers is to use some of the functions exposed by Excel. Just creating a spreadsheet using an Excel language version, say in English, and handing it to someone owning a different language, say in French, results in a corrupt spreadsheet with no way to fix it.

Let's thus take the CELL() function. It's a helper function meaning to extract content or information from an arbitrary cell. The problem is, by design, the first parameter passed to this function is a string which is only evaluated in the corresponding language version of Excel.

In a spreadsheet created using a French version of Excel, let's insert in cell A1 the following function : CELLULE("adresse";B2:B5). The function returns $B$2. Now if you open this spreadsheet in an English version of Excel, the A1 cell now returns an error : #VALUE! and a floating tooltip which says : "A value used in the formula is of the wrong data type".

Well, not only the error message is incorrect, but one immediately sees the distribution effect of poorly designed cell functions.

Guess what, this has not been fixed in OOXML.

As a side note, here is how Excel 2007 or Excel 2010 beta store the mentioned function : CELL("adresse",B2:B5). Which means Excel went through several legitimate conversion steps in order to make it US-English, but not everything. "adresse" should have been converted to "address". It is amazing that this tool is used outside the USA. Any serious global compliance draft for platform interoperabilities would crush Excel as not suitable for such purpose.

Of course, a true fix is to create a CELL.ADDRESS() function to address the matter. And to make sure in the public documentation that the locales of Excel functions are listed.

You would think not only all of the locale flavors of the function and "address" strings would be documented. They are not. You would also think that Excel's user interface warns the user against using such function. It does not.

If you are interested in the (fake) ongoing debate between ODF and OOXML, you must read Microsoft apologists pretend that ODF does not support formulas and therefore OpenOffice Calc is just a simplistic spreadsheet tool unlike Microsoft Excel. Not only is this a lie from a user point of view, but from a standards point of view, there is much left to be desired in the Excel OOXML documentation. Where is the map of function names in the 35+ Excel locales?

If that's not amateur work, it's intentionally defective.

18) Worsening the issue with Windows dates

Microsoft has recently made a huge return of investment in terms of public image (in technical circles) on the subject of Windows dates, you know, the fact that dates in Excel spreadsheets are using a Windows encoding, flawed in multiple ways. Microsoft was quick to announce they took great care of the issue by implementing in Excel 2010 a truely standard date type, namely something that would respect ISO standards for once, i.e. ISO 8601.

Well the implementation is far worse than one can imagine.

  • Excel 2010 does implement cell date types for spreadsheets, but dates are found in multiple other places including in spreadsheet files. So this is not a converging implementation meant to avoid interoperability issues. Rather, it just gives more work to implementers.
  • Second, Excel 2010 does implement cell date types for spreadsheets, but it's off by default. It's an option, and it's unchecked by default. The mechanism is enforced only if the user checks the corresponding flag in advanced Excel options. Expecting users to find and check a vague option hidden in the mass of Excel options is an insult to those who expected Microsoft to do something good at least once. This pretty much guarantees that spreadsheets with truely ISO 8601 dates will never exist out there. Clever Microsoft.
  • Third, Excel 2010 does implement cell data types for spreadsheet files without letting know the file implementer, which means any application that is not ready to read and parse "d" types in cells will break. The implementation could not be further from implicit XML rules. Instead of having a cell child of date type, Microsoft Excel simply implants a date type right where it isn't expected according to the previous implementation (Excel 2007). Application break guaranteed.

The level of hypocrisy from Microsoft is astounding. If you are not speechless already, you should.

19) XML as bad as binary

Yes another flawed claim is that Microsoft pushes XML, this idea that their software is now more XML native. This is simply not true. Let's take a case with spreadsheet where changing an unrelated attribute in a XML part describing the workbook simply corrupts the charts described in other XML parts. It goes against the premise of XML, what makes XML better than a binary file format, which is that modifying an XML fragment somewhere should not in principle affect other XML fragments.

Here is the flaw :

A simple chart created with Excel 2010 beta

The same file in Excel 2010 beta after a minor modification in the workbook part

Notice how the plot area exploded off the boundaries, killing all the automatic positioning of the chart title and the legend.

Here is how to reproduce the flaw :

- start Excel 2010 beta
- create a spreadsheet, add a few values, and create a chart from those values
- save the file as Chart.xlsx
- close and quit Excel 2010 beta
- rename the file as Chart.xlsx.zip
- unzip it and edit the part called xl/workbook.xml
- in this part, replace lastEdited="5" by lastEdited="1", as in :

<fileVersion appName="xl" lastEdited="1" lowestEdited="5" rupBuild="9114"/>

- zip the file again
- double-click on it

A flaw like this anihilates the reason why an OOXML file was broken into "independent" pieces, which otherwise would have allowed template based scenarios. It's not like it's impossible. It's needlessly complicated and reasons for the file to be actually corrupt just based on some tiny modifications simply arise. It's not better than what used to be the problems with binary files.

Let's what the official specification tells us about the lastEdited attribute. In section 3.2.13, page 1897 :

lastEdited (Last Edited Version)

Specifies the version of the application that last saved the workbook. This attribute is application-dependent.
The possible values for this attribute are defined by the XML Schema string datatype.

So what do we learn? Nothing. Furthermore, "application-dependent" is Microsoft novlang for "undocumented". How clever indeed. Good luck if you are an implementer.

Microsoft, knowing that much of the actual file format information is left for one to guess, has published a site that is supposed to document flaws and misses in the documentation. Related to the lastEdited attribute, the additional note lets us know that the integer should be between 0 and 32767. How helpful indeed...

Sunday, May 17, 2009

Microsoft's latest aggression on ODF, codenamed "cast lead"

Stéphane Rodriguez, May 2009

Previous articles :
- Beating a dead horse
- Follow up on Microsoft latest bullshit announcement
- Microsoft latest bullshit : native support of ODF in Office 2007
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

Once again they did it. Microsoft is telling the world that they are improving interoperability across existing office formats and applications thanks to their native support for the ODF file format, a leading office file format based on existing ISO standards. But it could not be further from the truth.

Microsoft are actually killing ODF, like the digital nazis that they are. Kissinger is proud of their spiritual sons.

What kind of white phosphorus are they using ?

First they don't write to ODF but to a canada dry version that we shall call MS-ODF, a variant filled with countless exploding mines, thrown from the air like any coward would do. Namely they are implanting the proprietary Excel formula syntax right inside files expecting the ODF formula syntax as exposed by all the ODF compatible applications out there. Since formulas are used in many elements such as charts, conditional formattings and so on, it wrecks any serious spreadsheet.

Second this canada dry version only barely tries to look like ODF, by implementing a tiny subset of it (listed here for spreadsheets) and thereby making sure that it is a one-way trip only.

Third, and that is where it shows their true spirit, is that no matter how proprietary MS-ODF is, they make sure to read back what they write, and that alone, not what ODF applications write, thus faking the whole round-trip thing with utter lack of respect for actual users facing various files in their daily lives. In other words, it tells them they'd better keep using a Microsoft Office license and make a particular attention to only accepting MS-ODF files as correct or face the wrath of arbitrarily corrupt files. Here Microsoft is building 8-meter tall walls and every single user becomes a Palestinian.

It would not be so bad if Microsoft, the digital nazis of our time, had called this new proprietary file format MS-ODF, and put a new file extension to it, .odf.microsoft, or something like that. But calling MS-ODF .odf is all intentional, and fully budgeted by the neo-cons of the 4th reich, what they really do is steal the .odf brand, across the shell implementation. Which brings me to resistance.

How resistants should fight back the aggressor ? First, by telling the peaceful world the genocide that is being committed. Second, by making sure to distribute ODF applications that better take advantage of shell associations. Microsoft will not tell, but that is their achille heel.

Good luck!

Monday, July 7, 2008

Beating a dead horse

Stéphane Rodriguez, July 2008

Previous articles :
- Follow up on Microsoft latest bullshit announcement
- Microsoft latest bullshit : native support of ODF in Office 2007
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

It's been 3 months since ISO made that April 1st gag when they declared OOXML a valid candidate for an "open standard", even though it's riddled with patents.

Microsoft made the situation even more ridiculous by making available, after April 1st, documents that are absolutely necessary in order to fully implement their file formats. Well, if those documents were not part of the ISO proposal in the first place, then what is the ISO proposal good for? Isn't an "open standard" meant to be implemented by more than one vendor?

Completely, utterly, shamelessly, ridiculous. Typical Microsoft.

Let's get on with the ridicule. Remember the days before April 1st? A day could not pass without a number of so-called independent companies claiming support for OOXML in one way or another, and telling how good it was. Well, since April 1st, it's like not a single freaking person cares about it. Silence. How so? Wasn't it a fraud to begin with?

Let's take the binary migration project that Microsoft launched back in February. Here is a refresher :

The "Office Binary (doc, xls, ppt) Translator to Open XML" project is now live on sourceforge: http://b2xtranslator.sourceforge.net/

As you may remember, this was a request from a number of national bodies, and while Ecma TC45 believed it was outside of the scope of DIS 29500, they did talk with Microsoft and come to this agreement:

Nonetheless, Ecma International discussed this subject with Microsoft Corporation, the author of the Binary Formats. To make it even easier for third party conversion of Binary Format-to-DIS 29500, Microsoft agreed to:

* Initiate a Binary Format-to-ISO/IEC JTC 1 DIS 29500 Translator Project on the open source software development web site SourceForge (http://sourceforge.net/ ) in collaboration with independent software vendors. The Translator Project will create software tools, plus guidance, showing how a document written using the Binary Formats can be translated to DIS 29500. The Translator will be available under the open source Berkeley Software Distribution (BSD) license, and anyone can use the mapping, submit bugs and feedback, or contribute to the Project. The Translator Project will start on February 15, 2008.
* Make it even easier to get access to the Binary Formats documentation by posting it and making it available for a direct download on the Microsoft web site no later than February 15, 2008. The Binary Formats have been under a covenant not to sue and Microsoft will also make them available under its Open Specification Promise (see www.microsoft.com/interop/osp) by the time they are posted.

We will modify DIS 29500 to include an informative reference to the SourceForge project.

While the project is still in its infancy, you can see what the planned project roadmap is, as well as an early draft of a mapping table between the Word binary format (.doc) and the Open XML format (.docx).

How is this project going on? Let's see for yourself on this web page. The project is still a 1st revision source code dump, and it's 4-month old. It's hard not to laugh.

Who thought Microsoft was serious when they started this project? Everyone worth his salt knows that a project like this involves an almost complete rewrite of both engines, and it could take a decade to do so. It's ridiculous to think that a company or independent people would spend their lives essentially rewriting Microsoft Office code base (the non UI part). After all, isn't it what was essentially done already with OpenOffice? Why isn't Microsoft instead pledging support for the OpenOffice suite by helping implement the undocumented stuff? Alternatively, why don't they instead open source their compatibility pack, a component that migrates Office documents back and forth?

It gets better.

Earlier this month, Microsoft released another 5000+ pages of documentation. This additional documentation is a direct acknowledgement that what I have been saying on this blog was spot on, which is that the documentation that was made available earlier was just a fraction of what was needed to implement a full run-time of Office documents. At least Microsoft gives way to a so-called anti-Microsoft person. How ironic that is. Well until you understand that the Microsoft bloggers were actually backing me and my products (diffopc+, xlsgen, ...) until I started becoming vocal and a critic on the subject. In other words, they were backing me when I was saying positive things about it. Those are not technical people, those are stinking marketing people ready to bribe.

If we take a look at the new documentation, it gets a little interesting. First of all, let's say that if you are not one of the 5 persons on the planet who has been involved in hardcore BIFF/Word/PPT/MSO, you are just wasting your time here. This puzzle (smells like a typical PM document) can only be understood by people in the trenches.

Then, Microsoft does the typical thing, they make it almost impossible to use the document by sorting everything alphabetically instead of logically, by theme. This attitude is kind of, realising valuable information while making life miserable to anyone who'll try to read it. Again, you cannot fully comprehend a behavior like this without a good dose of cynism.

Next, this documentation is still lacking plenty of information. For instance, in [MS-XLS].pdf, page 364, we learn :
dwBuild (4 bytes): An unsigned integer that specifies the recalculation engine identifier of the recalculation engine that performed the last recalculation. If the value is less than the recalculation engine identifier associated with the application, the application will recalculate the results of all formulas on this workbook immediately after loading the file.

It is clear that it's up to everyone I guess to figure out the build numbers used by shipped Excel products over the years. Failing to find those numbers implies a recalculation of the spreadsheet next time it's open, needless to say, something you want to avoid at all costs.

Above all, what I find most shameful is that earlier this year was the time Microsoft not only was buying people's voice left and right to obtain the $$ISO$$ rubber timestamp, they were also putting the last touch to Office 2009 (codenamed Office 14). And they have never talked about it. Whatever they have already baked in won't be part of the ISO proposal, making it as useful as an old sock. And doing so, iterating though their fire and motion war strategy which is to come up with new stuff and have competitors spend their time catching up instead of concentrating on their own applications.

The ISO episode hasn't changed things a bit in this regard.

[Update, july 9] : I wrote this blog post on july 7. When I wrote it, the b2xtranslator project I mention was showing a 4-month old developer branch, with just a single submission in it. Guess what happened in just the matter of two days! Monopoly employees have probably urgently asked their contractors to push something new in the developer branch, to avoid the ridicule. Just two days after I mentioned this blatant deception that Monopoly had created in the first place (a project that will never complete anyway given the amount of work that is needed, and the fact that it makes little sense to do it when the Office compatibility pack already does it), the thing gets "fixed". Very interesting...It smells as if Monopoly is extremely sensitive on this subject while the OOXML appeals are being processed.

Tuesday, May 27, 2008

Follow up on Microsoft latest bullshit announcement

Stéphane Rodriguez, May 2008

Previous articles :
- Microsoft latest bullshit : native support of ODF in Office 2007
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

A quick article to follow up on the latest Microsoft bullshit. There is a lot of floating rot in the air at the moment. The blogosphere is breezing with wild rumours, and tech reporters take great care echoing Microsoft PR press pass without a slight of skepticism. I want to take the time to explain what is really going on.

Microsoft has won. They wanted the ISO timestamp. They got it. They needed it since governments (and the EU) want such thing for documents now.

OOXML? It’s just a rough spec, as it has always been with Office file format specs in the past. The share of undocumented stuff in that document is the same than in the past, and I don’t think that it is even intentional. To understand why this is the case, you only have to consider the fact that Microsoft was not prepared to go to ISO when they were working on Office 2007. They got caught by surprise. Then, when government pressure increased, they rushed the spec through with some of the internal documents they had. But it is important to understand that those documents are handy when you have the source code with it, otherwise it’s just a puzzle and any implementer has to go through reverse engineering to really implement anything significant of it. I know this all too well, having been playing this game for many years.

Of course, the ISO process was a farce. With all the money they have, Microsoft went through $$ECMA$$$ rubber timestamp organization, and then ISO, an even more grotesque clique.

The delay on supporting OOXML? If you believe Microsoft is having a hard time supporting their own file formats, I think you did not get the memo. Most changes checked in through the February 2008 BRM (final step of ISO) were just cosmetic ones. Those who blogged on the BRM made it very clear that Microsoft insisted that no change that would break Office 2007 documents would be accepted. Estimation of time needed to implement the "changes" : a few days. Remember also that the Office team at Microsoft is a thousand people group. In fact, those changes are more appropriately called "migration changes", since they are not "breaking changes".

Let's take two examples :
- ISO 8601 dates : ISO 8601 dates were implemented in Excel 2003 already. You can easily try that by saving a spreadsheet file as "XML spreadsheet (.xml)" or "XML data (.xml)" in Excel 2003 or Excel 2007. Microsoft changed their mind in Excel 2007.
- VML : Microsoft will simply put VML under a DrawingML namespace and call it part of DrawingML despite the fact that implementers will continue to be burdened by a redundant graphics file format. In addition, the existing documents containing VML generated by Microsoft Office (MHTML, WordML 2003, ExcelML 2003, PowerpointML 2003, some substantial features in WordML 2007, some substantial features in ExcelML 2007, some substantial features in PowerpointML 2007) guarantee that VML has to implemented by any serious competitor. Worse, with VML and its variants over the years, this is as much variants to implement. This has a name : fire and motion. And the goal is to protect Microsoft Office's bottom line.

The native (and naive) support of ODF in Office 2007? That is a trick to please governments (and the EU) so that Microsoft appears as a nice group of folks, not the digital nazis that they are. The implementation of ODF will be horrible, the inter-application capabilities such as copy/paste will be inexistent. And implementing ODF 1.1 when ODF 1.2 is already available is a move intended to stall ODF forever.

Future of OOXML? There are two answers. Frankly, who freaking cares the paper? This paper will ALWAYS be at odds with the actual Office implementation. We have a good example for the time being, but it has always been the case. What about the actual file format then? It will be the subject of reverse engineering from implementers whose only recourse is to catch up all the undocumented stuff. Make no mistake though, now it's about applications, not documents anymore.

Conclusion : if you are still in the OOXML conspiracy game, about time to move on guys.

Thursday, May 22, 2008

Microsoft latest bullshit : native support of ODF in Office 2007

Stéphane Rodriguez, May 2008

Previous articles :
- Custom XML? What Custom XML?
- Backwards compatible? One more lie by omission
- Bad surprise in Microsoft Office binary documents : interoperability remains impossible
- Typical B.S. in technical articles about OOXML
- The truth about Microsoft Office compatibility
- OOXML is defective by design

I wanted to post a quick reaction to the latest Microsoft bullshit announcement, in which they reportedly plan to "add native support for ODF 1.1". The way they put is very succinct, intentionally probably, and it opens the door for wild guesses.

First of all, Microsoft is a huge Office licensing monopoly. It's so big it even surpasses Windows in sales. Any decline in Office licensing would be dramatic for Microsoft's future. With that alone, you know that any announcement from Microsoft that they are willing to interoperate with other people's software, namely applications, should be taken with a grain of salt.

Here is how, with the release of Office 2007, Microsoft intends to keep their monopoly in Office licensing :

Phase 1 - as long as there is not enough Office 2007 documents out there, make sure that customers understand that only Office 2007 can reliably migrate binary files to the new file formats. Hence the backwards compatibility claim which are part of the OOXML ISO marketing diversion (ironically inflated by critics).

During this phase, which began in 2006 and should take at least 5 years (at least one major organic corporation upgrade cycle), Microsoft bottom line is at risk. The strategy was therefore to make sure to retain the exclusivity when it comes to migrating file formats, and spending money to get this message heard well by the customers. This is exactly what the infuriating OOXML ECMA and then OOXML ISO was about, not anything remotely related to an international standard meant to "improve interoperability across platforms".

Technically speaking, only Microsoft can reliably migrate binary files since only they know the implementation required to do so. Have you noticed that Office 97 shipped 11 years ago, and we have yet to hear about a non-Microsoft application that would strictly interoperate with those files? Besides this, the so-called interoperability documents that were made available back in February 2008 are a farce : this is a bare update of the old MSDN documentation, and everything that was undocumented back in Office 97 days remains exactly as much undocumented. No improvement was made with this, as I explained in a previous article, but sure enough Microsoft exploited it to lure the sheeps who did not actually read it. The real message : keep buying Office 2007 licenses, otherwise you will be hurt. Or, somewhat more verbose, if you deploy non-Microsoft Office software, soon enough you'll get document fidelity issues which will damage your business. We provide the only application that ensures full-fidelity, so it's suicidal to use competing products. In other words, competitor exclusion.

Needless to say, this fire and motion strategy has been going on for two decades. Microsoft adds a number of features in their spaghetti codebase, announces it when it ships the product, then competitors have to catch up (years of work usually) instead of concentrating on their products capabilities. The irony is, with Office 2007 for instance, Microsoft has been themselves guilty of lack of full-fidelity with the chart engine that they replaced with a new one killing a number of existing features of the old one (rendering and chart options).

Phase 2 - there is enough Office 2007 documents out there. Game over.

With that said, a few more words.

In regards to competition between file formats, it has to be understood that the point of Microsoft is to ensure that going forward their internal representation of a file in memory remains what encompasses every other file format. In other words, ODF will become a second-class citizen and it is expected that it will be exploited to downgrade or significally hamper the fidelity of open/save ODF files. Classic Embrace, Extend, Extinguish. Likewise, since Office 2007 is not a native XML application (the internal representation is a bunch of binary structure, not XML DOM), it will never be able to become a basis for ODF-based applications that would really take advantage of XML. In other words, XML itself is being deceptively exploited in order to simply preserve the monopoly.

In regards to scenarios that would be enabled by "adding native support for ODF 1.1", what about external references from an ODS file to a XLS/XLSX file? What about copy/pasting back and forth preserving data, context and formatting? The list goes on and on.

"native support for ODF 1.1" would also imply that the open source projects that were heavily part of the marketing machinery were just that, marketing. Of course, the message was heard by corporate customers since day one (to them "open source projects" means "unsupported"), so there is nothing really surprising here.

Last but not least, with ODF 1.2 just around the corner and its dramatic improvements, there has never been a more strategic time for an enemy of ODF to announce support for ODF...1.1. In case you think the last sentence contradicts itself, consider the fact that the massive Microsoft distribution power amassed from two decades of pushing hard Microsoft Office onto everyone's desktop (reportedly 500+ million licenses) guarantees that whatever version of ODF shipping with Microsoft Office, it will be very hard for those actually contributing to ODF in the OpenOffice project to migrate the user base to whatever next version of ODF, say ODF 1.2. This is how you can make any change to ODF and are unable to get it in the hands of the user base. Yet another win for Microsoft.