Wednesday, February 27, 2008

Typical B.S. in technical articles about OOXML

Stéphane Rodriguez, February 2008

Previous articles :
The truth about Microsoft Office compatibility
OOXML is defective by design



Microsoft has posted an article on their community website related to OOXML. That technical article is supposed to compare OOXML's DrawingML markup for drawing objects and SVG, an open standard (W3C) developed by independent parties for drawing objects.

That article tries to appear neutral and fair, even though the author uses plenty of hyperbole to paint OOXML's DrawingML markup as the best thing since sliced bread, and demonize SVG as a simplistic markup, a point of view strikingly similar to Burton Group's study that painted OOXML as a wonderful file format, and demonized ODF as a simplistic file format. (we know Burton Group is a Microsoft client and business partner).

I recommend to read the pointed article before reading further.

To say it is a mess is a euphemism.

It is supposed to be a technical comparison, but the author never bothers mention what kind of comparison he's trying to make. There are indeed several comparison types :
- strict markup and semantics compatibility
- simple mapping to achieve markup and semantics compatibility
- blocking incompatibilities

Even in the blocking incompatibilies case, since DrawingML and SVG are both based on XML, there cannot be core incompatibilities preventing scenarios such as round-tripping. Indeed, the point of using XML is that you can always add missing markup from one language into another in order to account for the missing markup and semantics.

With that said, in the introduction, the document incorrectly defines SVG, portrayed as "something for the web", and uses this mischaracterization in the conclusion to actually show his affinity towards DrawingML : "a richer and nicer fit for Office documents".

The simple fact that the author uses "Office documents" instead of "Microsoft Office documents" is telling how little the author is willing to take into account the needs of general purpose Office documents, as opposed to a single vendor's Office document model.

What that document actually reflects is the author's own biased knowledge of the domain at hand.


1) The SVG definition : what it's about

According to the author, "Scalable Vector Graphics (SVG) is a graphics file format and Web development language based on XML. With SVG, user can describe the output (i.e. look and feel) of the content. SVG describes the common and extended feature set of today's graphical authoring environments, both tools and program. SVG is a common export format in today's graphical authoring environments. SVG is a language for describing two-dimensional graphics in XML. It allows three types of graphic objects: vector graphic shapes (e.g., paths consisting of straight lines and curves), images and text. With SVG, developers can create Web applications based on data-driven, interactive, and personalized graphics."

On the W3C website (http://www.w3.org/Graphics/SVG/About.html), SVG however is defined as follows :

"SVG is a language for describing two-dimensional graphics and graphical applications in XML. SVG 1.1 is a W3C Recommendation and forms the core of the current SVG developments. SVG Tiny 1.2 is the specification currently being developed as the core of the SVG 1.2 language (comments welcome). The SVG Mobile Profiles: SVG Basic and SVG Tiny are targeted to resource-limited devices and are part of the 3GPP platform for third generation mobile phones. SVG Print is a set of guidelines to produce final-form documents in XML suitible for archiving and printing.

Furthermore,

SVG is a platform for two-dimensional graphics. It has two parts: an XML-based file format and a programming API for graphical applications. Key features include shapes, text and embedded raster graphics, with many different painting styles. It supports scripting through languages such as ECMAScript and has comprehensive support for animation.

SVG is used in many business areas including Web graphics, animation, user interfaces, graphics interchange, print and hardcopy output, mobile applications and high-quality design.

SVG is a royalty-free vendor-neutral open standard developed under the W3C Process. It has strong industry support; Authors of the SVG specification include Adobe, Agfa, Apple, Canon, Corel, Ericsson, HP, IBM, Kodak, Macromedia, Microsoft, Nokia, Sharp and Sun Microsystems. SVG viewers are deployed to over 100 million desktops, and there is a broad range of support in many authoring tools.

SVG builds upon many other successful standards such as XML (SVG graphics are text-based and thus easy to create), JPEG and PNG for image formats, DOM for scripting and interactivity, SMIL for animation and CSS for styling.

SVG is interoperable. The W3C release a test suite and implementation results to ensure conformance.
"


As the definition tells, SVG can be seen as an ongoing work with a core set of markup to which an arbitrary number of so-called profiles can be attached to target specific devices for instance. i.e.

SVG Basic
SVG Tiny
SVG Mobile
SVG Print
...

SVG Basic itself uses fundamental standards such as CSS and Javascript to provide styling as well as programmability.

This definition provides ground for more profiles, such as an hypothetical "SVG chart", a profile that would define a charting package. In other words, SVG is fundamentally extensible to accomodate today's and tomorrow's needs.

Contrary to what the author claims, SVG is not specifically for the web.



2) DrawingML's definition.

In ECMA 376, Part 1

"DrawingML specifies the location and appearance of drawing elements in a package. For example, these elements could be, but are not limited to, shapes, pictures, and tables. The root element of a DrawingML XML fragment specifies the presence of a drawing at this location in the document.
A shape is a geometric object such as a circle, square, or rectangle; a picture is an image presented inside the document; and a table is a two-dimensional grid of cells organized into rows and columns. Cells and whole tables can have associated properties. A cell can contain text, for example. DrawingML also specifies the location and appearance of charts in a package. The root element of a chart part is chart, and specifies the appearance of the chart at this location in the document. In addition, DrawingML specifies package-wide appearance characteristics, such as the package's theme. The theme of a document specifies the color scheme, fonts, and effects, which can be referenced by parts of the document—such as text, drawings, charts, and diagrams—in order to create a consistent visual presentation.
A chart is a presentation of data in a graphical fashion, such as a pie chart, bar chart, line chart, in order to make trends and exceptions in the data more visually apparent.
"

This definition makes it very clear that, in constrast with SVG, DrawingML ties together hardcoded drawing object representations that no matter how they suit a particular vendor's product, entrenches anyone who would rely on DrawingML.

Just like elsewhere in ECMA 376, DrawingML is not based on any ISO standard. For instance, even though SVG can be tied to ISO-based ICC color profiles, used by industrial manufacturers, DrawingML colors are just the reflection of a vendor's own internal API.

No matter how quote, "rich and nice", end quote, this internal API is, it cannot compare with markup that can be adapted to arbitrary independent party profiles.

Since SVG is being built by a number of independent parties, whereas DrawingML is built by a single vendor, comparing SVG to DrawingML only contributes to putting the light on DrawingML proprietary developments. It actually goes against portraying DrawingML as a good thing to have.


3) Comparisons

3.1 Given the extensibility nature of SVG, there is no discussion whether SVG can support all of DrawingML elements. It can. It's a direct consequence of XML-based markup.

Now, that would be for markup compatibility.

3.2 Another comparison type is simple mapping to achieve markup and semantics compatibility. By that, the author would want to know if there is a way to map any DrawingML fragment to any *EXISTING* SVG fragment. But even that draws a few cases : which way, and is it both way?
- which way : if we have a specific markup element in DrawingML that we are willing to represent in existing SVG markup, this task is bound to fail if we are trying to map intricacies of DrawingML as opposed to general purpose drawing markup. For instance, in section 3.1.2.2 of that document, the author tries to match the DrawingML's a:comp markup with an equivalent in the existing SVG markup. And fails to find it. Well, there are good reasons for that. First of all, it puts a severe burden on implementers who have to compute and maintain color stacks (to resolve a:comp into an actual color when it comes to rendering that element), which is probably not desirable on devices where SVG is supposed to be thin and fast. For instance, if SVG is used for printing purposes, the printer driver using the equivalent of a:comp would have to compute and maintain color stacks. It is not sure that it is in either the printer manufacturer's interest to do so (implementation tax), or to manufacturers in general to do so. A preferred way is to use SVG profiles which provide room for selecting explicit colors among alternate lists (CSS based, most notably). What this allows is cheaper device implementation support of SVG based documents.

While this example of the a:comp markup is just one example, the entire document goes like this using the same logic.

- the other way : since DrawingML is hardcoded, an arbitrary number of SVG fragments may not be represented in DrawingML with markup and semantics compatibility. This problem is directly tied to the fact that 1) DrawingML is not based on SVG, it defines his own world 2) DrawingML lacks a number of SVG concepts and therefore cannot directly map markup and semantics. And there is one problem, today's Office 2007 runtime does not allow ANY FOREIGN XML ELEMENT in any of the core ECMA 376 languages, including DrawingML. In other words, if you take a DrawingML fragment, and add a few markup to tag it with information that defines some SVG concept from a SVG fragment, that would be seen by Office 2007 as a case of corrupt document.

- both ways : these are both cases above combined.


3.3 The last case of comparison is strict markup and semantics compatibility. This case is automatically bound to fail since DrawingML was developed by a single vendor and is not based on other standards.



4) Conclusion

It does not demonstrate anything other than :

- there is no direct mapping between DrawingML and SVG, and that has the unfortunate consequence of pointing out the single vendor's proprietary design that went into DrawingML. Should the single vendor have designed DrawingML on top of SVG, the situation would be drastically different.

- the lack of the author's own understanding of the fundamental reason why people use XML markup : in essence you can add all the missing markup from one language into another to satisfy scenarios such as round-tripping (DrawingML ==> SVG ==> DrawingML) without loss.



Some additional background information. The author of that document is Sheela E N, an employee of Sonata software, an Indian consulting company. Sonata software has made a name for themselves since 2006 by contracting with...Microsoft for helping them test a Microsoft-sponsored open source project ODF-OOXML, under the guidance of French-based CleverAge, and a number of other consulting companies. Those ties define a biased source (directly paid) when it comes to discussing Microsoft file formats.