Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2005 » November - geek tools and the scholar

Archive for November, 2005

XML Office Taste Test?

Posted in Uncategorized on November 29th, 2005 by darcusb – Comments Off

The Groklaw article seems to have sparked—or at least coincided with—some debate about the technical details of ODF vs. MS XML. One example of that debate is here. As I said in a comment there, I contributed to that article because I was tired of the “but OpenDocument is a nice little format, but not good enough for our needs” argument I keep hearing out of Redmond. I believe it perfectly possible to argue not only that ODF is equal to MS’s format, but superior.

I suppose in considering technical merit you have to have some benchmarks. Top among mine is how easy each format is to transform using standard XML tools like XSLT.

Here might be a good test:

Choose ten random programmers who claim to have XSLT skills. Confirm that at least some of them would consider their skills to be modest; XSLT beginners if you will.

Now, give them 48 hours to write two stylesheets for each format. One that converts from the format to XHTML, and another that converts from that XHTML back to the format. The document in question would be non-trivial; including a variety of different paragraph types, inline styling, footnotes, sections, and images.

Now, compare the quality of results.

My contention is that ODF would win this challenge by a significant margin. Put simply, you will end up with more consistent and better results.

BTW, Dorothea Salo makes a good point that we failed to address: the issue of overlapping tags and well-formedness. Am not sure how big a practical problem it is with Word’s XML.

Anatomy of XML Office Formats

Posted in Uncategorized on November 26th, 2005 by darcusb – Comments Off

I contributed to an article over on Groklaw that compares OpenDocument and Microsoft’s Office 12 XML formats. It might be good to read that alongside a presentation from Daniel Vogelheim (who then worked for Sun, and is responsible for much of OpenDocument) from awhile back [pdf].

The take home point is that OpenDocument is a really well-designed XML format technically, and that the choices its designers made empower users and third-party developers. By contrast, I argue, Microsoft’s design choices focus on ease of use for their internal developers, and result in an often obtuse and difficult to work with format for the rest of us.

While openness does not by definition yield better technical solutions, it often does; certainly in the case of the grand confrontation over XML office formats. If you read the OASIS charter for the OpenDocument Technical Committee, you will find that reuse of existing standards and ease of transformation are central to the very design of the format, and the mission of the group overseeing its continued evolution. Such is not the case for Microsoft’s formats.

New MS XML Strategy?

Posted in Uncategorized on November 22nd, 2005 by darcusb – Comments Off

So awhile back I wrote regarding their XML file formats:

As for Microsoft, they have two choices: open their file formats (e.g. submit them to a standards body, and remove any legal restrictions) to stem the tide toward OD, or add the ability to read and write OD to its suite.

Now MS has appeared to choose something vaguely like the first option, agreeing to submit the formats to the ECMA and then ISO, and modifying their license.

Certainly this looks like a kind of progress (though them choosing to support OD would be better still), but the devil will be in the details. First, I seriously doubt the standards process here will allow any real outside input. This seems quite likely to be more of a ratification process than anything.

I wrote a small little namespaced citation schema that has been approved for inclusion in OpenDocument. There’s no technical reason (legal may be another thing; see below) it couldn’t be included in Microsoft’s file formats too. If this was an open process, perhaps that could well be a positive outcome from my standpoint, certainly for interoperability’s sake. Likewise, I have figured out a way to transplant the logic from my citation style language into OpenDocument. Why not Office XML?

My point above is not necessarily a specific proposal, so much as to give a sense of what most us mean when we talk about openness.

Second, the major issue for me beyond this has always been the licensing details. Here’s Microsoft’s Brian Jones:

In addition to this move towards standardization, we are also going to make some changes to our licensing approach. I’ve definitely heard the concern from folks over the past few months around the licenses. We want to make this issue much simpler as well as address the core concern, which was that some folks thought we might somehow sue people for using the formats. Obviously we don’t want anyone to have that concern, so in order to clear up any other uncertainties related to how and where you can use our formats, we are moving away from our royalty free license, and instead we are going to provide a very simple and general statement that we make an irrevocable commitment not to sue. I’m not a lawyer, but from what I can see, this “covenant not to sue” looks like it should clear the way for GPL development which was a concern for some folks.

Forgive me for harboring some well-deserved skepticism, Brian, but until the license has been vetted by outside lawyers, I’ll reserve judgment. The criterion is really simple: can the MS file formats be implemented by anyone without restriction? All else is just marketing magic.

update: Andy Updegrove has a detailed comparison of the new MS no-sue covenant with the one Sun offered for OpenDocument. His conclusion? That MS not gone nearly far enough in addressing the concerns that open standards advocates have about the legal encumbrances MS has attached to the schemas, and that it fall significantly short of where Sun stands with OpenDocument. Much of the smart commentary from today has come to a similar conclusion; that MS has tried to do as little as possible while trying to use this as a marketing opportunity to convince people (particularly the press) that it has done much more.

Dare on XML Schema

Posted in Uncategorized on November 20th, 2005 by darcusb – Comments Off

From Dare Obasanjo, by way of of Danny Ayers:

After working with XSD for about three years, I came to the conclusion that XSD has held back the proliferation and advancement of XML technologies by about two or three years. The lack of adoption of web services technologies like SOAP and WSDL on the world wide web is primarily due to the complexity of XSD. The fact that XQuery has spent over 5 years in standards committees and has evolved to become a technology too complex for the average XML developer is also primarily the fault of XSD. This is because XSD is extremely complex and yet is rather inflexible with minimal functionality.


I wish more people would stand up and point out that the emperor has no clothes. There is stil intense pressure to choose XSD over superior standards like RELAX NG. And even when groups make the decision to go for the latter (as with say OpenDocument), they often fail to exploit really useful RELAX NG features like interleave or attribute-based validation because those features are unsupported in XSD. These are not features that are useful only in theory, mind you, but are deeply practical in many circumstances (say RDF metadata?).

In CSL, I made no such compromises. My thinking has always been that if XSD really has any future, then at some point it will catch up to having the RELAX NG features I care about, and at that point I can create an XSD version.

Looking for a Few Good C++ Coders

Posted in Uncategorized on November 19th, 2005 by darcusb – 1 Comment

A Sun developer has offered to work with one or two developers on prototyping for the OpenOffice Bibliographic Project. This help will mostly be in the form of technical advice and such, but this is a great offer as OOo is known for being rather big and complex for newcomers to get into.

The problem: we need someone with strong C++ skills, which seem to be rather hard to find, at least in comparison to Python or Java. If you’re interested in contributing, please join the project and post a note to the dev list. Likewise, if you know someone who might have the skill and interest, point ‘em our way. It could be a great student project, in fact.

Leigh Dodds on SPARQL

Posted in Uncategorized on November 16th, 2005 by darcusb – Comments Off

Leigh Dodds has been quiet lately, and has finally come up for air with an impressive collection of SPARQL related work. He describes the work here. Nice job Leigh!

I’d like to pull out one quote from his article:

A key aspect of the Web 2.0 idea is the ability to extract and query information held across many different ad hoc, third party apps, services, or repositories. That ability to move in and among various data sources is key to the Web 2.0 idea of the mashup — take a little Google Maps, salt with some eBay, and sprinkle with a heaping hunk of Flickr, right?

SPARQL, which is both a query language and a data access protocol, has the ability to become a key component in Web 2.0 applications: as a standard backed by a flexible data model, it can provide a common query mechanism for all Web 2.0 applications.

I can’t help but wonder if SPARQL might not pick up traction in the space currently occupied by library-based standards like SRU.


Posted in Uncategorized on November 15th, 2005 by darcusb – 1 Comment

Question for the day: what constitutes an open standards process?

Evidence for deliberation, from the OASIS ODF listserv:

I’ll leave out any editorial comment.

Mapping Existing ODF Metadata to RDF

Posted in Uncategorized on November 14th, 2005 by darcusb – Comments Off

I created a new wiki page, complete with a file upload of the latest proposal from Sun engineers Lars Opperman and Florian Reuter for mapping existing ODF metadata to RDF. If you have time (or know others who might have feedback), please point ‘em here for comment:

This process, BTW, is an experiment of sorts to off-load some of the technical work from the OASIS TC to more open venues. Please help make this a success!

OpenDocument Metadata Proposal

Posted in Uncategorized on November 13th, 2005 by darcusb – Comments Off

The OpenDocument Fellowship has offered to host a mailing list and wiki to develop a proposal for enhanced metadata support in OpenDocument. This would basically formalize the ideas I’ve been discussing here. Suffice to say I’ll need help on some of the technical details, particularly from those knowledgeable on the intersections of XML, RDF and metadata.

Custom Data

Posted in Uncategorized on November 8th, 2005 by darcusb – Comments Off

Brian Jones from MS blogs about the ability to embed custom XML files in their forthcoming new proprietary-ODF-clone file formats:

In Office 12, we’ve introduced a new feature to the formats that we’re currently calling the XML data store, and the way it works is really simple. As you should all know by now, the new format consists of a ZIP file with a bunch of XML parts (files) inside. Up until now we’ve talked about all the parts that we in Office have defined to create our documents. You as a developer also have the ability to add your own parts though. You can take any XML file and put it inside the ZIP package. Then all you need to do is create a relationship from the main document part to your XML part, and the Office applications will roundtrip your XML with the file …

Seems sensible, and just what we at the OpenOffice bibliographic project want to do with ODF and OOo. One area where I think ODF can trump MS here is with the current discussion around using RDF for metadata, providing a consistent model for custom metadata content.