Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2009 » May - geek tools and the scholar

Archive for May, 2009

Google Wave Free Association

Posted in Teaching, Technology on May 29th, 2009 by darcusb – Comments Off

So Google’s announcement of Wave seems like a big deal. Rather than the typical deep and thoughtful post, which I’m sure others have done, some random thoughts:

  • from what I can tell, unlike much of Google’s current application infrastructure (GMail and Docs), the code will be open source; this would be really big
  • also unlike the current apps, Wave is distributed; this is also a really big deal
  • the collaborative document-editing seems wicked cool, and goes a fair bit beyond Docs
  • I absolutely cringe when I hear the phrase rich text in 2009, particularly when Google has so failed to get the basics of structured documents right in Docs
  • OTOH, HTML 5 provides some room for them to improve this if they put their mind to it
  • would really love if the extension mechanism was rich enough to allow integration of citations (say a Zotero extension; though perhaps something more distributed), and flexible enough to do it right (which by definition means not based on bibtex)
  • This has potentially big, though as yet unclear, implications for higher education, and for the sort of work that happens these days in LMSs.

On the Inclusion of BibTeX in HTML5

Posted in Technology on May 20th, 2009 by darcusb – 9 Comments

As part of the HTML5 effort, editor Ian Hickson has proposed a new way to encode structured data in HTML. Ian has since included within the proposal encodings of various widely used standards to describe events, contacts and citations. These vocabularies have normative status within the proposed spec, and have a privileged place within the DOM.

On the last use case, he has chosen BibTeX, on the basis that it is widely used and simple to author and process. Ian and I have chatted about this via email. To summarize my thoughts, then, I would like to argue against the inclusion of BibTeX based on the following points:

  1. BibTeX is designed for the sciences, that typically only cite secondary academic literature. It is thus inadequate for, nor widely used, in many fields outside of the sciences: the humanities and law being quite obvious examples. For this reason, BibTeX cannot by default adequately represent even the use cases Ian has identified. For example, there are many citations on Wikipedia that can only be represented using effectively useless types such as “misc” and which require new properties to be invented.
  2. Related, BibTeX cannot represent much of the data in widely used bibliographic applications such as Endnote, RefWorks and Zotero except in very general ways.
  3. The BibTeX extensibility model puts a rather large burden on inventing new properties to accommodate data not in the core model. For example, the core model has no way to represent a DOI identifier (this is no surprise, as BibTeX was created before DOIs existed). As a consequence, people have gradually added this to their BibTeX records and styles in a more ad hoc way. This ad hoc approach to extensibility has one of two consequences: either the vocabulary terms are understood as completely uncontrolled strings, or one needs to standardize them. If we assume the first case, we introduce potential interoperability problems. If we assume the second, we have an organizational and process problem: that the WHATWG and/or the W3C—neither of which have expertise in this domain—become the gate-keepers for such extensions. In either case, we have a rather brittle and anachronistic approach to extension.
  4. The BibTeX model conflicts with Dublin Core and with vCard, both of which are quite sensibly used elsewhere in the microdata spec to encode information related to the document proper. There seems little justification in having two different ways to represent a document depending on whether on it is THIS document or THAT document.
  5. Aspects of BibTeX’s core model are ambiguous/confusing. For example, what number does “number” refer to? Is it a document number, or an issue number? [note: it's actually both, depending on context; in a report it's the former, while in an article it's the latter]

My suggestion instead?

  1. reuse Dublin Core and vCard for the generic data: titles, creators/contributors, publisher, dates, part/version relations, etc., and only add those properties (volume, issue, pages, editors, etc.) that they omit
  2. typing should NOT be handled a bibtex-type property, but the same way everything else is typed in the microdata proposal: a global identifier
  3. make it possible for people to interweave other, richer, vocabularies such as bibo within such item descriptions. In other words, extension properties should be URIs.
  4. define the mapping to RDF of such an “item” description; can we say, for example, that it constitutes a dct:references link from the document to the described source?
The result would be something more consistent, general and extensible, while also still being easy to author and process. From a DOM perspective, we’re just talking about things like ref1.type returning a URI rather than doing ref1.bibtex-type that returns a string, and accessing a periodical title like ref1.isPartOf.title rather than ref1.journal (which of course doesn’t work for newspapers, or magazines, or court reporters, or weblogs, all of which have the exact same characteristics: they’re publications of sorts).

A Home NAS and Backup Solution

Posted in General, Technology on May 18th, 2009 by darcusb – Comments Off

So I’ve for awhile now been thinking I need to get more serious about a storage and backup solution for my personal and household data. After casually looking around at alternatives, I finally decide on a solution. I effectively took this information about hardware, with this and this information about using OpenSolaris and ZFS for software, and now have 1 TB of mirrored networked storage (and automated snapshots when I get to it), all for less than $500.

It was far more of a PITA getting OpenSolaris running as I wanted than I’d hoped, but I think the end product is both better and cheaper than the commercial alternatives.

HTML 5 Microdata Use Cases

Posted in Technology on May 10th, 2009 by darcusb – Comments Off

I mentioned in my previous post on the HTML 5 microdata draft that it included a use case from me; it’s this one:

A scholar and teacher wants other scholars (and potentially students) to be able to easily extract information about who he is to add it to their contact databases.

This is close to my description, but significantly narrower. Compare my words:

I want to be able to add structured data to my web site to denote who I am, what I have published, and what I teach in such a way that other scholars (and potentially students) can easily extract that information to add it to their contact databases, or to their bibliographic applications, or whatever. This involves contact data, for sure, but also other, domain specific, data, as well, and so presumes a flexible and extensible model and syntax.

The distinction is important because it makes clear that fixed encoding formats like hCard are not close to adequate; this is not just about a one-size-fits-all profile format, nor about possible integration into one particular kind of application (a contact database).

HTML5 Microdata Proposal

Posted in Technology on May 10th, 2009 by darcusb – 4 Comments

I’ve been following the discussion about extensible metadata in HTML 5 from afar, not really having the time to get any more involved. The bottom line for one of the primary use cases I provided was, can I represent what’s embedded in my home profile and publications pages? This isn’t just about data relating to me and my pages, but linking them to other data, elsewhere. For example, I will be changing my subject pages to link to the new Library of Congress id service, such as subject headings. Can I do that in HTML 5?

The group (well, let’s be real, Ian Hickson) released a first draft of a proposal today. I haven’t really looked at it carefully and thought through all the implications, but my initial take is it seems an attempt to split the difference between RDFa and microformats. So one can encode metadata properties, for example, using either plain string tokens (the microformat way), or using URIs (the RDF/RDFa way). I might well prefer to use RDFa, but perhaps with some tweaks, the microdata proposal might well allow the most important pieces of RDFa. At least I hope so.

But there are places where there seem some arbitrary restrictions. For example, I see no way to define a microdata item’s identity as anything but local to the document (the spec only allowing local IDs; not global URIs). If I have that right, that’s a critical and arbitrary flaw, and needs to be changed.

And, as Shelley Powers points out, it’s really, really strange and arbitrary to allow one to use a “reversed DNS identifier” as a global identifier alternative to an HTTP URI, but not allow other prefix mechanisms (such as CURIEs), particularly when the common argument against namespace prefixes in general and CURIEs in particular if they are too difficult. I’d rather see all three, or only URIs.

Finally, the “item” attribute is odd. It’s effectively equivalent to the RDFa typeOf attribute, in that it allows one to type the related properties. But then a) why not just call it typeOf?, and b) related to my point about identity above, the notion of an “item” is quite ambiguous, and seems to confuse identity and type.

I’d really love if the relevant open-minded experts in this space could find time to have a f2f meeting over this proposal, and iron out these sorts of details.

Thomson Reuters Wants Your Name

Posted in Technology on May 7th, 2009 by darcusb – 1 Comment

I recently learned that, as part of their lawsuit regarding Zotero, Thomson Reuters has successfully forced GMU to release the contact information for all 286 people who have SVN and Trac accounts at

I don’t personally care, because I’m sure these lawyers already know my name. But this seems nothing more than yet more thuggish intimidation.

New Laptop

Posted in Technology on May 6th, 2009 by darcusb – 3 Comments

It’s hard not to notice MIcrosoft’s new add push against Apple. The punchline is that buying a “PC” (the ads never mention Windows, oddly enough) tends to give a consumer more choice and better value compared to buying a Mac.

As a longtime Mac user, I tend to agree. Except the logical extension to the argument is to point out that Windows isn’t the only non-Mac OS in town, and that Linux-based alternatives such as Ubuntu offer the same value proposition: more choice and better value (not to mention “free’).

So it’s with that thinking in mind that I finally bought a new laptop after casually looking around for something to replace my aging Mac iBook G4. I wanted a machine with the following characteristics:

  1. good battery life
  2. good screen
  3. excellent keyboard (since I intend to use it for writing and notetaking)
  4. light weight
  5. rugged
  6. inexpensive
  7. decent performance

I seriously considered one of the recent larger netbooks, but ultimately went with a Thinkpad X61s. I got a refurbished model for less than $700 direct from Lenovo, complete with 3 GB of RAM, a 9-cell battery, and a free bag.

Is it quite as elegant from a design standpoint as a Mac alternative? Not in the least! But despite being last year’s model, it’s really fast, it’s really light, it has very good battery life (haven’t really tested it, but I expect to get over four hours of real use out of it), and a great keyboard and screen. It’s also really nicely built.

So what about the OS? Some version of Linux was clear (I did boot into Vista at first in order to prepare the USB boot image, but subsequently wiped it out completely; good riddance), but which one? I started out with Arch, but gave up when I couldn’t establish a network connection to finish the basic installation. I then moved over to Ubuntu, which installed and configured without a hitch; everything simply worked: wireless network connection, suspend and wake, etc., etc.

But one thing I really like about Mac OS X is the design aesthetics. There’s something nice about working in a beautiful environment. Sadly, Ubuntu is not that for me. But xubuntu, on the other hand, is right up my alley! So a quick addition of the xubuntu packages and I’m happy.

The only thing that makes me a little hesitant to do a wholesale switch off of the Mac OS is it’s superior support in the image editing arena. If and when GIMP catches up to the ease-of-use and resolution-independent editing of Lightroom and Aperture, that will probably be it for me.

RDFa, Microformats and HTML 5 QOTD

Posted in General on May 5th, 2009 by darcusb – Comments Off

Shelley Powers, on a rather typical IRC conversation on RDFa in HTML 5:

Unfortunately, too many people who really don’t know data are making too many decisions about how data will be represented in the web of the future.
As usual, Shelley nails it.