Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2008 » March - geek tools and the scholar

Archive for March, 2008

A Brain-Dead Library of Congress Decision

Posted in Technology on March 31st, 2008 by darcusb – 3 Comments

In the “WTF” category, someone please tell me news of the Library of Congress deal with Microsoft to adopt Silverlight is an early April Fool’s joke! A public agency whose purpose is the openness of information adopting a proprietary technology?! And this happens just as the LoC has been doing really good work on better exploiting the open technologies of the web?

I’m too disgusted to say much else, and will be contacting my Congressional representatives on this one.

Drupal, CSL, and Google SOC

Posted in Technology on March 17th, 2008 by darcusb – Comments Off

Ron Jerome has recently started work on a PHP port of CiteProc, for integration with Drupal. This would add the sort of citation processing support one sees in Zotero to Drupal, and potentially any other PHP application.

Having gotten roughly half way through the port, Ron got busy with other responsibilities, like updating his Biblio module for Drupal 6. So, instead, he’s decided to submit a project for consideration in the upcoming Google Summer of Code. It seems the idea piqued the interest of the right people, and it’s now listed among the “official” list of project ideas.

So if you’re a student with good PHP skills and interest in contributing in this space, feel free to apply. Or, if you know someone that might fit the bill, urge them to do so. If accepted, I’ll be a co-mentor, along with Ron.


Posted in Technology on March 9th, 2008 by darcusb – Comments Off

I mentioned RDF and SPARQL in the previous post. I’ve been thinking about SPARQL again in part because of this really cool little IRC bot. From an IRC session:

<bdarcus>   sparqlbot, count graphs
<sparqlbot> bdarcus, I count 75 graphs.
<bdarcus>   sparqlbot, count triples
<sparqlbot> bdarcus, 19888 triples found
<bdarcus>   sparqlbot, load
<bdarcus>   sparqlbot, Scobleizer's contacts
<sparqlbot> 1132 triples loaded in 21.5 seconds
<sparqlbot> bdarcus, I found rael, Biz Stone, Evan Williams, sara, Andy Keep, Krissy, Philip Kaplan, veen, Jason Shellen, Sacca, Scott Fegette, Matt Galligan, Jerry Richardson, Mary Hodder, Brian Walsh, Clint G, Jim Williams, Paul Morriss, Ian Hay, Wayne Sutton, nanek, Ross, caroline, Hunter, Brad Barrish, necrodome, Mack D. Male, Nitin, om, steve epstein, Dav...

So basically, the natural language terms like “count” and “contacts” invoke specialized SPARQL queries, and return the results in natural language form. Really nice, and illustrates a world of possibility!

Learning from the Tumblelog

Posted in Technology on March 9th, 2008 by darcusb – Comments Off

As I’ve been looking into revamping and expanding my personal website, I’ve been interested in the Tumblelog. A traditional weblog essentially has one main object: the post. A post is typically a chunk of (typically) text content, with an author, a title, and so forth. A blog is thus a collection of posts, ordered by date.

A Tumblelog breaks out of the single object box. In addition to the post, depending on implementation you can also have links, people, places, photos, music, and quotes. That content can in turn be assembled from other sources: Delicious feeds, Flickr photo sets, etc.

From this perspective, then, a Tumblelog allows one to weave together a range of different kinds of content. So the date-ordered list can include different kinds of objects, but also these objects can be weaved together even within, say, a post.

So what lessons might this have for a scholar? What ideas might I steal from the Tumblelog, and how might I extend them?

I’d say the general approach goes really far. I think I would probably just get a little more generic. For example, a post and an article have little that distinguishes them, except the view. A draft manuscript isn’t conceptually any different than a draft blog post (unless you wanted to model sections). Notes are really just informal content, but still not really fundamentally different. Citations might be thought of as just a special kind of link.

So in the ideal CMS I am imagining, it would weave together links and associated metadata from Delicious and Zotero 2.0*, images from Flickr, and have a project view that allows me to group content and publications.

But what about the details? How to implement this?

In the world of Django, the approach seems to be to have different models for the different content, and then use a generic relation model to be easily able to weave together the content. So, separate classes/tables, for links, photos, quotes and so forth. This approach seems to work well for Jeff Croft, Wilson Miner, and Nathan Borror.

I have to say, though, that after dealing a lot with RDF, a relational database feels a little claustrophobic: having to define an entire model upfront, and to worry about the consequences of changes later. And while I love the automatic Django admin interface, I’m starting to wonder if it’s really worth all the hassle. For a personal site, it’s not like I’m creating and managing that much structured data.

On the other hand, the (currently PHP-based but soon to include Ruby) Chypr project takes a more generic approach, where there is essentially a single object again, but this can be extended. This makes sense, since projects like Chyrp are designed as both dedicated tools, but also to be easily extended with plug-ins.

But given the straight-jacket restrictions of a traditional relational database, exactly how can one store quotes, and events, and images all in the same table? In the current implementation, it seems that extended data is embedded as XML in the database. Ouch, this just feels wrong! Extended data becomes essentially a second-class citizen.

This seems a perfect place to borrow from RDF, either in whole, or in part. One approach would simply be include an RDF store wholesale, as planned in Drupal. With an example like ARC, you can just have a few tables sit alongside the main application tables, and handle all the flexibility you want. If a plug-in developer wanted to add extended data, they could just register the common data in the post table, but then add the extended triples in the generic RDF tables. Since each post gets a URI, it’s easy to then merge the data.

Of course, this raises the question: why not just go all RDF? If my project, publication, image, etc. metadata are all stored as RDF, then creating a Tumblelog could be a simple SPARQL query away.

I hope to figure this all out soon, as I really want to get this new website up and forget about it!

RefDB and Word-Processor Integration

Posted in Technology on March 7th, 2008 by darcusb – Comments Off

RefDB author Markus Hoenicka discusses work he’s been doing on integrating the application with word-processors like OpenOffice. His argument:

Instead of expecting from all m bibliography tools out there to develop plugins for n word processors, thus placing a burden of maintaining m*n interfaces on the community, each word processor should implement a standardized interface to query and retrieve bibliographic data from any number of bibliography tools, which in turn have to support the same interface.

Indeed; I’ve been saying the same thing for years.

Of course, the devil is in the details, and this is a complicated problem. For example, if you have as your goal standardizing the interface so that it becomes easier for different tools to support a wider range of editors and word-processors, that doesn’t per se solve other problems. For example, it still leaves unanswered:

  1. How are citations encoded in the document?
  2. How are the citations and bibliography processed?

Even more fundamental from a use case perspective, if a user is now free to use different word-processors, are they also free to use different bibliographic data sources? Can they collaborate easily with their co-authors, who may or may not be using the same applications?

Markus’ proposed solution for the standard interface protocol is SRU, which is what we at the OOo bib project have advocated for quite awhile. WRT to my questions above, he has chosen to:

  1. use plain text markup within the document (rather than, say, fields) to encode the citation, using local (not global identifiers)
  2. a script scans the document and RefDB outputs a formatted RTF file

The implementation, then, has some limitations. As with Zotero, formatting is essentially specific to a particular application (and perhaps, even, database instance).

While I think Markus is right about the need for a standard interface, I think to really solve the issues I note above may well require moving more of the data and formatting logic and processing into the word-processor.

So imagine a Python/Perl/Ruby/Java library that was installed within the word-processor, and whose job it was to read standardized citation fields, match it to embedded (RDF in the case of OOo) source data, and to format the fields. So long as compliant applications could send the data in the right form, those documents would then be truly portable.

RDF in Drupal

Posted in Technology on March 6th, 2008 by darcusb – Comments Off

Wondering how RDF might enhance the traditional CMS? Take a look at this recounting (and linked screencast) of a recent keynote on adding semantic web support to Drupal.