Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /var/san/www/prod/html/blogs/darcusb/wp-includes/theme.php on line 1109 darcusblog » 2004 » January - geek tools and the scholar

Archive for January, 2004

Semantic Blogging and Structured Search

Posted in Uncategorized on January 30th, 2004 by darcusb – Comments Off

Hot on the heels of Dan’s Linkstacking paper comes Jon Udell’s latest about similar issues. As he puts it:

The next phase of my structured search project is coming to life. For the new version I’m parsing all 200+ of the RSS feeds to which I subscribe, XHTML-izing the content, storing it in Berkeley DB XML, and exposing it to the same kinds of searches I’ve been applying to my own content.

He’s using Tidy to convert the HTML content to XHTML, which is then available to standard XML query and processing tools. Very, very cool!

This is the kind of thing that really needs to be picked up in the scholarly community, something a la the Semantic Blogging Demo, where feeds also contain bibliographic metadata. I’d really like at some point to be able to:

  1. store content and (bibliographic) metadata in the same XML DB
  2. export that content and metadata in a variety of forms, from academic papers, to public blog posts
  3. with respect to blogs, markup a citation and then refer to an ID in a bibliographic record, which is itself embedded as RDF in a feed

Linkstacking

Posted in Uncategorized on January 29th, 2004 by darcusb – 1 Comment

Dan Chudnov has posted a draft of an interesting paper on a new project he’s been working on. In his words:

This informal paper proposes that libraries could merge the functions of weblogging, reference management, and link resolution into a new library groupware infrastructure, helping users to better manage the entire lifecycle of the bibliographic research process. Several scenarios explore how such an application suite might help library users by integrating their bibliographic research more closely with communication – scholarly and otherwise, from private annotation to public discussion. A discussion of related architectural issues suggests a new model of “link routing” to augment “link resolution,” and describes how link routing systems could enable library visitors to become users of our groupware services as much as they already are users of the information resources we procure.

Biblio

Posted in Uncategorized on January 29th, 2004 by darcusb – Comments Off

Mostly as an effort to document my thinking about my needs for bibliographic metadata, I’ve gone back and worked on my biblio schema a bit.

File is here. I still want to break out the person and org stuff into a separate schema that I import.

Below is an example instance. It shows influence from the recent RDF discussions, though I’m not sure that’s a good thing for the element naming or not.

<biblio>
  <item id="one">
    <isCreatedBy role="author">
      <person id="doe">
        <name>
          <given>Jane</given>
          <family>Doe</family>
        </name>
      </person>
    </isCreatedBy>
    <hasTitle>
      <titleMain>A Title</titleMain>
      <titleSub>Subtitle</titleSub>
    </hasTitle>
    <isPartOf type="single-issue">
      <isCreatedBy role="editor">
        <person>
          <name>
            <given>John</given>
            <family>Smith</family>
          </name>
        </person>
      </isCreatedBy>
      <hasTitle>
        <titleMain>Book Title</titleMain>
      </hasTitle>
      <hasOrigin>
        <isPublishedBy>
          <organization>
            <name>
              <fullName>ABC Publishers</fullName>
            </name>
            <address>
              <city>New York</city>
            </address>
        </organization>
        </isPublishedBy>
        <dateIssued year="2002"/>
      </hasOrigin>
      <partDesc>
        <range unit="page">
          <start>23</start>
          <end>45</end>
        </range>
      </partDesc>
    </isPartOf>
  </item>
</biblio>

Presentations: Workflow

Posted in General on January 28th, 2004 by darcusb – Comments Off

For writing presentations, there is perhaps nothing better than a good outliner (maybe emacs + nXML?). I like Keynote, but its performance problems and UI eccentricities are enough to have caused me to give up using it for creating the text of a presentation. It’s simply too frustrating; constantly getting in my way and slowing my thought process down.

I’ve thus gone back to using OmniOutliner, which remains quite elegant in many ways. It has also had XML support for the past year or so, and export to Keynote.

Still, I really wish Omni would get its OO development in higher gear. There is still a lot of improvement possible in this application, and I’ve not seen much of it in the past year.

CVs and XML

Posted in Uncategorized on January 28th, 2004 by darcusb – 3 Comments

Raymond Yee asks another good question:

As I was updating my list my presentations and papers, I started wondering whether there is some XML format I could use to present the information I have in the document. I couldn’t find anything I would end up using at this time…

Given that neither of these specifications seem to do what I wanted, I started pondering the use of MODS…

I’ve thought the same.

Since I’ve managed to convert my syllabi (almost) to XML/XSLT, the CV was the obvious next step, and Raymond’s post pushed me to action. To wit, I’ve created two RELAX NG schemas: one called “cv”, and the other “contacts.” The schema allows you to use MODS data directly in the publications and presentations sections, but having them as external links seems a better idea. Among other things, it allows use of other metadata models.

I’ll post the schemas – and later the XSLT files – when I find time to “finish” them.

Anyone know how to get an XSLT processor to handle data from external online records?

Anyway, here’s a simple example…

<cv dateCreated="2004-01-27">
  <person linkend="darcusb" base="http://www.users.muohio.edu/darcusb/contacts"/>
  <education>
    <degree type="PhD">
      <organization>
        <name>Syracuse University</name>
      </organization>
      <years>
        <start>1997</start>
        <end>2001</end>
      </years>
    </degree>
    <degree type="MA">
      <organization>
        <name>University of Colorado</name>
      </organization>
      <years>
        <start>1995</start>
        <end>1997</end>
      </years>
    </degree>
  </education>
  <publications base="http://www.users.muohio.edu/darcusb/pubs">
    <publication linkend="pb-bd-2004a"/>
    <publication linkend="pb-bd-2003a"/>
    <publication linkend="pb-bd-2003b"/>
    <publication linkend="pb-bd-2000a"/>
  </publications>
  <presentations base="http://www.users.muohio.edu/darcusb/pres">
    <presentation linkend="pr-bd-2003a"/>
    <presentation linkend="pr-bd-2002a"/>
    <presentation linkend="pr-bd-2001a"/>
    <presentation linkend="pr-bd-2000a"/>
  </presentations>
</cv>

LibDB: Bringing RDF and the FRBR to the Masses?

Posted in Uncategorized on January 24th, 2004 by darcusb – Comments Off

The bleeding edge of thinking on bibliographic metadata in the library world is the FRBR. The basic premise of the recommendations is that it is useful to distinguish between different levels of abstractions attached to content. I won’t go into detail on a complex subject which I don’t perfectly understand myself, but in a nutshell, the principles laid out in the recommendations are comprehensive and based on current thinking about how to model metadata that goes far beyond MARC. For this reason it is also seen, at least by me, as rather abstract, and difficult to see the immediate concrete relevance to my needs. This is rather the same complaint many have about RDF.

Well, the new LibDB project seeks to change that. The announcement release describes LibDB as:

An open-sourced Perl/MySQL library and asset management system based on and inspired by the Functional Requirements for Bibliographic Records, triples from the semantic web, and “the end-user doesn’t, and shouldn’t, need to know this stuff”. In English, this means that you’ll be able to smartly and easily catalog your movies, books, magazines, comics, etc. into your own computerized “personal library.”

The immediate impetus is a movie database, but the project is designed to be much broader from the beginning. This could be really interesting. It will support MODS, so in theory it could interact with a citation formatting engine such as Bibliofile.

Here’s my attempt to come to grips with the FRBR, representing a speech, later published as a text in a book. I’m not really sure I have this right, but I suspect I have it more-or-less close.

<work ID="one">
  <isCreatedBy role="speaker">
    <person ID="doej">
      <name>
        <given>John</given>
        <other abbrev="yes">Q</other>
        <family>Doe</family>
      </name>
</person> </isCreatedBy> <hasTitle> <titleMain>Title</titleMain> <titleSub>Subtitle</titleSub> </hasTitle> <isRealizedThrough ID="one-A"> <event> <hasTitle> <titleMain>A Conference</titleMain> </hasTitle> <date>2002-10-10</date> <place>New York</place> </event> <isEmbodiedIn status="published"> <text> <isPartOf> <monograph> <hasTitle> <titleMain>A Book</titleMain> </hasTitle> <hasOrigin> <publisher> <organization> <name> <full>ABC Publishers</full> </name> <place>New York</place> </organization> </publisher> <dateIssued>2003</dateIssued> </hasOrigin> <hasNumbers> <range unit="page"> <start>21</start> <end>34</end> </range> </hasNumbers> </monograph> </isPartOf> </text> <isExemplfiedIn> <location>archive</location> </isExemplfiedIn> </isEmbodiedIn> </isRealizedThrough> </work>

Names (again)

Posted in Uncategorized on January 23rd, 2004 by darcusb – Comments Off

I posted an example to the MODS list of where I’m at in my thinking about how bibliographic name representation ought to be handled in XML. I’m not quite sure how to translate this into RDF, but I doubt it’s hard given that it’s influenced by what I’ve seen in some RDF efforts…

<creator role="editor">
  <person ID="doej">
    <name>
      <termOfAddress>Sir</termOfAddress>
      <given>John</given>
      <other abbrev="yes">Q</other>
      <articular>van</articular>
      <family>Doe</family>
      <termOfAddress>Duke of X</termOfAddress>
      <full>Sir John Q. van Doe, Duke of X</full>
    </name>
  </person>
</creator>

SCO CEO Darl McBride is an Idiot

Posted in General on January 23rd, 2004 by darcusb – Comments Off

I’m being deliberately inflammatory with the title above, but I really don’t know how else to describe my reaction to SCO CEO Darl McBride’s pathetic attempt to link post-9/11-era “red-baiting” to an argument against the GPL. The GPL as “threat to the U.S. information technology industry”? To “our international competitive position”? To “our national security”?

Sigh…. As disappointed as I am to read this sort of thing, I’ll be deeply troubled if the argument ever gets anywhere with the people whose opinion he is trying to sway. Not only is it reactionary, but it’s simply stupid. Clearly the “our” above is the narrow interests of a company that cannot otherwise compete, and is akin to U.S. proponents of free trade who can’t seem to bring themselves to eliminate subsidies to U.S. agriculture.

Bibutils

Posted in Uncategorized on January 22nd, 2004 by darcusb – Comments Off

Chris Putnam has a new web page and a new beta release for the new version of his conversion tools. Aside from a few minor issues, this is a nice piece of work! If you have RIS, Endnote, or BibTeX data and an interest in migrating to (or at least exploring) XML and MODS, give it a try. The more people bang on them with real data, the better the end result.

Note-Taking and Metadata: Acrobat and RDF

Posted in Uncategorized on January 21st, 2004 by darcusb – Comments Off

I recently installed Adobe’s new Creative Suite, and started playing with Acrobat Pro. Among other things, I was quite intrigued to find exportable (presumably Applescriptable) RDF metadata! Here’s an excerpt from an example:

 <rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/&quot; 
   rdf:about="uuid:239e512c-4b5e-11d8-905d-000a959f0e52">
  <dc:format>application/pdf</dc:format>
  <dc:title>
   <rdf:Alt>
    <rdf:li xml:lang="en">Democracy and Political Activism</rdf:li>
   </rdf:Alt>
  </dc:title>
  <dc:creator>
   <rdf:Seq>
    <rdf:li>Jane Doe</rdf:li>
   </rdf:Seq>
  </dc:creator>
  <dc:subject>
   <rdf:Bag>
    <rdf:li>democracy</rdf:li>
    <rdf:li>public sphere</rdf:li>
    <rdf:li>United States</rdf:li>
   </rdf:Bag>
  </dc:subject>
 </rdf:Description>

OK, so I can highlight and annotate PDF-based documents directly. I can now also embed useful (and open) metadata. Any clever ideas on how to tie it together with citation and bibliographic management?

update: So it seems Adobe has a customization API for metadata entry. Anyone out there with experience writing these? I just want to take the standard DC metadata and extend it a bit as outlined in previous posts.