Thursday, December 27, 2007

Mapping NITF into Plone's metadata

As I mentioned previously, Plone standard News Items have not enough metadata for using them seriously in a newspaper publication.

When we started using Plone for our breaking news site, we tried to fill the gap using some fancy Web 2.0 features like tags and tag clouds. I was convinced (well, in fact I am still convinced) that online media should face the problem of organizing news in a different way. Unfortunately, the implementation of this solution was problematic from the beginning.

First, tag clouds are wild beasts and you have to learn a lot before trying to implement one (my early version of the TagCloud product proved to have a lot of flaws). Second, social bookmarking in our environment showed a lot of disadvantages like nonexistence of a controlled vocabulary, use of "unclear" tags and spelling errors.

Worst of all, publishers never liked the concept.

A tag cloud was used for navigation in the microsite for the Mexican general election in 2006

I also think many ordinary readers never understood the tag cloud neither because not many newspapers were using it at that time (even today, almost 2 years later, it's difficult to find tag clouds in online versions of traditional media).

At the end we just had to abandon the idea and, obviously, it was quite evident that we needed to extend the functionalities of our site to include concepts like sections and a way to indicate if a new article was more important than another.

In La Jornada we have been using NITF to store news articles for the printed edition's site since some time and it had simplified our work. We needed to bring that experience to the breaking news site.

As you can see in its documentation, NITF defines a lot of metadata to be used on the different stages of a news article life. So, the first thing I did was a mapping between Plone's metadata and NITF elements and attributes:

  • Subject (nitf/head/docdata/key-list): list of keywords; holds a list of keywords about the document
  • Contributors (nitf/body/body.end/tagline): a byline at the end of a story
  • Creation Date (nitf/head/docdata/date.issue): date/time document was issued
  • Last Modified Date (nitf/head/revision-history): information about the creative history of the document; also used as an audit trail (includes who made changes, when the changes were made, and why)
  • Effective Date (nitf/head/docdata/date.release): date/time document is available to be released
  • Expiration Date (nitf/head/docdata/date.expire): date/time at which the document has no validity
  • Language (nitf/body/@xml:lang ): language value governed by RFC3066
  • Rights (nitf/head/docdata/doc.copyright): copyright information for document header

Then, I identified what was the information we were missing:

  • Property (nitf/head/tobject/tobject.property/@tobject.property.type): subject code property; includes such items as analysis, feature, and obituary. In our case we use it to differentiate news articles produced in-house from the ones written by our associates
  • Section (nitf/head/pubdata/@position.section): named section of a publication where a news object appear, such as Science, Sports, Weekend, etc.
  • Urgency (nitf/head/docdata/urgency/@ed-urg): is used to define the importance of a news article (1=most, 5=normal, 8=least)
  • Byline (nitf/body/body.head/byline): container for byline information; it can be unstructured text or structured text with direct specification of the responsible person/entity and their title

After having this in mind, I started looking how to accomplish the task the easier way.

No comments:

Post a Comment