hvelarde: January 2008

Lightning Talks are among the best investments of time you can make on any Plone Conference: there's always a lot of smart people doing great things.

During the Plone Conference 2007, in Naples, Florian Schulze spoke about schemaxtender, a package that allows you to inject new fields into an Archetypes schema using an adapter. When I saw his talk, I knew I was going to use it for what I had in mind.

As I mentioned before in Mapping NITF into Plone's metadata, at La Jornada we were looking for a way to adapt Plone's standard News Item content type to:

add new fields to it (property, section, urgency and byline)
change fields' order among different schematas to make the edition of new content easier for the publishers

Using schemaextender to accomplish these tasks was so easy that I was really excited when I finished. I spent about 40 hours (in fact, a little bit more after the second release) to read Part 1 of Philip von Weitershausen's excellent book Web Component Development with Zope 3, to understand the way schemaextender works, to find out how to make it work on Plone 2.5, to start using the adapted content with Smart Folders, and even to write some tests for it.

All this work is available in a product called nitf4plone in case you want to try it (be aware this is an beta release). The product works on both, Plone 2.5 and Plone 3.0.

Let's analize the code... but first, a disclaimer: don't try this on Plone 2.5.

Why? schemaextender was written to work with Archetypes version 1.5 or later (that's, Plone 3.0 and later). There's a branch to make it work with Plone 2.5 patched by Erik Rose of the WebLion Project Team at PSU, but it will never be merged into the maintenance one. If you use this branch you are on your own in the event of a bug. Also, have in mind that there's no way back on doing this and after installing schemaextender in Plone 2.5, the adapter will be available for all sites in the instance.

So, yes, do as I say, not as I did.

Having warned you, let's dive a little bit into the code; schemaextender includes 3 types of adapters:

ISchemaExtender lets you can add new fields to a schema
IOrderableSchemaExtender lets you add new fields and reorder them
ISchemaModifier is a low-level hook that allows direct manipulation of the schema

You can find information and examples on all of them on the source code. As you might have expected, I decided to use IOrderableSchemaExtender.

To write the adapter, we need to declare the new fields first; let's take a look to the section field as an example:

class SectionField(ExtensionField, atapi.StringField):
     """Named section of a publication where a news object appear
     """

     def getDefault(self, instance):
          …
          return nitf.default_section

     def Vocabulary(self, instance):
          …
          return atapi.DisplayList([(x, x) for x in nitf.sections])

As you can see, we need to subclass from ExtensionField and StringField; please note that it's mandatory to keep this order. ExtensionField will provide standard accessors and mutators which are not generated on the class. StringField will provide standard attributes and the widget for the field. Also we override getDefault() and Vocabulary() methods to set the default value and vocabulary.

Let's take a look to the adapter's class now:

class NITFExtender(object):
    """Adapter to add NITF fields to News Items
    """
    implements(IOrderableSchemaExtender)
    adapts(IATNewsItem)

    fields = [
        …
        SectionField('section',
        languageIndependent = 1,
        enforceVocabulary = 1,
        required = 1,
        widget = atapi.SelectionWidget(
            label = 'Section',
            label_msgid = 'section',
            description = 'Named section where the news object appear',
            description_msgid = 'help_section',
            i18n_domain = 'nitf4plone')),
    …
    ]

def __init__(self, context):
    self.context = context

def getFields(self):
    return self.fields

def getOrder(self, original):
    # we only need to change the order of the fields in Plone 2.5
    if 'metadata' in original:
        # first we remove the fields from whichever schemata they are
        for schemata in original.keys():
            if 'relatedItems' in original[schemata]:
                original[schemata].remove('relatedItems')
            if 'subject' in original[schemata]:
                original[schemata].remove('subject')

        # now we insert them where we want them to appear
        idx = original['default'].index('property')
        original['default'].insert(idx, 'subject')
        original['metadata'].insert(0, 'relatedItems')
    return original

As I mentioned, our adapter implements IOrderableSchemaExtender. In Plone 3.0 adapters can be registered locally at installation time:

sm = portal.getSiteManager()
sm.registerAdapter(NITFExtender, (IATNewsItem, ), IOrderableSchemaExtender)

In Plone 2.5 we can't have local adapters and registrations aren't persistent, so we have to handle this in a different way:

Here you can see the way news items look after applying the adapter:

schemaextender in action: the news item now contains new attributes and the order of the fields is modified to make the work of the publishers easy.

The code of the adapter is pretty clean and easy to understand.

Having finished it, we followed Mikko Ohtamaa's procedure to adding new fields to Smart Folders search in order to display all news articles for a given section and it worked fine.

Right now we are working on the migration of the content of our site to use the new fields; we are also preparing some templates to display the new information and some CompositePack's viewlets to use them to create the front pages in a better way.

Please let me know if you find this development interesting or if you want to participate in some way.

I want to thank all the members of the Plone community who helped me answering my questions at the #plone channel on IRC and the Product Developers forum, specially Martin Aspeli and Florian Schulze (who helped me with the schemaxtender internals and were really patient with me), Mikel Larreategui and Erik Rose (who helped me with the installer), Wichert Akkerman and Andreas Jung (who are always available answering all sort of questions on the forums).

A question arose today at the Plone general mailing list (a.k.a. Plone-users): it is possible to create a list of related content automatically?

Well, the answer is yes and I'm going to tell you how.

Some time ago Benjamin Saller created a proof-of-concept product called Haystack to do auto-classification of content. Haystack was built around Open Text Summarizer and the haystack_tool included a couple of methods to summarize text and to get a list of "topics" extracted from the content. Haystack also included some portlets to demonstrate its functionality.

We used Haystack in La Jornada for some time with mixed results: the summarizer worked well; we called it to create the description field of our content using Ajax in order to reduce the work of our publishers at edition time.

On the other hand, with the "topics" obtained we were creating a portlet that retrieved the related content. The main problems with this were the low quality of the "topics" and the implementation of the relation. Sometimes we had some embarrassing results relating content from Iraq with some other of, let's say, Shakira, just because they shared some "topic".

Haystack didn't understood the meaning of words and, of course, Ben Saller was aware of that. Last time I saw him was during the Plone Conference 2006 in Seattle. He gave a talk on Haystack 2.0 and he was really excited about its new features: linguistic mapping and automated conceptual mapping, providing high-quality relationships with little or no human effort.

Unfortunately for us, Ben has been a little bit away from the Plone community for some time. So I don't know what's the status on his work.

Going back to the original question in the mailing list, Matt Bowen pointed out to me that Yahoo! has a web service called Term Extraction that does almost the same thing and he even found a python implementation for it.

I tested Term Extraction with some text in Spanish and I was very pleased with the results:

<ResultSet xsi:schemaLocation="urn:yahoo:cate http://api.search.yahoo.com/ContentAnalysisService/V1/TermExtractionResponse.xsd">
    <Result>wong kar wai</Result>
    <Result>stephen frears</Result>
    <Result>festival de cannes</Result>
    <Result>sean penn</Result>
    <Result>25 de mayo</Result>
    <Result>cines</Result>
    <Result>organizadores</Result>
    <Result>evidencia</Result>
    <Result>el presidente</Result>
    <Result>hace mucho tiempo</Result>
    <Result>afp</Result>
    <Result>ya</Result>
</ResultSet>

Implementing this in Plone seems not to be quite complicated: you can trigger a script in a workflow transition, or use Content Rules in Plone 3.0, to fill the Subject field or, better, add an additional field to store this information. Just remember the Term Extraction web service is limited to 5,000 queries per IP address per day.

Yes, I know this solution suffers from the same problems that Haystack, but the "topics" obtained here have better quality and you can always find a better algorithm to do the relation, like testing for more than one "topic" or using only "topics" longer than one word.

Anyway I will put this in my list of pending stuff to test (with a little help of Matt Bowen, of course).

hvelarde

Friday, January 11, 2008

When adapting content types, schemaextender is the way to go

Wednesday, January 2, 2008

Relating content automatically in Plone