Note: These tutorials are out of date, please see the Lemur Wiki instead.
Lemur Project Tutorials:
Starting Out
Offset Annotations: Indexing a corpus with offset annotations
Telling the indexer to add the annotations:
Once you have your offset annotations file created, indexing your corpus with the annotations is easy.
IndriBuildIndex accepts the parameter annotations within the corpus tag to specify a file containing offset annotations for the documents in a collection. Specified as:
<corpus>
<annotations>/path/to/file</annotations>
</corpus>
<parserName>OffsetAnnotationAnnotator</parserName>
Indexing offset annotations as fields:
For your offset annotation fields to be searchable, you must provide a <field> reference with the name of your annotation tag in the parameter file. This will tell the indexer to be certain to include the various annotation tags as indexable fields.
Using our offset annotation example from the last page, we would want to add the following field definitions to our indexing parameter file:
<field><name>NNP</name></field> <field><name>VBZ</name></field> <field><name>DT</name></field> <field><name>NN</name></field> <field><name>VBN</name></field> <field><name>TO</name></field> <field><name>VB</name></field> <field><name>IN</name></field> <field><name>CC</name></field>
For more about using indexing fields, see the intermediate track's section on field parameters.
![]() |
![]() |
![]() |
| [Previous: Creating an offset annotation file] | [Back to TOC] | [Next: Retrieval with offset annotations] |
The Lemur Project
Last modified: June 21, 2007. 09:14:12 am




