News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Query Log Toolbar
Lemur Wiki
Download
People
Discussion
Archived Forums
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.


Note: These tutorials are out of date, please see the Lemur Wiki instead.


Lemur Project Tutorials:
Starting Out

Offset Annotations: Indexing a corpus with offset annotations


Telling the indexer to add the annotations:

Once you have your offset annotations file created, indexing your corpus with the annotations is easy.

IndriBuildIndex accepts the parameter annotations within the corpus tag to specify a file containing offset annotations for the documents in a collection. Specified as:

<corpus>
  <annotations>/path/to/file</annotations>
</corpus>

in the parameter file. This parameter may be either a single annotations file or the name of a directory containing a separate annotations file for each input file in the corpus path entry. For numeric fields given in offset annotations, the field parameter for the given field needs to specify a different parserName parameter, eg:
 
<parserName>OffsetAnnotationAnnotator</parserName>  

Indexing offset annotations as fields:

For your offset annotation fields to be searchable, you must provide a <field> reference with the name of your annotation tag in the parameter file. This will tell the indexer to be certain to include the various annotation tags as indexable fields.

Using our offset annotation example from the last page, we would want to add the following field definitions to our indexing parameter file:


<field><name>NNP</name></field>
<field><name>VBZ</name></field>
<field><name>DT</name></field>
<field><name>NN</name></field>
<field><name>VBN</name></field>
<field><name>TO</name></field>
<field><name>VB</name></field>
<field><name>IN</name></field>
<field><name>CC</name></field>
  

For more about using indexing fields, see the intermediate track's section on field parameters.

 


Previous: Creating an offset annotation file Back to TOC Next: Retrieval with offset annotations
[Previous: Creating an offset annotation file] [Back to TOC] [Next: Retrieval with offset annotations]

 


The Lemur Project The Lemur Project
Last modified: June 21, 2007. 09:14:12 am