Annotation support for metadata

IndriBuildIndex accepts the parameter metadata to specify a file containing metadata annotations for the documents in a collection. Specified as <corpus><metadata>/path/to/file</metadata></corpus> in the parameter file. This parameter may be either a single annotations file or the name of a directory containing a separate annotations file for each input file in the corpus path entry.

Annotation File Format

Format of the offset metadata file: 3-column, tab-delimited. From left-to-right, those columns are:

docno
external doc id for document to annotate (string) (e.g. 10)
key
the key/name of the metadata element (string) (e.g. origURL)
value
the value of the metadata element (string) (e.g. http://bla)


The Lemur Project
  The Lemur Project
  Last modified: Wednesday, 14-Dec-2005 12:01:36 EST