Annotation support for metadata
IndriBuildIndex accepts the parameter metadata to specify
a file containing metadata annotations for the documents in a
collection. Specified as
<corpus><metadata>/path/to/file</metadata></corpus>
in the parameter file. This parameter may be either a single
annotations file or the name of a directory containing a separate
annotations file for each input file in the corpus path entry.
Annotation File Format
Format of the offset metadata file: 3-column, tab-delimited.
From left-to-right, those columns are:
- docno
- external doc id for document to annotate (string) (e.g. 10)
- key
- the key/name of the metadata element (string) (e.g. origURL)
- value
- the value of the metadata element (string)
(e.g. http://bla)
The Lemur Project
Last modified: Wednesday, 14-Dec-2005 12:01:36 EST