News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Query Log Toolbar
Lemur Wiki
Download
People
Discussion
Archived Forums
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.


Note: These tutorials are out of date, please see the Lemur Wiki instead.


Lemur Project Tutorials:
Intermediate Track

Indexing: Indexing Overview


After performing any necessary pre-processing to your corpus, you should be ready to index your data. The Lemur Toolkit comes with two applications for index building: IndriBuildIndex and BuildIndex. You can always, of course, use the Lemur Toolkit API to build your own indexer.

The two applications are similar, but do have some subtle differences. First, IndriBuildIndex can only build Indri-style indexes, whereas BuildIndex can build either Indri indexes or KeyFile Indexes. A comparison of the two index types can be found in the Beginning Tutorial Track.

There is also a difference in the way the the parameter files for the two applications are interpreted. For an overview of the parameters to the two applications, see the indexing documentation.

 


  Back to TOC Next: Using IndriBuildIndex
  [Back to TOC] [Next: Using IndriBuildIndex]

 


The Lemur Project The Lemur Project
Last modified: June 21, 2007. 09:14:12 am