GenerateSmoothSupport Application

This application generates two support files for retrieval using the language modeling approach. Both files contain some pre-computed quantities that are needed to speed up the retrieval process. One file (name given by the parameter smoothSupportFile, see below) is needed by retrieval using smoothed unigram language model. Each entry in this support file corresponds to one document and records two pieces of information: (a) the count of unique terms in the document; (b) the sum of collection language model probabilities for the words in the document. The other file (with an extra suffix "<tt>.mc</tt>" is needed if you run feedback based on the Markov chain query model. Each line in this file contains a term and a sum of the probability of the word given all documents in the collection. (i.e., a sum of p(w|d) over all possible d's.)

To run the application, follow the general steps of running a lemur application and set the following variables in the parameter file:

  1. index: the table-of-content (TOC) record file of the index.

  2. smoothSupportFile: file path for the support file (e.g., /usr0/mydata/index.supp)

