Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

Structured Query Evaluation

This application runs retrieval experiments to evaluate the performance of the structured query model using the inquery retrieval method. StructQueryEval requires that its index parameter be a positional index (KeyfileIncIndex).

Feedback is implemented as a WSUM of the original query combined with terms selected from the feedback documents based on belief score. The expanded query has the form:

#wsum( (1 - a) <original query> a*w1 t1 a*w2 t2 ... a*wN tN )


where a is the value of the parameter feedbackPosCoeff.

Scoring is either done over a working set of documents (essentially re-ranking), or over the whole collection. This is indicated by the parameter "useWorkingSet". When "useWorkingSet" has either a non-zero (integer) value or the value true, scoring will be on a working set specified in a file given by "workingSetFile". The file should have three columns. The first is the query id; the second the document id; and the last a numerical value, which is ignored. By default, scoring is on the whole collection.

The parameters are:

  1. index: The complete name of the index table-of-content file for the database index. This must be a positional index (currently KeyfileIncIndex).

  2. textQuery: the query text stream parsed by ParseInQuery

  3. resultFile: the result file

  4. resultFormat: whether the result format should be of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score&gt. String value, either trec for TREC format or 3col for three column format. The integer values, zero for non-TREC format, and non-zero for TREC format used in previous versions of lemur are accepted. Default: TREC format.
  5. resultCount: the number of documents to return as result for each query
  6. defaultBelief: The default belief for a document: Default=0.4
  7. feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)

  8. feedbackTermCount: the number of terms to add to a query when doing feedback.
  9. feedbackPosCoeff: the coefficient for positive terms in the expanded query.

Generated on Tue Jun 15 11:02:58 2010 for Lemur by doxygen 1.3.4