Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

Query Clarity

This application (QueryClarity.cpp) computes clarity scores for a query model which could be an expanded model based on feedback documents and the original query model using the KL-divergence retrieval method. The original query model can be computed based on the original query text (when the parameter "initQuery" is not set, or set to a null string), or based on a previously saved query model (the model is given by the parameter "initQuery"). If the feedbackDocCount==0 then computs the clarity score only for the original or given query files. Clarity scores for each entire query, and each individual term within each query are written to the file specified by the parameter "expandedQuery".

Feedback can be based on true relevance judgments or any previously returned retrieval results.

Two important notes:


  1. index: The complete name of the index table-of-content file for the database index.

  2. smoothSupportFile: The name of the smoothing support file (e.g., one generated by GenerateSmoothSupport).

  3. textQuery: the original query text stream

  4. initQuery: the file with a saved initial query model. When this parameter is set to a non-empty string, the model stored in this file will be used for expansion; otherwise, the original query text is used the initial query model for expansion.

  5. feedbackDocuments: the file of feedback documents to be used for feedback. In the case of pseudo feedback, this can be a result file generated from an initial retrieval process. In the case of relevance feedback, this is usually a 3-column relevance judgment file. Note that this means you can NOT use a TREC-style judgment file directly; you must remove the second column to convert it to three-column.

  6. resultFormat: whether the feedback document file (given by feedbackDocuments is of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score>. String value, either trec for TREC format or 3col for three column format. The integer values, zero for non-TREC format, and non-zero for TREC format used in previous versions of lemur are accepted. Default: TREC format.

  7. expandedQuery: the file to store the query clarity scores.

  8. feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)

  9. queryUpdateMethod: feedback method, one of:

  10. Method-specific feedback parameters:

    For all interpolation-based approaches (i.e., the new query model is an interpolation of the original model with a (feedback) model computed based on the feedback documents), the following four parameters apply:

    1. feedbackCoefficient: the coefficient of the feedback model for interpolation. The value is in [0,1], with 0 meaning using only the original model (thus no updating/feedback) and 1 meaning using only the feedback model (thus ignoring the original model).

    2. feedbackTermCount: Truncate the feedback model to no more than a given number of words/terms.

    3. feedbackProbThresh: Truncate the feedback model to include only words with a probability higher than this threshold. Default value: 0.001.

    4. feedbackProbSumThresh: Truncate the feedback model until the sum of the probability of the included words reaches this threshold. Default value: 1.

Parameters feedbackTermCount, feedbackProbThresh, and feedbackProbSumThresh work conjunctively to control the truncation, i.e., the truncated model must satisfy all the three constraints.

All the three feedback methods also recognize the parameter feedbackMixtureNoise (default value :0.5), but with different interpretations.

In addition, the collection mixture model also recognizes the parameter emIterations, which is the maximum number of iterations the EM algorithm will run. Default: 50. (The EM algorithm can terminate earlier if the log-likelihood converges quickly, where convergence is measured by some hard-coded criterion. See the source code in SimpleKLRetMethod.cpp for details. )

Generated on Tue Jun 15 11:02:58 2010 for Lemur by doxygen 1.3.4