Cross Lingual Retrieval Evaluation

This application runs cross-lingual retrieval experiments.

Parameters are:

sourceIndex: The complete name of the index for the source language collection. This provides the background model for the source language.
targetIndex: The complete name of the index for the target language collection. This is the collection that is searched.
textQuery: the query text stream, in the source language
XLlambda: The smoothing parameter for mixing P(t|D) and P(s|GS).
XLbeta: The Jelinik-Mercer lambda for estimating P(t|D).
sourceBackgroundModel: One of "term" or "doc". If term, background model for the source language is estimated as tf(s)/|V|. If doc, the background model for the source language is estimated as df(t)/sum_w_in_V df(w). Default is term.
targetBackgroundModel: One of "term" or "doc". If term, background model for the target language is estimated as tf(s)/|V|. If doc, the background model for the target language is estimated as df(t)/sum_w_in_V df(w). Default is term.
resultFile: the result file
resultFormat: whether the result format should be of the TREC format (i.e., six-column) or just a simple three-column format <queryID, docID, score>. String value, either trec for TREC format or 3col for three column format. Default: TREC format.
resultCount: the number of documents to return for each query
feedbackDocCount: the number of docs to use for pseudo-feedback (0 means no-feedback)
feedbackTermCount: the number of terms to add to a query when doing feedback.

Simple KL parameters:

smoothSupportFile: The name of the smoothing support file
smoothMethod: One of the four:
- jelinikmercer or jm for Jelinek-Mercer
- dirichletprior or dir for Dirichlet prior
- absolutediscount or ad for Absolute discounting
- twostage or 2s for two stage.
smoothStrategy: Either interpolate for interpolate or backoff for backoff.
adjustedScoreMethod: Which type of score to output, one of:
- "querylikelihood" or "ql" for query likelihood.
- "crossentropy" or "ce" for cross entropy.
- "negativekld" or "-d" for negative KL divergence.
JelinekMercerLambda: The collection model weight in the JM interpolation method. Default: 0.5
DirichletPrior: The prior parameter in the Dirichlet prior smoothing method. Default: 1000
discountDelta: The delta (discounting constant) in the absolute discounting method. Default 0.7.
queryUpdateMethod: feedback method, one of:
- relevancemodel1 or rm1 for relevance model 1.
- relevancemodel2 or rm2 for relevance model 2.

feedbackCoefficient: the coefficient of the feedback model for interpolation. The value is in [0,1], with 0 meaning using only the original model (thus no updating/feedback) and 1 meaning using only the feedback model (thus ignoring the original model).
feedbackTermCount: Truncate the feedback model to no more than a given number of words/terms.
feedbackProbThresh: Truncate the feedback model to include only words with a probability higher than this threshold. Default value: 0.001.
feedbackProbSumThresh: Truncate the feedback model until the sum of the probability of the included words reaches this threshold. Default value: 1.

Parameters feedbackTermCount, feedbackProbThresh, and feedbackProbSumThresh work conjunctively to control the truncation, i.e., the truncated model must satisfy all the three constraints.

Generated on Tue Jun 15 11:02:58 2010 for Lemur by

1.3.4