Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

Indri Query Retrieval

QueryEnvironment Parameters

Retrieval Parameters

index
path to an Indri Repository. Specified as <index>/path/to/repository</index> in the parameter file and as -index=/path/to/repository on the command line. This element can be specified multiple times to combine Repositories.
server
hostname of a host running an Indri server (IndriDaemon). Specified as <server>hostname</server> in the parameter file and as -server=hostname on the command line. The hostname can include an optional port number to connect to, using the form hostname:portnum. This element can be specified multiple times to combine servers.
count
an integer value specifying the maximum number of results to return for a given query. Specified as <count>number</count> in the parameter file and as -count=number on the command line.
query
An indri query language query to run. This element can be specified multiple times.
rule
specifies the smoothing rule (TermScoreFunction) to apply. Format of the rule is:

( key ":" value ) [ "," key ":" value ]*

Here's an example rule in command line format:

-rule=method:linear,collectionLambda:0.2,field:title

and in parameter file format:
<rule>method:linear,collectionLambda:0.2,field:title</rule>

This corresponds to Jelinek-Mercer smoothing with background lambda equal to 0.2, only for items in a title field.

If nothing is listed for a key, all values are assumed. So, a rule that does not specify a field matches all fields. This makes -rule=method:linear,collectionLambda:0.2 a valid rule.

Valid keys:

method
smoothing method (text)
field
field to apply this rule to
operator
type of item in query to apply to { term, window }

Valid methods:

dirichlet
(also 'd', 'dir') (default mu=2500)
jelinek-mercer
(also 'jm', 'linear') (default collectionLambda=0.4, documentLambda=0.0), collectionLambda is also known as just "lambda", either will work
twostage
(also 'two-stage', 'two') (default mu=2500, lambda=0.4)
If the rule doesn't parse correctly, the default is Dirichlet, mu=2500.
stopper
a complex element containing one or more subelements named word, specifying the stopword list to use. Specified as <stopper><word>stopword</word></stopper> and as -stopper.word=stopword on the command line. This is an optional parameter with the default of no stopping.
maxWildcardTerms
(optional) An integer specifying the maximum number of wildcard terms that can be generated for a synonym list for this query or set of queries. If this limit is reached for a wildcard term, an exception will be thrown. If this parameter is not specified, a default of 100 will be used.

Baseline (non-LM) retrieval

baseline
Specifies the baseline (non-language modeling) retrieval method to apply. This enables running baseline experiments on collections too large for the Lemur RetMethod API. When running a baseline experiment, the queries may not contain any indri query language operators, they must contain only terms.

Format of the parameter value:

(tfidf|okapi) [ "," key ":" value ]*

Here's an example rule in command line format:

-baseline=tfidf,k1:1.0,b:0.3

and in parameter file format:
<baseline>tfidf,k1:1.0,b:0.3</baseline>

Methods:

tfidf
Performs retrieval via tf.idf scoring as implemented in lemur::retrieval::TFIDFRetMethod using BM25TF term weighting. Pseudo-relevance feedback may be performed via the parameters below.

Parameters (optional):

k1
k1 parameter for term weight (default 1.2)
b
b parameter for term weight (default 0.75)

okapi
Performs retrieval via Okapi scoring as implemented in lemur::retrieval::OkapiRetMethod. Pseudo-relevance feedback may not be performed with this baseline method.

Parameters (optional):

k1
k1 parameter for term weight (default 1.2)
b
b parameter for term weight (default 0.75)
k3
k3 parameter for query term weight (default 7)

Formatting Parameters

queryOffset
an integer value specifying one less than the starting query number, eg 150 for TREC formatted output. Specified as <queryOffset>number</queryOffset> in the parameter file and as -queryOffset=number on the command line.
runID
a string specifying the id for a query run, used in TREC scorable output. Specified as <runID>someID</runID> in the parameter file and as -runID=someID on the command line.
trecFormat
the symbol true to produce TREC scorable output, otherwise the symbol false. Specified as <trecFormat>true</trecFormat> in the parameter file and as -trecFormat=true on the command line. Note that 0 can be used for false, and 1 can be used for true.
inex participant-id
triggers output of results in INEX format and specifies the participant-id attribute used in submissions. Specified as <inex><particpantID>someID</participantID><inex> in the parameter file and as -inex.participantID=someID on the command line.
inex task
triggers output of results in INEX format and specifies the task attribute (default CO.Thorough). Specified as <inex><task>someTask</task><inex> in the parameter file and as -inex.task=someTask on the command line.
inex query
triggers output of results in INEX format and specifies the query attribute (default automatic). Specified as <inex><query>someQueryType</query><inex> in the parameter file and as -inex.query=someQueryType on the command line.
inex topic-part
triggers output of results in INEX format and specifies the topic-part attribute (default T). Specified as <inex><topicPart>someTopicPart</topicPart><inex> in the parameter file and as -inex.topicPart=someTopicPart on the command line.
inex description
triggers output of results in INEX format and specifies the contents of the description tag. Specified as <inex><description>some description</description><inex> in the parameter file and as -inex.description="some description" on the command line.

Pseudo-Relevance Feedback Parameters

fbDocs
an integer specifying the number of documents to use for feedback. Specified as <fbDocs>number</fbDocs> in the parameter file and as -fbDocs=number on the command line.
fbTerms
an integer specifying the number of terms to use for feedback. Specified as <fbTerms>number</fbTerms> in the parameter file and as -fbTerms=number on the command line.
fbMu
a floating point value specifying the value of mu to use for feedback. Specified as <fbMu>number</fbMu> in the parameter file and as -fbMu=number on the command line.
fbOrigWeight
a floating point value in the range [0.0..1.0] specifying the weight for the original query in the expanded query. Specified as <fbOrigWeight>number</fbOrigWeight> in the parameter file and as -fbOrigWeight=number on the command line.

Generated on Tue Jun 15 11:02:58 2010 for Lemur by doxygen 1.3.4