Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages
This application parses a file containing structured queries into BasicDocStream format.

The parameters are:

  1. stopwords: name of file containing the stopword list.
  2. acronyms: name of file containing the acronym list.
  3. docFormat:
  4. stemmer:
  5. outputFile: name of the output file.

The structured query operators are:

Sum Operator: #sum (T1 ... Tn )

The terms or nodes contained in the sum operator are treated as having equal influence on the final result. The belief values provided by the arguments of the sum are averaged to produce the belief value of the #sum node.

Weighted Sum Operator: #wsum (W1 T1 ... Wn Tn)

The terms or nodes contained in the wsum operator contribute unequally to the final result according to the weight associated with each (Wx). Note that this is a change from the InQuery operator, as there is no initial weight, Ws, for scaling the belief value of the sum.

Ordered Distance Operator: #N (T1 ... Tn) or #odN (T1 ... Tn)

The terms within an ODN operator must be found within N words of each other in the text in order to contribute to the document's belief value. The "#N" version is an abbreviation of ODN, thus #3(health care) is equivalent to #od3(health care).

Un-ordered Window Operator: #uwN(T1 ... Tn)

The terms contained in a UWN operator must be found in any order within a window of N words in order for this operator to contribute to the belief value of the document.

Phrase Operator: #phrase(T1 ... Tn)

The operator is treated as an ordered distance operator of 3 (#od3).

Passage Operator: #passageN(T1 ... Tn)

The passage operator looks for the terms or nodes within the operator to be found in a passage window of N words. The document is rated based upon the score of it's best passage.

Synonym Operator: #syn(T1 ... Tn)

The terms of the operator are treated as instances of the same term.

And Operator: #and(T1 ... Tn)

The more terms contained in the AND operator which are found in a document, the higher the belief value of that document.

Boolean And Operator: #band(T1 ... Tn)

All of the terms within a BAND operator must be found in a document in order for this operator to contribute to the belief value of that document.

Boolean And Not Operator: #bandnot (T N)

Search for document matching the first argument but not the second.

Or Operator: #or(T1 ... Tn)

One of terms within the OR operator must be found in a document for that document to get credit for this operator.

Maximum Operator: #max(T1 ... Tn)

The maximum belief value of all the terms or nodes contained in the MAX operator is taken to be the belief value of this operator.

Filter Require Operator: #filreq(arg1 arg2)

Use the documents returned (belief list) of the first argument if and only if the second argument would return documents. The value of the second argument does not effect the belief values of the first argument; only whether they will be returned or not.

Filter Reject Operator: #filrej(arg1 arg2)

Use the documents returned by the first argument if and only if there were no documents returned by the second argument. The value of the second argument does not effect the belief values of the first argument; only whether they will be returned or not.

Negation Operator: #not(T1)

The term or node contained in this operator is negated so that documents which do not contain it are rewarded.

The input query file is of the form:

#qN = queryNode ;
where N is the query id and queryNode is one of the aforementioned query operators. The query may span multiple lines and must be terminated with the semicolon. The body of the query must not contain a semicolon, as that will prematurely terminate the query.

An example query:

#q18=#wsum(1 #sum(Languages and compilers for #1(parallel processors)) 2 #sum(highly horizontal microcoded machines) 1 code 1 compaction );

Generated on Tue Jun 15 11:02:58 2010 for Lemur by doxygen 1.3.4