News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Query Log Toolbar
Lemur Wiki
Download
People
Discussion
Archived Forums
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.


Note: These tutorials are out of date, please see the Lemur Wiki instead.


Lemur Project Tutorials:
Starting Out

Retrieval: Retrieval via the Web Interface


Once the Lemur CGI Web interface is installed and configured, you can issue various commands for querying and for various accesses into the database. If you call the CGI without any parameters, you are presented with an interactive search box that you can issue queries into. By default, queries are searched with a simple "sum" operator, but you can issue Indri Query commands or InQuery search commands to the CGI.
 
Programmatically, you can invoke the CGI interface by calling "lemur.cgi?name=value&name=value&..." etc. The name/value pairs are processed from left to right, in order.
 
A list of the various parameters is as follows:

Parameter Function Description
termstats=<term> prints corpus statistics for term This command will return the total number of times the term is used in the corpus and the total number of documents that the term occurs in.
datasource=n sets the database to the n'th database (index) This command sets the database to use for the current call
listdatasources lists the available databases (indexes) This will return a listing of the valid index IDs and the descriptions
datasourcestats=n displays the statistics for the database ID This will display statistics for the given database such as the number of documents, the number of words, the number of unique terms, and the average document length
getdocext=<string> fetches the document with external id <string> This will return the unparsed document from the external ID given
setoutput=debug sets the CGI interface to Diagnostic mode This causes all output to be in plaintext.
setoutput=interactive sets the CGI interface to Interactive mode This is the default mode which allows interactivity
setoutput=program sets the CGI interface to Program mode This causes all output to stream back without regard to formatting
help prints the help message
getdoc=<integer> fetches the document with internal id <integer> Returns the unparsed document to the user
getparseddoc=<integer> fetches the parsed form of the document with internal id <integer> Returns the parsed document (generally a bag of words) to the user
getterm=<string> shows the lexicalized (stopped and stemmed) form of <string>
maxresults=x sets the number of documents to retrieve to x
query=<string> uses the query <string> to search the database The query can use the Indri or InQuery command language (see parameter "t" below to set the query language type)
start=n starts the query results at rank n
querytype=<query_type> sets the query type to use. Can be one of "indri" (default) or "inquery"
invlist=<term> returns the inverted list for term Use term.field for field specific list
invposlist=<term> returns the inverted list for term with positions Much like the lowercase 'v', but also the positions that the term occurs in within the document. Use term.field for field specific list

 


Previous: Retrieval via GUI Back to TOC Next: Batch Retrieval
[Previous: Retrieval via GUI] [Back to TOC] [Next: Batch Retrieval]

 


The Lemur Project The Lemur Project
Last modified: June 21, 2007. 09:14:12 am