News
Features
The Lemur Toolkit
Indri Search Engine
Lemur Query Log Toolbar
Lemur Wiki
Download
People
Discussion
Archived Forums
Tutorials
Sign Up

 
CMU - Language Technologies Institute
Carnegie Mellon University
CIIR, University of Massachusetts Amherst
University of Massachusetts
 

The Lemur Project is sponsored by the Advanced Research and Development Activity in Information Technology (ARDA) under its Statistical Language Modeling for Information Retrieval Research Program and by the National Science Foundation.


Note: These tutorials are out of date, please see the Lemur Wiki instead.


Lemur Project Tutorials:
Starting Out

Installing & Compiling: LemurCGI and the Lemur GUI


Contents

  1. Lemur CGI
  2. Lemur Index and Retrieval GUIs

 

Lemur CGI

The Lemur CGI is a CGI executable that runs under a HTTP server (web server) that allows access into indices and general search capabilities.

Beginning with version 4.3 of the Lemur Toolkit, the Lemur CGI is included as part of the site search package, and is built and installed by default on unix-like systems (linux, solaris, OS/X).

The CGI files will be installed in ${prefix}/share/cgi. Copy the contents of this folder to the location accessible via your webserver. Be sure that your webserver configuration will allow executables to be run. Consult your webserver documentation or system administator if you are uncertain how to ensure this.

Before the initial execution, edit the "lemur.config" file (which should stay in the same directory as lemur.cgi) to reflect your configuration.

Configuration Elements

The configuration file is a well-formed XML file with the opening tag <lemurconfig>. There are two required elements within the configuration file:

<templatepath>:
this should reflect the path (either relative or absolute) to the template files.
<indexes>:
this section contains information about what indexes are available, and can contain as many indexes as needed. For each <index> item, there should be two elements. First, a <path> element must be set pointing at where the index is located. Secondly (and optionally), a <description> tag can be set to be a description of the pointed index. The path should be the full path to the index constructed by the crawl-index script.

There is also some optional elements:

<rootpaths>:
this element defines if the original path in the search result exists, then to strip it out of the URL. This is most useful for enabling a site-search capability where there are locally mirrored versions of the indexed web pages. For example, if your local cache of your website is at "/var/cache/mirrored_site/", if you do not have the LemurCGI set to strip paths, the original URLs displayed would include the prefix "/var/cache/mirrored_site/" in front of every result. This option is not necessary for indexes built with the crawl-index script.
<supportanchortext>:
If this element is set to true to, this tells the CGI to include support for retrieval of inlinks if you have used the harvestlinks program to gather these from your corpus. This is the default setting for indexes built with crawl-index.
<querylog>:
This tells the CGI to log every query that is given to it.

 

A sample configuration containing the above elements might look like:

<lemurconfig>
	<templatepath>./templates/</templatepath>
	<rootpaths strippath="true">
		<path>/home/lemur/data/</path>
	</rootpaths>
	<supportanchortext>true</supportanchortext>
	<querylog>./logging/lemurlog.txt</querylog>
	<indexes>
		<index>
			<path>/home/lemur/indexes/sampleIndex</path>
			<description>Sample Lemur Index</description>
		</index>
		<index>
			<path>/home/lemur/indexes/sampleIndex_2</path>
			<description>A Second Sample Lemur Index</description>
		</index>
	</indexes>
</lemurconfig>	
	

Edit the file help-db.html to describe the contents of the text database(s) being searched by the Lemur search engine. You can describe the documents in whatever way you feel is most helpful to your users.

If you wish to use the default HTML templates, no modifications are necessary, but if you want to modify the HTML templates for your own uses, be sure to read the "README_Templates.txt" file for instructions on available commands that you can use within the templates.

The LemurCGI has several classes of functions that allows interactive access into an index. To see the list of functions and a description of what they do, in your web browser, execute "http://[your_path]/lemur.cgi?h=?" where [your_path] is the path (via http) to your lemur.cgi installation. See the online documenatation for more information.


To build from Microsoft Visual C++ .NET 2003 (Version 7.1):

  1. Install the lemur source code when running the lemur installer (choose custom)
  2. Open the Lemur.sln solution in the source directory
  3. Select the LemurCGI project.
  4. Select either Debug or Release mode.
  5. Right click the LemurCGI project and choose build.
  6. Copy the created executable (LemurCGI.exe, typically found in your C:\Program Files\Lemur\Lemur 4.4\src\lemur-4.4\site-search\cgi\Debug folder, or C:\Program Files\Lemur\Lemur 4.4\src\lemur-4.4\site-search\cgi\Release if built in release mode) along with the entire contents of the C:\Program Files\Lemur\Lemur 4.4\src\lemur-4.4\site-search\cgi\bin folder to the location accessible via your webserver. Be sure that your webserver configuration will allow executables to be run. Consult your webserver documentation or system administator if you are uncertain how to ensure this.

Lemur Index and Retrieval GUIs

There are two separate java GUIs for the lemur toolkit, one for indexing, and one for retrieval.

The Lemur indexing and retrieval GUIs come in either precompiled executable JAR files (if you use the Windows installer) , or can be built from the java sources using the "--enable-java" flag; If you install the precompiled JAR files with the Windows installer, you should be able to double-click on the JAR file to execute it.

If your system is not set up to automatically execute JAR files, you can start the GUI from the command line by issuing the command "java -jar LemurIndex.jar" for the indexing application or "java -jar LemurRetrieval.jar" for the retrieval application. Aside from executing the JAR, your system will also need to be able to find the JLemur and JLemurIndexing shared libraries.

If you get the error "java.lang.UnsatisfiedLinkError: no JLemur in java.library.path", it means that the GUI cannot find JLemur.dll (on windows) or libJLemur.so (on Linux/Solaris). On windows, java looks for the shared library in the current path and in paths specified in your PATH environment variable. On Linux-based systems, it looks for the library in directories in your LD_LIBRARY_PATH environment variable.

Lemur Indexing GUI
Lemur Indexing GUI

 


Previous: Lemur Applications Back to TOC Next: Creating a Site-Search for Your Website
[Previous: Lemur Applications] [Back to TOC] [Next: Creating a Site-Search for Your Website]

 


The Lemur Project The Lemur Project
Last modified: June 21, 2007. 09:14:12 am