Lemur User Interfaces

Overview
Tips on building an index for interactive applications
Installing the CGI
Installing the stand-alone GUI

1. Overview

Lemur was originally designed to be used as a research system with batch retrieval. However, we are moving towards supporting more interactive applications. To this end we have a CGI script for web based retrieval and a stand-alone GUI application. These applications still take advantage of the Lemur API and require Lemur indexes.

2. Tips on building an index for interactive applications

Lemur has more than one index, but they are not all best suited for interactive use (although they are all technically compatible). Especially for larger text collections, the speed at which the index loads might be an issue. For this reason, we recommend using the KeyfileIncIndex (.key) or an IndriIndex with the CGI and GUI. The KeyfileIncIndex loads quickly because it does not need to load in all of the document ids and term dictionary. It also requires less memory during runtime because of this.

Another common requirement for interactive use is being able to see the full document text from the results list. In Lemur, you have to build a DocumentManager and associate it with your index to have this functionality. Otherwise, you will still get results, but not be able to open the documents for viewing. You can build a DocumentManager and an Index simultaneously by using the application BuildDocMgr. This application is provided with Lemur.

BuildDocMgr is capable of building a few different kinds of DocumentManagers as well. In this case, we recommend using the ElemDocMgr (specify with managerType elem). This DocumentManager is capable of grabbing separate elements back from the original document if it was tagged and parsed appropriately. This way, when you see your list of results, you can see the document title or headline instead of its ID, which may not be easy to read. This does not change the operation of the programs, only the visualization of the results list. The TrecParser and WebParser included with Lemur can parse and recognize titles enclosed in <TTL> and <TITLE> tags, respectively. The TrecParser also recognizes headlines in <HL>, <HEAD>, or <HEADLINE> tags.

It is important that you build the Index and DocumentManager in such a way that it can later be opened and used from any other directory. To do this, you must specify full path names in your parameter files when specifying the Index and DocumentManager names. You should also use full path names to point to your data files (so that the DocumentManager can find them later), or keep them in the same directory as the DocumentManager. Data files are listed in a file specified by the dataFiles parameter.

3. Installing the CGI

The Lemur CGI is a CGI executable that runs under a HTTP server (web server) that allows access into indices and general search capabilities.

Beginning with version 4.3 of the Lemur Toolkit, the Lemur CGI is included as part of the site search package, and is built and installed by default on unix-like systems (linux, solaris, OS/X).

The CGI files will be installed in ${prefix}/share/cgi. Copy the contents of this folder to the location accessible via your webserver. Be sure that your webserver configuration will allow executables to be run. Consult your webserver documentation or system administator if you are uncertain how to ensure this.

Before the initial execution, edit the "lemur.config" file (which should stay in the same directory as lemur.cgi) to reflect your configuration.

See the ${prefix}/share/cgi/lemur.config file for an example configuration.

The configuration file is a well-formed XML file with the opening tag <lemurconfig>. There are two required elements within the configuration file:

<templatepath>: this should reflect the path (either relative or absolute) to the template files.

<indexes>: this section contains information about what indexes are available, and can contain as many indexes as needed. For each <index> item, there should be two elements. First, a <path> element must be set pointing at where the index is located. Secondly (and optionally), a <description> tag can be set to be a description of the pointed index. The path should be the full path to the index constructed by the crawl-index script.

There is also an optional element <rootpaths> that defines if the original path in the search result exists, then to strip it out of the URL. This is most useful for enabling a site-search capability where there are locally mirrored versions of the indexed web pages. For example, if your local cache of your website is at "/var/cache/mirrored_site/", if you do not have the LemurCGI set to strip paths, the original URLs displayed would include the prefix "/var/cache/mirrored_site/" in front of every result. This option is not necessary for indexes built with the crawl-index script.

Also, there is an element, <supportanchortext> that can be set to true to also include support for retrieval of inlinks if you have used the harvestlinks program to gather these from your corpus. This is the default setting for indexes built with crawl-index.

Edit the file help-db.html to describe the contents of the text database(s) being searched by the Lemur search engine. You can describe the documents in whatever way you feel is most helpful to your users.

If you wish to use the default HTML templates, no modifications are necessary, but if you want to modify the HTML templates for your own uses, be sure to read the "README_Templates.txt" file for instructions on available commands that you can use within the templates.

The LemurCGI has several classes of functions that allows interactive access into an index. To see the list of functions and a description of what they do, in your web browser, execute "http://[your_path]/lemur.cgi?h=?" where [your_path] is the path (via http) to your lemur.cgi installation. See the online documenatation for more information.

To build from Microsoft Visual C++ .NET 2003 (Version 7.1):

Install the lemur source code when running the lemur installer (choose custom)
Open the Lemur.sln solution in the source directory
Select the LemurCGI project.
Select either Debug or Release mode.
Right click the LemurCGI project and choose build.
Copy the created executable (LemurCGI.exe, typically found in your C:\Program Files\Lemur\Lemur 4.6\src\lemur-4.6\site-search\cgi\Debug folder, or C:\Program Files\Lemur\Lemur 4.6\src\lemur-4.6\site-search\cgi\Release if built in release mode) along with the entire contents of the C:\Program Files\Lemur\Lemur 4.6\src\lemur-4.6\site-search\cgi\bin folder to the location accessible via your webserver. Be sure that your webserver configuration will allow executables to be run. Consult your webserver documentation or system administator if you are uncertain how to ensure this.

4. Installing the stand-alone GUIs

There are four separate java GUIs for the Lemur Toolkit, two for indexing, and two for retrieval. As of version 4.3 of the Lemur Toolkit, both are included as optional components in the main distribution. In the Windows installer, the guis are an optional component (select custom install).

If configured with --enable-java, documentation for the Lemur JNI will be installed in <install-directory>/share/lemur/JNIdoc. The file index.html points into the javadoc generated documentation.

If configured with --enable-java, the shared library will be installed in <install-directory>/lib/liblemur_jni.so and the java class files will be installed in <install-directory>/share/lemur/lemur.jar and <install-directory>/share/lemur/indri.jar, for the Lemur and Indri APIS. You will need to add <install-directory>/lib to your LD_LIBRARY_PATH and add the appropriate jar file(s) to your CLASSPATH to use the JNI interface.

Four additional jar files are installed. RetUI.jar provides a basic document retrieval GUI for interactive queries, using the Indri API. IndexUI.jar provides a basic collection indexing GUI for building an indri repository. LemurRet.jar provides a basic document retrieval GUI for interactive queries using the Lemur API. LemurIndex.jar provides a basic collection indexing GUI for building Lemur indexes. All are installed in <install-directory>/share/lemur and can be run with

java -jar <jarfilename>

If you get the error "java.lang.UnsatisfiedLinkError: no lemur_jni in java.library.path", it means that the GUI cannot find lemur_jni.dll (on windows) or liblemur_jni.so (on Linux/Solaris). On windows, java looks for the shared library in the current directory and in directories specified in your PATH environment variable. On Linux-based systems, it looks for the library in directories in your LD_LIBRARY_PATH environment variable.

Lemur User Interfaces

Contents

1. Overview

2. Tips on building an index for interactive applications

3. Installing the CGI

4. Installing the stand-alone GUIs