On the command line, type in the following commands to unpack the package. This should create a directory named lemur-4.0.
> gunzip lemur-4.0.tar.gz > tar -xvf lemur-4.0.tar
Go to directory lemur-4.0 and run the configuration script configure. This will generate a file named "MakeDefns", which has some customized definitions to be used in makefiles. configure accepts the following arguments:
--enable-distrib compiles and installs the distributed retrieval components. Default is disabled.
--enable-summarization compiles and installs the summarization components. Default is disabled.
--enable-cluster compiles and installs the clustering components. Default is disabled.
--enable-assert Enable assert statements in the code. Default is disabled.
--prefix=Specifies the directory for the installed toolkit. Default is /usr/local.
For example, to configure Lemur with the default libraries:
lemur-4.0>./configureOr to configure Lemur with some modules:
lemur-4.0>./configure --enable-distrib --enable-summarization
With directory lemur-4.0 as the current working directory, type in "make". This will compile the whole Lemur toolkit and link all the Lemur applications.
lemur-4.0> make
After compiling Lemur, type in "make install". This will install the Lemur library and include files according to the directory specified by the prefix option of the configure script.
For example:
lemur-4.0> ./configure --prefix=/usr0/mydir-for-lemur lemur-4.0> make installwill create /usr0/mydir-for-lemur/lib/liblemur.a and install the header files in /usr0/mydir-for-lemur/include/. The application executables will be all in /usr0/mydir-for-lemur/bin.
For users who are only interested in using Lemur as a library and application suite, the original source tree (i.e., the lemur-4.0 directory) can be removed after this step.
We have dropped the support for any version of gcc older than 3.0. Solutions to some problems with installing Lemur have been posted on the Lemur Forum.
Download and install the lemur toolkit with the windows executable installer. The lemur applications, libraries, and include files will be installed in the selected target directory (default is C:\Program Files\Lemur\Lemur 4.0), with the applications in the subfolder bin, the library lemur.lib in the subfolder lib, the include files in the subfolder include, and the data files, including the folders for the Krovetz and arabic stemmer data, in the data subfolder. The installer can add the bin subfolder to the search path to facilitate running the lemur applications.
After installing the lemur toolkit, you can use the library by adding the subfolder include of the target directory to the C/C++ General Additional Include Directories property for your project, eg: C:\Program Files\Lemur\Lemur 4.0\include and adding the subfolder lib of the target directory to the Linker General Additional Library Directories property for your project, eg: C:\Program Files\Lemur\Lemur 4.0\lib and adding lemur.lib and wsock32.lib to the Linker Input Additional Dependencies property for your project. If your project is configured as Debug, you should choose the Multi-threaded Debug (/MTd) runtime library. If your project is configured as Release, you should choose the Multi-threaded (/MT) runtime library. The lemur library and applications were built in Release mode using Multi-Threaded. You should have C/C++ Language Enable Run-Time Type Info set to yes.
The installer can optionally install the full lemur toolkit source tree, placing it in the src subfolder of the target directory. That folder contains the Visual Studio solution file Lemur.sln. There is a separate project file for each library and for each application in Lemur.
By default the project configurations are built in "Debug" mode. To change this so that it compiles with fewer warnings and runs at higher efficiency, change the configuration setting in the "Build" menu. Then choose "Configuration Manager". In the menu for "Active Solution Configuration", choose "Release".
When built from source, there is a separate library for each of the sublibraries in lemur.lib and no lemur.lib is produced by the solution.
The executables for Lemur applications are generated in the directory app/obj; they will be copied to prefix/bin (as configured by configure) after running "make install'.
The usage for different applications may vary, but most applications tend to have the following general usage.
Create a parameter file with value definitions for all the input variables of an application. Parameter files are in XML format. The top level element in the parameter file is named parameters. For example,
<parameters> <dataFiles>/usr3/web/sourcelist</dataFiles> <index>/usr3/web/myindex</index> <indexType>inv</indexType> <memory>128000000</memory> <docFormat>web</docFormat> <position>true</position> </parameters>
In general, all the file paths must be absolute paths in accordance to your operating system. Lemur does not have the capability of searching for files along different paths.
Run the application program with the parameter as the only argument, or the first argument, if the application can take other parameters from the command line. Most applications only recognize parameters defined in the parameter file, but there are some exceptions.
For example, if the parameter file above is named buildparam in the directory /usr3/web, then just do:
/usr3/web> BuildIndex buildparam
Most applications will display a usage or a list of required input variables, if you run it with the "--help" option. For more information about the specific applications and their parameters, please see Lemur Modules and Applications .
The directory has three some test scripts, including test_indri_index.sh, test_pos_index.sh, test_key_index.sh,and test_struct_query.sh. The test index scripts use the specified indexes and demonstrates most of the functionality of Lemur, i.e., from formatting a database, building an index, to running various kinds of retrieval experiments. clean.sh cleans up any files generated by any of the testing scripts. For more information about the indexes and how they differ, please see the indexing guide.
Your output should not be too different from the output contained in the sample output files Roundoff error should only lead to minor deviations from these results.
Basically, the scripts would start from a source database file and a query file with some simple SGML format, and build an index of the database and a support file that is necessary to make some retrieval algorithms fast, and then, they will run different retrieval experiments with different parameter files. The retrieval results are evaluated with a perl script ( ireval.pl ) in the app/src directory. A precision recall summary file is generated for each experiment.
You can try to change some of the settings in the parameter files and see how it will affect the retrieval performance.
The Lemur Project
Last modified: Tuesday, 26-Jul-2005 20:41:20 EDT