Lemur 4.7 release notes

4.7 (Jun 23, 2008)

(Older versions: 1.1, 1.9, 2.0, 2.1, 2.2, 3.0, 3.1, 4.0, 4.1, 4.2, 4.3, 4.4, 4.5, 4.6 )


Bugs Fixed (4.7):

(see SourceForge for the complete tickets):

Bug #Issue
1866819 docPool used unitialized, producing an error on VS2005. Initialize to NULL.
1866820 SimpleKLQueryModel loads OOV terms causing segfault. If an OOV term is put into the model, the retrieval step will fetch the document list for it, receiving a NULL, causing a segfault.
1866821 String.hpp fails to compile with GCC 4.1. The ostream operators need a second declaration outside of the String class and inside the namespace.
1866915 CORI retrieval method produces scores > 1.0. The value of rmax for score adjustment does not take into account the weight of each query term.
1878939 UTF8CaseNormalizationTransformation::transform doesn't copy. The buf_index is not advanced as new characters are copied.
1893631 harvestlinks yields incorrect anchor text due to the HTMLParser not adjusting the token counters after inserting the original URL into the document's terms vector. Subsequent access to the anchor text terms drifts backwards through the term's array as hrefs are processed.
1893632 harvestlinks strips url parameters due to HTMLParser::normalizeURL removing parameters from the url.
1893637 Combiner produces duplicate entries. When running harvestlinks, the Combiner will process the final entry in a file twice, producing duplicated output and incorrect link counts.
1897910 backward indexing empty metadata yields Exception. addDocument needs to test the value to ensure that it is not empty before trying to add it to the reverse lookup table.
1907126 LemurIndriIndex fails to open repository with extension. If the indri repository has a file extension, eg CACM.index, LemurIndriIndex will fail to open the repository, due to stripping the extension from the path.
1911091 Windows lemur.lib missing dependencies in installer. The combined lemur.lib built for the binary install on Windows does not include the contrib libraries: zlib.lib, antlr.lib, xpdf.lib. This can cause linking of projects built against lemur.lib to fail with undefined externals.
1911208 wildcard operator fails on numerics. The query grammar has been updated to allow numerics to appear in wildcard expressions.
1913665 keyfile: **prefix_simple_insert failed in replace_max_key. When using long keys, such as URLS, the size needed to insert a key can be overestimated, causing prefix_simple_insert to fail.
1926060 path queries fail to retrieve documents due to the TagList not setting the parent field for the first tag in the outermost containing scope.
1927219 nexi language queries yield default scores. The ShrinkageBeliefNode is using the wrong Extent constructor, passing a weight of 1, which is being interpreted as the begin. Signature mismatch due to constructor changes in Extent when adding the parent field.
1927244 nexi language queries yield incorrect scores when a query term occurs in the last inner field of the document.
1927493 about can not appear as a term in a nexi query. Add a special case to rawText to enable the ABOUT keyword to appear as a term.
1937678 Adding an index twice to a QueryEnvironment, if an index is added twice to a query environment (a logical error), it is impossible to remove both instances via the removeIndex API. Change addIndex and addServer to silently ignore an entry that has already been added.
1950814 Out-of-bounds memory accesses in Krovetz stemmer. Several suffix rules access word[j-1] without regard to the possibility that j == 0.
1988909 bits/atomicity.h not found for GCC 4.2+. Configure and atomic.hpp updated to find ext/atomicity.h.
1993141 Memory leak in LemurIndriIndex plugged.

Known Problems:

This is a list of bugs and known problems with the current version of Lemur (4.7) and Indri (2.7). Many problems have fixes or workarounds that are posted on the Lemur Forums. There may also be open bug tickets issued on sourceforge, see https://sourceforge.net/tracker/?group_id=161383&atid=819615 for the complete tickets. Please check there if you do not see something here.

Enhancements (4.7):