Lemur 4.6 release notes
4.6 (Dec 21, 2007)
- 4.6 includes changes to the keyed file package that are not backwards compatible with previous versions. All KeyfileIncIndexes and indri indexes will need to be rebuilt.
- 4.6 corrects various issues in the 4.5 distribution package, and fixes 20 bugs with the 4.5 release. See the bug fix list below for complete details.
- Applications compiled with the Lemur Toolkit require the following libraries: z, iberty, pthread, and m on linux, and additionally socket and nsl on solaris. Applications built in Visual Studio require the additional library wsock32.lib. The java jar files were built with Java 5 (jdk 1.5.0). We have tested using GCC 3.2 (solaris), 3.2.2(linux), 3.4(linux), 3.4.3(linux x86_64), 4.0.2(linux), VC++ .NET 7.1(Windows XP), and Visual Studio 2005 (Windows XP).
Bugs Fixed (4.5):
-
Problem: BUG# 1756203 -- #uw/#band fail to retrieve all documents.
Solution: Don't cast a signed value to unsigned.
-
Problem: BUG# 1756205 -- DocumentStructureNode loads structure for a
non-existant document.
Solution: Don't try to load the structure for the docid equal to the maximum document in the index.
-
Problem: BUG# 1756210 -- dumpindex merge yields broken index.
Solution: Correctly set the value of maximumDocument when merging indexes in IndexWriter.
-
Problem: BUG# 1756815 -- Unitialized variable in NexiParser.cpp
Solution: initialize the variable.
-
Problem: BUG# 1788458 -- MMRSummApp always generates autoquery
Solution: The test for the query parameter was inverted.
-
Problem: BUG# 1788462 -- _writeFieldList has infinite loop with deleted
documents
Solution: Advance the iterator when a deleted document is encountered.
-
Problem: BUG# 1791557 -- Cluster db type parameter misnamed in docs
Solution: Give the correct parameter name in the documentation string.
-
Problem: BUG# 1792297 -- TFIDFDocRep does not initialize document length
Solution: Initialize document length.
-
Problem: BUG# 1794672 -- merge indexes hangs in infinite loop
Solution: Update the decoded integer before throwing an Exception for a too small buffer.
-
Problem: BUG# 1801211 -- key corruption with prefix compression in
keyfile
Solution: delete_keys should reset the block prefix_lc to 0 when it empties a block. [NB superceded by keyfile package update below]
-
Problem: BUG# 1803680 -- OfflineCluster dies w/ bus error
Solution: The Parameters::get for numeric values did not validate the input string. Have it throw an Exception when given an empty string.
-
Problem: BUG# 1804135 -- workSetFile parameter should be workingSetFile
Solution: Give the correct parameter name in the documentation.
-
Problem: BUG# 1815896 -- URL tokenization breaks harvestlinks
Solution: Insert the untokenized URL into terms before inserting the tokenized URL.
-
Problem: BUG# 1821826 -- Segfault when creating ParsedDocument from Java
Solution: Update the swig typemap to correctly create an input ParsedDocument parameter.
-
Problem: BUG# 1831157 -- Merging KeyfileIncIndex inverted list segments
loses data
Solution: Correctly invert operator<
-
Problem: BUG# 1833118 --Arabic stemmer strips [A-Za-z0-9]
Solution: pass ascii encoded characters below 105 unmolested.
-
Problem: BUG# 1841795 -- Memory corruption when indexing large files
Solution: Make TaggedDocumentIterator more conservative with respect to growing its buffer, reducing memory usage. Add space to the allocations made by DocListMemoryBuilder and DocListExtentMemory builder to ensure sufficient space is available for the new entry.
-
Problem: BUG# 1843450 -- FixedPassageNode causes infinite loop when
nested
Solution: Ensure that one of extents or _subextents is advanced on each iteration of the loop in hasMatch.
-
Problem: IndriTimer.hpp doesn't #define INDRI_TIMER_HPP.
Solution: #define INDRI_TIMER_HPP in IndriTimer.hpp.
-
Problem: ParamGet(String&, double&) returns an incorrect default value.
Solution: Change default from -1 to -1.0 to get proper p->get method called.
Known Problems:
This is a list of bugs and known problems with the current version of Lemur (4.6) and Indri (2.6). Many problems have fixes or workarounds that are posted on the forum. There may also be open bug tickets issued on sourceforge, see https://sourceforge.net/tracker/?group_id=161383&atid=819615 for the complete tickets. Please check there if you do not see something here.
- No known problems currently exist.
Enhancements (4.6):
-
Feature Request 1840290:
PRF at the passage level.
Modify the RMExpander to use the RelevanceModel class, enabling passage level pseudo-relevance feedback. If the query to expand contains an extent restriction, the resultant expanded query will also contain that extent restriction.
-
Feature Request 1760464:
Windows linking fix.
Add the /Zl (Omit default library name) flag to the VS 2005 library project files.
-
Keyfile: keyfile package update.
-
FileClassEnvFactory: add trecchar file class to provide character at a
time indexing of trec format documents.
-
Query evaluation/annotation speed improvements (indri query language):
- The children of UnorderedWindowNode now get sorted at the beginning of evaluation which can lead to a quicker exit from the node evaluation if any of the terms are not found in the children.
- Nodes which can have extent parent, child or sibling interactions now have optimizations (which are rank-preserving) for speeding up queries via an early exit if the parent/child relationship criteria are not met during the query.
-
Heritrix Web Crawler in site search component has been updated to version 1.12.1.
-
Field support for the lemur::api::Index class (to get at indri
fields from the Lemur API in addition to using the Indri API).
-
Java UI Updates: A new tab has been added to both the Lemur Indexing UI and the
Indri Indexing UI to allow the addition of named fields to be built in the index.
Futhermore, there are now options to add offset annotation files for each of the
input data files. There is also a field to allow the addition of anchor text data
(as gathered via harvestLinks). On the Lemur Indexing UI, a new field has also been
added to allow a list of metadata fields to be indexed as well.
The Lemur Project
Last modified:December 19, 2007. 13:59:35 pm

