- Due to the changes necessary for UTF-8
language independent tokenization and byte offset processing for
offset annotations, an indri Repository built
with this version is not backwards compatible with Lemur version
4.1. All indri indexes need to be
rebuilt to ensure proper behavior.
- Applications compiled with the Lemur Toolkit require the
following libraries: z, iberty,
pthread, and m on linux, and additionally
socket and nsl on solaris. Applications build in
Visual Studio require the additional library wsock32.lib.
- We have tested using GCC 3.2 (solaris), 3.2.2(linux),
3.4(linux), 3.4.3(linux x86_64), 4.0.2(linux), and VC++ .NET
7.1(Windows XP).
- Enhancements:
- Lemur 4.2 only
- Additional INEX task activity support, including Ogilvie's
hierarchical shrinkage model. See INEX task support for
information about the INEX task. See IndriRunQuery for
a description of the INEX task output
parameters.
- GCC 4.0 and OS/X build support. The lemur toolkit builds with GCC
4.0.x and on OS/X out of the box. OS/X has received limited testing.
- Support for metadata annotations when constructing an indri
repository. See metadata
annotations for details.
- Offset annotation support updated to permit use of byte offsets,
rather than token offsets in the annotations file.
- UTF-8 encoding parsing support for indri repositories. Includes
tokenization based on UTF-8 character classes.
- Bugs Fixed:
- Problem: SimpleKLRetMethod does not initialize adjustedScoreMethod
parameter
in score function.
Solution: Initialize value to the default.
- Problem: *eos token not indexed by parsers as documented in the
summarization
component.
Solution: Modify the parsers to properly pass the *eos token along the
TextHandler chain.
- Problem: DistRetEval produces bogus output to the rankings file for some
queries.
Solution: Write out only rankings.size() elements to the ranks file, rather
than doccount. If a collection has none of the query terms, there will
be no entry in rankings, and random memory values will be read.
- Problem: DistSearchMethod ignores retrieval method parameter.
Solution: Modify DistSearchMethod to use the retrieval method parameter value
when creating the model.
- Problem: Given a collection split across multiple indri indexes, some
queries
may yield incorrect results.
Solution: Have TermFrequencyBeliefNode return a score for 0
occurrences in the
case where a term is not present in a given index, but does occur in
the
set of indexes used by the QueryEnvironment.
- Problem: In SequentialReadBuffer, if a read greater than the size of the
buffer occurs, only the size of the buffer worth of data will be
read. This causes an underread of the file.
Solution: When calling cache, use the max of length, buffer size,
rather than
buffer size.