- We have tested using gcc 3.2.2, 3.2.3, and VC++ .NET .
- This version is backwards compatible with Lemur version 3.0, except that indexes using keyfiles need to be rebuilt. This includes KeyfileIncIndex, KeyfileDocMgr, and ElemDocMgr. For information about deprecations and other code changes, see the "Code Update" section below.
- New Applications:
- BuildIndex - This application replaces BuildInvertedIndex and BuildKeyfileIncIndex. It is capable of building an Inv(FP) index, a KeyfileIncIndex, or an IndriIndex.
- IndriBuildIndex - This application builds an IndriIndex using the IndriIndex API.
- IndriRunQuery - This application runs retrieval evaluation using the Indri SQL with smoothing options
- IndriDaemon - This application opens an Indri Repository and listens on a socket for requests from an Indri query client, and processes them as they arrive.
- dumpTerm - Convenience application that opens an index and dumps out the DocInfoList (inverted list) for a given term or term ID.
- dumpDoc - Convenience application that opens an index and dumps out the TermInfoList for a given document ID.
- Additions, Enhancements, and other changes:
- New IndriIndex that can handle very large amount of data (ie TREC Terabyte collection). Using IndriBuildIndex, a repository can be built and stored across multiple machines. IndriBuildIndex has trec, web, MS doc, and pdf support. Lemur users with existing TextHandlers can use BuildIndex to build an IndriIndex.
This new index also has a document manager built in so a separate DocumentManager is not necessary to retrieve original documents later. IndriIndex compresses and stores a copy of the original documents in its own repository. - Also new is the Indri Structured Query Language with full fields support, ie searching for titles and dates. This retrieval method can currently be used only on top of an IndriIndex. For more information about this SQL, please view Indri Structured Query Retrieval.
- Inclusion of IndriRetMethod for use with the Lemur Retrieval GUI.
- Improved keyfile class uses machine independent byte ordering.This supports moving indexes that use keyfile,ie KeyfileIncIndex from one machine to another, regardless of which platform it was originally build on.
- Java GUI for Lemur Indexing.
- Addition of solution (Lemur.sln) and project (.vcproj) files for Visual C++ .NET support.
- New IndriIndex that can handle very large amount of data (ie TREC Terabyte collection). Using IndriBuildIndex, a repository can be built and stored across multiple machines. IndriBuildIndex has trec, web, MS doc, and pdf support. Lemur users with existing TextHandlers can use BuildIndex to build an IndriIndex.
- Code Update:
- This version of Lemur has a new directory called parsing. Many of the files that support parsing elements have been moved to this directory from the utility directory, along with new parsing classes for the indri index. Files that remain in the utility directory are for general supporting classes.
- Here is a list of deprecations. These are things that no longer exist in the toolkit:
Deprecated: Replaced with: class BasicIndex
and its supporting classes:
BasicDocInfoList
BasicTermInfoList
IndexCount
IndexProb
Array
Compress
GammaCompress
FastList
FLL
Int
List
LL
MemList
Number
Timer
use another index:
IndriIndex
KeyfileIncIndex
InvFPIndex
InvIndex
cluster/FreqCounter renamed to FloatFreqVector application BuildInvertedIndex application BuildIndex application BuildKeyfileIncIndex application BuildIndex application BuildBasicIndex the BasicIndex has been deprecated.
use BuildIndex.mak windows makefiles use .sln and .vcproj files with .NET
- Here is a list of things that will be deprecated in the near future:
Will Be Deprecated: Replaced with: class InvDocInfo class DocInfo class QueryDocument class StringQuery sequential iteration methods in
DocInfoList and TermInfoList
(startIteration, hasMore, nextEntry)STL style external iterators
- Bugs Fixed:
- Problem: Inv(FP)Index and KeyfileIncIndex try to construct string from NULL when call to term or document is out of range
Solution: Replace creation of NULL string with "" (empty) string
- Problem: StructQueryEval proximity operators fail if no position information is available
Solution: Change operators to execute #band operator and emit warning
- Problem: Keyfile::getNext does not return key length or null terminate the key
Solution: Fix to null terminate key, add 2nd method to return key length
- Problem: FlattextDocMgr and KeyfileDocMgr can't handle file names that contain spaces
Solution: Replace use of >> with use of getline
- Problem: InvIndex::docLengthCounted returns wrong value
Solution: Fix to return correct value
- Problem: QryBasedSampler cannot create new queries
Solution: Fix LemurDBManager to return the same parser with each call
- Problem: InQueryRetMethod SQL ordered distance operator does not work as expected
Solution: Fix syntax error that caused logic error
- Problem: Inv(FP)Index and KeyfileIncIndex try to construct string from NULL when call to term or document is out of range
