Lemur 3.1 release notes


 
(Older versions: 1.1, 1.9, 2.0, 2.1, 2.2, 3.0 )

  • We have tested using gcc 3.2.2, 3.2.3, and VC++ .NET .

  • This version is backwards compatible with Lemur version 3.0, except that indexes using keyfiles need to be rebuilt. This includes KeyfileIncIndex, KeyfileDocMgr, and ElemDocMgr. For information about deprecations and other code changes, see the "Code Update" section below.

  • New Applications:
    • BuildIndex - This application replaces BuildInvertedIndex and BuildKeyfileIncIndex. It is capable of building an Inv(FP) index, a KeyfileIncIndex, or an IndriIndex.
    • IndriBuildIndex - This application builds an IndriIndex using the IndriIndex API.
    • IndriRunQuery - This application runs retrieval evaluation using the Indri SQL with smoothing options
    • IndriDaemon - This application opens an Indri Repository and listens on a socket for requests from an Indri query client, and processes them as they arrive.
    • dumpTerm - Convenience application that opens an index and dumps out the DocInfoList (inverted list) for a given term or term ID.
    • dumpDoc - Convenience application that opens an index and dumps out the TermInfoList for a given document ID.

  • Additions, Enhancements, and other changes:
    • New IndriIndex that can handle very large amount of data (ie TREC Terabyte collection). Using IndriBuildIndex, a repository can be built and stored across multiple machines. IndriBuildIndex has trec, web, MS doc, and pdf support. Lemur users with existing TextHandlers can use BuildIndex to build an IndriIndex.
      This new index also has a document manager built in so a separate DocumentManager is not necessary to retrieve original documents later. IndriIndex compresses and stores a copy of the original documents in its own repository.
    • Also new is the Indri Structured Query Language with full fields support, ie searching for titles and dates. This retrieval method can currently be used only on top of an IndriIndex. For more information about this SQL, please view Indri Structured Query Retrieval.
    • Inclusion of IndriRetMethod for use with the Lemur Retrieval GUI.
    • Improved keyfile class uses machine independent byte ordering.This supports moving indexes that use keyfile,ie KeyfileIncIndex from one machine to another, regardless of which platform it was originally build on.
    • Java GUI for Lemur Indexing.
    • Addition of solution (Lemur.sln) and project (.vcproj) files for Visual C++ .NET support.

  • Code Update:
    • This version of Lemur has a new directory called parsing. Many of the files that support parsing elements have been moved to this directory from the utility directory, along with new parsing classes for the indri index. Files that remain in the utility directory are for general supporting classes.
    • Here is a list of deprecations. These are things that no longer exist in the toolkit:
      Deprecated: Replaced with:
      class BasicIndex
      and its supporting classes:
       BasicDocInfoList
       BasicTermInfoList
       IndexCount
       IndexProb
       Array
       Compress
       GammaCompress
       FastList
       FLL
       Int
       List
       LL
       MemList
       Number
       Timer
      use another index:
       IndriIndex
       KeyfileIncIndex
       InvFPIndex
       InvIndex
      cluster/FreqCounter renamed to FloatFreqVector
      application BuildInvertedIndex application BuildIndex
      application BuildKeyfileIncIndex application BuildIndex
      application BuildBasicIndex the BasicIndex has been deprecated.
      use BuildIndex
      .mak windows makefiles use .sln and .vcproj files with .NET


    • Here is a list of things that will be deprecated in the near future:
      Will Be Deprecated: Replaced with:
      class InvDocInfo class DocInfo
      class QueryDocument class StringQuery
      sequential iteration methods in
      DocInfoList and TermInfoList
      (startIteration, hasMore, nextEntry)
      STL style external iterators

  • Bugs Fixed:
    1. Problem: Inv(FP)Index and KeyfileIncIndex try to construct string from NULL when call to term or document is out of range
      Solution: Replace creation of NULL string with "" (empty) string
    2. Problem: StructQueryEval proximity operators fail if no position information is available
      Solution: Change operators to execute #band operator and emit warning
    3. Problem: Keyfile::getNext does not return key length or null terminate the key
      Solution: Fix to null terminate key, add 2nd method to return key length
    4. Problem: FlattextDocMgr and KeyfileDocMgr can't handle file names that contain spaces
      Solution: Replace use of >> with use of getline
    5. Problem: InvIndex::docLengthCounted returns wrong value
      Solution: Fix to return correct value
    6. Problem: QryBasedSampler cannot create new queries
      Solution: Fix LemurDBManager to return the same parser with each call
    7. Problem: InQueryRetMethod SQL ordered distance operator does not work as expected
      Solution: Fix syntax error that caused logic error

The Lemur Project
  The Lemur Project
  Last modified: Wednesday, 20-Jun-2007 12:00:47 EDT