Lemur 1.9 release notes
(Older versions:
1.1
)
- Beginning with this version, Lemur no longer supports gcc versions below 3.0.
We have tested using gcc 3.0.1, gcc 3.1, and VC++ 6.0.
- New libraries:
- summarization - The basic Summarizer class provides a generic interface for various summary generation techniques. Two implementations are provided to demonstrate how to utilize this interface, BasicSumm which implements a simple sentence selection algorithm, and MMRSumm which implements an MMR algorithm that includes automatic query generation for generic summaries.
- distrib - This library has support for distributed information retrieval and query-based sampling. The QryBasedSampler utility class gives an API for building applications that require a query-based sampling component and is an extensible tool for creating descriptions of text databases. DistSearchMethod allows simultaneous search of multiple databases. DistMergeMethod is an abstract API for merging scores from distributed databases. A sample implementation, CORIMergeMethod, is provided.
- New applications:
- BasicSummApp - simple summarizer
- MMRSummApp - Maximal Marginal Relevance summarization
- DistRetEval - distributed retrieval (rank, search, and merge) using a collection selection index and individual indexes
- CollSelIndex - builds a collection selection index
- QryBasedSample - performs query-based sampling on text databases
- TwoStageRetEval - evaluation of two-stage smoothing algorithms
- EstimateDirPrior - estimates Dirichlet prior smoothing parameter
- Other major additions:
retrieval:
- RetMethodManager creates the appropriate retrievalMethod given a method id (from parameters)
- added two-stage smoothing method to SimpleKLDocModel
- CORIRetMethod added for collection ranking and for individual index retrieval to be used with CORIMergeMethod
index:
- DocumentManager added as abstract API for retrieving document sources after indexing. The Index abstract API modified to allow retrieval of a DocumentManager associated with a document.
FlattextDocMgr sample class is provided and can be used with a PushIndex.
- InvPushIndex and InvIndex added for building and retrieving from a push index that has no positions. PushIndexer application modified to accept new parameter position which will build an index with or without positions.
- Compression added to both push indexers, Inv and InvFP indexes. They also support partial loading and specifying the output stream for messages from the indexer. InvFPIndex retrieval maintains backwards compatibility.
utility:
- Document and BasicDocStream updated to allow multiple iterations through the same document and allow the stream to skip to the next document
- new DocScoreVector data structure added to store a list of document ids and scores.
- Bugs Fixed:
- Problem: warnings with deprecated typename usage when compiled with gcc 3.1
Solution: used explicit typename keyword where needed in ISet, PSet, and CSet classes
- Problem: when getting bag-of-words (not sequence-of-words) termInfoList from positional push index, position information was lost
Solution: modified InvFPTermList to retain list of positions
- Problem: BasicIndex returns inaccurate value of total termCount()
Solution: remove inclusion of [OOV] in count
Last modified: Tue Aug 6 16:13:38 EDT 2002