Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::retrieval::PassageRep Class Reference

Passage representation for a document. More...

#include <PassageRep.hpp>

Inheritance diagram for lemur::retrieval::PassageRep:

lemur::api::DocumentRep List of all members.

Public Member Functions

 PassageRep (lemur::api::DocumentRep &dRep, int d, int p, int o)
 Fixed sized window passage with overlap.

PassageRep::iterator begin ()
PassageRep::iterator end ()
void setEnd (int s, int e, int dl)
 update end and length values

int passageTF (lemur::api::TERMID_T tid, lemur::api::MatchInfo *matches) const
 Term frequency of a term within the current passage.

int getStart () const
 start of the current passage

int getEnd () const
 end + 1 of the current passage

virtual double termWeight (lemur::api::TERMID_T termID, const lemur::api::DocInfo *info) const
 Delegate call to termWeight of the encapsulated DocumentRep.

virtual double scoreConstant () const
 Delegate call to scoreConstant of the encapsulated DocumentRep.


Protected Attributes

lemur::api::DocumentRepdocRep
 DocumentRep for the whole document. Calls to termWeight and scoreConstant are delegated to it.

int psgSize
 Size of the passage, in number of tokens.

int overlap
 Number of tokens to overlap when advancing the passage window.

int docEnd
 Length of the whole document.

int start
 index of start of the current passage.

int pEnd
 index of end of the current passage.


Detailed Description

Passage representation for a document.

Supports iteration over passages of fixed window size with an overlap of K terms for the window. Encapsulates the DocumentRep for the whole document, modifying its docLength attribute. Delegates calls to termWeight and scoreConstant to the encapsulated DocumentRep. TFIDFRetMethod with BM25 tf weighting and OkapiRetMethod will not compute correct scores, as they use the average document length from the collection in their formulas. The difference should be small.


Constructor & Destructor Documentation

lemur::retrieval::PassageRep::PassageRep lemur::api::DocumentRep dRep,
int  d,
int  p,
int  o
[inline]
 

Fixed sized window passage with overlap.

Parameters:
dRep DocumentRep for the document as returned by computeDocRep.
d length of whole document.
p size of passage in terms of tokens.
o number of tokens to overlap.


Member Function Documentation

PassageRep::iterator lemur::retrieval::PassageRep::begin  )  [inline]
 

PassageRep::iterator lemur::retrieval::PassageRep::end  )  [inline]
 

int lemur::retrieval::PassageRep::getEnd  )  const [inline]
 

end + 1 of the current passage

int lemur::retrieval::PassageRep::getStart  )  const [inline]
 

start of the current passage

int lemur::retrieval::PassageRep::passageTF lemur::api::TERMID_T  tid,
lemur::api::MatchInfo matches
const [inline]
 

Term frequency of a term within the current passage.

Parameters:
tid the term id to count.
matches the term matches returned by MatchInfo::getMatches for the document. This list is used for efficiency, as it is shorter than the whole TermInfoList for the document.
Returns:
the frequency of a term within the current passage.

virtual double lemur::retrieval::PassageRep::scoreConstant  )  const [inline, virtual]
 

Delegate call to scoreConstant of the encapsulated DocumentRep.

Implements lemur::api::DocumentRep.

void lemur::retrieval::PassageRep::setEnd int  s,
int  e,
int  dl
[inline]
 

update end and length values

virtual double lemur::retrieval::PassageRep::termWeight lemur::api::TERMID_T  termID,
const lemur::api::DocInfo info
const [inline, virtual]
 

Delegate call to termWeight of the encapsulated DocumentRep.


Member Data Documentation

int lemur::retrieval::PassageRep::docEnd [mutable, protected]
 

Length of the whole document.

lemur::api::DocumentRep& lemur::retrieval::PassageRep::docRep [protected]
 

DocumentRep for the whole document. Calls to termWeight and scoreConstant are delegated to it.

int lemur::retrieval::PassageRep::overlap [protected]
 

Number of tokens to overlap when advancing the passage window.

int lemur::retrieval::PassageRep::pEnd [mutable, protected]
 

index of end of the current passage.

int lemur::retrieval::PassageRep::psgSize [protected]
 

Size of the passage, in number of tokens.

int lemur::retrieval::PassageRep::start [mutable, protected]
 

index of start of the current passage.


The documentation for this class was generated from the following file:
Generated on Tue Jun 15 11:03:06 2010 for Lemur by doxygen 1.3.4