Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::parse::TrecParser Class Reference

#include <TrecParser.hpp>

Inheritance diagram for lemur::parse::TrecParser:

lemur::api::Parser lemur::api::TextHandler List of all members.

Public Member Functions

 TrecParser ()
void parseFile (const string &filename)
 Parse a file.

void parseBuffer (char *buf, int len)
 Parse a buffer of len length.

long fileTell () const

Static Public Attributes

const string identifier = "trec"

Private Member Functions

void doParse ()
 Actual parsing action flow.


Private Attributes

int state
 The state of the parser.

Property begelem
 keep a property for being and end of elements

Property endelem
LinkedPropertyList proplist
 list


Detailed Description

Parses documents in NIST's TREC format. Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.

U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

The following fields are parsed: TEXT, HL, HEAD, HEADLINE, LP, TTL


Constructor & Destructor Documentation

lemur::parse::TrecParser::TrecParser  ) 
 


Member Function Documentation

void lemur::parse::TrecParser::doParse  )  [private]
 

Actual parsing action flow.

long lemur::parse::TrecParser::fileTell  )  const [virtual]
 

Gives current byte position offset into file being parsed. Don't use with parseBuffer

Implements lemur::api::Parser.

void lemur::parse::TrecParser::parseBuffer char *  buf,
int  len
[virtual]
 

Parse a buffer of len length.

Implements lemur::api::Parser.

void lemur::parse::TrecParser::parseFile const string &  filename  )  [virtual]
 

Parse a file.

Implements lemur::api::Parser.


Member Data Documentation

Property lemur::parse::TrecParser::begelem [private]
 

keep a property for being and end of elements

Property lemur::parse::TrecParser::endelem [private]
 

const string lemur::parse::TrecParser::identifier = "trec" [static]
 

Reimplemented from lemur::api::Parser.

LinkedPropertyList lemur::parse::TrecParser::proplist [private]
 

list

int lemur::parse::TrecParser::state [private]
 

The state of the parser.


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:06 2010 for Lemur by doxygen 1.3.4