Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::distrib::FreqCounter Class Reference

#include <FreqCounter.hpp>

Inheritance diagram for lemur::distrib::FreqCounter:

lemur::api::TextHandler List of all members.

Public Member Functions

 FreqCounter (const lemur::api::Stopper *stopWords=NULL)
 FreqCounter (const string &filename, const lemur::api::Stopper *stopWords=NULL)
 ~FreqCounter ()
 Delete the freqency counter.

void clear ()
 Clear the frequency counter (set all counts to 0).

void output (const string &filename) const
 Output the frequency information to a file.

char * randomWord ()
void setRandomMode (int mode)
int getRandomMode () const
char * randomCtf () const
char * randomDf () const
char * randomAveTf () const
char * randomUniform () const
int numWords () const
int totWords () const
const freqmapgetFreqInfo () const
int getCtf (const char *word) const
int getDf (const char *word) const
double getAveTf (const char *word) const
double ctfRatio (FreqCounter &lm1) const
char * handleDoc (char *docno)
 Overridden from TextHandler.

char * handleWord (char *word)
 Overridden from TextHandler - increments collection term frequencies.

void endDoc ()
 Specifies end of a document - updates document frequencies.

void setName (const string &freqCounterName)
 Set the name of language model described by the frequency counter.

const string & getName () const
 Get the counter's name.

void pruneBottomWords (int topWords)
 Prune least frequent words, keeping only topWords most frequent words.


Protected Member Functions

void input (const string &filename)

Protected Attributes

freqmap freqInfo
stringset doc
stringset randdone
string name
const lemur::api::Stopperstopper
long ctfTot
int dfTot
long double avetfTot
bool atfValid
int randomMode
int nWords

Detailed Description

Counts collection term frequencies and document frequencies. Also provides a means for selecting random words. The FreqCounter can use a stopword list.


Constructor & Destructor Documentation

lemur::distrib::FreqCounter::FreqCounter const lemur::api::Stopper stopWords = NULL  ) 
 

Create a frequency counter with the specified stopword list. The stopWords parameter is optional.

lemur::distrib::FreqCounter::FreqCounter const string &  filename,
const lemur::api::Stopper stopWords = NULL
 

Create a frequency counter (loading it from file) with the specified stopword list. Thes stopWords parameter is optional.

lemur::distrib::FreqCounter::~FreqCounter  ) 
 

Delete the freqency counter.


Member Function Documentation

void lemur::distrib::FreqCounter::clear  ) 
 

Clear the frequency counter (set all counts to 0).

double lemur::distrib::FreqCounter::ctfRatio FreqCounter lm1  )  const
 

Compare lm1 to this language model, returning the ctf ratio.

void lemur::distrib::FreqCounter::endDoc  ) 
 

Specifies end of a document - updates document frequencies.

double lemur::distrib::FreqCounter::getAveTf const char *  word  )  const
 

Get the average term frequency for a word.

int lemur::distrib::FreqCounter::getCtf const char *  word  )  const
 

Get the collection term frequency for a word.

int lemur::distrib::FreqCounter::getDf const char *  word  )  const
 

Get the document frequency for a word.

const lemur::distrib::freqmap * lemur::distrib::FreqCounter::getFreqInfo  )  const
 

Get a reference to the internal frequency count map.

const string & lemur::distrib::FreqCounter::getName  )  const
 

Get the counter's name.

int lemur::distrib::FreqCounter::getRandomMode  )  const
 

Gets the current random word mode. See setRandomMode(...)

char * lemur::distrib::FreqCounter::handleDoc char *  docno  )  [virtual]
 

Overridden from TextHandler.

Reimplemented from lemur::api::TextHandler.

char * lemur::distrib::FreqCounter::handleWord char *  word  )  [virtual]
 

Overridden from TextHandler - increments collection term frequencies.

Reimplemented from lemur::api::TextHandler.

void lemur::distrib::FreqCounter::input const string &  filename  )  [protected]
 

int lemur::distrib::FreqCounter::numWords  )  const
 

Return the number of unique words seen across all documents processed.

void lemur::distrib::FreqCounter::output const string &  filename  )  const
 

Output the frequency information to a file.

void lemur::distrib::FreqCounter::pruneBottomWords int  topWords  ) 
 

Prune least frequent words, keeping only topWords most frequent words.

char * lemur::distrib::FreqCounter::randomAveTf  )  const
 

Select a word at random using average term frequency. This word is no guarenteed to be unique from other calls to this function.

char * lemur::distrib::FreqCounter::randomCtf  )  const
 

Select a word at random using collection term frequency. This word is not guarenteed to be unique from other calls to this function.

char * lemur::distrib::FreqCounter::randomDf  )  const
 

Select a word at random using document frequency. This word is not guarenteed to be unique from other calls to this function.

char * lemur::distrib::FreqCounter::randomUniform  )  const
 

Select a word at random with equal probability for each word. This word is not guarenteed to be unique from other calls to this funtion.

char * lemur::distrib::FreqCounter::randomWord  ) 
 

Get a random word from the distribution specified by setRandomMode. The random word is unique since the last clear operation.

void lemur::distrib::FreqCounter::setName const string &  freqCounterName  ) 
 

Set the name of language model described by the frequency counter.

void lemur::distrib::FreqCounter::setRandomMode int  mode  ) 
 

Set the random word selection mode: R_CTF - select using collection term frequency R_DF - select using document frequency R_AVE_TF - select using average term frequency R_UNIFORM - select each word with equal probability

int lemur::distrib::FreqCounter::totWords  )  const
 

Return the total words seen across all documents processed.


Member Data Documentation

bool lemur::distrib::FreqCounter::atfValid [mutable, protected]
 

long double lemur::distrib::FreqCounter::avetfTot [mutable, protected]
 

long lemur::distrib::FreqCounter::ctfTot [protected]
 

int lemur::distrib::FreqCounter::dfTot [protected]
 

stringset lemur::distrib::FreqCounter::doc [protected]
 

freqmap lemur::distrib::FreqCounter::freqInfo [mutable, protected]
 

string lemur::distrib::FreqCounter::name [protected]
 

int lemur::distrib::FreqCounter::nWords [protected]
 

stringset lemur::distrib::FreqCounter::randdone [protected]
 

int lemur::distrib::FreqCounter::randomMode [protected]
 

const lemur::api::Stopper* lemur::distrib::FreqCounter::stopper [protected]
 


The documentation for this class was generated from the following files:
Generated on Tue Jun 15 11:03:05 2010 for Lemur by doxygen 1.3.4