Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

lemur::api::DocStream Class Reference

Abstract interface for a collection of documents. More...

#include <DocStream.hpp>

Inheritance diagram for lemur::api::DocStream:

lemur::parse::BasicDocStream List of all members.

Public Member Functions

virtual ~DocStream ()
virtual void startDocIteration ()=0
 start document iteration

virtual bool hasMore ()=0
virtual DocumentnextDoc ()=0
 return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc()


Detailed Description

Abstract interface for a collection of documents.

DocStream is an abstract interface for a collection of documents. A given realization can have special tokenization, document header formats, etc, and will return a special Document instance to indicate this.

The following is an example of supporting an index with position information:

An example of supporting index with position information

// a DocStream that handles position class PosDocStream : public DocStream { ... Document *nextDoc() { return (new PosDocument(...)); // returns a special Document } ... };

// a Document that has position information class PosDocument : public Document { ... TokenTerm *nextTerm() { return (new PosTerm(...)); // returns a special Term } };

// a Term that has position class PosTerm: public TokenTerm { int getPosition() { ... } };

// Indexer that records term positions class PosIndex : public Index { ... PosDocStream *db;

... // when indexing

db->startDocIteration(); Document *doc; while (db->hasMore()) { Document *doc = db->nextDoc(); // we'll actually get a PosDocument doc->startTermIteration(); PosTerm *term; while (doc->hasMore()) { term = (PosTerm *)nextTerm(term); // note that down-casting! term->getPosition(); term->spelling(); ...

} } ... }


Constructor & Destructor Documentation

virtual lemur::api::DocStream::~DocStream  )  [inline, virtual]
 


Member Function Documentation

virtual bool lemur::api::DocStream::hasMore  )  [pure virtual]
 

Implemented in lemur::parse::BasicDocStream.

virtual Document* lemur::api::DocStream::nextDoc  )  [pure virtual]
 

return a pointer to next document (static memory, do not delete returned instance). hasMore() should be called before calling nextDoc()

Implemented in lemur::parse::BasicDocStream.

virtual void lemur::api::DocStream::startDocIteration  )  [pure virtual]
 

start document iteration

Implemented in lemur::parse::BasicDocStream.


The documentation for this class was generated from the following file:
Generated on Tue Jun 15 11:03:04 2010 for Lemur by doxygen 1.3.4