Main Page | Namespace List | Class Hierarchy | Class List | File List | Namespace Members | Class Members | File Members | Related Pages

indri::index::TermBitmap Class Reference

#include <TermBitmap.hpp>

List of all members.

Public Member Functions

 TermBitmap ()
 ~TermBitmap ()
int lastFrom ()
void add (int to)
void add (int from, int to)
int get (int from)
size_t memorySize ()

Private Member Functions

void _addBufferIfNecessary ()
indri::utility::Buffer_findBuffer (int from)
const char * _findInBuffer (indri::utility::Buffer *b, int from)
int _bitsSet (unsigned char c)

Private Attributes

std::vector< indri::utility::Buffer * > _maps
int _fromBase
int _toBase
int _lastFrom
char * _current

Detailed Description

TermBitmap is used to convert termIDs when many DiskIndexes are merged together. The add() function has very strict preconditions; both from and to must increase on every call, and from must always be less than to.

This data is stored in 32-byte bitmap chunks with the following form: 4 bytes - fromBase 4 bytes - toBase 24 bytes - bitmap

Each bit set in the bitmap region corresponds to a (from, to) pair.

Suppose the beginning of the bitmap looks like this: 000100100110000.... This could be represented by the following pairs: (1, 4) (2, 7) (3, 10) (4, 11) For instance, the (2, 7) pair says that the second non-zero bit is at index 7. This (2, 7) pair is translated to mean that (fromBase + 2, toBase + 7) is a pair stored by some explicit add() call.

To save on heap overhead, we manage blocks of 64K each in Buffer objects, which are stored in the vector called _maps.

The TermBitmap is used because, in the ideal case, it is much more space efficient than the simpler approach of using an array mapping. In an array, we'd need 32 bits for each (from, to) pair. In the case where the (from, to) pairs are optimally dense [e.g. (1,1), (2,2), (3,3) ... ], the TermBitmap uses 1.33 bits per pair.

Constructor & Destructor Documentation

indri::index::TermBitmap::TermBitmap  )  [inline]

indri::index::TermBitmap::~TermBitmap  )  [inline]

Member Function Documentation

void indri::index::TermBitmap::_addBufferIfNecessary  )  [inline, private]

int indri::index::TermBitmap::_bitsSet unsigned char  c  )  [inline, private]

indri::utility::Buffer* indri::index::TermBitmap::_findBuffer int  from  )  [inline, private]

const char* indri::index::TermBitmap::_findInBuffer indri::utility::Buffer b,
int  from
[inline, private]

void indri::index::TermBitmap::add int  from,
int  to

void indri::index::TermBitmap::add int  to  )  [inline]

int indri::index::TermBitmap::get int  from  )  [inline]

int indri::index::TermBitmap::lastFrom  )  [inline]

size_t indri::index::TermBitmap::memorySize  )  [inline]

Member Data Documentation

char* indri::index::TermBitmap::_current [private]

int indri::index::TermBitmap::_fromBase [private]

int indri::index::TermBitmap::_lastFrom [private]

std::vector<indri::utility::Buffer*> indri::index::TermBitmap::_maps [private]

int indri::index::TermBitmap::_toBase [private]

The documentation for this class was generated from the following file:
Generated on Tue Jun 15 11:03:00 2010 for Lemur by doxygen 1.3.4