Main Page | Namespace List | Class Hierarchy | Alphabetical List | Class List | Directories | File List | Namespace Members | Class Members | File Members | Related Pages

indri::api::QueryEnvironment Class Reference

Principal class for interacting with Indri indexes during retrieval. Provides the API for opening one or more Repository servers, either local or remote. Provides the API for querying the servers with the Indri query language, and additionally requesting aggregate collection statistics. More...

#include <QueryEnvironment.hpp>

List of all members.

Public Member Functions

void addIndex (class IndexEnvironment &environment)
void addIndex (const std::string &pathname)
 Add a local repository.
void addServer (const std::string &hostname)
 Add a remote server.
void close ()
 Close the QueryEnvironment.
INT64 documentCount (const std::string &term)
 Return total number of documents containing term in the collection.
INT64 documentCount ()
 Return total number of documents in the collection.
std::vector< lemur::api::DOCID_TdocumentIDsFromMetadata (const std::string &attributeName, const std::vector< std::string > &attributeValue)
 Return a list of document IDs where the document has a metadata key that matches attributeName, with a value matching one of the attributeValues.
int documentLength (lemur::api::DOCID_T documentID)
 Return the length of a document.
std::vector< std::string > documentMetadata (const std::vector< indri::api::ScoredExtentResult > &documentIDs, const std::string &attributeName)
 Fetch the named metadata attribute for a list of ScoredExtentResults.
std::vector< std::string > documentMetadata (const std::vector< lemur::api::DOCID_T > &documentIDs, const std::string &attributeName)
 Fetch the named metadata attribute for a list of document ids.
std::vector< indri::api::ParsedDocument * > documents (const std::vector< indri::api::ScoredExtentResult > &results)
 Fetch the parsed documents for a given list of ScoredExtentResults Caller is responsible for deleting the returned elements.
std::vector< indri::api::ParsedDocument * > documents (const std::vector< lemur::api::DOCID_T > &documentIDs)
 Fetch the parsed documents for a given list of document ids. Caller is responsible for deleting the returned elements.
std::vector< indri::api::ParsedDocument * > documentsFromMetadata (const std::string &attributeName, const std::vector< std::string > &attributeValues)
 Fetch all documents with a metadata key that matches attributeName, with a value matching one of the attributeValues.
std::vector< DocumentVector * > documentVectors (const std::vector< lemur::api::DOCID_T > &documentIDs)
 Fetch a document vector for a list of documents. Caller responsible for deleting the Vector.
double expressionCount (const std::string &expression, const std::string &queryType="indri")
 Return the total number of times this expression appears in the collection.
std::vector< ScoredExtentResultexpressionList (const std::string &expression, const std::string &queryType="indri")
 Return all the occurrences of this expression in the collection. Note that the returned vector may be quite large for large collections, and therefore has the very real possibility of exhausting the memory of the machine. Use this method with discretion.
std::vector< std::string > fieldList ()
 Return the list of fields.
std::vector< std::string > pathNames (const std::vector< indri::api::ScoredExtentResult > &results)
 Fetch the XPath names of extents for a list of ScoredExtentResults.
 QueryEnvironment ()
void removeIndex (const std::string &pathname)
 Remove a local repository.
void removeServer (const std::string &hostname)
 Remove a remote server.
QueryAnnotationrunAnnotatedQuery (const std::string &query, const std::vector< lemur::api::DOCID_T > &documentSet, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.
QueryAnnotationrunAnnotatedQuery (const std::string &query, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.
std::vector< indri::api::ScoredExtentResultrunQuery (const std::string &query, const std::vector< lemur::api::DOCID_T > &documentSet, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.
std::vector< indri::api::ScoredExtentResultrunQuery (const std::string &query, int resultsRequested, const std::string &queryType="indri")
 Run an Indri query language query.
QueryResults runQuery (QueryRequest &request)
 Run an Indri query language query.
void setMaxWildcardTerms (int maxTerms)
 set maximum number of wildcard terms to expand to.
void setMemory (UINT64 memory)
 Set the amount of memory to use.
void setScoringRules (const std::vector< std::string > &rules)
 Set the scoring rules.
void setSingleBackgroundModel (bool background)
 Set whether there should be one single background model or context sensitive models.
void setStopwords (const std::vector< std::string > &stopwords)
 Set the stopword list for query processing.
INT64 stemCount (const std::string &term)
 Return total number of stem occurrences.
INT64 stemFieldCount (const std::string &term, const std::string &field)
 Return total number of stem occurrences within a field.
INT64 termCount (const std::string &term)
 Return total number of term occurrences.
INT64 termCount ()
 Return total number of terms.
INT64 termFieldCount (const std::string &term, const std::string &field)
 Return total number of term occurrences within a field.
 ~QueryEnvironment ()

Private Member Functions

void _annotateQuery (indri::infnet::InferenceNetwork::MAllResults &results, const std::vector< lemur::api::DOCID_T > &documentIDs, std::string &annotatorName, indri::lang::Node *queryRoot)
void _copyStatistics (std::vector< indri::lang::RawScorerNode * > &scorerNodes, indri::infnet::InferenceNetwork::MAllResults &statisticsResults)
void _mergeQueryResults (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::server::QueryServerResponse * > &responses)
void _mergeServerQuery (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::lang::Node * > &roots, int resultsRequested)
std::vector< indri::api::ScoredExtentResult_runQuery (indri::infnet::InferenceNetwork::MAllResults &results, const std::string &q, int resultsRequested, const std::vector< lemur::api::DOCID_T > *documentIDs, QueryAnnotation **annotation, const std::string &queryType="indri")
std::vector< indri::server::QueryServerResponse * > _runServerQuery (std::vector< indri::lang::Node * > &roots, int resultsRequested)
void _scoredQuery (indri::infnet::InferenceNetwork::MAllResults &results, indri::lang::Node *queryRoot, std::string &accumulatorName, int resultsRequested, const std::vector< lemur::api::DOCID_T > *documentSet)
void _sumServerQuery (indri::infnet::InferenceNetwork::MAllResults &results, std::vector< indri::lang::Node * > &roots, int resultsRequested)
 QueryEnvironment (QueryEnvironment &other)

Private Attributes

std::vector< indri::net::NetworkMessageStream * > _messageStreams
Parameters _parameters
std::vector< indri::collection::Repository * > _repositories
std::map< std::string, std::pair<
indri::server::QueryServer *,
indri::collection::Repository * > > 
_repositoryNameMap
std::map< std::string, std::pair<
indri::server::QueryServer *,
indri::net::NetworkStream * > > 
_serverNameMap
std::vector< indri::server::QueryServer * > _servers
std::vector< indri::net::NetworkStream * > _streams


Detailed Description

Principal class for interacting with Indri indexes during retrieval. Provides the API for opening one or more Repository servers, either local or remote. Provides the API for querying the servers with the Indri query language, and additionally requesting aggregate collection statistics.


Constructor & Destructor Documentation

indri::api::QueryEnvironment::QueryEnvironment QueryEnvironment other  )  [inline, private]
 

indri::api::QueryEnvironment::QueryEnvironment  ) 
 

indri::api::QueryEnvironment::~QueryEnvironment  ) 
 


Member Function Documentation

void indri::api::QueryEnvironment::_annotateQuery indri::infnet::InferenceNetwork::MAllResults results,
const std::vector< lemur::api::DOCID_T > &  documentIDs,
std::string &  annotatorName,
indri::lang::Node queryRoot
[private]
 

void indri::api::QueryEnvironment::_copyStatistics std::vector< indri::lang::RawScorerNode * > &  scorerNodes,
indri::infnet::InferenceNetwork::MAllResults statisticsResults
[private]
 

void indri::api::QueryEnvironment::_mergeQueryResults indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::server::QueryServerResponse * > &  responses
[private]
 

void indri::api::QueryEnvironment::_mergeServerQuery indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

std::vector<indri::api::ScoredExtentResult> indri::api::QueryEnvironment::_runQuery indri::infnet::InferenceNetwork::MAllResults results,
const std::string &  q,
int  resultsRequested,
const std::vector< lemur::api::DOCID_T > *  documentIDs,
QueryAnnotation **  annotation,
const std::string &  queryType = "indri"
[private]
 

std::vector< indri::server::QueryServerResponse * > indri::api::QueryEnvironment::_runServerQuery std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

void indri::api::QueryEnvironment::_scoredQuery indri::infnet::InferenceNetwork::MAllResults results,
indri::lang::Node queryRoot,
std::string &  accumulatorName,
int  resultsRequested,
const std::vector< lemur::api::DOCID_T > *  documentSet
[private]
 

void indri::api::QueryEnvironment::_sumServerQuery indri::infnet::InferenceNetwork::MAllResults results,
std::vector< indri::lang::Node * > &  roots,
int  resultsRequested
[private]
 

void indri::api::QueryEnvironment::addIndex class IndexEnvironment environment  ) 
 

Add an IndexEnvironment object. Unlike the other add calls, this one will not close the index when QueryEnvironment::close is called.

Parameters:
environment an IndexEnvironment instance

void indri::api::QueryEnvironment::addIndex const std::string &  pathname  ) 
 

Add a local repository.

Parameters:
pathname the path to the repository.

void indri::api::QueryEnvironment::addServer const std::string &  hostname  ) 
 

Add a remote server.

Parameters:
hostname the host the server is running on

void indri::api::QueryEnvironment::close  ) 
 

Close the QueryEnvironment.

INT64 indri::api::QueryEnvironment::documentCount const std::string &  term  ) 
 

Return total number of documents containing term in the collection.

Parameters:
term the term to count documents for.
Returns:
total number of documents containing term in the aggregated collection

INT64 indri::api::QueryEnvironment::documentCount  ) 
 

Return total number of documents in the collection.

Returns:
total number of documents in the aggregated collection

std::vector< DOCID_T > indri::api::QueryEnvironment::documentIDsFromMetadata const std::string &  attributeName,
const std::vector< std::string > &  attributeValue
 

Return a list of document IDs where the document has a metadata key that matches attributeName, with a value matching one of the attributeValues.

Parameters:
attributeName the name of the metadata attribute (e.g. 'url' or 'docno')
attributeValue values that the metadata attribute should match
Returns:
a vector of ParsedDocuments that match the given metadata criteria

int indri::api::QueryEnvironment::documentLength lemur::api::DOCID_T  documentID  ) 
 

Return the length of a document.

Parameters:
documentID the document id.
Returns:
length of the document, documentID

std::vector< std::string > indri::api::QueryEnvironment::documentMetadata const std::vector< indri::api::ScoredExtentResult > &  documentIDs,
const std::string &  attributeName
 

Fetch the named metadata attribute for a list of ScoredExtentResults.

Parameters:
documentIDs the list of ScoredExtentResults
attributeName the name of the metadata attribute
Returns:
the vector of string values for that attribute

std::vector<std::string> indri::api::QueryEnvironment::documentMetadata const std::vector< lemur::api::DOCID_T > &  documentIDs,
const std::string &  attributeName
 

Fetch the named metadata attribute for a list of document ids.

Parameters:
documentIDs the list of ids
attributeName the name of the metadata attribute
Returns:
the vector of string values for that attribute

std::vector< indri::api::ParsedDocument * > indri::api::QueryEnvironment::documents const std::vector< indri::api::ScoredExtentResult > &  results  ) 
 

Fetch the parsed documents for a given list of ScoredExtentResults Caller is responsible for deleting the returned elements.

Parameters:
results the list of ScoredExtentResults
Returns:
the vector of ParsedDocument pointers.

std::vector<indri::api::ParsedDocument*> indri::api::QueryEnvironment::documents const std::vector< lemur::api::DOCID_T > &  documentIDs  ) 
 

Fetch the parsed documents for a given list of document ids. Caller is responsible for deleting the returned elements.

Parameters:
documentIDs the list of ids
Returns:
the vector of ParsedDocument pointers.

std::vector< indri::api::ParsedDocument * > indri::api::QueryEnvironment::documentsFromMetadata const std::string &  attributeName,
const std::vector< std::string > &  attributeValues
 

Fetch all documents with a metadata key that matches attributeName, with a value matching one of the attributeValues.

Parameters:
attributeName the name of the metadata attribute (e.g. 'url' or 'docno')
attributeValues values that the metadata attribute should match
Returns:
a vector of ParsedDocuments that match the given metadata criteria

std::vector<DocumentVector*> indri::api::QueryEnvironment::documentVectors const std::vector< lemur::api::DOCID_T > &  documentIDs  ) 
 

Fetch a document vector for a list of documents. Caller responsible for deleting the Vector.

Parameters:
documentIDs the vector of document ids.
Returns:
DocumentVector pointer for the specified document.

double indri::api::QueryEnvironment::expressionCount const std::string &  expression,
const std::string &  queryType = "indri"
 

Return the total number of times this expression appears in the collection.

Parameters:
expression The expression to evaluate, probably an ordered or unordered window expression

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::expressionList const std::string &  expression,
const std::string &  queryType = "indri"
 

Return all the occurrences of this expression in the collection. Note that the returned vector may be quite large for large collections, and therefore has the very real possibility of exhausting the memory of the machine. Use this method with discretion.

Parameters:
expression The expression to evaluate, probably an ordered or unordered window expression

std::vector< std::string > indri::api::QueryEnvironment::fieldList  ) 
 

Return the list of fields.

Returns:
vector of field names.

std::vector< std::string > indri::api::QueryEnvironment::pathNames const std::vector< indri::api::ScoredExtentResult > &  results  ) 
 

Fetch the XPath names of extents for a list of ScoredExtentResults.

Parameters:
results the list of ScoredExtentResults
Returns:
the vector of string XPath names for the extents

void indri::api::QueryEnvironment::removeIndex const std::string &  pathname  ) 
 

Remove a local repository.

Parameters:
pathname the path to the repository.

void indri::api::QueryEnvironment::removeServer const std::string &  hostname  ) 
 

Remove a remote server.

Parameters:
hostname the host the server is running on

QueryAnnotation* indri::api::QueryEnvironment::runAnnotatedQuery const std::string &  query,
const std::vector< lemur::api::DOCID_T > &  documentSet,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
QueryAnnotation
Parameters:
query the query to run
documentSet the working set of document ids to evaluate
resultsRequested maximum number of results to return
Returns:
pointer to QueryAnnotations for the query

indri::api::QueryAnnotation * indri::api::QueryEnvironment::runAnnotatedQuery const std::string &  query,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
QueryAnnotation
Parameters:
query the query to run
resultsRequested maximum number of results to return
Returns:
pointer to QueryAnnotations for the query

std::vector<indri::api::ScoredExtentResult> indri::api::QueryEnvironment::runQuery const std::string &  query,
const std::vector< lemur::api::DOCID_T > &  documentSet,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
ScoredExtentResult
Parameters:
query the query to run
documentSet the working set of document ids to evaluate
resultsRequested maximum number of results to return
Returns:
the vector of ScoredExtentResults for the query

std::vector< indri::api::ScoredExtentResult > indri::api::QueryEnvironment::runQuery const std::string &  query,
int  resultsRequested,
const std::string &  queryType = "indri"
 

Run an Indri query language query.

See also:
ScoredExtentResult
Parameters:
query the query to run
resultsRequested maximum number of results to return
Returns:
the vector of ScoredExtentResults for the query

indri::api::QueryResults indri::api::QueryEnvironment::runQuery QueryRequest request  ) 
 

Run an Indri query language query.

Parameters:
request the query to run
Returns:
the QueryResults for the request,

void indri::api::QueryEnvironment::setMaxWildcardTerms int  maxTerms  ) 
 

set maximum number of wildcard terms to expand to.

Parameters:
maxTerms the maximum number of terms to expand a wildcard operator argument (default 100).

void indri::api::QueryEnvironment::setMemory UINT64  memory  ) 
 

Set the amount of memory to use.

Parameters:
memory number of bytes to allocate

void indri::api::QueryEnvironment::setScoringRules const std::vector< std::string > &  rules  ) 
 

Set the scoring rules.

Parameters:
rules the vector of scoring rules.

void indri::api::QueryEnvironment::setSingleBackgroundModel bool  background  ) 
 

Set whether there should be one single background model or context sensitive models.

Parameters:
background true for one background model false for context sensitive models

void indri::api::QueryEnvironment::setStopwords const std::vector< std::string > &  stopwords  ) 
 

Set the stopword list for query processing.

Parameters:
stopwords the list of stopwords

INT64 indri::api::QueryEnvironment::stemCount const std::string &  term  ) 
 

Return total number of stem occurrences.

Parameters:
term the stem to count
Returns:
total frequency of this stem in the aggregated collection

INT64 indri::api::QueryEnvironment::stemFieldCount const std::string &  term,
const std::string &  field
 

Return total number of stem occurrences within a field.

Parameters:
term the stem to count
field the name of the field
Returns:
total frequency of this stem within this field in the aggregated collection

INT64 indri::api::QueryEnvironment::termCount const std::string &  term  ) 
 

Return total number of term occurrences.

Parameters:
term the term to count
Returns:
total frequency of this term in the aggregated collection

INT64 indri::api::QueryEnvironment::termCount  ) 
 

Return total number of terms.

Returns:
total number of terms in the aggregated collection

INT64 indri::api::QueryEnvironment::termFieldCount const std::string &  term,
const std::string &  field
 

Return total number of term occurrences within a field.

Parameters:
term the term to count
field the name of the field
Returns:
total frequency of this term within this field in the aggregated collection


Member Data Documentation

std::vector<indri::net::NetworkMessageStream*> indri::api::QueryEnvironment::_messageStreams [private]
 

Parameters indri::api::QueryEnvironment::_parameters [private]
 

std::vector<indri::collection::Repository*> indri::api::QueryEnvironment::_repositories [private]
 

std::map<std::string, std::pair<indri::server::QueryServer *, indri::collection::Repository *> > indri::api::QueryEnvironment::_repositoryNameMap [private]
 

std::map<std::string, std::pair<indri::server::QueryServer *, indri::net::NetworkStream *> > indri::api::QueryEnvironment::_serverNameMap [private]
 

std::vector<indri::server::QueryServer*> indri::api::QueryEnvironment::_servers [private]
 

std::vector<indri::net::NetworkStream*> indri::api::QueryEnvironment::_streams [private]
 


The documentation for this class was generated from the following files:
Generated on Thu Jun 19 11:12:02 2008 for LEMUR by  doxygen 1.4.4