INDRI
Language modeling meets inference networks

Indri is a new search engine from the Lemur project; a cooperative effort between the University of Massachusetts and Carnegie Mellon University to build language modeling information retrieval tools.

Effective

  • Best-in-class ad hoc retrieval performance

Flexible

  • Supports popular structured query operators from INQUERY
  • Open source, with a flexible BSD-inspired license
  • Parses PDF, HTML, XML, and TREC documents
  • Word and PowerPoint parsing (Windows only)

Usable

  • Supports UTF-8 encoded text
  • Language independent tokenization of UTF-8 encoded documents.
  • Includes both command line tools and a Java user interface
  • API can be used from Java, PHP, or C++
  • Works on Windows, Linux, Solaris and Mac OS X

Powerful

  • Can be used on a cluster of machines for faster indexing and retrieval
  • Suffix-based wildcard term matching
  • Field retrieval
  • Passage retrieval
  • Scales to terabyte-sized collections

Related Links