Quick Start

It isn't difficult to get started with Sifaka. Just follow the steps below.

Java 8: Sifaka requires the 64-bit version of Java 8. If you don't have it already, download the Java 8 Runtime Environment (JRE).
Download Sifaka: It is available from SourceForge, on the Lemur Project page.
Index documents: Sifaka uses a document index that enables it to search and analyze your documents quickly. Thus, the next step is to build an index for your documents.

To build a document index: Sifaka can index plain text, the Reuters-21578 dataset from UCI, simplified TREC documents, the Wall Street 1987-1992, html documents, warc files, and tweets in the Twitter Spritzer format. (The Document Parser Tutorial provides information about each supported format and how to build parsers for other document formats.)
1. Run Sifaka Build Index Application: The process for starting Sifaka Build Index is a little different on different operating systems.
  
  Windows: There are several options for running Sifaka in Windows.
  1. All versions: Double-click on sifakaBuildIndex.jar. This is the best choice for most people.
  2. Windows 7: Open a command prompt. Navigate to the directory which contains sifaka.jar. Type: java -jar sifakaBuildIndex.jar
  3. Windows 10: Open a bash shell. Navigate to the directory which contains sifaka.jar. Type: java -jar sifakaBuildIndex.jar
  Mac: Open a terminal. Navigate to the directory which contains sifaka.jar. Type: java -jar sifakaBuildIndex.jar
  
  Linux: Open a terminal. Navigate to the directory which contains sifaka.jar. Type: java -jar sifakaBuildIndex.jar
2. Create a Sifaka index: Use Sifaka BuildIndex to index documents in a directory with specified indexing options and annotations.
  1. To build an index with the Reuters-21578 Text Categorization Data Set from UCI, download the dataset: reuters21578.tar.gz. (Refer to the document parser tutorial to build an index with a different set of documents.)
  2. Unzip the archive.
  3. Specify the indexing parameters.
  4. Press the Build Index button to start indexing documents.
Analyze documents: The process for starting the Sifaka TextMiner application is a little different on different operating systems.

Windows:
1. All versions: Double-click on sifakaTextMiner.jar. This is the best choice for most people.
2. Windows 7: Open a command prompt. Navigate to the directory that contains sifaka.jar. Type: java -jar sifakaTextMiner.jar
3. Windows 10: Open a bash shell. Navigate to the directory that contains sifaka.jar. Type: java -jar sifakaTextMiner.jar
Mac: Open a terminal. Navigate to the directory that contains sifaka.jar. Type: java -jar sifakaTextMiner.jar

Linux: Open a terminal. Navigate to the directory that contains sifaka.jar. Type: java -jar sifakaTextMiner.jar

Follow tutorials to open the built index and perform text analysis.