This example application demonstrates the basic offline clustering task. Provides kmeans and bisecting kmeans partitional clustering. It will run each algorithm on the first 100 documents in the index (or all of them if less than 100) and print out the results.
The parameters accepted by OfflineCluster are:

index  the index to use. Default is none.

clusterType  Type of cluster to use, either agglomerative or centroid. Centroid is agglomerative using mean which trades memory use for speed of clustering. Default is centroid.

simType  The similarity metric to use. Default is cosine similarity (COS), which is the only implemented method.

docMode  The integer encoding of the scoring method to use for the agglomerative cluster type. The default is max (maximum). The choices are:

max  Maximum score over documents in a cluster.

mean  Mean score over documents in a cluster. This is identical to the centroid cluster type.

avg  Average score over documents in a cluster.

min  Minimum score over documents in a cluster.

numParts  Number of partitions to split into. Default is 2

maxIters  Maximum number of iterations for kmeans. Default is 100.

bkIters  Number of kmeans iterations for bisecting kmeans. Default is 5.
Generated on Tue Jun 15 11:02:58 2010 for Lemur by
1.3.4