IdentifinderParser.hpp File Reference

#include "Parser.hpp"
#include "TextHandler.hpp"
#include "LinkedPropertyList.hpp"

Go to the source code of this file.

Namespaces

namespace lemur

namespace lemur::parse

Defines

#define BEGIN_PREFIX   "B_"

#define END_PREFIX   "E_"

#define PREFIX_LEN   2

Define Documentation

#define BEGIN_PREFIX "B_"

Parses documents in with similar document separation tags NIST's Web format. <DOC></DOC> around documents and <DOCNO></DOCNO> around docids. This parser recognizes named entity tags from the Identifinder tagger and passed them along as properties. For each tag X, also adds in b_X and e_X to the first and last token of each entity. For example, "Carnegie Mellon University" was identified as a place, it would be parsed with the following properties: Carnegie [b_place] [place] Mellon [place] University [e_place] [place] A single token entity, like Madonna would be Madonna [b_person] [person] [e_person] Does case folding for words that are not in the acronym list. Contraction suffixes and possessive suffixes are stripped.
U.S.A., USA's, and USAs are converted to USA. Does not recognize acronyms with numbers.

#define END_PREFIX "E_"

#define PREFIX_LEN 2

Generated on Tue Jun 15 11:02:56 2010 for Lemur by

1.3.4


Namespaces
namespace	lemur
namespace	lemur::parse
Defines
#define	BEGIN_PREFIX "B_"
#define	END_PREFIX "E_"
#define	PREFIX_LEN 2