Returns true iff a character should be included in a token. This
tokenizer generates as tokens adjacent sequences of characters which
satisfy this predicate. Characters for which this is false are used to
define token boundaries and are not included in tokens.
Called on each token character to normalize it before it is added to the
token. The default implementation does nothing. Subclasses may use this
to, e.g., lowercase tokens.