| java.lang.Object org.antlr.runtime.BaseRecognizer
All known Subclasses: org.antlr.runtime.Lexer, org.antlr.runtime.tree.TreeParser, org.antlr.runtime.Parser,
BaseRecognizer | abstract public class BaseRecognizer (Code) | | A generic recognizer that can handle recognizers generated from
lexer, parser, and tree grammars. This is all the parsing
support code essentially; most of it is error recovery stuff and
backtracking.
|
Field Summary | |
final public static int | DEFAULT_TOKEN_CHANNEL | final public static int | HIDDEN | final public static int | INITIAL_FOLLOW_STACK_SIZE | final public static int | MEMO_RULE_FAILED | final public static Integer | MEMO_RULE_FAILED_I | final public static int | MEMO_RULE_UNKNOWN | final public static String | NEXT_TOKEN_RULE_NAME | protected int | _fsp | protected int | backtracking If 0, no backtracking is going on. | protected boolean | errorRecovery This is true when we see an error and before having successfully
matched a token. | protected boolean | failed In lieu of a return value, this indicates that a rule or token
has failed to match. | protected BitSet[] | following Track the set of token types that can follow any rule invocation.
Stack grows upwards. | protected int | lastErrorIndex The index into the input stream where the last error occurred.
This is used to prevent infinite loops where an error is found
but no token is consumed during recovery...another error is found,
ad naseum. | protected Map[] | ruleMemo An array[size num rules] of Map that tracks
the stop token index for each rule. |
Method Summary | |
public boolean | alreadyParsedRule(IntStream input, int ruleIndex) Has this rule already parsed input at the current index in the
input stream? Return the stop token index or MEMO_RULE_UNKNOWN. | public void | beginResync() A hook to listen in on the token consumption during error recovery. | protected BitSet | combineFollows(boolean exact) | protected BitSet | computeContextSensitiveRuleFOLLOW() Compute the context-sensitive FOLLOW set for current rule.
This is set of token types that can follow a specific rule
reference given a specific call chain. | protected BitSet | computeErrorRecoverySet() | public void | consumeUntil(IntStream input, int tokenType) | public void | consumeUntil(IntStream input, BitSet set) | public void | displayRecognitionError(String[] tokenNames, RecognitionException e) | public void | emitErrorMessage(String msg) | public void | endResync() | public int | getBacktrackingLevel() | public String | getErrorHeader(RecognitionException e) | public String | getErrorMessage(RecognitionException e, String[] tokenNames) What error message should be generated for the various
exception types?
Not very object-oriented code, but I like having all error message
generation within one method rather than spread among all of the
exception classes. | public String | getGrammarFileName() For debugging and other purposes, might want the grammar name. | public List | getRuleInvocationStack() Return List of the rules in your parser instance
leading up to a call to this method. | public static List | getRuleInvocationStack(Throwable e, String recognizerClassName) A more general version of getRuleInvocationStack where you can
pass in, for example, a RecognitionException to get it's rule
stack trace. | public int | getRuleMemoization(int ruleIndex, int ruleStartIndex) Given a rule number and a start token index number, return
MEMO_RULE_UNKNOWN if the rule has not parsed input starting from
start index. | public int | getRuleMemoizationCacheSize() return how many rule/input-index pairs there are in total.
TODO: this includes synpreds. | public String | getTokenErrorDisplay(Token t) How should a token be displayed in an error message? The default
is to display just the text, but during development you might
want to have a lot of information spit out. | public String[] | getTokenNames() Used to print out token names like ID during debugging and
error reporting. | public void | match(IntStream input, int ttype, BitSet follow) Match current input symbol against ttype. | public void | matchAny(IntStream input) | public void | memoize(IntStream input, int ruleIndex, int ruleStartIndex) Record whether or not this rule parsed the input at this position
successfully. | protected void | mismatch(IntStream input, int ttype, BitSet follow) factor out what to do upon token mismatch so tree parsers can behave
differently. | protected void | pushFollow(BitSet fset) | public void | recover(IntStream input, RecognitionException re) Recover from an error found on the input stream. | protected boolean | recoverFromMismatchedElement(IntStream input, RecognitionException e, BitSet follow) This code is factored out from mismatched token and mismatched set
recovery. | public void | recoverFromMismatchedSet(IntStream input, RecognitionException e, BitSet follow) | public void | recoverFromMismatchedToken(IntStream input, RecognitionException e, int ttype, BitSet follow) Attempt to recover from a single missing or extra token.
EXTRA TOKEN
LA(1) is not what we are looking for. | public void | reportError(RecognitionException e) Report a recognition problem.
This method sets errorRecovery to indicate the parser is recovering
not parsing. | public void | reset() | public List | toStrings(List tokens) A convenience method for use most often with template rewrites. | public void | traceIn(String ruleName, int ruleIndex, Object inputSymbol) | public void | traceOut(String ruleName, int ruleIndex, Object inputSymbol) |
DEFAULT_TOKEN_CHANNEL | final public static int DEFAULT_TOKEN_CHANNEL(Code) | | |
HIDDEN | final public static int HIDDEN(Code) | | |
INITIAL_FOLLOW_STACK_SIZE | final public static int INITIAL_FOLLOW_STACK_SIZE(Code) | | |
MEMO_RULE_FAILED | final public static int MEMO_RULE_FAILED(Code) | | |
MEMO_RULE_FAILED_I | final public static Integer MEMO_RULE_FAILED_I(Code) | | |
MEMO_RULE_UNKNOWN | final public static int MEMO_RULE_UNKNOWN(Code) | | |
NEXT_TOKEN_RULE_NAME | final public static String NEXT_TOKEN_RULE_NAME(Code) | | |
backtracking | protected int backtracking(Code) | | If 0, no backtracking is going on. Safe to exec actions etc...
If >0 then it's the level of backtracking.
|
errorRecovery | protected boolean errorRecovery(Code) | | This is true when we see an error and before having successfully
matched a token. Prevents generation of more than one error message
per error.
|
failed | protected boolean failed(Code) | | In lieu of a return value, this indicates that a rule or token
has failed to match. Reset to false upon valid token match.
|
following | protected BitSet[] following(Code) | | Track the set of token types that can follow any rule invocation.
Stack grows upwards. When it hits the max, it grows 2x in size
and keeps going.
|
lastErrorIndex | protected int lastErrorIndex(Code) | | The index into the input stream where the last error occurred.
This is used to prevent infinite loops where an error is found
but no token is consumed during recovery...another error is found,
ad naseum. This is a failsafe mechanism to guarantee that at least
one token/tree node is consumed for two errors.
|
ruleMemo | protected Map[] ruleMemo(Code) | | An array[size num rules] of Map that tracks
the stop token index for each rule. ruleMemo[ruleIndex] is
the memoization table for ruleIndex. For key ruleStartIndex, you
get back the stop token for associated rule or MEMO_RULE_FAILED.
This is only used if rule memoization is on (which it is by default).
|
alreadyParsedRule | public boolean alreadyParsedRule(IntStream input, int ruleIndex)(Code) | | Has this rule already parsed input at the current index in the
input stream? Return the stop token index or MEMO_RULE_UNKNOWN.
If we attempted but failed to parse properly before, return
MEMO_RULE_FAILED.
This method has a side-effect: if we have seen this input for
this rule and successfully parsed before, then seek ahead to
1 past the stop token matched for this rule last time.
|
beginResync | public void beginResync()(Code) | | A hook to listen in on the token consumption during error recovery.
The DebugParser subclasses this to fire events to the listenter.
|
combineFollows | protected BitSet combineFollows(boolean exact)(Code) | | |
computeContextSensitiveRuleFOLLOW | protected BitSet computeContextSensitiveRuleFOLLOW()(Code) | | Compute the context-sensitive FOLLOW set for current rule.
This is set of token types that can follow a specific rule
reference given a specific call chain. You get the set of
viable tokens that can possibly come next (lookahead depth 1)
given the current call chain. Contrast this with the
definition of plain FOLLOW for rule r:
FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}
where x in T* and alpha, beta in V*; T is set of terminals and
V is the set of terminals and nonterminals. In other words,
FOLLOW(r) is the set of all tokens that can possibly follow
references to r in *any* sentential form (context). At
runtime, however, we know precisely which context applies as
we have the call chain. We may compute the exact (rather
than covering superset) set of following tokens.
For example, consider grammar:
stat : ID '=' expr ';' // FOLLOW(stat)=={EOF}
| "return" expr '.'
;
expr : atom ('+' atom)* ; // FOLLOW(expr)=={';','.',')'}
atom : INT // FOLLOW(atom)=={'+',')',';','.'}
| '(' expr ')'
;
The FOLLOW sets are all inclusive whereas context-sensitive
FOLLOW sets are precisely what could follow a rule reference.
For input input "i=(3);", here is the derivation:
stat => ID '=' expr ';'
=> ID '=' atom ('+' atom)* ';'
=> ID '=' '(' expr ')' ('+' atom)* ';'
=> ID '=' '(' atom ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ('+' atom)* ';'
=> ID '=' '(' INT ')' ';'
At the "3" token, you'd have a call chain of
stat -> expr -> atom -> expr -> atom
What can follow that specific nested ref to atom? Exactly ')'
as you can see by looking at the derivation of this specific
input. Contrast this with the FOLLOW(atom)={'+',')',';','.'}.
You want the exact viable token set when recovering from a
token mismatch. Upon token mismatch, if LA(1) is member of
the viable next token set, then you know there is most likely
a missing token in the input stream. "Insert" one by just not
throwing an exception.
|
computeErrorRecoverySet | protected BitSet computeErrorRecoverySet()(Code) | | |
consumeUntil | public void consumeUntil(IntStream input, BitSet set)(Code) | | Consume tokens until one matches the given token set
|
emitErrorMessage | public void emitErrorMessage(String msg)(Code) | | Override this method to change where error messages go
|
endResync | public void endResync()(Code) | | |
getBacktrackingLevel | public int getBacktrackingLevel()(Code) | | |
getErrorMessage | public String getErrorMessage(RecognitionException e, String[] tokenNames)(Code) | | What error message should be generated for the various
exception types?
Not very object-oriented code, but I like having all error message
generation within one method rather than spread among all of the
exception classes. This also makes it much easier for the exception
handling because the exception classes do not have to have pointers back
to this object to access utility routines and so on. Also, changing
the message for an exception type would be difficult because you
would have to subclassing exception, but then somehow get ANTLR
to make those kinds of exception objects instead of the default.
This looks weird, but trust me--it makes the most sense in terms
of flexibility.
For grammar debugging, you will want to override this to add
more information such as the stack frame with
getRuleInvocationStack(e, this.getClass().getName()) and,
for no viable alts, the decision description and state etc...
Override this to change the message generated for one or more
exception types.
|
getGrammarFileName | public String getGrammarFileName()(Code) | | For debugging and other purposes, might want the grammar name.
Have ANTLR generate an implementation for this method.
|
getRuleInvocationStack | public List getRuleInvocationStack()(Code) | | Return List of the rules in your parser instance
leading up to a call to this method. You could override if
you want more details such as the file/line info of where
in the parser java code a rule is invoked.
This is very useful for error messages and for context-sensitive
error recovery.
|
getRuleInvocationStack | public static List getRuleInvocationStack(Throwable e, String recognizerClassName)(Code) | | A more general version of getRuleInvocationStack where you can
pass in, for example, a RecognitionException to get it's rule
stack trace. This routine is shared with all recognizers, hence,
static.
TODO: move to a utility class or something; weird having lexer call this
|
getRuleMemoization | public int getRuleMemoization(int ruleIndex, int ruleStartIndex)(Code) | | Given a rule number and a start token index number, return
MEMO_RULE_UNKNOWN if the rule has not parsed input starting from
start index. If this rule has parsed input starting from the
start index before, then return where the rule stopped parsing.
It returns the index of the last token matched by the rule.
For now we use a hashtable and just the slow Object-based one.
Later, we can make a special one for ints and also one that
tosses out data after we commit past input position i.
|
getRuleMemoizationCacheSize | public int getRuleMemoizationCacheSize()(Code) | | return how many rule/input-index pairs there are in total.
TODO: this includes synpreds. :(
|
getTokenErrorDisplay | public String getTokenErrorDisplay(Token t)(Code) | | How should a token be displayed in an error message? The default
is to display just the text, but during development you might
want to have a lot of information spit out. Override in that case
to use t.toString() (which, for CommonToken, dumps everything about
the token). This is better than forcing you to override a method in
your token objects because you don't have to go modify your lexer
so that it creates a new Java type.
|
getTokenNames | public String[] getTokenNames()(Code) | | Used to print out token names like ID during debugging and
error reporting. The generated parsers implement a method
that overrides this to point to their String[] tokenNames.
|
match | public void match(IntStream input, int ttype, BitSet follow) throws RecognitionException(Code) | | Match current input symbol against ttype. Upon error, do one token
insertion or deletion if possible. You can override to not recover
here and bail out of the current production to the normal error
exception catch (at the end of the method) by just throwing
MismatchedTokenException upon input.LA(1)!=ttype.
|
memoize | public void memoize(IntStream input, int ruleIndex, int ruleStartIndex)(Code) | | Record whether or not this rule parsed the input at this position
successfully. Use a standard java hashtable for now.
|
mismatch | protected void mismatch(IntStream input, int ttype, BitSet follow) throws RecognitionException(Code) | | factor out what to do upon token mismatch so tree parsers can behave
differently. Override this method in your parser to do things
like bailing out after the first error; just throw the mte object
instead of calling the recovery method.
|
pushFollow | protected void pushFollow(BitSet fset)(Code) | | Push a rule's follow set using our own hardcoded stack
|
recover | public void recover(IntStream input, RecognitionException re)(Code) | | Recover from an error found on the input stream. Mostly this is
NoViableAlt exceptions, but could be a mismatched token that
the match() routine could not recover from.
|
recoverFromMismatchedElement | protected boolean recoverFromMismatchedElement(IntStream input, RecognitionException e, BitSet follow)(Code) | | This code is factored out from mismatched token and mismatched set
recovery. It handles "single token insertion" error recovery for
both. No tokens are consumed to recover from insertions. Return
true if recovery was possible else return false.
|
recoverFromMismatchedToken | public void recoverFromMismatchedToken(IntStream input, RecognitionException e, int ttype, BitSet follow) throws RecognitionException(Code) | | Attempt to recover from a single missing or extra token.
EXTRA TOKEN
LA(1) is not what we are looking for. If LA(2) has the right token,
however, then assume LA(1) is some extra spurious token. Delete it
and LA(2) as if we were doing a normal match(), which advances the
input.
MISSING TOKEN
If current token is consistent with what could come after
ttype then it is ok to "insert" the missing token, else throw
exception For example, Input "i=(3;" is clearly missing the
')'. When the parser returns from the nested call to expr, it
will have call chain:
stat -> expr -> atom
and it will be trying to match the ')' at this point in the
derivation:
=> ID '=' '(' INT ')' ('+' atom)* ';'
^
match() will see that ';' doesn't match ')' and report a
mismatched token error. To recover, it sees that LA(1)==';'
is in the set of tokens that can follow the ')' token
reference in rule atom. It can assume that you forgot the ')'.
|
reportError | public void reportError(RecognitionException e)(Code) | | Report a recognition problem.
This method sets errorRecovery to indicate the parser is recovering
not parsing. Once in recovery mode, no errors are generated.
To get out of recovery mode, the parser must successfully match
a token (after a resync). So it will go:
1. error occurs
2. enter recovery mode, report error
3. consume until token found in resynch set
4. try to resume parsing
5. next match() will reset errorRecovery mode
|
reset | public void reset()(Code) | | reset the parser's state; subclasses must rewinds the input stream
|
toStrings | public List toStrings(List tokens)(Code) | | A convenience method for use most often with template rewrites.
Convert a List to List
|
|
|