| A module to perform optimizations on DFAs.
I could more easily (and more quickly) do some optimizations (such as
PRUNE_EBNF_EXIT_BRANCHES) during DFA construction, but then it
messes up the determinism checking. For example, it looks like
loop exit branches are unreachable if you prune exit branches
during DFA construction and before determinism checks.
In general, ANTLR's NFA->DFA->codegen pipeline seems very robust
to me which I attribute to a uniform and consistent set of data
structures. Regardless of what I want to "say"/implement, I do so
within the confines of, for example, a DFA. The code generator
can then just generate code--it doesn't have to do much thinking.
Putting optimizations in the code gen code really starts to make
it a spagetti factory (uh oh, now I'm hungry!). The pipeline is
very testable; each stage has well defined input/output pairs.
### Optimization: PRUNE_EBNF_EXIT_BRANCHES
There is no need to test EBNF block exit branches. Not only is it
an unneeded computation, but counter-intuitively, you actually get
better errors. You can report an error at the missing or extra
token rather than as soon as you've figured out you will fail.
Imagine optional block "( DOT CLASS )? SEMI". ANTLR generates:
int alt=0;
if ( input.LA(1)==DOT ) {
alt=1;
}
else if ( input.LA(1)==SEMI ) {
alt=2;
}
Clearly, since Parser.match() will ultimately find the error, we
do not want to report an error nor do we want to bother testing
lookahead against what follows the (...)? We want to generate
simply "should I enter the subrule?":
int alt=2;
if ( input.LA(1)==DOT ) {
alt=1;
}
NOTE 1. Greedy loops cannot be optimized in this way. For example,
"(greedy=false:'x'|.)* '\n'". You specifically need the exit branch
to tell you when to terminate the loop as the same input actually
predicts one of the alts (i.e., staying in the loop).
NOTE 2. I do not optimize cyclic DFAs at the moment as it doesn't
seem to work. ;) I'll have to investigate later to see what work I
can do on cyclic DFAs to make them have fewer edges. Might have
something to do with the EOT token.
### PRUNE_SUPERFLUOUS_EOT_EDGES
When a token is a subset of another such as the following rules, ANTLR
quietly assumes the first token to resolve the ambiguity.
EQ : '=' ;
ASSIGNOP : '=' | '+=' ;
It can yield states that have only a single edge on EOT to an accept
state. This is a waste and messes up my code generation. ;) If
Tokens rule DFA goes
s0 -'='-> s3 -EOT-> s5 (accept)
then s5 should be pruned and s3 should be made an accept. Do NOT do this
for keyword versus ID as the state with EOT edge emanating from it will
also have another edge.
### Optimization: COLLAPSE_ALL_INCIDENT_EDGES
Done during DFA construction. See method addTransition() in
NFAToDFAConverter.
### Optimization: MERGE_STOP_STATES
Done during DFA construction. See addDFAState() in NFAToDFAConverter.
|