| java.lang.Object org.antlr.tool.NFAFactory
NFAFactory | public class NFAFactory (Code) | | Routines to construct StateClusters from EBNF grammar constructs.
No optimization is done to remove unnecessary epsilon edges.
TODO: add an optimization that reduces number of states and transitions
will help with speed of conversion and make it easier to view NFA. For
example, o-A->o-->o-B->o should be o-A->o-B->o
|
Method Summary | |
public StateCluster | build_AB(StateCluster A, StateCluster B) From A B build A-e->B (that is, build an epsilon arc from right
of A to left of B). | public StateCluster | build_AlternativeBlock(List alternativeStateClusters) From A|B|..|Z alternative block build
o->o-A->o->o (last NFAState is blockEndNFAState pointed to by all alts)
| ^
o->o-B->o--|
| |
... | public StateCluster | build_AlternativeBlockFromSet(StateCluster set) | public StateCluster | build_Aoptional(StateCluster A) | public StateCluster | build_Aplus(StateCluster A) From (A)+ build
|---| (Transition 2 from A.right points at alt 1)
v | (follow of loop is Transition 1)
o->o-A-o->o
Meaning that the last NFAState in A points back to A's left Transition NFAState
and we add a new begin/end NFAState. | public StateCluster | build_Astar(StateCluster A) From (A)* build
|---|
v |
o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1)
| ^
o---------| (optional branch is 2nd alt of optional block containing A+)
Meaning that the last (end) NFAState in A points back to A's
left side NFAState and we add 3 new NFAStates (the
optional branch is built just like an optional subrule).
See the Aplus() method for more on the loop back Transition.
The new node on right edge is set to RIGHT_EDGE_OF_CLOSURE so we
can detect nested (A*)* loops and insert an extra node. | public StateCluster | build_Atom(int label) | public StateCluster | build_CharLiteralAtom(String charLiteral) | public StateCluster | build_CharRange(String a, String b) From char 'c' build StateCluster o-intValue(c)->o
can include unicode spec likes '\u0024' later. | public int | build_EOFStates(List rules) add an EOF transition to any rule end NFAState that points to nothing
(i.e., for all those rules not invoked by another rule). | public StateCluster | build_Epsilon() | public StateCluster | build_Range(int a, int b) Can only complement block of simple alts; can complement build_Set()
result, that is. | public StateCluster | build_RuleRef(int ruleIndex, NFAState ruleStart) For reference to rule r, build
o-e->(r) o
where (r) is the start of rule r and the trailing o is not linked
to from rule ref state directly (it's done thru the transition(0)
RuleClosureTransition. | public StateCluster | build_SemanticPredicate(GrammarAST pred) Build what amounts to an epsilon transition with a semantic
predicate action. | public StateCluster | build_Set(IntSet set) From set build single edge graph o->o-set->o. | public StateCluster | build_StringLiteralAtom(String stringLiteral) For a non-lexer, just build a simple token reference atom.
For a lexer, a string is a sequence of char to match. | public StateCluster | build_Wildcard() | protected IntSet | getCollapsedBlockAsSet(State blk) Given a collapsed block of alts (a set of atoms), pull out
the set and return it. | public int | getNumberOfStates() | public NFAState | newState() | public void | optimizeAlternative(StateCluster alt) Optimize an alternative (list of grammar elements).
Walk the chain of elements (which can be complicated loop blocks...)
and throw away any epsilon transitions used to link up simple elements.
This only removes 195 states from the java.g's NFA, but every little
bit helps. |
nfa | NFA nfa(Code) | | This factory is attached to a specifc NFA that it is building.
The NFA will be filled up with states and transitions.
|
stateCounter | protected int stateCounter(Code) | | Used to assign state numbers
|
build_AB | public StateCluster build_AB(StateCluster A, StateCluster B)(Code) | | From A B build A-e->B (that is, build an epsilon arc from right
of A to left of B).
As a convenience, return B if A is null or return A if B is null.
|
build_AlternativeBlock | public StateCluster build_AlternativeBlock(List alternativeStateClusters)(Code) | | From A|B|..|Z alternative block build
o->o-A->o->o (last NFAState is blockEndNFAState pointed to by all alts)
| ^
o->o-B->o--|
| |
... |
| |
o->o-Z->o--|
So every alternative gets begin NFAState connected by epsilon
and every alt right side points at a block end NFAState. There is a
new NFAState in the NFAState in the StateCluster for each alt plus one for the
end NFAState.
Special case: only one alternative: don't make a block with alt
begin/end.
Special case: if just a list of tokens/chars/sets, then collapse
to a single edge'd o-set->o graph.
Set alt number (1..n) in the left-Transition NFAState.
|
build_AlternativeBlockFromSet | public StateCluster build_AlternativeBlockFromSet(StateCluster set)(Code) | | From a set ('a'|'b') build
o->o-'a'..'b'->o->o (last NFAState is blockEndNFAState pointed to by all alts)
|
build_Aoptional | public StateCluster build_Aoptional(StateCluster A)(Code) | | From (A)? build either:
o--A->o
| ^
o---->|
or, if A is a block, just add an empty alt to the end of the block
|
build_Aplus | public StateCluster build_Aplus(StateCluster A)(Code) | | From (A)+ build
|---| (Transition 2 from A.right points at alt 1)
v | (follow of loop is Transition 1)
o->o-A-o->o
Meaning that the last NFAState in A points back to A's left Transition NFAState
and we add a new begin/end NFAState. A can be single alternative or
multiple.
During analysis we'll call the follow link (transition 1) alt n+1 for
an n-alt A block.
|
build_Astar | public StateCluster build_Astar(StateCluster A)(Code) | | From (A)* build
|---|
v |
o->o-A-o--o (Transition 2 from block end points at alt 1; follow is Transition 1)
| ^
o---------| (optional branch is 2nd alt of optional block containing A+)
Meaning that the last (end) NFAState in A points back to A's
left side NFAState and we add 3 new NFAStates (the
optional branch is built just like an optional subrule).
See the Aplus() method for more on the loop back Transition.
The new node on right edge is set to RIGHT_EDGE_OF_CLOSURE so we
can detect nested (A*)* loops and insert an extra node. Previously,
two blocks shared same EOB node.
There are 2 or 3 decision points in a A*. If A is not a block (i.e.,
it only has one alt), then there are two decisions: the optional bypass
and then loopback. If A is a block of alts, then there are three
decisions: bypass, loopback, and A's decision point.
Note that the optional bypass must be outside the loop as (A|B)* is
not the same thing as (A|B|)+.
This is an accurate NFA representation of the meaning of (A)*, but
for generating code, I don't need a DFA for the optional branch by
virtue of how I generate code. The exit-loopback-branch decision
is sufficient to let me make an appropriate enter, exit, loop
determination. See codegen.g
|
build_CharLiteralAtom | public StateCluster build_CharLiteralAtom(String charLiteral)(Code) | | From char 'c' build StateCluster o-intValue(c)->o
|
build_CharRange | public StateCluster build_CharRange(String a, String b)(Code) | | From char 'c' build StateCluster o-intValue(c)->o
can include unicode spec likes '\u0024' later. Accepts
actual unicode 16-bit now, of course, by default.
TODO not supplemental char clean!
|
build_EOFStates | public int build_EOFStates(List rules)(Code) | | add an EOF transition to any rule end NFAState that points to nothing
(i.e., for all those rules not invoked by another rule). These
are start symbols then.
Return the number of grammar entry points; i.e., how many rules are
not invoked by another rule (they can only be invoked from outside).
These are the start rules.
|
build_Epsilon | public StateCluster build_Epsilon()(Code) | | From an empty alternative build StateCluster o-e->o
|
build_Range | public StateCluster build_Range(int a, int b)(Code) | | Can only complement block of simple alts; can complement build_Set()
result, that is. Get set and complement, replace old with complement.
public StateCluster build_AlternativeBlockComplement(StateCluster blk) {
State s0 = blk.left;
IntSet set = getCollapsedBlockAsSet(s0);
if ( set!=null ) {
// if set is available, then structure known and blk is a set
set = nfa.grammar.complement(set);
Label label = s0.transition(0).target.transition(0).label;
label.setSet(set);
}
return blk;
}
|
build_RuleRef | public StateCluster build_RuleRef(int ruleIndex, NFAState ruleStart)(Code) | | For reference to rule r, build
o-e->(r) o
where (r) is the start of rule r and the trailing o is not linked
to from rule ref state directly (it's done thru the transition(0)
RuleClosureTransition.
If the rule r is just a list of tokens, it's block will be just
a set on an edge o->o->o-set->o->o->o, could inline it rather than doing
the rule reference, but i'm not doing this yet as I'm not sure
it would help much in the NFA->DFA construction.
TODO add to codegen: collapse alt blks that are sets into single matchSet
|
build_SemanticPredicate | public StateCluster build_SemanticPredicate(GrammarAST pred)(Code) | | Build what amounts to an epsilon transition with a semantic
predicate action. The pred is a pointer into the AST of
the SEMPRED token.
|
build_Set | public StateCluster build_Set(IntSet set)(Code) | | From set build single edge graph o->o-set->o. To conform to
what an alt block looks like, must have extra state on left.
|
build_StringLiteralAtom | public StateCluster build_StringLiteralAtom(String stringLiteral)(Code) | | For a non-lexer, just build a simple token reference atom.
For a lexer, a string is a sequence of char to match. That is,
"fog" is treated as 'f' 'o' 'g' not as a single transition in
the DFA. Machine== o-'f'->o-'o'->o-'g'->o and has n+1 states
for n characters.
|
build_Wildcard | public StateCluster build_Wildcard()(Code) | | Build an atom with all possible values in its label
|
getCollapsedBlockAsSet | protected IntSet getCollapsedBlockAsSet(State blk)(Code) | | Given a collapsed block of alts (a set of atoms), pull out
the set and return it.
|
getNumberOfStates | public int getNumberOfStates()(Code) | | |
optimizeAlternative | public void optimizeAlternative(StateCluster alt)(Code) | | Optimize an alternative (list of grammar elements).
Walk the chain of elements (which can be complicated loop blocks...)
and throw away any epsilon transitions used to link up simple elements.
This only removes 195 states from the java.g's NFA, but every little
bit helps. Perhaps I can improve in the future.
|
|
|