A DFA state represents a set of possible NFA configurations.
As Aho, Sethi, Ullman p. 117 says "The DFA uses its state
to keep track of all possible states the NFA can be in after
reading each input symbol. That is to say, after reading
input a1a2..an, the DFA is in a state that represents the
subset T of the states of the NFA that are reachable from the
NFA's start state along some path labeled a1a2..an."
In conventional NFA->DFA conversion, therefore, the subset T
would be a bitset representing the set of states the
NFA could be in. We need to track the alt predicted by each
state as well, however. More importantly, we need to maintain
a stack of states, tracking the closure operations as they
jump from rule to rule, emulating rule invocations (method calls).
Recall that NFAs do not normally have a stack like a pushdown-machine
so I have to add one to simulate the proper lookahead sequences for
the underlying LL grammar from which the NFA was derived.
I use a list of NFAConfiguration objects. An NFAConfiguration
is both a state (ala normal conversion) and an NFAContext describing
the chain of rules (if any) followed to arrive at that state. There
is also the semantic context, which is the "set" of predicates found
on the path to this configuration.
A DFA state may have multiple references to a particular state,
but with different NFAContexts (with same or different alts)
meaning that state was reached via a different set of rule invocations.
abortedDueToMultipleRecursiveAlts If we detect recursion on more than one alt, decision is non-LL(*),
but try to isolate it to only those states whose closure operations
detect recursion.
protected boolean
abortedDueToRecursionOverflow If a closure operation finds that we tried to invoke the same
rule too many times (stack would grow beyond a threshold), it
marks the state has aborted and notifies the DecisionProbe.
protected int
acceptStateReachable The NFA->DFA algorithm may terminate leaving some states
without a path to an accept state, implying that upon certain
input, the decision is not deterministic--no decision about
predicting a unique alternative can be made.
protected int
cachedHashCode Build up the hash code for this state as NFA configurations
are added as it's monotonically increasing list of configurations.
reachableLabels As this state is constructed (i.e., as NFA states are added), we
can easily check for non-epsilon transitions because the only
transition that could be a valid label is transition(0).
protected boolean
resolvedWithPredicates Rather than recheck every NFA configuration in a DFA state (after
resolving) in findNewDFAStatesAndAddDFATransitions just check
this boolean.
addReachableLabel(Label label) Add label uniquely and disjointly; intersection with
another set or int/char forces breaking up the set(s).
Example, if reachable list of labels is [a..z, {k,9}, 0..9],
the disjoint list will be [{a..j,l..z}, k, 9, 0..8].
As we add NFA configurations to a DFA state, we might as well track
the set of all possible transition labels to make the DFA conversion
more efficient.
getConflictingAlts() Walk each NFA configuration in this DFA state looking for a conflict
where (s|i|ctx) and (s|j|ctx) exist, indicating that state s with
context conflicting ctx predicts alts i and j.
getGatedPredicatesInNFAConfigurations() For gated productions, we need an OR'd list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
For syntactic predicates, we only want to generate predicate
evaluations as it transitions to an accept state; waste to
do it earlier.
getSyntacticPredicatesInNFAConfigurations() For gated productions, we need a list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
getUniqueAlt() Return the uniquely mentioned alt from the NFA configurations;
Ignore the resolved bit etc...
public int
getUniquelyPredictedAlt() Walk each configuration and if they are all the same alt, return
that alt else return NFA.INVALID_ALT_NUMBER.
public int
hashCode() A decent hash for a DFA state is the sum of the NFA state/alt pairs.
This is used when we add DFAState objects to the DFA.states Map and
when we compare DFA states.
If we detect recursion on more than one alt, decision is non-LL(*),
but try to isolate it to only those states whose closure operations
detect recursion. There may be other alts that are cool:
a : recur '.'
| recur ';'
| X Y // LL(2) decision; don't abort and use k=1 plus backtracking
| X Z
;
If a closure operation finds that we tried to invoke the same
rule too many times (stack would grow beyond a threshold), it
marks the state has aborted and notifies the DecisionProbe.
The NFA->DFA algorithm may terminate leaving some states
without a path to an accept state, implying that upon certain
input, the decision is not deterministic--no decision about
predicting a unique alternative can be made. Recall that an
accept state is one in which a unique alternative is predicted.
Used to prevent the closure operation from looping to itself and
hence looping forever. Sensitive to the NFA state, the alt, and
the context. This just the nfa config set because we want to
prevent closures only on states contributed by closure not reach
operations.
When doing an acyclic DFA, this is the number of lookahead symbols
consumed to reach this state. This value may be nonzero for most
dfa states, but it is only a valid value if the user has specified
a max fixed lookahead.
As this state is constructed (i.e., as NFA states are added), we
can easily check for non-epsilon transitions because the only
transition that could be a valid label is transition(0). When we
process this node eventually, we'll have to walk all states looking
for all possible transitions. That is of the order: size(label space)
times size(nfa states), which can be pretty damn big. It's better
to simply track possible labels.
This is type List
Rather than recheck every NFA configuration in a DFA state (after
resolving) in findNewDFAStatesAndAddDFATransitions just check
this boolean. Saves a linear walk perhaps DFA state creation.
Every little bit helps.
Add an NFA configuration to this DFA node. Add uniquely
an NFA state/alt/syntactic&semantic context (chain of invoking state(s)
and semantic predicate contexts).
I don't see how there could be two configurations with same
state|alt|synCtx and different semantic contexts because the
semantic contexts are computed along the path to a particular state
so those two configurations would have to have the same predicate.
Nonetheless, the addition of configurations is unique on all
configuration info. I guess I'm saying that syntactic context
implies semantic context as the latter is computed according to the
former.
As we add configurations to this DFA state, track the set of all possible
transition labels so we can simply walk it later rather than doing a
loop over all possible labels in the NFA.
Add label uniquely and disjointly; intersection with
another set or int/char forces breaking up the set(s).
Example, if reachable list of labels is [a..z, {k,9}, 0..9],
the disjoint list will be [{a..j,l..z}, k, 9, 0..8].
As we add NFA configurations to a DFA state, we might as well track
the set of all possible transition labels to make the DFA conversion
more efficient. W/o the reachable labels, we'd need to check the
whole vocabulary space (could be 0..\uFFFF)! The problem is that
labels can be sets, which may overlap with int labels or other sets.
As we need a deterministic set of transitions from any
state in the DFA, we must make the reachable labels set disjoint.
This operation amounts to finding the character classes for this
DFA state whereas with tools like flex, that need to generate a
homogeneous DFA, must compute char classes across all states.
We are going to generate DFAs with heterogeneous states so we
only care that the set of transitions out of a single state are
unique. :)
The idea for adding a new set, t, is to look for overlap with the
elements of existing list s. Upon overlap, replace
existing set s[i] with two new disjoint sets, s[i]-t and s[i]&t.
(if s[i]-t is nil, don't add). The remainder is t-s[i], which is
what you want to add to the set minus what was already there. The
remainder must then be compared against the i+1..n elements in s
looking for another collision. Each collision results in a smaller
and smaller remainder. Stop when you run out of s elements or
remainder goes to nil. If remainder is non nil when you run out of
s elements, then add remainder to the end.
Single element labels are treated as sets to make the code uniform.
Two DFAStates are equal if their NFA configuration sets are the
same. This method is used to see if a DFA state already exists.
Because the number of alternatives and number of NFA configurations are
finite, there is a finite number of DFA states that can be processed.
This is necessary to show that the algorithm terminates.
Cannot test the state numbers here because in DFA.addState we need
to know if any other state exists that has this exact set of NFA
configurations. The DFAState state number is irrelevant.
Walk each NFA configuration in this DFA state looking for a conflict
where (s|i|ctx) and (s|j|ctx) exist, indicating that state s with
context conflicting ctx predicts alts i and j. Return an Integer set
of the alternative numbers that conflict. Two contexts conflict if
they are equal or one is a stack suffix of the other or one is
the empty context.
Use a hash table to record the lists of configs for each state
as they are encountered. We need only consider states for which
there is more than one configuration. The configurations' predicted
alt must be different or must have different contexts to avoid a
conflict.
Don't report conflicts for DFA states that have conflicting Tokens
rule NFA states; they will be resolved in favor of the first rule.
When more than one alternative can match the same input, the first
alternative is chosen to resolve the conflict. The other alts
are "turned off" by setting the "resolved" flag in the NFA
configurations. Return the set of disabled alternatives. For
a : A | A | A ;
this method returns {2,3} as disabled. This does not mean that
the alternative is totally unreachable, it just means that for this
DFA state, that alt is disabled. There may be other accept states
for that alt.
For gated productions, we need an OR'd list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
For syntactic predicates, we only want to generate predicate
evaluations as it transitions to an accept state; waste to
do it earlier. So, only add gated preds derived from manually-
specified syntactic predicates if this is an accept state.
Also, since configurations w/o gated predicates are like true
gated predicates, finding a configuration whose alt has no gated
predicate implies we should evaluate the predicate to true. This
means the whole edge has to be ungated. Consider:
X : ('a' | {p}?=> 'a')
| 'a' 'b'
;
Here, you 'a' gets you from s0 to s1 but you can't test p because
plain 'a' is ok. It's also ok for starting alt 2. Hence, you can't
test p. Even on the edge going to accept state for alt 1 of X, you
can't test p. You can get to the same place with and w/o the context.
Therefore, it is never ok to test p in this situation.
TODO: cache this as it's called a lot; or at least set bit if >1 present in state
public int getNumberOfEOTNFAStates() {
int n = 0;
Iterator iter = nfaConfigurations.iterator();
NFAConfiguration configuration;
while (iter.hasNext()) {
configuration = (NFAConfiguration) iter.next();
NFAState s = dfa.nfa.getState(configuration.state);
if ( s.isEOTState() ) {
n++;
}
}
return n;
}
For gated productions, we need a list of all predicates for the
target of an edge so we can gate the edge based upon the predicates
associated with taking that path (if any).
experimental 11/29/2005
public Set getGatedPredicatesInNFAConfigurations() {
Set preds = new HashSet();
Iterator iter = nfaConfigurations.iterator();
NFAConfiguration configuration;
while (iter.hasNext()) {
configuration = (NFAConfiguration) iter.next();
if ( configuration.semanticContext.isGated() ) {
preds.add(configuration.semanticContext);
}
}
if ( preds.size()==0 ) {
return null;
}
return preds;
}
Return the uniquely mentioned alt from the NFA configurations;
Ignore the resolved bit etc... Return INVALID_ALT_NUMBER
if there is more than one alt mentioned.
Walk each configuration and if they are all the same alt, return
that alt else return NFA.INVALID_ALT_NUMBER. Ignore resolved
configurations, but don't ignore resolveWithPredicate configs
because this state should not be an accept state. We need to add
this to the work list and then have semantic predicate edges
emanating from it.
A decent hash for a DFA state is the sum of the NFA state/alt pairs.
This is used when we add DFAState objects to the DFA.states Map and
when we compare DFA states. Computed in addNFAConfiguration()