Token updater fixes a list of tokens constructed for a document
after text of the document gets modified.
Subclasses need to define all the abstract methods
so that the updating method can work on real token sequences.
Updater looks similar to list iterator
but there are differences in the semantics
of iterator's modification operations.
The algorithm used in the
TokenListUpdater.update(int,int) is based on "General Incremental Lexical Analysis" written
by Tim A. Wagner and Susan L. Graham, University
of California, Berkeley. It's available online
at
twagner-lexing.pdf.
Ending EOF token is not used but the lookahead
of the ending token(s) is increased by one (past the end of the input)
if they have reached the EOF.
Non-startable tokens are not supported.
When updating a token with lookback one as a result
of modification the lookahead of the preceding token is inspected
to find out whether the modification has really affected it.
This can often save the previous token from being relexed.
Currently the algorithm computes the lookback values on the fly
and it does not store the lookback in the tokens. For typical languages
the lookback is reasonably small (0, 1 or 2) so it's usually not worth
to consume extra space in token instances for storing of the lookback.
There would also be an additional overhead of updating the lookback
values in the tokens after the modification and the algorithm code would
be somewhat less readable.
The algorithm removes the affected tokens in the natural order as they
follow in the token stream. That can be used when the removed tokens
need to be collected (e.g. in an array).
If the offset and state after token recognition matches
the end offset and state after recognition of the originally present
token then the relexing is stopped because a match was found and the newly
produced tokens would match the present ones.
Otherwise the token(s) in the list are removed and replaced
by the relexed token and the relexing continues until a match is reached.
author: Miloslav Metelka version: 1.00 |