| |
|
| java.lang.Object org.netbeans.editor.Syntax
All known Subclasses: org.netbeans.editor.ext.plain.PlainSyntax, org.netbeans.editor.ext.java.JavaSyntax, org.netbeans.editor.ext.html.HTMLSyntax, org.netbeans.editor.ext.MultiSyntax,
Syntax | public class Syntax (Code) | | Lexical analyzer that works on a given text buffer. It allows to sequentially
parse a given character buffer by calling nextToken() that returns
the token-ids.
After the token is found by calling the nextToken method, the
getTokenOffset() method can be used to get the starting offset of
the current token in the buffer. The getTokenLength() gives the
length of the current token.
The heart of the analyzer is the parseToken() method which parses
the text and returns the token-id of the last token found. The
parseToken() method is called from the nextToken(). It
operates with two important variables. The offset variable
identifies the currently scanned character in the buffer. The
tokenOffset is the begining of the current token. The
state variable that identifies the current internal state of the
analyzer is set accordingly when the characters are parsed. If the
parseToken() recognizes a token, it returns its ID and the
tokenOffset is its begining in the buffer and
offset - tokenOffset is its length. When the token is processed
the value of tokenOffset is set to be the same as current value of
the offset and the parsing continues.
Internal states are the integer constants used internally by analyzer. They
are assigned to the state variable to express that the analyzer
has moved from one state to another. They are usually numbered starting from
zero but they don't have to. The only reserved value is -1 which is reserved
for the INIT state - the initial internal state of the analyzer.
There is also the support for defining the persistent info about the current
state of the analyzer. This info can be later used to restore the parsing
from some particular state instead of parsing from the begining of the
buffer. This feature is very useful if there are the modifications performed
in the document. The info is stored in the StateInfo interface
with the BaseStateInfo as the basic implementation. It enables to
get and set the two important values from the persistent point of view. The
first one is the value of the state variable. The other one is the
difference offset - tokenOffset which is called pre-scan. The
particular analyzer can define additional values important for the persistent
storage. The createStateInfo() can be overriden to create custom
state-info and loadState() and storeState() can be
overriden to get/set the additional values.
The load() method sets the buffer to be parsed. There is a special
parameter in the load() method called position that allows a relation of the
character buffer passed to the load() method and the position of the buffer's
data in the document. For this extended functionality the document must be
passed to the constructor of the lexical analyzer at some level.
author: Miloslav Metelka version: 1.00 |
Inner Class :public interface StateInfo | |
Inner Class :public static class BaseStateInfo implements StateInfo | |
Field Summary | |
final public static int | DIFFERENT_STATE | final public static int | EQUAL_STATE | final public static int | INIT | protected char | buffer | protected boolean | lastBuffer Setting this flag to true means that there are currently no more buffers
available so that analyzer should return all the tokens including those
whose successful scanning would be otherwise left for later when the next
buffer will be available. | protected int | offset | protected int | state Internal state of the lexical analyzer. | protected int | stopOffset On which offset in the buffer scanning should stop. | protected int | stopPosition The position in the document that logically corresponds to the stopOffset
value. | protected TokenID | supposedTokenID This variable can be populated by the parseToken() method in case the
user types an errorneous construction but it's clear what correct token
he meant to write. | protected TokenContextPath | tokenContextPath Path from which the found token-id comes from. | protected int | tokenLength | protected int | tokenOffset |
Method Summary | |
public int | compareState(StateInfo stateInfo) | public StateInfo | createStateInfo() | public char[] | getBuffer() | public int | getOffset() | public int | getPreScan() Get the pre-scan which is a number of characters between offset and
tokenOffset. | public String | getStateName(int stateNumber) Get state name as string. | public TokenID | getSupposedTokenID() | public TokenContextPath | getTokenContextPath() Get the token-context-path of the returned token. | public int | getTokenLength() Get length of token in scanned buffer. | public int | getTokenOffset() Get start of token in scanned buffer. | public void | load(StateInfo stateInfo, char buffer, int offset, int len, boolean lastBuffer, int stopPosition) Load the state from syntax mark into analyzer. | public void | loadInitState() Initialize the analyzer when scanning from the begining of the document
or when the state stored in syntax mark is null for some reason or to
explicitly reset the analyzer to the initial state. | public void | loadState(StateInfo stateInfo) Load valid mark state into the analyzer. | public TokenID | nextToken() Function that should be called externally to scan the text. | protected TokenID | parseToken() This is core function of analyzer and it returns either the token-id or
null to indicate that the end of buffer was found. | public void | relocate(char buffer, int offset, int len, boolean lastBuffer, int stopPosition) Relocate scanning to another buffer. | public void | reset() | public void | storeState(StateInfo stateInfo) Store state of this analyzer into given mark state. | public String | toString() |
DIFFERENT_STATE | final public static int DIFFERENT_STATE(Code) | | Is the state of analyzer different from given state info?
|
EQUAL_STATE | final public static int EQUAL_STATE(Code) | | Is the state of analyzer equal to a given state info?
|
INIT | final public static int INIT(Code) | | Initial internal state of the analyzer
|
buffer | protected char buffer(Code) | | Text buffer to scan
|
lastBuffer | protected boolean lastBuffer(Code) | | Setting this flag to true means that there are currently no more buffers
available so that analyzer should return all the tokens including those
whose successful scanning would be otherwise left for later when the next
buffer will be available. Setting this flag to true ensures that all the
characters in the current buffer will be processed. The lexical analyzer
should on one hand process all the characters but on the other hand it
should "save" its context. For example if the scanner finds the unclosed
comment at the end of the buffer it should return the comment token but
stay in the "being in comment" internal state.
|
offset | protected int offset(Code) | | Current offset in the buffer
|
state | protected int state(Code) | | Internal state of the lexical analyzer. At the begining it's set to INIT
value but it is changed by parseToken() as the characters are
processed one by one.
|
stopOffset | protected int stopOffset(Code) | | On which offset in the buffer scanning should stop.
|
stopPosition | protected int stopPosition(Code) | | The position in the document that logically corresponds to the stopOffset
value. If there's no relation to the document, it's -1. The reason why
the relation to the document's data is expressed through the stopOffset
to stopPosition relation is because the stopOffset is the only offset
that doesn't change rapidly in the operation of the lexical analyzer.
|
supposedTokenID | protected TokenID supposedTokenID(Code) | | This variable can be populated by the parseToken() method in case the
user types an errorneous construction but it's clear what correct token
he meant to write. For example if the user writes a single '0x' it's an
errorneous construct but it's clear that the user wants to enter the
hexa-number. In this situation the parseToken() should report error, but
it should also set the supposedTokenID to the hexa-number token. This
information is used while drawing the text. If the caret stand inside or
around such token, it calls the getSupposedTokenID() after calling the
nextToken() and if it's non-null it uses it instead of the original
token.
|
tokenContextPath | protected TokenContextPath tokenContextPath(Code) | | Path from which the found token-id comes from. The
TokenContext.getContextPath() can be used to get the path. If
the lexical analyzer doesn't use any children token-contexts it can
assign the path in the constructor.
|
tokenLength | protected int tokenLength(Code) | | This variable is the length of the token that was found
|
tokenOffset | protected int tokenOffset(Code) | | Offset holding the begining of the current token
|
compareState | public int compareState(StateInfo stateInfo)(Code) | | Compare state of this analyzer to given state info
|
createStateInfo | public StateInfo createStateInfo()(Code) | | Create state info appropriate for particular analyzer
|
getBuffer | public char[] getBuffer()(Code) | | Get the current buffer
|
getOffset | public int getOffset()(Code) | | Get the current scanning offset
|
getPreScan | public int getPreScan()(Code) | | Get the pre-scan which is a number of characters between offset and
tokenOffset. If there's no more characters in the current buffer, the
analyzer returns EOT, but it can be in a state when there are already
some characters parsed at the end of the current buffer but the token is
still incomplete and it cannot be returned yet. The pre-scan value helps
to determine how many characters from the end of the current buffer
should be present at the begining of the next buffer so that the current
incomplete token can be returned as the first token when parsing the next
buffer.
|
getStateName | public String getStateName(int stateNumber)(Code) | | Get state name as string. It can be used for debugging purposes by
developer of new syntax analyzer. The states that this function
recognizes can include all constants used in analyzer so that it can be
used everywhere in analyzer to convert numbers to more practical strings.
|
getTokenContextPath | public TokenContextPath getTokenContextPath()(Code) | | Get the token-context-path of the returned token.
|
getTokenLength | public int getTokenLength()(Code) | | Get length of token in scanned buffer.
|
getTokenOffset | public int getTokenOffset()(Code) | | Get start of token in scanned buffer.
|
load | public void load(StateInfo stateInfo, char buffer, int offset, int len, boolean lastBuffer, int stopPosition)(Code) | | Load the state from syntax mark into analyzer. This method is used when
Parameters: stateInfo - info about the state of the lexical analyzer to load. It canbe null to indicate there's no previous state so the analyzerstarts from its initial state. Parameters: buffer - buffer that will be scanned Parameters: offset - offset of the first character that will be scanned Parameters: len - length of the area to be scanned Parameters: lastBuffer - whether this is the last buffer in the document. All thetokens will be returned including the last possibly incompleteone. If the data come from the document, the simple rule forthis parameter is (doc.getLength() == stop-position) wherestop-position is the position corresponding to the (offset +len) in the buffer that comes from the document data. Parameters: stopPosition - position in the document that corresponds to (offset + len)offset in the provided buffer. It has only sense if the datain the buffer come from the document. It helps in writing theadvanced analyzers that need to interact with some other datain the document than only those provided in the characterbuffer. If there is no relation to the document data, thestopPosition parameter must be filled with -1 which means aninvalid value. The stop-position is passed (instead ofstart-position) because it doesn't change through the analyzeroperation. It corresponds to the stopOffset thatalso doesn't change through the analyzer operation so anybuffer-offset can be transferred to position by computingstopPosition + buffer-offset - stopOffset wherestopOffset is the instance variable that is assigned tooffset + len in the body of relocate(). |
loadInitState | public void loadInitState()(Code) | | Initialize the analyzer when scanning from the begining of the document
or when the state stored in syntax mark is null for some reason or to
explicitly reset the analyzer to the initial state. The offsets must not
be touched by this method.
|
loadState | public void loadState(StateInfo stateInfo)(Code) | | Load valid mark state into the analyzer. Offsets are already initialized
when this method is called. This method must get the state from the mark
and set it to the analyzer. Then it must decrease tokenOffset by the
preScan stored in the mark state.
Parameters: markState - mark state to be loaded into syntax. It must be non-nullvalue. |
nextToken | public TokenID nextToken()(Code) | | Function that should be called externally to scan the text. It manages
the call to parseToken() and cares about the proper setting of the
offsets. It can be extended to support any custom debugging required.
|
parseToken | protected TokenID parseToken()(Code) | | This is core function of analyzer and it returns either the token-id or
null to indicate that the end of buffer was found. The function scans the
active character and does one or more of the following actions: 1. change
internal analyzer state 2. set the token-context-path and return token-id
3. adjust current position to signal different end of token; the
character that offset points to is not included in the token
|
relocate | public void relocate(char buffer, int offset, int len, boolean lastBuffer, int stopPosition)(Code) | | Relocate scanning to another buffer. This is used to continue scanning
after previously reported EOT. Relocation delta between current offset
and the requested offset is computed and all the offsets are relocated.
If there's a non-zero preScan in the analyzer, it is a caller's
responsibility to provide all the preScan characters in the relocation
buffer.
Parameters: buffer - next buffer where the scan will continue. Parameters: offset - offset where the scan will continue. It's not decremented bythe current preScan. Parameters: len - length of the area to be scanned. It's not extended by thecurrent preScan. Parameters: lastBuffer - whether this is the last buffer in the document. All thetokens will be returned including the last possibly incompleteone. If the data come from the document, the simple rule forthis parameter is (doc.getLength() == stop-position) wherestop-position is the position corresponding to the (offset +len) in the buffer that comes from the document data. Parameters: stopPosition - position in the document that corresponds to (offset + len)offset in the provided buffer. It has only sense if the datain the buffer come from the document. It helps in writing theadvanced analyzers that need to interact with some other datain the document than only those provided in the characterbuffer. If there is no relation to the document data, thestopPosition parameter must be filled with -1 which means aninvalid value. The stop-position is passed (instead ofstart-position) because it doesn't change through the analyzeroperation. It corresponds to the stopOffset thatalso doesn't change through the analyzer operation so anybuffer-offset can be transferred to position by computingstopPosition + buffer-offset - stopOffset wherestopOffset is the instance variable that is assigned tooffset + len in the body of relocate(). |
reset | public void reset()(Code) | | |
storeState | public void storeState(StateInfo stateInfo)(Code) | | Store state of this analyzer into given mark state.
|
toString | public String toString()(Code) | | Syntax information as String
|
|
|
|