| java.lang.Object de.susebox.jtopas.Token
Token | public class Token (Code) | |
Instances of this class are created by the classes implementing the
Tokenizer interface. Token describes a portion of text
according to the settings given to the producing
Tokenizer in form of
a
TokenizerProperties object. Beside the token type the token image
itself, its position in the input stream, line and column position and associated
informations can be obtained from the Token (provided, the nessecary
parse flags are set in the tokenizer).
This class replaces the older
de.susebox.java.util.Token which is
deprecated.
author: Heiko Blau See Also: Tokenizer See Also: TokenizerProperties |
Field Summary | |
final public static byte | BLOCK_COMMENT Block comments are also a special form of a whitespace sequence. | final public static byte | EOF A token of the type EOF is used to indicate an end-of-line condition
on the input stream of the tokenizer. | final public static byte | KEYWORD The token is a keyword registered with the used
Tokenizer . | final public static byte | LINE_COMMENT Although a line comment is - in most cases - actually a whitespace sequence, it
is often nessecary to handle it separately. | final public static byte | NORMAL The token is nothing special (no keyword, no whitespace, etc.). | final public static byte | PATTERN The token matches a pattern. | final public static byte | SEPARATOR Separators are otherwise not remarkable characters. | final public static byte | SPECIAL_SEQUENCE Special sequences are characters or character combinations that have a certain
meaning to the parsed language or dialect. | final public static byte | STRING The token is one of the quoted strings known to the
Tokenizer . | final public static byte | UNKNOWN This is for the leftovers of the lexical analysis of a text. | final public static byte | WHITESPACE Whitespaces are portions of the text, that contain one or more characters
that separate the significant parts of the text. | protected Object | _companion An information associated with the token. | protected int | _endColumn The column where the token ends in the source of data. | protected int | _endLine The line where the token ends in the source of data. | protected String | _image The string representing the token. | protected String[] | _imageParts Array with the image parts. | protected int | _length The length of the string representing the token.. | protected int | _startColumn The column where the token starts in the source of data. | protected int | _startLine The line where the token starts in the source of data. | protected int | _startPosition The absolute position where the token starts in the source of data. | protected int | _type The token type. |
Constructor Summary | |
public | Token() Default constructor. | public | Token(int type) Constructs a token of a given type. | public | Token(int type, String image) Construct a token of a given type with the given image. | public | Token(int type, String image, Object companion) Construct a token of a given type with the given image and a companion. |
Method Summary | |
public boolean | equals(Object object) Implementation of the well known method
java.lang.Object.equals .
Note that two token are equal if every member of it is equal. | public Object | getCompanion() Obtaining the associated information of the token. | public int | getEndColumn() Obtaining the column number where the Token ends. | public int | getEndLine() Obtaining the line number where the token ends. | public int | getEndPosition() Obtaining the end position of this token. | public String | getImage() Obtaining the token image as a
java.lang.String . | public String[] | getImageParts() Image parts are substrings of a token image. | public int | getLength() Obtaining the length of the token. | public int | getStartColumn() Obtaining the column number of the Token start. | public int | getStartLine() Obtaining the line number where the Token starts. | public int | getStartPosition() Obtaining the starting position of the token. | public int | getType() Obtaining the type of the Token . | public static String | getTypeName(int type) Getting a type name for displaying. | public void | setCompanion(Object companion) Some token may have associated informations for the user of the Token . | public void | setEndColumn(int colno) In
Tokenizer 's counting lines and columns, this method is used to set the
column number where the end of the Token was found.
The end column number is the one of the first character that does
NOT belongs to the token. | public void | setEndLine(int lineno) In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the end of the Token was found. | public void | setEndPosition(int endPosition) Setting the end position of the token relative to the start of the input
stream. | public void | setImage(String image) Setting the token image. | public void | setImageParts(String[] imageParts) The counterpart to
Token.getImageParts . | public void | setLength(int length) Setting the length of the token. | public void | setStartColumn(int colno) In
Tokenizer 's counting lines and columns, this method is used to
set the column number where the beginning of the Token was
found. | public void | setStartLine(int lineno) In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the beginning of the Token was found. | public void | setStartPosition(int startPosition) Setting the start position of the token relative to the start of the input
stream. | public void | setType(int type) Setting the type property of the Token . | public String | toString() Implementation of the well known method
java.lang.Object.toString . |
BLOCK_COMMENT | final public static byte BLOCK_COMMENT(Code) | | Block comments are also a special form of a whitespace sequence. See
Token.LINE_COMMENT for details.
|
EOF | final public static byte EOF(Code) | | A token of the type EOF is used to indicate an end-of-line condition
on the input stream of the tokenizer.
|
KEYWORD | final public static byte KEYWORD(Code) | | The token is a keyword registered with the used
Tokenizer .
|
LINE_COMMENT | final public static byte LINE_COMMENT(Code) | | Although a line comment is - in most cases - actually a whitespace sequence, it
is often nessecary to handle it separately. Syntax hilighting is a thing that
needs to know a line comment.
|
NORMAL | final public static byte NORMAL(Code) | | The token is nothing special (no keyword, no whitespace, etc.).
|
PATTERN | final public static byte PATTERN(Code) | | The token matches a pattern. This can be a number od identifier pattern for
instance.
|
SEPARATOR | final public static byte SEPARATOR(Code) | | Separators are otherwise not remarkable characters. An opening parenthesis
might be nessecary for a syntactically correct text, but without any special
meaning to the compiler, interpreter etc. after it has been detected.
|
SPECIAL_SEQUENCE | final public static byte SPECIAL_SEQUENCE(Code) | | Special sequences are characters or character combinations that have a certain
meaning to the parsed language or dialect. In computer languages we have for
instance operators, end-of-statement characters etc.
A companion might have been associated with a special sequence. It probably
contains information important to the user of the Token .
|
STRING | final public static byte STRING(Code) | | The token is one of the quoted strings known to the
Tokenizer . In Java
this would be for instance a "String" or a 'c' (haracter).
|
UNKNOWN | final public static byte UNKNOWN(Code) | | This is for the leftovers of the lexical analysis of a text.
|
WHITESPACE | final public static byte WHITESPACE(Code) | | Whitespaces are portions of the text, that contain one or more characters
that separate the significant parts of the text. Generally, a sequence of
whitespaces is equally represented by one single whitespace character. That
is the difference to separators.
|
Token | public Token()(Code) | | Default constructor.
|
Token | public Token(int type)(Code) | | Constructs a token of a given type. Only the type of the token is known but not
its image or positions.
Parameters: type - token type, one of the class constants. |
Token | public Token(int type, String image)(Code) | | Construct a token of a given type with the given image. No position information
is given.
Parameters: type - token type, one of the class constants. Parameters: image - the token image itself |
Token | public Token(int type, String image, Object companion)(Code) | | Construct a token of a given type with the given image and a companion. This
constructor is most useful for keywords or special sequences.
Parameters: type - token type, one of the class constants. Parameters: image - the token image itself Parameters: companion - an associated information of the token type |
equals | public boolean equals(Object object)(Code) | | Implementation of the well known method
java.lang.Object.equals .
Note that two token are equal if every member of it is equal. That means
that token retrieved by two different
Tokenizer instances can be
equal.
Parameters: object - the java.lang.Object to compare true if two token are equal, false otherwise |
getCompanion | public Object getCompanion()(Code) | | Obtaining the associated information of the token. Can be null . See
Token.setCompanion for details.
the associated information of this token |
getEndColumn | public int getEndColumn()(Code) | | Obtaining the column number where the Token ends. See
Token.setEndColumn for more.
If a tokenizer doesn't count lines and columns, the returned value is -1.
column number where the token ends or -1, if no line counting isperformed See Also: Token.setEndColumn |
getEndLine | public int getEndLine()(Code) | | Obtaining the line number where the token ends. See
Token.setEndLine for
more. If a tokenizer doesn't count lines and columns, the returned value is
-1.
line number where the token ends or -1, if no line counting isperformed See Also: Token.setEndLine |
getImageParts | public String[] getImageParts()(Code) | | Image parts are substrings of a token image. The operation returns a meaningful
result only, if the flag
TokenizerProperties.F_RETURN_IMAGE_PARTS is
set for the TokenizerProperties , the
Tokenizer or the
TokenizerProperty that "produced" the token. If that flag is not set
the return value is identical to
Token.getImage .
Number and contents of the image parts depend on the token type:
-
Token.NORMAL ,
Token.KEYWORD ,
Token.SPECIAL_SEQUENCE ,
Token.SEPARATOR : These token have one image part that is identical to
the image itself (
Token.getImage ).
-
Token.WHITESPACE : Whitespaces have one image part for each substring
on a single line without any line separators. For whitespace sequences
without line separators there will be one part that is identical to the
image itself (
Token.getImage ). More generally, whitespaces have
separatorCount + 1 image parts. For multi-line whitespaces
some or all of these image parts can be empty.
-
Token.STRING : One image part per line containing the characters between
and excluding the string start and end sequences and/or the line
separators, equivalent to the handling of whitespaces. The string escape
sequences are resolved. For instance, the image part of the SQL string
'select ''hello'' from dual' is select 'hello' from dual .
Multiline strings may have empty image parts (if emtpy lines are included
in the string). The string "line1\n" has two image parts: "line1" and the
empty string (since the string ends on a new line). The string "\nline2"
has also two image parts: the empty string and "line2" (since the string
starts on one line and ends on the next).
-
Token.PATTERN : a pattern has image parts according to the groups defined
in the regular expression of the pattern. The
java.util.regex.Pattern class speaks of "Capturing groups" that are expressions in parentheses.
Image parts are especially important for pattern token, where the access
to parts of the pattern is usually nessecary. For instance, in Java Unicode
characters can be written in form of
"\\u[0-9A-Fa-f]{4}"
pattern. For further processing the hexadecimal part must be accessed.
By using the pattern "\\u([0-9A-Fa-f]{4})" , a token containing
the unicode notation "\\u00AC" has the two image parts
"\\u00AC" (capturing group 0) and "00AC"
(capturing group 1).
-
Token.LINE_COMMENT : Line comments have one image part that contains
the substring after the line comment start sequence up to and excluding
the line separator sequence.
-
Token.BLOCK_COMMENT : Like whitespaces and string, block comments have
one image part per line they are spanning. The first part is without the
block comment start sequence, the last without the block comment end
sequence. The line separator sequences are also not included in the parts.
-
Token.EOF : The method returns an empty array.
The return value is an array of strings rather than an
java.util.Enumeration or
java.util.Iterator , since it can be used more easily and contains
only one element in a lot if not most cases.
an array of image parts according to the token type if the flag TokenizerProperties.F_RETURN_IMAGE_PARTS is set or containing the image itself otherwise (Token.getImage). |
getLength | public int getLength()(Code) | | Obtaining the length of the token. Note that some token types have a zero length
(like EOF or UNKNOWN).
the length of the token. See Also: Token.setLength See Also: Token.getEndPosition |
getStartColumn | public int getStartColumn()(Code) | | Obtaining the column number of the Token start. See
Token.setStartColumn for details.
If a tokenizer doesn't count lines and columns, the returned value is -1.
the column number where the token starts or -1, if no line counting is performed See Also: Token.setStartColumn |
getStartLine | public int getStartLine()(Code) | | Obtaining the line number where the Token starts. See also
Token.setStartLine for details.
If a tokenizer doesn't count lines and columns, the returned value is -1.
the line number where the token starts or -1, if no line counting isperformed See Also: Token.setStartLine |
getType | public int getType()(Code) | | Obtaining the type of the Token . This is one of the constants
defined in the Token class.
the token type See Also: Token.setType |
getTypeName | public static String getTypeName(int type)(Code) | | Getting a type name for displaying. The methode never fails even if the
given type is unknown.
Parameters: type - one of the Token type constants a string representation of the given type constant |
setCompanion | public void setCompanion(Object companion)(Code) | | Some token may have associated informations for the user of the Token .
A popular thing would be the association of an integer constant to a special
sequence or keyword to be used in fast switch statetents.
Parameters: companion - the associated information for this token |
setEndColumn | public void setEndColumn(int colno)(Code) | | In
Tokenizer 's counting lines and columns, this method is used to set the
column number where the end of the Token was found.
The end column number is the one of the first character that does
NOT belongs to the token. This approach is choosen in accordance
to the toIndex parameters in
java.lang.String.substring(intint) .
Parameters: colno - column number where the token ends |
setEndLine | public void setEndLine(int lineno)(Code) | | In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the end of the Token was found.
See
Token.setStartLine for more.
The end line number is the one there the first character was found that does
NOT belongs to the token. This approach is choosen in accordance
to the toIndex parameters in
java.lang.String.substring(intint) .
Parameters: lineno - line number where the token ends |
setEndPosition | public void setEndPosition(int endPosition)(Code) | | Setting the end position of the token relative to the start of the input
stream. For instance, the first character in a file has the start position
0. The character at the given end position is NOT part of
this Token . This is the same principle as in the
java.lang.String.substring(intint) method.
This method is an alternative to
Token.setLength depending on which
information is at hand or easier to obtain for the
Tokenizer producing
this Token .
Note that this method MUST be called after
Token.setStartPosition since it affects the length of the token. Its effect is in turn eliminated
by calls to
Token.setLength and
Token.setImage Parameters: endPosition - the position where the token ends in the input stream. |
setImage | public void setImage(String image)(Code) | | Setting the token image. Note that some
Tokenizer only fill position
and length information rather than setting the token image. This strategy
might have a tremendous influence on the parse performance and the memory
allocation.
Parameters: image - the token image See Also: Token.getImage |
setImageParts | public void setImageParts(String[] imageParts)(Code) | | The counterpart to
Token.getImageParts . It sets all image parts in one
operation. The method accepts null and empty arrays.
Parameters: imageParts - an array of image parts according to the token type ornull |
setLength | public void setLength(int length)(Code) | | Setting the length of the token. Some
Tokenizer may prefer or may be
configured not to return a token image, but only the position and length
informations. This may save a lot of time whereever only a subset of the found
tokens are actually needed by the user.
This method is an alternative to
Token.setEndPosition depending on which
information is at hand or easier to obtain for the
Tokenizer producing
this Token .
Note that this method is implicitely called by
Token.setImage and
Token.setEndPosition .
Parameters: length - the length of the token See Also: Token.getLength See Also: Token.setEndPosition |
setStartColumn | public void setStartColumn(int colno)(Code) | | In
Tokenizer 's counting lines and columns, this method is used to
set the column number where the beginning of the Token was
found. Column numbers start with 0.
Parameters: colno - number where the token begins See Also: Token.getStartColumn |
setStartLine | public void setStartLine(int lineno)(Code) | | In
Tokenizer 's counting lines and columns, this method is used to
set the line number where the beginning of the Token was found.
Line numbers start with 0.
Parameters: lineno - line number where the token begins See Also: Token.getStartLine |
setStartPosition | public void setStartPosition(int startPosition)(Code) | | Setting the start position of the token relative to the start of the input
stream. For instance, the first character in a file has the start position
0.
Parameters: startPosition - the position where the token starts in the input stream. See Also: Token.getStartPosition See Also: Token.setEndPosition |
setType | public void setType(int type)(Code) | | Setting the type property of the Token . This is one of the constants
defined in this class.
Parameters: type - the token type See Also: Token.getType |
|
|