| java.lang.Object simple.page.translate.Tokenizer
Tokenizer | final class Tokenizer implements Lexer(Code) | | The Tokenizer is used to extract valid tokens from
the stream of bytes given to it for scanning. Identifying the
tokens from the stream of input is done using delimiters to
specify the start and end of a valid token. For example take
the well known JSP syntax. A parsable segment typically opens
using the following token <% and closes with
the %> , as shown in the JSP text shown below.
<%= new java.util.Date() %>
This tokenizer can be used to extract HTML expressions and
other such formats by specifying the starting and ending of
the expression. For example the following HTML could be used
to specify the opening and closure of an valid token.
<script language='groovy'>
java.util.Date();
</script>
The above token will be identified using a case insensitive
match, and whitespace characters can be ignored, such that
the HTML does not have to be formatted correctly in order
for this tokenizer to extract the HTML as a valid token.
author: Niall Gallagher |
Constructor Summary | |
public | Tokenizer(Parser parser) Constructor for the Tokenizer object. |
Method Summary | |
public void | match(String start, String finish) This method tells the lexer how to extract the tokens
from the source document. | public void | match(String start, String finish, String special) This method tells the lexer how to extract the tokens
from the source document. | public void | scan(char[] text) This will scan the provided bytes for tokens that should be
emitted to the Parser . | public void | scan(char[] text, int pos, int len) This will scan the provided bytes for tokens that should be
emitted to the Parser . |
Tokenizer | public Tokenizer(Parser parser)(Code) | | Constructor for the Tokenizer object. This
is used to scan a stream of bytes and pass any extracted
tokens from the stream to the Parser .
Parameters: parser - the parser used to parse extracted tokens |
match | public void match(String start, String finish)(Code) | | This method tells the lexer how to extract the tokens
from the source document. This is given the opening and
closing tokens used to identify a segment. Typically
with languages such as JSP and PHP code segments are
opened with a delimiter like <% for JSP
and <?php for PHP. This method allows
the lexer to be configured to process such delimiters.
Parameters: start - this is the opening token for a segment Parameters: finish - this is the closing token for a segment |
match | public void match(String start, String finish, String special)(Code) | | This method tells the lexer how to extract the tokens
from the source document. This is given the opening and
closing tokens used to identify a segment. Typically
with languages such as JSP and PHP code segments are
opened with a delimiter like <% for JSP
and <?php for PHP. This method allows
the lexer to be configured to process such delimiters.
With this match method a collection of
special characters can be specified. These characters
tell the lexer what it should allow whitespace to
surround. For example take the HTML expressions below.
< script language ='groovy' >
<script language='groovy'>
The above two HTML expressions should be considered
equal using the special characters < ,
> , and = .
Parameters: start - this is the opening token for a segment Parameters: finish - this is the closing token for a segment Parameters: special - this is the set of special characters |
scan | public void scan(char[] text)(Code) | | This will scan the provided bytes for tokens that should be
emitted to the Parser . The tokens emitted to
the parser object are either plain text tokens or valid
segments that require further processing by the parser.
Parameters: text - this is the buffer that contains the bytes |
scan | public void scan(char[] text, int pos, int len)(Code) | | This will scan the provided bytes for tokens that should be
emitted to the Parser . The tokens emitted to
the parser object are either plain text tokens or valid
segments that require further processing by the parser.
Parameters: text - this is the buffer that contains the bytes Parameters: pos - this is the offset within the buffer to read Parameters: len - this is the number of bytes to use |
|
|