Java Doc for Tokenizer.java in  » Parser » JTopas » de » susebox » jtopas » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Parser » JTopas » de.susebox.jtopas 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


de.susebox.jtopas.Tokenizer

All known Subclasses:   de.susebox.jtopas.StandardTokenizer,  de.susebox.jtopas.AbstractTokenizer,
Tokenizer
public interface Tokenizer (Code)

The interface Tokenizer contains setup methods, parse operations and other getter and setter methods for a tokenizer. A tokenizer splits a stream of input data into various units like whitespaces, comments, keywords etc. These units are the tokens that are reflected in the Token class of the de.susebox.jtopas package.

A Tokenizer is configured using a TokenizerProperties object that contains declarations for whitespaces, separators, comments, keywords, special sequences and patterns. It is designed to enable a common approach for parsing texts like program code, annotated documents like HTML and so on.

To detect links in an HTML document, a tokenizer would be invoked like that (see StandardTokenizerProperties and StandardTokenizer for the classes mentioned here):

 Vector               links     = new Vector();
 FileReader           reader    = new FileReader("index.html");
 TokenizerProperties  props     = new StandardTokenizerProperties();
 Tokenizer            tokenizer = new StandardTokenizer();
 Token                token;
 props.setParseFlags(Tokenizer.F_NO_CASE);
 props.setSeparators("=");
 props.addString("\"", "\"", "\\");
 props.addBlockComment(">", "<");
 props.addKeyword("HREF");
 tokenizer.setTokenizerProperties(props);
 tokenizer.setSource(new ReaderSource(reader));
 try {
 while (tokenizer.hasMoreToken()) {
 token = tokenizer.nextToken();
 if (token.getType() == Token.KEYWORD) {
 tokenizer.nextToken();               // should be the '=' character
 links.addElement(tokenizer.next());
 }
 }
 } finally {
 tokenizer.close();
 reader.close();
 }
 
This somewhat rough way to find links should work fine on syntactically correct HTML code. It finds common links as well as mail, ftp links etc. Note the block comment. It starts with the ">" character, that is the closing character for HTML tags and ends with the "<" being the starting character of HTML tags. The effect is that all the real text is treated as a comment.

To extract the contents of a HTML file, one would write:

 StringBuffer         contents  = new StringBuffer(4096);
 FileReader           reader    = new FileReader("index.html");
 TokenizerProperties  props     = new StandardTokenizerProperties();
 Tokenizer            tokenizer = new StandardTokenizer();
 Token                token;
 props.setParseFlags(Tokenizer.F_NO_CASE);
 props.addBlockComment(">", "<");
 props.addBlockComment(">HEAD<", ">/HEAD<");
 props.addBlockComment(">!--;", "--<");
 tokenizer.setTokenizerProperties(props);
 tokenizer.setSource(new ReaderSource(reader));
 try {
 while (tokenizer.hasMoreToken()) {
 token = tokenizer.nextToken();
 if (token.getType() != Token.BLOCK_COMMENT) {
 contents.append(token.getToken());
 }
 }
 } finally {
 tokenizer.close();
 reader.close();
 }
 
Here the block comment is the exact opposite of the first example. Now all the HTML tags are skipped. Moreover, we declared the HTML-Header as a block comment as well - the informations from the header are thus skipped alltogether.

Parsing (tokenizing) is done on a well defined priority scheme. See Tokenizer.nextToken for details.

NOTE: if a character sequence is registered for two categories of tokenizer properties (e.g. as a line comments starting sequence as well as a special sequence), the category with the highest priority wins (e.g. if the metioned sequence is found, it is interpreted as a line comment).

The tokenizer interface is clearly designed for "readable" data, say ASCII- or UNICODE data. Parsing binary data has other characteristics that do not necessarily fit in a scheme of comments, keywords, strings, identifiers and operators.

Note that the interface has no methods that handle stream data sources. This is left to the implementations that may have quite different data sources, e. g. java.io.InputStreamReader , database queries, string arrays etc. The interface TokenizerSource serves as an abstraction of such widely varying data sources.

The Tokenizer interface partly replaces the older de.susebox.java.util.Tokenizer interface which is deprecated.


See Also:   Token
See Also:   TokenizerProperties
author:
   Heiko Blau




Method Summary
public  voidchangeParseFlags(int flags, int mask)
     Setting the control flags of the TokenizerProperties.
public  voidclose()
     This method is nessecary to release memory and remove object references if a Tokenizer instances are frequently created for small tasks. Generally, the method shouldn't throw any exceptions.
public  StringcurrentImage()
     Convenience method to retrieve only the token image of the Token that would be returned by Tokenizer.currentToken .
public  TokencurrentToken()
     Retrieve the Token that was found by the last call to Tokenizer.nextToken . or Tokenizer.nextImage .
public  intcurrentlyAvailable()
     Retrieving the number of the currently available characters.
public  chargetChar(int pos)
     Get a single character from the current text range.
public  intgetColumnNumber()
     If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the current column position starting with 0 in the input stream.
public  de.susebox.jtopas.spi.KeywordHandlergetKeywordHandler()
     Retrieving the current de.susebox.jtopas.spi.KeywordHandler .
public  intgetLineNumber()
     If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream.
public  intgetParseFlags()
     Retrieving the parser control flags.
public  de.susebox.jtopas.spi.PatternHandlergetPatternHandler()
     Retrieving the current de.susebox.jtopas.spi.PatternHandler .
public  intgetRangeStart()
     This method returns the absolute offset in characters to the start of the parsed stream.
public  intgetReadPosition()
     Getting the current read offset.
public  de.susebox.jtopas.spi.SeparatorHandlergetSeparatorHandler()
     Retrieving the current de.susebox.jtopas.spi.SeparatorHandler .
public  de.susebox.jtopas.spi.SequenceHandlergetSequenceHandler()
     Retrieving the current de.susebox.jtopas.spi.SequenceHandler .
public  TokenizerSourcegetSource()
     Retrieving the TokenizerSource of this Tokenizer.
public  StringgetText(int start, int length)
     Retrieve text from the currently available range.
public  TokenizerPropertiesgetTokenizerProperties()
     Retrieving the current tokenizer characteristics.
public  de.susebox.jtopas.spi.WhitespaceHandlergetWhitespaceHandler()
     Retrieving the current de.susebox.jtopas.spi.WhitespaceHandler .
public  booleanhasMoreToken()
     Check if there are more tokens available.
public  StringnextImage()
     This method is a convenience method.
public  TokennextToken()
     Retrieving the next Token .
public  intreadMore()
     Try to read more data into the text buffer of the tokenizer.
public  voidsetKeywordHandler(de.susebox.jtopas.spi.KeywordHandler handler)
     Setting a new de.susebox.jtopas.spi.KeywordHandler or removing any previously installed one.
public  voidsetPatternHandler(de.susebox.jtopas.spi.PatternHandler handler)
     Setting a new de.susebox.jtopas.spi.PatternHandler or removing any previously installed one.
public  voidsetReadPositionAbsolute(int position)
     This method sets the tokenizers current read position to the given absolute read position.
public  voidsetReadPositionRelative(int offset)
     This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position.
public  voidsetSeparatorHandler(de.susebox.jtopas.spi.SeparatorHandler handler)
     Setting a new de.susebox.jtopas.spi.SeparatorHandler or removing any previously installed SeparatorHandler.
public  voidsetSequenceHandler(de.susebox.jtopas.spi.SequenceHandler handler)
     Setting a new de.susebox.jtopas.spi.SequenceHandler or removing any previously installed one.
public  voidsetSource(TokenizerSource source)
     Setting the source of data.
public  voidsetTokenizerProperties(TokenizerProperties props)
     Setting the tokenizer characteristics.
public  voidsetWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler handler)
     Setting a new de.susebox.jtopas.spi.WhitespaceHandler or removing any previously installed one.



Method Detail
changeParseFlags
public void changeParseFlags(int flags, int mask) throws TokenizerException(Code)
Setting the control flags of the TokenizerProperties. Use a combination of the F_... flags declared in TokenizerProperties for the parameter. The mask parameter contains a bit mask of the F_... flags to change.
The parse flags for a tokenizer can be set through the associated TokenizerProperties instance. These global settings take effect in all Tokenizer instances that use the same TokenizerProperties object. Flags related to the parsing process can also be set separately for each tokenizer during runtime. These are the dynamic flags: Other flags can also be set for each tokenizer separately, but should be set before the tokenizing starts to make sense. The other flags should only be used on the TokenizerProperties instance or on single TokenizerProperty objects and influence all Tokenizer instances sharing the same TokenizerProperties object. For instance, using the flag TokenizerProperties.F_NO_CASE is an invalid operation on a Tokenizer. It affects the interpretation of keywords and sequences by the associated TokenizerProperties instance and, moreover, possibly the storage of these properties.
This method throws a TokenizerException if a flag is passed that cannot be handled by the Tokenizer object itself.
This method takes precedence over the TokenizerProperties.setParseFlags method of the associated TokenizerProperties object. Even if the global settings of one of the dynamic flags (see above) change after a call to this method, the flags set separately for this tokenizer, stay active.
Parameters:
  flags - the parser control flags
Parameters:
  mask - the mask for the flags to set or unset
throws:
  TokenizerException - if one or more of the flags given cannot be honored
See Also:   Tokenizer.getParseFlags



close
public void close()(Code)
This method is nessecary to release memory and remove object references if a Tokenizer instances are frequently created for small tasks. Generally, the method shouldn't throw any exceptions. It is also ok to call it more than once.
It is an error, to call any other method of the implementing class after close has been called.



currentImage
public String currentImage() throws TokenizerException(Code)
Convenience method to retrieve only the token image of the Token that would be returned by Tokenizer.currentToken . This is an especially usefull method, if the parse flags for this Tokenizer have the flag TokenizerProperties.F_TOKEN_POS_ONLY set, since this method returns a valid string even in that case.
Since version 0.6.1 of JTopas, this method throws a TokenizerException rather than returning null if neither Tokenizer.nextToken nor Tokenizer.nextImage have been called before or Tokenizer.setReadPositionRelative or Tokenizer.setReadPositionAbsolute habe been called after the last call to nextToken or nextImage. the token image of the current token
throws:
  TokenizerException - if the tokenizer has no current token
See Also:   Tokenizer.currentToken
See Also:   Tokenizer.nextImage



currentToken
public Token currentToken() throws TokenizerException(Code)
Retrieve the Token that was found by the last call to Tokenizer.nextToken . or Tokenizer.nextImage .
Since version 0.6.1 of JTopas, this method throws a TokenizerException rather than returning null if neither Tokenizer.nextToken nor Tokenizer.nextImage have been called before or Tokenizer.setReadPositionRelative or Tokenizer.setReadPositionAbsolute habe been called after the last call to nextToken or nextImage. the Token retrieved by the last call to Tokenizer.nextToken.
throws:
  TokenizerException - if the tokenizer has no current token
See Also:   Tokenizer.nextToken
See Also:   Tokenizer.currentImage



currentlyAvailable
public int currentlyAvailable()(Code)
Retrieving the number of the currently available characters. This includes both characters already parsed by the Tokenizer and characters still to be analyzed.
number of currently available characters



getChar
public char getChar(int pos) throws IndexOutOfBoundsException(Code)
Get a single character from the current text range.
Parameters:
  pos - position of the required character the character at the specified position
throws:
  IndexOutOfBoundsException - if the parameter pos is not in the available text range (text window)



getColumnNumber
public int getColumnNumber()(Code)
If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the current column position starting with 0 in the input stream. Displaying information about columns usually means adding 1 to the zero-based column number. the current column position or -1 if the flag if no column numbersare supplied TokenizerProperties.F_COUNT_LINES is not set).is not set
See Also:   Tokenizer.getLineNumber



getKeywordHandler
public de.susebox.jtopas.spi.KeywordHandler getKeywordHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.KeywordHandler . The method may return null if there isn't any handler installed. the currently active de.susebox.jtopas.spi.KeywordHandler or null, if keyword support is switched off
See Also:   Tokenizer.setKeywordHandler



getLineNumber
public int getLineNumber()(Code)
If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream. The implementation of the Tokenizer interface can decide which end-of-line sequences should be recognized. The most flexible approach is to process the following end-of-line sequences:
  • Carriage Return (ASCII 13, '\r'). This EOL is used on Apple Macintosh
  • Linefeed (ASCII 10, '\n'). This is the UNIX EOL character.
  • Carriage Return + Linefeed ("\r\n"). This is used on MS Windows systems.
Another legitime and in many cases satisfying way is to use the system property "line.separator".
Displaying information about lines usually means adding 1 to the zero-based line number. the current line number starting with 0 or -1 if no line numbers are supplied (TokenizerProperties.F_COUNT_LINES is not set).
See Also:   Tokenizer.getColumnNumber



getParseFlags
public int getParseFlags()(Code)
Retrieving the parser control flags. A bitmask containing the F_... constants is returned. This method returns both the flags that are set separately for this Tokenizer and the flags set for the associated TokenizerProperties object. the current parser control flags
See Also:   Tokenizer.changeParseFlags



getPatternHandler
public de.susebox.jtopas.spi.PatternHandler getPatternHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.PatternHandler . The method may return null if there isn't any handler installed. the currently active de.susebox.jtopas.spi.PatternHandler or null, if patterns are not recognized by the tokenizer
See Also:   Tokenizer.setPatternHandler



getRangeStart
public int getRangeStart()(Code)
This method returns the absolute offset in characters to the start of the parsed stream. Together with Tokenizer.currentlyAvailable it describes the currently available text "window".
The position returned by this method and also by Tokenizer.getReadPosition are absolute rather than relative in a text buffer to give the tokenizer the full control of how and when to refill its text buffer. the absolute offset of the current text window in characters from the start of the data source of the Tokenizer



getReadPosition
public int getReadPosition()(Code)
Getting the current read offset. This is the absolute position where the next call to nextToken or next will start. It is therefore not the same as the position returned by Token.getStartPosition of the current token ( Tokenizer.currentToken ).
It is the starting position of the token returned by the next call to Tokenizer.nextToken , if that token is no whitespace or if whitespaces are returned ( TokenizerProperties.F_RETURN_WHITESPACES ).
The position returned by this method and also by Tokenizer.getRangeStart are absolute rather than relative in a text buffer to give the tokenizer the full control of how and when to refill its text buffer. the absolute offset in characters from the start of the data source of the Tokenizer where reading will be continued



getSeparatorHandler
public de.susebox.jtopas.spi.SeparatorHandler getSeparatorHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.SeparatorHandler . The method may return null if there isn't any handler installed. the currently active de.susebox.jtopas.spi.SeparatorHandler or null, if separators aren't recognized by the tokenizer
See Also:   Tokenizer.setSeparatorHandler



getSequenceHandler
public de.susebox.jtopas.spi.SequenceHandler getSequenceHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.SequenceHandler . The method may return null if there isn't any handler installed.
A SequenceHandler deals with line and block comments, strings and special sequences. the currently active de.susebox.jtopas.spi.SequenceHandler or null, if no
See Also:   Tokenizer.setSequenceHandler



getSource
public TokenizerSource getSource()(Code)
Retrieving the TokenizerSource of this Tokenizer. The method may return null if there is no TokenizerSource associated with this Tokenizer. the TokenizerSource associated with this Tokenizer
See Also:   Tokenizer.setSource



getText
public String getText(int start, int length) throws IndexOutOfBoundsException(Code)
Retrieve text from the currently available range. The start and length parameters must be inside Tokenizer.getRangeStart and Tokenizer.getRangeStart + Tokenizer.currentlyAvailable .
Example:
 int     startPos = tokenizer.getReadPosition();
 String  source;
 while (tokenizer.hasMoreToken()) {
 Token token = tokenizer.nextToken();
 switch (token.getType()) {
 case Token.LINE_COMMENT:
 case Token.BLOCK_COMMENT:
 source   = tokenizer.getText(startPos, token.getStartPos() - startPos);
 startPos = token.getStartPos();
 }
 }
 

Parameters:
  start - position where the text begins
Parameters:
  length - length of the text the text beginning at the given position ith the given length
throws:
  IndexOutOfBoundsException - if the starting position or the length is out of the current text window



getTokenizerProperties
public TokenizerProperties getTokenizerProperties()(Code)
Retrieving the current tokenizer characteristics. The method may return null if Tokenizer.setTokenizerProperties has not been called so far. the TokenizerProperties of this Tokenizer
See Also:   Tokenizer.setTokenizerProperties



getWhitespaceHandler
public de.susebox.jtopas.spi.WhitespaceHandler getWhitespaceHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.WhitespaceHandler . The method may return null if there whitespaces are not recognized. the currently active whitespace handler or null, if the baseimplementation is working
See Also:   Tokenizer.setWhitespaceHandler



hasMoreToken
public boolean hasMoreToken()(Code)
Check if there are more tokens available. This method will return true until and enf-of-file condition is encountered during a call to Tokenizer.nextToken or Tokenizer.nextImage .
That means, that the EOF is returned one time, afterwards hasMoreToken will return false. Furthermore, that implies, that the method will return true at least once, even if the input data stream is empty.
The method can be conveniently used in a while loop. true if a call to Tokenizer.nextToken or Tokenizer.nextImagewill succed, false otherwise



nextImage
public String nextImage() throws TokenizerException(Code)
This method is a convenience method. It returns only the next token image without any informations about its type or associated information. This is an especially usefull method, if the parse flags for this Tokenizer have the flag TokenizerProperties.F_TOKEN_POS_ONLY set, since this method returns a valid string even in that case. the token image of the next token
throws:
  TokenizerException - generic exception (list) for all problems that may occur while parsing(IOExceptions for instance)
See Also:   Tokenizer.nextToken
See Also:   Tokenizer.currentImage



nextToken
public Token nextToken() throws TokenizerException(Code)
Retrieving the next Token . The method works in this order:
  1. Check for an end-of-file condition. If there is such a condition then return it.
  2. Try to collect a sequence of whitespaces. If such a sequence can be found return if the flag F_RETURN_WHITESPACES is set, or skip these whitespaces.
  3. Check the next characters against all known pattern. A pattern is usually a regular expression that is used by java.util.regex.Pattern . But implementations of de.susebox.jtopas.spi.PatternHandler may use other pattern syntaxes. Note that pattern are not recognized within "normal" text (see below for a more precise description).
  4. Check the next characters against all known line and block comments. If a line or block comment starting sequence matches, return if the flag F_RETURN_WHITESPACES is set, or skip the comment. If comments are returned they include their starting and ending sequences (newline in case of a line comment).
  5. Check the next characters against all known string starting sequences. If a string begin could be identified return the string until and including the closing sequence.
  6. Check the next characters against all known special sequences. Especially, find the longest possible match. If a special sequence could be identified then return it.
  7. Check for ordinary separators. If one could be found return it.
  8. Check the next characters against all known keywords. If a keyword could be identified then return it.
  9. Return the text portion until the next whitespace, comment, special sequence or separator. Note that pattern are not recognized within "normal" text. A pattern match has therefore always a whitespace, comment, special sequence, separator or another pattern match in front of it or starts at position 0 of the data.
The method will return the EOF token as long as Tokenizer.hasMoreToken returns false. It will not return null in such conditions. found Token including the EOF token
throws:
  TokenizerException - generic exception (list) for all problems that may occur while parsing(IOExceptions for instance)
See Also:   Tokenizer.nextImage



readMore
public int readMore() throws TokenizerException(Code)
Try to read more data into the text buffer of the tokenizer. This can be useful when a method needs to look ahead of the available data or a skip operation should be performed.
The method returns the same value than an immediately following call to Tokenizer.currentlyAvailable would return. the number of character now available
throws:
  TokenizerException - generic exception (list) for all problems that may occur while reading (IOExceptions for instance)



setKeywordHandler
public void setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.KeywordHandler or removing any previously installed one. If null is passed (installed handler removed), no keyword support is available.
Usually, the TokenizerProperties used by a Tokenizer implement the de.susebox.jtopas.spi.KeywordHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its KeywordHandler. A different or a handler specific to a certain Tokenizer instance, can be set using this method.
Parameters:
  handler - the (new) de.susebox.jtopas.spi.KeywordHandler to use or null to remove it
See Also:   Tokenizer.getKeywordHandler
See Also:   TokenizerProperties.addKeyword



setPatternHandler
public void setPatternHandler(de.susebox.jtopas.spi.PatternHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.PatternHandler or removing any previously installed one. If null is passed, pattern are not supported by the tokenizer (any longer).
Usually, the TokenizerProperties used by a Tokenizer implement the de.susebox.jtopas.spi.PatternHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its PatternHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
Parameters:
  handler - the (new) de.susebox.jtopas.spi.PatternHandler to use or null to remove it
See Also:   Tokenizer.getPatternHandler
See Also:   TokenizerProperties.addPattern



setReadPositionAbsolute
public void setReadPositionAbsolute(int position) throws IndexOutOfBoundsException(Code)
This method sets the tokenizers current read position to the given absolute read position. It realizes one type of rewind / forward operations. The given position must be inside the intervall Tokenizer.getRangeStart and Tokenizer.getRangeStart + Tokenizer.currentlyAvailable - 1.
The current read position is the end position of the current token. That means that the following assertion can be made:
 Token token1 = tokenizer.nextToken();
 tokenizer.setReadPositionAbsolute(tokenizer.getReadPosition() - token1.getLength());
 Token token2 = tokenizer.nextToken();
 assert(token1.equals(token2));
 

Since JTopas version 0.6.1, the operation clears the current token. Therefore, Tokenizer.currentImage and Tokenizer.currentToken will throw a TokenizerException if called after a setReadPositionAbsolute without a subsequent call to Tokenizer.nextToken of Tokenizer.nextImage .
Parameters:
  position - absolute position for the next parse operation
throws:
  IndexOutOfBoundsException - if the parameter position is not in the available text range (text window)
See Also:   Tokenizer.setReadPositionRelative



setReadPositionRelative
public void setReadPositionRelative(int offset) throws IndexOutOfBoundsException(Code)
This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position. It realizes one type of rewind / forward operations. The given offset must be greater or equal than Tokenizer.getRangeStart - Tokenizer.getReadPosition and lower than Tokenizer.currentlyAvailable - Tokenizer.getReadPosition .
Since JTopas version 0.6.1, the operation clears the current token. Therefore, Tokenizer.currentImage and Tokenizer.currentToken will throw a TokenizerException if called after a setReadPositionAbsolute without a subsequent call to Tokenizer.nextToken of Tokenizer.nextImage .
Parameters:
  offset - number of characters to move forward (positive offset) orbackward (negative offset)
throws:
  IndexOutOfBoundsException - if the parameter offset wouldmove the read position out of the available text range (text window)
See Also:   Tokenizer.setReadPositionAbsolute



setSeparatorHandler
public void setSeparatorHandler(de.susebox.jtopas.spi.SeparatorHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.SeparatorHandler or removing any previously installed SeparatorHandler. If null is passed, the tokenizer doesn't recognize separators.
Usually, the TokenizerProperties used by a Tokenizer implement the de.susebox.jtopas.spi.SeparatorHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its SeparatorHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
Parameters:
  handler - the (new) separator handler to use or null toremove it
See Also:   Tokenizer.getSeparatorHandler
See Also:   TokenizerProperties.setSeparators



setSequenceHandler
public void setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.SequenceHandler or removing any previously installed one. If null is passed, the tokenizer will not recognize line and block comments, strings and special sequences.
Usually, the TokenizerProperties used by a Tokenizer implement the de.susebox.jtopas.spi.SequenceHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its SeparatorHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
Parameters:
  handler - the (new) de.susebox.jtopas.spi.SequenceHandler to use or null to remove it
See Also:   Tokenizer.getSequenceHandler
See Also:   TokenizerProperties.addSpecialSequence
See Also:   TokenizerProperties.addLineComment
See Also:   TokenizerProperties.addBlockComment
See Also:   TokenizerProperties.addString



setSource
public void setSource(TokenizerSource source)(Code)
Setting the source of data. This method is usually called during setup of the Tokenizer but may also be invoked while the tokenizing is in progress. It will reset the tokenizers input buffer, line and column counters etc.
It is allowed to pass null. Calls to Tokenizer.hasMoreToken will return false, while calling Tokenizer.nextToken will return an EOF token.
Parameters:
  source - a TokenizerSource to read data from
See Also:   Tokenizer.getSource



setTokenizerProperties
public void setTokenizerProperties(TokenizerProperties props) throws NullPointerException, IllegalArgumentException(Code)
Setting the tokenizer characteristics. This operation is usually done before the parse process. A common place is a constructor of a Tokenizer implementation. If the tokenizer characteristics change during the parse process they take effect with the next call of Tokenizer.nextToken or Tokenizer.nextImage . Usually, a Tokenizer implementation will also implement the TokenizerPropertyListener interface to be notified about property changes.
Generally, the Tokenizer implementation should also implement the de.susebox.jtopas.spi.DataProvider interface or provide an inner class that implements the DataProvider interface, while the TokenizerProperties implementation should in turn implement the interfaces These handler interfaces are collected in the de.susebox.jtopas.spi.DataMapper interface.
Although the implementation of the mentioned interfaces is recommended, it is not a mandatory way. Except for de.susebox.jtopas.spi.PatternHandler that must be implemented by the TokenizerProperties implementation, since it is not possible for a Tokenizer to interpret a regular expression pattern only with the information provided through the TokenizerProperties interface.
If a Tokenizer implementation chooses to use a exclusively tailored TokenizerProperties implementation, it should throw an java.lang.IllegalArgumentException if it is not provided with an instance of that TokenizerProperties implementation.
If null is passed to the method it throws java.lang.NullPointerException .
Parameters:
  props - the TokenizerProperties for this tokenizer
throws:
  NullPointerException - if the null is passed to the call
throws:
  IllegalArgumentException - if the TokenizerProperties implementationof the parameter cannot be used with the implementation of thisTokenizer
See Also:   Tokenizer.getTokenizerProperties



setWhitespaceHandler
public void setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.WhitespaceHandler or removing any previously installed one. If null is passed, the tokenizer will not recognize whitespaces.
Usually, the TokenizerProperties used by a Tokenizer implement the de.susebox.jtopas.spi.WhitespaceHandler interface. If so, the Tokenizer object sets the TokenizerProperties instance as its WhitespaceHandler. A different handler or a handler specific to a certain Tokenizer instance, can be set using this method.
Parameters:
  handler - the (new) whitespace handler to use or null to switch off whitespace handling
See Also:   Tokenizer.getWhitespaceHandler
See Also:   TokenizerProperties.setWhitespaces



www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.