Java Doc for AbstractTokenizer.java in  » Parser » JTopas » de » susebox » jtopas » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Parser » JTopas » de.susebox.jtopas 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   de.susebox.jtopas.AbstractTokenizer

All known Subclasses:   de.susebox.jtopas.StandardTokenizer,
AbstractTokenizer
abstract public class AbstractTokenizer implements Tokenizer,TokenizerPropertyListener(Code)

Base class for Tokenizer implementations. AbstractTokenizer separates the data analysis from the actual data provision. Although the class maintains read and write positions the physical representation of the logical character buffer behind these positions concerns only the subclasses.


See Also:   Tokenizer
See Also:   TokenizerProperties
author:
   Heiko Blau


Field Summary
final protected static  intVALID_FLAGS_MASK
     mask of flags that can be set separately for a AbstractTokenizer.
protected  AbstractTokenizer_baseTokenizer
    
protected  int_columnNumber
     if line counting is enabled, this contains the current column number starting with 0.
protected  int_currentReadPos
     Data index there AbstractTokenizer.nextToken will start parsing.
protected  int_currentWritePos
     Data index there AbstractTokenizer.readMoreDataFromBase will fill in new data.
protected  StandardTokenizerProperties_defaultProperties
    TokenizerProperties tha tare used if no others have been specified by calling AbstractTokenizer.setTokenizerProperties .
protected  int_flags
     overall tokenizer flags.
protected  int_lineNumber
     if line counting is enabled, this contains the current line number starting with 0.
protected  AbstractTokenizer_nextTokenizer
    
protected  AbstractTokenizer_prevTokenizer
    
protected  Token[]_scannedToken
     List of currently known token.

Constructor Summary
public  AbstractTokenizer()
     Default constructor that sets the tokenizer control flags as it would be approbriate for C/C++ and Java.
public  AbstractTokenizer(TokenizerProperties properties)
     Contructing a AbstractTokenizer with a backing TokenizerProperties instance.

Method Summary
public  voidaddTokenizer(AbstractTokenizer tokenizer)
     Adding an embedded tokenizer.
protected  voidadjustLineAndColumn(int type, int length)
     The method recomputes the line and column position of the tokenizer, if the flag TokenizerProperties.F_COUNT_LINES is set.
public  voidchangeParseFlags(int flags, int mask)
     Setting the control flags of the Tokenizer.
public  voidclose()
     Closing this tokenizer frees resources and deregisters from the associated TokenizerProperties object.
protected  intcomparePrefix(int offset, String prefix, boolean noCase)
     This method compares the characters at the given offset (from the current read position) with the given prefix.
protected  intcompleteBlockComment(TokenizerProperty prop)
     Completing a block comment.
protected  TokenizerPropertycompleteBoundedToken(Token token)
     The number of characters until the next comment, whitespace, string, special sequence or separator are determined.
protected  intcompleteLineComment(TokenizerProperty prop)
     Completing a line comment.
protected  intcompleteString(TokenizerProperty prop)
     Completing a string.
protected  intcompleteWhitespace()
     After having identified a whitespace, this method continues to read data until it detects a non-whitespace.
public  StringcurrentImage()
     Convenience method to retrieve only the token image of the Token that would be returned by AbstractTokenizer.currentToken .
public  TokencurrentToken()
     Retrieve the Token that was found by the last call to AbstractTokenizer.nextToken .
public  intcurrentlyAvailable()
     Retrieving the number of the currently available characters.
protected  DataProvidergetBaseDataProvider(int startPos, int length)
     Returns the de.susebox.jtopas.spi.DataProvider of the base tokenizer.
protected  AbstractTokenizergetBaseTokenizer()
     Embedded tokenizers have their base tokenizer they share the input stream with.
public  chargetChar(int pos)
     Returns the character at the given position.
public  intgetColumnNumber()
     If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the current column positionstarting with 0 in the input stream.
public  intgetCurrentColumn()
     Retrieve the current column.
public  intgetCurrentLine()
     Query the current row.
abstract protected  DataProvidergetDataProvider(int startPos, int length)
     Subclasses have to provide de.susebox.jtopas.spi.DataProvider instances for various token type handlers.
public  de.susebox.jtopas.spi.KeywordHandlergetKeywordHandler()
     Retrieving the current de.susebox.jtopas.spi.KeywordHandler .
public  intgetLineNumber()
     If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream.
public  intgetParseFlags()
     Retrieving the parser control flags.
public  de.susebox.jtopas.spi.PatternHandlergetPatternHandler()
     Retrieving the current de.susebox.jtopas.spi.PatternHandler .
public  intgetReadPosition()
     Getting the current read offset.
public  de.susebox.jtopas.spi.SeparatorHandlergetSeparatorHandler()
     Retrieving the current de.susebox.jtopas.spi.SeparatorHandler .
public  de.susebox.jtopas.spi.SequenceHandlergetSequenceHandler()
     Retrieving the current SequenceHandler .
public  TokenizerSourcegetSource()
     Retrieving the TokenizerSource of this Tokenizer.
public  StringgetText(int start, int len)
     Retrieve text from the currently available range.
public  TokenizerPropertiesgetTokenizerProperties()
     Retrieving the current tokenizer characteristics.
public  de.susebox.jtopas.spi.WhitespaceHandlergetWhitespaceHandler()
     Retrieving the current de.susebox.jtopas.spi.WhitespaceHandler .
public  booleanhasMoreToken()
     Checking if there are more tokens available.
protected  booleanisEOF(int offset)
     Checks the EOF condition at the given offset.
protected  booleanisFlagSet(int flag)
     Checking a given flag.
protected  booleanisFlagSet(TokenizerProperty prop, int flag)
     Checking if a given flag is set for the given TokenizerProperty , for this Tokenizer or for the used TokenizerProperties .
protected  TokenizerPropertyisKeyword(int startingAtPos, int length)
     This method checks if the character sequence starting at a given position with a given lenghth is a keyword.
protected  booleanisPattern(int offset, boolean freePatternOnly)
     Testing for pattern matching.
protected  booleanisSeparator(int offset)
     This method checks at the given offset if it contains a separator.
protected  booleanisSpecialSequence(int offset)
     This method checks at the given offset if it contains a a special sequence.
protected  booleanisWhitespace(char testChar)
     This method checks if the character is a whitespace.
protected  booleanisWhitespace(int offset)
     This method checks at the given offset if it is a whitespace.
public  StringnextImage()
     This method is a convenience method.
public  TokennextToken()
     Retrieving the next Token .
public  voidpropertyChanged(TokenizerPropertyEvent event)
     Event handler method.
public  intreadMore()
     Try to read more data into the text buffer of the tokenizer.
abstract protected  intreadMoreData()
     This method is called when the tokenizer runs out of data.
protected  intreadMoreDataFromBase()
     This method organizes the input buffer.
protected  intreadWhitespaces(int startingAtPos, int maxChars)
     This method detects the number of whitespace characters starting at the given position.
public  voidsetKeywordHandler(de.susebox.jtopas.spi.KeywordHandler handler)
     Setting a new de.susebox.jtopas.spi.KeywordHandler or removing any previously installed one.
public  voidsetPatternHandler(de.susebox.jtopas.spi.PatternHandler handler)
     Setting a new de.susebox.jtopas.spi.PatternHandler or removing any previously installed one.
public  voidsetReadPositionAbsolute(int position)
     This method sets the tokenizers current read position to the given absolute read position.
public  voidsetReadPositionRelative(int offset)
     This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position.
public  voidsetSeparatorHandler(de.susebox.jtopas.spi.SeparatorHandler handler)
     Setting a new de.susebox.jtopas.spi.SeparatorHandler or removing any previously installed SeparatorHandler.
public  voidsetSequenceHandler(de.susebox.jtopas.spi.SequenceHandler handler)
     Setting a new de.susebox.jtopas.spi.SequenceHandler or removing any previously installed one.
public  voidsetSource(TokenizerSource source)
     Setting the source of data.
public  voidsetSource(Reader reader)
     Convenience method to avoid the construction of a TokenizerSource from the most important data source java.io.Reader .
public  voidsetTokenizerProperties(TokenizerProperties props)
     Setting the tokenizer characteristics.
public  voidsetWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler handler)
     Setting a new de.susebox.jtopas.spi.WhitespaceHandler or removing any previously installed one.
protected  String[]splitBlockComment(TokenizerProperty prop, String image)
     Splits a given block comment into lines.
protected  String[]splitIntoLines(String image)
     Splits a given String into lines.
protected  String[]splitString(TokenizerProperty prop, String image)
     Splits a given string into lines and removing string escapes.
public  voidswitchTo(AbstractTokenizer tokenizer)
     Changing fron one tokenizer to another.
protected  voidsynchronizeAll()
     When the method AbstractTokenizer.readMoreData changes the contents of the input buffer or the input buffer itself, all embedded tokenizers must be synchronized.

Field Detail
VALID_FLAGS_MASK
final protected static int VALID_FLAGS_MASK(Code)
mask of flags that can be set separately for a AbstractTokenizer.



_baseTokenizer
protected AbstractTokenizer _baseTokenizer(Code)
For embedded tokenizers: this is the base tokenizer that reads the data



_columnNumber
protected int _columnNumber(Code)
if line counting is enabled, this contains the current column number starting with 0.



_currentReadPos
protected int _currentReadPos(Code)
Data index there AbstractTokenizer.nextToken will start parsing.



_currentWritePos
protected int _currentWritePos(Code)
Data index there AbstractTokenizer.readMoreDataFromBase will fill in new data.



_defaultProperties
protected StandardTokenizerProperties _defaultProperties(Code)
TokenizerProperties tha tare used if no others have been specified by calling AbstractTokenizer.setTokenizerProperties .



_flags
protected int _flags(Code)
overall tokenizer flags.



_lineNumber
protected int _lineNumber(Code)
if line counting is enabled, this contains the current line number starting with 0.



_nextTokenizer
protected AbstractTokenizer _nextTokenizer(Code)
For embedded tokenizers: this is the list of the succeding tokenizers



_prevTokenizer
protected AbstractTokenizer _prevTokenizer(Code)
For embedded tokenizers: this is the list of the previous tokenizers



_scannedToken
protected Token[] _scannedToken(Code)
List of currently known token. The first element is the current token returned by the last call to AbstractTokenizer.nextToken . The following elements are look-ahead token that have already been identified when extracting the current token.




Constructor Detail
AbstractTokenizer
public AbstractTokenizer()(Code)
Default constructor that sets the tokenizer control flags as it would be approbriate for C/C++ and Java. Found token images are copied. No line nor column informations are provided. Nested comments are not allowed.
The tokenizer will use the TokenizerProperties.DEFAULT_WHITESPACES and TokenizerProperties.DEFAULT_SEPARATORS for whitespace and separator handling.



AbstractTokenizer
public AbstractTokenizer(TokenizerProperties properties)(Code)
Contructing a AbstractTokenizer with a backing TokenizerProperties instance.
Parameters:
  properties - an TokenizerProperties object containing the settings for the tokenizing process




Method Detail
addTokenizer
public void addTokenizer(AbstractTokenizer tokenizer) throws TokenizerException(Code)
Adding an embedded tokenizer. Embedded tokenizer work on the same input buffer as their base tokenizer. A situation where embedded tokenizer could be applied, is a HTML stream with cascading style sheet (CSS) and JavaScript parts.
There are no internal means of switching from one tokenizer to another. This should be done by the caller using the method AbstractTokenizer.switchTo .
The TokenizerProperties.F_KEEP_DATA and TokenizerProperties.F_COUNT_LINES flags of the base tokenizer take effect also in the embedded tokenizers.
Since is might be possible that the given tokenizer is a derivation of the AbstractTokenizer class, this method is synchronized on tokenizer.
Parameters:
  tokenizer - an embedded tokenizer
throws:
  TokenizerException - if something goes wrong (not likely :-)



adjustLineAndColumn
protected void adjustLineAndColumn(int type, int length)(Code)
The method recomputes the line and column position of the tokenizer, if the flag TokenizerProperties.F_COUNT_LINES is set. It gets the token type of the Token that has been retrieved by the calling AbstractTokenizer.nextToken . Using the tokenizer control flags and certain other information it tries to to find end-of-line sequences as fast as possible. For example, a line comment should always contain a end-of-line sequence, so we can simply increase the line count and set the column count to 0.
Parameters:
  type - the type of the current token
Parameters:
  length - the length of the current token



changeParseFlags
public void changeParseFlags(int flags, int mask) throws TokenizerException(Code)
Setting the control flags of the Tokenizer. See the method description in Tokenizer .
Parameters:
  flags - the parser control flags
Parameters:
  mask - the mask for the flags to set or unset
throws:
  TokenizerException - if one or more of the flags given cannot be honored
See Also:   AbstractTokenizer.getParseFlags



close
public void close()(Code)
Closing this tokenizer frees resources and deregisters from the associated TokenizerProperties object.



comparePrefix
protected int comparePrefix(int offset, String prefix, boolean noCase) throws TokenizerException(Code)
This method compares the characters at the given offset (from the current read position) with the given prefix.
Parameters:
  offset - start comparing at this offset from the current read position
Parameters:
  prefic - compare read data with this prefix
Parameters:
  noCase - case- or not case-sensitive comparison
throws:
  TokenizerException - failure while reading data from the input stream 0 if the the given prefix matches the input stream, -1 on EOF and1 if not matching



completeBlockComment
protected int completeBlockComment(TokenizerProperty prop) throws TokenizerException(Code)
Completing a block comment. After a block comment sequence has been found, all characters up to and including the end sequence of the block comment belong to the block comment. Note that on reaching end-of-file a block comment does not nessecarily ends with an end-of-block-comment sequence.
Parameters:
  prop - the property describing the block comment to complete length of the block comment
throws:
  TokenizerException - failure while reading data from the input stream



completeBoundedToken
protected TokenizerProperty completeBoundedToken(Token token) throws TokenizerException(Code)
The number of characters until the next comment, whitespace, string, special sequence or separator are determined. The character sequnce is then checked for keyword or pattern matching.
Parameters:
  token - buffer to receive information about the keyword or normal token null or a TokenizerProperty if a keyword or pattern is found
throws:
  TokenizerException - failure while reading data from the input stream



completeLineComment
protected int completeLineComment(TokenizerProperty prop) throws TokenizerException(Code)
Completing a line comment. After a line comment sequence has been found, all characters up to and including the end-of-line combination belong to the line comment. Note that on reaching end-of-file a line comment does not nessecarily ends with an end-of-line sequence (linefeed for example).
Parameters:
  prop - the property describing the line comment to complete length of the line comment
throws:
  TokenizerException - failure while reading data from the input stream



completeString
protected int completeString(TokenizerProperty prop) throws TokenizerException(Code)
Completing a string. After a string start sequence has been found, all characters up to and including the end-of-string sequence belong to the string. Note that on reaching end-of-file a string does not nessecarily ends with an end-of-string sequence.
Parameters:
  prop - the property describing the string to complete length of the string
throws:
  TokenizerException - failure while reading data from the input stream



completeWhitespace
protected int completeWhitespace() throws TokenizerException(Code)
After having identified a whitespace, this method continues to read data until it detects a non-whitespace. number of consecutive whitespaces
throws:
  TokenizerException - failure while reading data from the input stream



currentImage
public String currentImage() throws TokenizerException(Code)
Convenience method to retrieve only the token image of the Token that would be returned by AbstractTokenizer.currentToken . See the method description in Tokenizer . the token image of the current token
See Also:   AbstractTokenizer.currentToken



currentToken
public Token currentToken() throws TokenizerException(Code)
Retrieve the Token that was found by the last call to AbstractTokenizer.nextToken . See the method description in Tokenizer . the Token retrieved by the lahasest call to AbstractTokenizer.nextToken.
throws:
  TokenizerException - if the tokenizer has no current token



currentlyAvailable
public int currentlyAvailable()(Code)
Retrieving the number of the currently available characters. See the method description in Tokenizer . number of currently available characters



getBaseDataProvider
protected DataProvider getBaseDataProvider(int startPos, int length)(Code)
Returns the de.susebox.jtopas.spi.DataProvider of the base tokenizer. This is this tokenizer if it is not an embedded one.
Parameters:
  startPos - position in the input data
Parameters:
  length - number of characters the DataProvider for the given data range



getBaseTokenizer
protected AbstractTokenizer getBaseTokenizer()(Code)
Embedded tokenizers have their base tokenizer they share the input stream with. the base tokenizer (the one owning the input stream and text buffer)



getChar
public char getChar(int pos) throws IndexOutOfBoundsException(Code)
Returns the character at the given position. The method does not attempt to read more data.
Parameters:
  pos - get character on this position in the data stream the character at the given position
throws:
  IndexOutOfBoundsException - if the parameter pos is not in the available text range (text window)



getColumnNumber
public int getColumnNumber()(Code)
If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the current column positionstarting with 0 in the input stream. See the method description in Tokenizer . the current column position
See Also:   AbstractTokenizer.getLineNumber



getCurrentColumn
public int getCurrentColumn()(Code)
Retrieve the current column. The method can only be used if the flag F_COUNT_LINES has been set. Without this flag being set, the return value is undefined. Note that column counting starts with 0, while editors often use 1 for the first column in one row. current column number (starting with 0)



getCurrentLine
public int getCurrentLine()(Code)
Query the current row. The method can only be used if the flag TokenizerProperties.F_COUNT_LINES has been set. Without this flag being set, the return value is undefined.
Note that row counting starts with 0, while editors often use 1 for the first row. current row (starting with 0) or -1 if the flag TokenizerProperties.F_COUNT_LINES is set



getDataProvider
abstract protected DataProvider getDataProvider(int startPos, int length)(Code)
Subclasses have to provide de.susebox.jtopas.spi.DataProvider instances for various token type handlers. The given start position is the absolute number of characters from the beginning of the data source.
Parameters:
  startPos - position in the input data
Parameters:
  length - number of characters the DataProvider for the given data range



getKeywordHandler
public de.susebox.jtopas.spi.KeywordHandler getKeywordHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.KeywordHandler . See the method description in Tokenizer . the currently active whitespace keyword or null, if keyword support is switched off



getLineNumber
public int getLineNumber()(Code)
If the flag TokenizerProperties.F_COUNT_LINES is set, this method will return the line number starting with 0 in the input stream. See the method description in Tokenizer . the current line number starting with 0 or -1 if no line numbers are supplied.
See Also:   AbstractTokenizer.getColumnNumber



getParseFlags
public int getParseFlags()(Code)
Retrieving the parser control flags. See the method description in Tokenizer . the current parser control flags
See Also:   AbstractTokenizer.changeParseFlags



getPatternHandler
public de.susebox.jtopas.spi.PatternHandler getPatternHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.PatternHandler . See the method description in Tokenizer . the currently active de.susebox.jtopas.spi.PatternHandler or null, if patterns are not recognized by the tokenizer
See Also:   AbstractTokenizer.setPatternHandler



getReadPosition
public int getReadPosition()(Code)
Getting the current read offset. See the method description in Tokenizer . the absolute offset in characters from the start of the data source of the Tokenizer where reading will be continued
See Also:   AbstractTokenizer.setReadPositionAbsolute
See Also:   AbstractTokenizer.setReadPositionRelative



getSeparatorHandler
public de.susebox.jtopas.spi.SeparatorHandler getSeparatorHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.SeparatorHandler . See the method description in Tokenizer . the currently active SeparatorHandler or null, if separators aren't recognized by the tokenizer
See Also:   AbstractTokenizer.setSequenceHandler



getSequenceHandler
public de.susebox.jtopas.spi.SequenceHandler getSequenceHandler()(Code)
Retrieving the current SequenceHandler . See the method description in Tokenizer . the currently active SequenceHandler or null, if the baseimplementation is working



getSource
public TokenizerSource getSource()(Code)
Retrieving the TokenizerSource of this Tokenizer. The method may return null if there is no TokenizerSource associated with it.
Parameters:
  the - TokenizerSource associated with this Tokenizer
See Also:   AbstractTokenizer.setSource



getText
public String getText(int start, int len) throws IndexOutOfBoundsException(Code)
Retrieve text from the currently available range. See the method description in Tokenizer .
Parameters:
  start - position where the text begins
Parameters:
  len - length of the text the text beginning at the given position ith the given length
throws:
  IndexOutOfBoundsException - if the starting position or the length is out of the current text window



getTokenizerProperties
public TokenizerProperties getTokenizerProperties()(Code)
Retrieving the current tokenizer characteristics. See the method description in Tokenizer . the TokenizerProperties of this Tokenizer
See Also:   AbstractTokenizer.setTokenizerProperties



getWhitespaceHandler
public de.susebox.jtopas.spi.WhitespaceHandler getWhitespaceHandler()(Code)
Retrieving the current de.susebox.jtopas.spi.WhitespaceHandler . See the method description in Tokenizer . the currently active whitespace handler or null, if the baseimplementation is working



hasMoreToken
public boolean hasMoreToken()(Code)
Checking if there are more tokens available. See the method description in Tokenizer . true if a ca_ll to AbstractTokenizer.nextToken or AbstractTokenizer.nextImagewill succed, false otherwise



isEOF
protected boolean isEOF(int offset) throws TokenizerException(Code)
Checks the EOF condition at the given offset.
Parameters:
  offset - check at this position relative to the current read position true if EOF has been reached, false otherwise
throws:
  TokenizerException - failure while reading data from the input stream



isFlagSet
protected boolean isFlagSet(int flag)(Code)
Checking a given flag. The method considers both the globally set flags in the associated TokenizerProperties instance and the locally set by AbstractTokenizer.changeParseFlags .
Parameters:
  flag - one of the F_... flags defined in TokenizerProperties



isFlagSet
protected boolean isFlagSet(TokenizerProperty prop, int flag)(Code)
Checking if a given flag is set for the given TokenizerProperty , for this Tokenizer or for the used TokenizerProperties . The method considers both the globally set flags in the associated TokenizerProperties instance and the locally set by AbstractTokenizer.changeParseFlags .
Parameters:
  prop - check the flag for this property
Parameters:
  flag - one of the Flags constants



isKeyword
protected TokenizerProperty isKeyword(int startingAtPos, int length) throws TokenizerException(Code)
This method checks if the character sequence starting at a given position with a given lenghth is a keyword. If so, it returns the keyword description as TokenizerProperty object.
Parameters:
  startingAtPos - check at this position
Parameters:
  length - the candidate has this number of characters
throws:
  TokenizerException - routed exception from the active de.susebox.jtopas.spi.KeywordHandler TokenizerProperty describing the keyword or null



isPattern
protected boolean isPattern(int offset, boolean freePatternOnly) throws TokenizerException(Code)
Testing for pattern matching.
Parameters:
  offset - check at this position relative to the current read position
Parameters:
  freePatternOnly - if true consider only pattern that can occur anywhere in the data
throws:
  TokenizerException - failure while reading data from the input stream true if a pattern match was found at the given offset,false otherwise



isSeparator
protected boolean isSeparator(int offset) throws TokenizerException(Code)
This method checks at the given offset if it contains a separator.
Parameters:
  offset - check at this position relative to the current read position
throws:
  TokenizerException - failure while reading data from the input stream true if a separator was found atthe given offset,false otherwise



isSpecialSequence
protected boolean isSpecialSequence(int offset) throws TokenizerException(Code)
This method checks at the given offset if it contains a a special sequence. Unlike the method AbstractTokenizer.test4SpecialSequence it does nothing more.
Parameters:
  offset - check at this position relative to the current read position
throws:
  TokenizerException - failure while reading data from the input stream true if a special sequence was found at the given offset,false otherwise



isWhitespace
protected boolean isWhitespace(char testChar)(Code)
This method checks if the character is a whitespace. Implement Your own code for situations where this default implementation is not fast enough or otherwise not really good.
Parameters:
  testChar - check this character true if the given character is a whitespace,false otherwise



isWhitespace
protected boolean isWhitespace(int offset) throws TokenizerException(Code)
This method checks at the given offset if it is a whitespace.
Parameters:
  offset - check at this position relative to the current read position
throws:
  TokenizerException - failure while reading data from the input stream true if a whitespace sequence was found at the given offset,false otherwise



nextImage
public String nextImage() throws TokenizerException(Code)
This method is a convenience method. It returns only the next token image without any informations about its type or associated information. See the method description in Tokenizer . the token image of the next token
throws:
  TokenizerException - generic exception (list) for all problems that may occur while parsing(IOExceptions for instance)
See Also:   AbstractTokenizer.currentImage



nextToken
public Token nextToken() throws TokenizerException(Code)
Retrieving the next Token . See the method description in Tokenizer . found Token including the EOF token
throws:
  TokenizerException - generic exception (list) for all problems that may occur while parsing(IOExceptions for instance)



propertyChanged
public void propertyChanged(TokenizerPropertyEvent event)(Code)
Event handler method. The given TokenizerPropertyEvent parameter contains the nessecary information about the property change. We choose one single method in favour of various more specialized methods since the reactions on adding, removing and modifying tokenizer properties are often the same (flushing cash, rereading information etc.) are probably not very different.
Note that a modification of the parse flags in the backing TokenizerProperties object removes all flags previously modified through AbstractTokenizer.changeParseFlags .
Parameters:
  event - the TokenizerPropertyEvent that describes the change



readMore
public int readMore() throws TokenizerException(Code)
Try to read more data into the text buffer of the tokenizer. See the method description in Tokenizer . the number of character now available
throws:
  TokenizerException - generic exception (list) for all problems that may occur while reading (IOExceptions for instance)



readMoreData
abstract protected int readMoreData() throws TokenizerException(Code)
This method is called when the tokenizer runs out of data. Its main purpose is to call the TokenizerSource.read method. It is also responsible to handle the flag TokenizerProperties.F_KEEP_DATA flag). number of read bytes or -1 if an end-of-file condition occured
throws:
  TokenizerException - wrapped exceptions from the TokenizerSource.read method



readMoreDataFromBase
protected int readMoreDataFromBase() throws TokenizerException(Code)
This method organizes the input buffer. It moves the current text window if nessecary or allocates more space, if data should be kept completely (see the TokenizerProperties.F_KEEP_DATA flag). Its main purpose is to call the TokenizerSource.read method. number of read bytes or -1 if an end-of-file condition occured
throws:
  TokenizerException - wrapped exceptions from the TokenizerSource.read method



readWhitespaces
protected int readWhitespaces(int startingAtPos, int maxChars) throws TokenizerException(Code)
This method detects the number of whitespace characters starting at the given position. It should return the number of characters identified as whitespaces starting from and including the given start position.
Then overriding this method, use AbstractTokenizer.getBaseDataProvider to access characters.
Do not attempt to actually read more data or do anything that leads to the change of the data source or to tokenizer switching. This is done by the tokenizer framework.
Parameters:
  startingAtPos - start checking for whitespace from this position
Parameters:
  maxChars - if there is no non-whitespace character, read up to this number of characters number of whitespace characters starting from the given offset
throws:
  TokenizerException - failure while reading data from the input stream



setKeywordHandler
public void setKeywordHandler(de.susebox.jtopas.spi.KeywordHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.KeywordHandler or removing any previously installed one. See the method description in Tokenizer .
Parameters:
  handler - the (new) KeywordHandler to use or nullto remove it



setPatternHandler
public void setPatternHandler(de.susebox.jtopas.spi.PatternHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.PatternHandler or removing any previously installed one. See the method description in Tokenizer .
Parameters:
  handler - the (new) de.susebox.jtopas.spi.PatternHandler to use or null to remove it
See Also:   AbstractTokenizer.getPatternHandler



setReadPositionAbsolute
public void setReadPositionAbsolute(int position) throws IndexOutOfBoundsException(Code)
This method sets the tokenizers current read position to the given absolute read position. See the method description in Tokenizer .
When using this method with embedded tokenizers, the user is responsible to set the read position in the currently used tokenizer. It will be propagated by the next call to AbstractTokenizer.switchTo . Until that point, a call to this method has no effect on the other tokenizers sharing the same data source.
Parameters:
  position - absolute position for the next parse operation
throws:
  IndexOutOfBoundsException - if the parameter position isnot in the available text range (text window)
See Also:   AbstractTokenizer.setReadPositionRelative



setReadPositionRelative
public void setReadPositionRelative(int offset) throws IndexOutOfBoundsException(Code)
This method sets the tokenizers new read position the given number of characters forward (positive value) or backward (negative value) starting from the current read position. See the method description in Tokenizer .
When using this method with embedded tokenizers, the user is responsible to set the read position in the currently used tokenizer. It will be propagated by the next call to AbstractTokenizer.switchTo . Until that point, a call to this method has no effect on the other tokenizers sharing the same data source.
Parameters:
  offset - number of characters to move forward (positive offset) orbackward (negative offset)
throws:
  IndexOutOfBoundsException - if the parameter offset wouldmove the read position out of the available text range (text window)
See Also:   AbstractTokenizer.setReadPositionAbsolute



setSeparatorHandler
public void setSeparatorHandler(de.susebox.jtopas.spi.SeparatorHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.SeparatorHandler or removing any previously installed SeparatorHandler. See the method description in Tokenizer .
Parameters:
  handler - the (new) separator handler to use or null toremove it
See Also:   AbstractTokenizer.getSeparatorHandler



setSequenceHandler
public void setSequenceHandler(de.susebox.jtopas.spi.SequenceHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.SequenceHandler or removing any previously installed one. See the method description in Tokenizer .
Parameters:
  handler - the (new) SequenceHandler to use or null to remove it



setSource
public void setSource(TokenizerSource source)(Code)
Setting the source of data. This method is usually called during setup of the Tokenizer but may also be invoked while the tokenizing is in progress. It will reset the tokenizers input buffer, line and column counters etc.
Subclasses should override this method to do their own actions on a data source change. Generally, this base method should be called first in the subclass implementation of setSource (equivalent to super calls in constructors of derived classes).
Parameters:
  source - a TokenizerSource to read data from
See Also:   AbstractTokenizer.getSource



setSource
public void setSource(Reader reader)(Code)
Convenience method to avoid the construction of a TokenizerSource from the most important data source java.io.Reader .
Parameters:
  reader - the java.io.Reader to get data from



setTokenizerProperties
public void setTokenizerProperties(TokenizerProperties props) throws NullPointerException(Code)
Setting the tokenizer characteristics. See the method description in Tokenizer .
Parameters:
  props - the TokenizerProperties for this tokenizer
throws:
  NullPointerException - if the null is passed to the call
See Also:   AbstractTokenizer.getTokenizerProperties



setWhitespaceHandler
public void setWhitespaceHandler(de.susebox.jtopas.spi.WhitespaceHandler handler)(Code)
Setting a new de.susebox.jtopas.spi.WhitespaceHandler or removing any previously installed one. See the method description in Tokenizer .
Parameters:
  handler - the (new) whitespace handler to use or null to switch off whitespace handling
See Also:   AbstractTokenizer.getWhitespaceHandler



splitBlockComment
protected String[] splitBlockComment(TokenizerProperty prop, String image)(Code)
Splits a given block comment into lines. The method is used to retrieve the image parts for block comment token types.
Parameters:
  prop - the TokenizerProperty describing a block comment
Parameters:
  image - split this string into lines an array containing the lines of the image without line separatorcharacters



splitIntoLines
protected String[] splitIntoLines(String image)(Code)
Splits a given String into lines. The method ist used to retrieve the image parts of several token types.
Parameters:
  image - split this string into lines an array containing the lines of the image without line separatorcharacters



splitString
protected String[] splitString(TokenizerProperty prop, String image)(Code)
Splits a given string into lines and removing string escapes. The method is used to retrieve the image parts for string token types.
Parameters:
  prop - the TokenizerProperty describing a string
Parameters:
  image - split this string into lines an array containing the lines of the image without line separatorcharacters



switchTo
public void switchTo(AbstractTokenizer tokenizer) throws TokenizerException(Code)
Changing fron one tokenizer to another. If the given tokenizer has not been added with AbstractTokenizer.addTokenizer , an exception is thrown.
The switchTo method does the nessecary synchronisation between this and the given tokenizer. The user is therefore responsible to use switchTo whenever a tokenizer change is nessecary. It must be done this way:
 Tokenizer base     = new MyTokenizer(...)
 Tokenizer embedded = new MyTokenizer(...)
 // setting properties (comments, keywords etc.)
 ...
 // embedding a tokenizer
 base.addTokenizer(embedded);
 // tokenizing with base
 ...
 if (switch_condition) {
 base.switchTo(embedded);
 }
 // tokenizing with embedded
 ...
 if (switch_condition) {
 embedded.switchTo(base);
 }
 
That way we avoid a more complex synchronisation between tokenizers whenever one of them parses the next data in the input stream. However, the danger of not synchronized tokenizers remains, so take care.
Since is might be possible that the given tokenizer is a derivation of the AbstractTokenizer class, this method is synchronized on tokenizer.
Parameters:
  tokenizer - the tokenizer that should be used from now on



synchronizeAll
protected void synchronizeAll() throws TokenizerException(Code)
When the method AbstractTokenizer.readMoreData changes the contents of the input buffer or the input buffer itself, all embedded tokenizers must be synchronized. That means their member variables are adjusted to the base tokenizer.
throws:
  TokenizerException - if something goes wrong



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.