Java Doc for LexML.java in  » Web-Server » Brazil » sunlabs » brazil » util » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Server » Brazil » sunlabs.brazil.util 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   sunlabs.brazil.util.LexML

All known Subclasses:   sunlabs.brazil.util.LexHTML,
LexML
public class LexML (Code)
This class breaks angle-bracket-separated markup languages like SGML, XML, and HTML into tokens. It understands three types of tokens:
tags
Formally known as "entities", tags are delimited by "<" and ">". The first word in the tag is the tag name and the rest of the tag consists of the attributes, a set of "name=value" or "name" data. Spaces in tags are not significant except for quoted values in the attributes.
string
Plain strings that are not in angle-brackets. Spaces are significant and preserved.
comments
Delimited by "<!--" and "-->". All text between the delimiters is part of the comment. However, by convention, some comments actually contain data and so the methods that extract the fields from tags can be used to attempt to extract the fields from comments, too. Spaces are significant and preserved in a comment, unless the comment is treated as a tag, in which the tag rules apply.

This class is intended to parse markup languages, not to validate them. "Malformed" data is interpreted as graciously as possible, in order to extract as much information as possible. For instance: spaces are allowed between the "<" and the tag name, values in tags do not need to be quoted, and unbalanced quotes are accepted.

One type of "malformed" data specifically not handled is a quoted ">" character occurring within the body of a tag. Even if it is quoted, a ">" in the attributes of a tag will be interpreted as the end of the tag. For example, the single tag <img src='foo.jpg' alt='xyz > abc'> will be erroneously broken by this parser into two tokens:

  • the tag <img src='foo.jpg' alt='xyz >
  • the string "abc'>" (and possibly whatever text follows after).
Unfortunately, this type of "malformed" data is known to occur regularly.

This class also may not properly parse all well-formed XML tags, such as tags with extended paired delimiters <& and &>, <? and ?>, or <![CDATA[ and ]]>. Additionally, XML tags that have embedded comments containing the ">" character will not be parsed correctly (for example: <!DOCTYPE foo SYSTEM -- a > b -- foo.dtd>), since the ">" in the comment will be interpreted as the end of declaration tag, for the same reason mentioned above.
author:
   Colin Stevens (colin.stevens@sun.com)
version:
   1.6, 01/01/16



Field Summary
final public static  intCOMMENT
    
final public static  intSTRING
    
final public static  intTAG
    
 intargsEnd
    
 intargsStart
    
 Stringstr
    
 intstrEnd
    
 inttagEnd
    
 inttagStart
    
 inttokenEnd
    
 inttokenStart
    
 inttype
    

Constructor Summary
public  LexML(String str)
     Create a new ML parser, which can be used to iterate over the tokens in the given string.

Method Summary
public  StringgetArgs()
     Gets the name/value pairs in the body of the current tag as a string.
public  StringMapgetAttributes()
     Gets the name/value pairs in the body of the current tag as a table.

Any quote marks in the body, either single or double quotes, are left on the values, so that the values can be easily re-emitted and still form a valid body.

For names that have no associated value in the tag, the value is stored as the empty string "".

public  StringgetBody()
     Gets the string making up the current token, not including the angle brackets or comment delimiters, if appropriate.
public  StringgetTag()
     Gets the tag name at the beginning of the current tag.
public  StringgetToken()
     Gets the string making up the whole current token, including the brackets or comment delimiters, if appropriate.
public  intgetType()
     Gets the type of the current token.
public  booleannextToken()
     Advances to the next token.
public  voidreplace(String str)
     Changes the string that this LexML is parsing.

Example use: the caller decided to parse part of the body, and now wants this LexML to pick up and parse the rest of it.
Parameters:
  str - The string that this LexML should now parse.

public  Stringrest()
     Gets the rest of the string that has not yet been parsed.

Field Detail
COMMENT
final public static int COMMENT(Code)
The value returned by getType for comment tokens



STRING
final public static int STRING(Code)
The value returned by getType for string tokens



TAG
final public static int TAG(Code)
The value returned by getType for tag tokens



argsEnd
int argsEnd(Code)



argsStart
int argsStart(Code)



str
String str(Code)



strEnd
int strEnd(Code)



tagEnd
int tagEnd(Code)



tagStart
int tagStart(Code)



tokenEnd
int tokenEnd(Code)



tokenStart
int tokenStart(Code)



type
int type(Code)




Constructor Detail
LexML
public LexML(String str)(Code)
Create a new ML parser, which can be used to iterate over the tokens in the given string.
Parameters:
  str - The ML to parse.




Method Detail
getArgs
public String getArgs()(Code)
Gets the name/value pairs in the body of the current tag as a string. The name/value pairs, or null ifthe current token was a string.



getAttributes
public StringMap getAttributes()(Code)
Gets the name/value pairs in the body of the current tag as a table.

Any quote marks in the body, either single or double quotes, are left on the values, so that the values can be easily re-emitted and still form a valid body.

For names that have no associated value in the tag, the value is stored as the empty string "". Therefore, the two tags <table border> and <table border=""> cannot be distinguished based on the result of calling getAttributes. The table of name/value pairs, or null ifthe current token was a string.




getBody
public String getBody()(Code)
Gets the string making up the current token, not including the angle brackets or comment delimiters, if appropriate. The body of the token.



getTag
public String getTag()(Code)
Gets the tag name at the beginning of the current tag. In other words, the tag name for <table border=3> is "table". Any surrounding space characters are removed, but the case of the tag is preserved.

For comments, the "tag" is the first word in the comment. This can be used to help parse comments that are structured similar to regular tags, such as server-side include comments like <!--#include virtual="file.inc">. The tag in this case would be "!--#include". The tag name, or null if the current tokenwas a string.




getToken
public String getToken()(Code)
Gets the string making up the whole current token, including the brackets or comment delimiters, if appropriate. The current token.



getType
public int getType()(Code)
Gets the type of the current token. The type.
See Also:   LexML.COMMENT
See Also:   LexML.TAG
See Also:   LexML.STRING



nextToken
public boolean nextToken()(Code)
Advances to the next token. The user can then call the other methods in this class to get information about the new current token. true if a token was found, falseif there were no more tokens left.



replace
public void replace(String str)(Code)
Changes the string that this LexML is parsing.

Example use: the caller decided to parse part of the body, and now wants this LexML to pick up and parse the rest of it.
Parameters:
  str - The string that this LexML should now parse. Whateverstring this LexML was parsing is forgotten, and it nowstarts parsing at the beginning of the new string.
See Also:   LexML.rest




rest
public String rest()(Code)
Gets the rest of the string that has not yet been parsed.

Example use: to help the parser in circumstances such as the HTML "<script>" tag where the script body doesn't the obey the rules because it might contain lone "<" or ">" characters, which this parser would interpret as the start or end of funny-looking tags. The unparsed remainder of the string.
See Also:   LexML.replace




Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.