| java.lang.Object com.microstar.xml.XmlParser
XmlParser | public class XmlParser (Code) | | Parse XML documents and return parse events through call-backs.
You need to define a class implementing the XmlHandler
interface: an object belonging to this class will receive the
callbacks for the events. (As an alternative to implementing
the full XmlHandler interface, you can simply extend the
HandlerBase convenience class.)
Usage (assuming that MyHandler is your implementation
of the XmlHandler interface):
XmlHandler handler = new MyHandler();
XmlParser parser = new XmlParser();
parser.setHandler(handler);
try {
parser.parse("http://www.host.com/doc.xml", null);
} catch (Exception e) {
[do something interesting]
}
Alternatively, you can use the standard SAX interfaces
with the SAXDriver class as your entry point.
author: Copyright (c) 1997, 1998 by Microstar Software Ltd. author: Written by David Megginson <dmeggins@microstar.com> version: 1.1 See Also: XmlHandler See Also: HandlerBase See Also: SAXDriver |
Field Summary | |
final public static int | ATTRIBUTE_CDATA Constant: the attribute value is a string value. | final public static int | ATTRIBUTE_DEFAULT_FIXED Constant: the attribute was declared #FIXED. | final public static int | ATTRIBUTE_DEFAULT_IMPLIED Constant: the attribute was declared #IMPLIED. | final public static int | ATTRIBUTE_DEFAULT_REQUIRED Constant: the attribute was declared #REQUIRED. | final public static int | ATTRIBUTE_DEFAULT_SPECIFIED Constant: the attribute has a literal default value specified. | final public static int | ATTRIBUTE_DEFAULT_UNDECLARED Constant: the attribute is not declared. | final public static int | ATTRIBUTE_ENTITIES Constant: the attribute value is a list of entity names. | final public static int | ATTRIBUTE_ENTITY Constant: the attribute value is the name of an entity. | final public static int | ATTRIBUTE_ENUMERATED Constant: the attribute value is a token from an enumeration. | final public static int | ATTRIBUTE_ID Constant: the attribute value is a unique identifier. | final public static int | ATTRIBUTE_IDREF Constant: the attribute value is a reference to a unique identifier. | final public static int | ATTRIBUTE_IDREFS Constant: the attribute value is a list of ID references. | final public static int | ATTRIBUTE_NMTOKEN Constant: the attribute value is a name token. | final public static int | ATTRIBUTE_NMTOKENS Constant: the attribute value is a list of name tokens. | final public static int | ATTRIBUTE_NOTATION Constant: the attribute is the name of a notation. | final public static int | ATTRIBUTE_UNDECLARED Constant: the attribute has not been declared for this element type. | final public static int | CONTENT_ANY Constant: the element has a content model of ANY. | final public static int | CONTENT_ELEMENTS Constant: the element has element content. | final public static int | CONTENT_EMPTY Constant: the element has declared content of EMPTY. | final public static int | CONTENT_MIXED Constant: the element has mixed content. | final public static int | CONTENT_UNDECLARED Constant: an element has not been declared. | final public static int | ENTITY_INTERNAL Constant: the entity is internal. | final public static int | ENTITY_NDATA Constant: the entity is external, non-XML data. | final public static int | ENTITY_TEXT Constant: the entity is external XML data. | final public static int | ENTITY_UNDECLARED Constant: the entity has not been declared. | XmlHandler | handler |
Constructor Summary | |
public | XmlParser() Construct a new parser with no associated handler. |
Method Summary | |
void | checkEncoding(String encodingName, boolean ignoreEncoding) Check that the encoding specified makes sense. | void | cleanupVariables() Clean up after the parse to allow some garbage collection. | void | copyIso8859_1ReadBuffer(int count) Convert a buffer of ISO-8859-1-encoded bytes into UTF-16 characters. | void | copyUcs2ReadBuffer(int count, int shift1, int shift2) Convert a buffer of UCS-2-encoded bytes into UTF-16 characters. | void | copyUcs4ReadBuffer(int count, int shift1, int shift2, int shift3, int shift4) Convert a buffer of UCS-4-encoded bytes into UTF-16 characters. | void | copyUtf8ReadBuffer(int count) Convert a buffer of UTF-8-encoded bytes into UTF-16 characters. | void | dataBufferAppend(char c) Add a character to the data buffer. | void | dataBufferAppend(String s) Add a string to the data buffer. | void | dataBufferAppend(char ch, int start, int length) Append (part of) a character array to the data buffer. | void | dataBufferFlush() Flush the contents of the data buffer to the handler, if
appropriate, and reset the buffer for new input. | void | dataBufferNormalize() Normalise whitespace in the data buffer. | String | dataBufferToString() Convert the data buffer to a string. | public Enumeration | declaredAttributes(String elname) Get the declared attributes for an element type.
Parameters: elname - The name of the element type. | public Enumeration | declaredElements() Get the declared elements for an XML document. | public Enumeration | declaredEntities() Get declared entities.
An Enumeration of all the entities declared forthis XML document. | public Enumeration | declaredNotations() Get declared notations.
An Enumeration of all the notations declared forthis XML document. | void | detectEncoding() Attempt to detect the encoding of an entity.
The trick here (as suggested in the XML standard) is that
any entity not in UTF-8, or in UCS-2 with a byte-order mark,
must begin with an XML declaration or an encoding
declaration; we simply have to look for "<?XML" in various
encodings.
This method has no way to distinguish among 8-bit encodings.
Instead, it assumes UTF-8, then (possibly) revises its assumption
later in checkEncoding(). | void | encodingError(String message, int value, int offset) Report a character encoding error. | void | error(String message, String textFound, String textExpected) Report an error. | void | error(String message, char textFound, String textExpected) Report a serious error. | Object | extendArray(Object array, int currentSize, int requiredSize) Ensure the capacity of an array, allocating a new one if
necessary. | void | filterCR() Filter carriage returns in the read buffer. | Object[] | getAttribute(String elName, String name) Retrieve the three-member array representing an
attribute declaration. | public String | getAttributeDefaultValue(String name, String aname) Retrieve the default value of a declared attribute.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. | public int | getAttributeDefaultValueType(String name, String aname) Retrieve the default value type of a declared attribute. | public String | getAttributeEnumeration(String name, String aname) Retrieve the allowed values for an enumerated attribute type.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. | public String | getAttributeExpandedValue(String name, String aname) Retrieve the expanded value of a declared attribute.
All general entities will be expanded.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. | public int | getAttributeType(String name, String aname) Retrieve the declared type of an attribute.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. | public int | getColumnNumber() Return the current column number. | Hashtable | getElementAttributes(String name) Look up the attribute hash table for an element. | public String | getElementContentModel(String name) Look up the content model of an element.
The result will always be null unless the content type is
CONTENT_ELEMENTS or CONTENT_MIXED.
Parameters: name - The element type name. | public int | getElementContentType(String name) Look up the content type of an element.
Parameters: name - The element type name. | public String | getEntityNotationName(String eName) Get the notation name associated with an NDATA entity.
Parameters: ename - The NDATA entity name. | public String | getEntityPublicId(String ename) Return an external entity's public identifier, if any.
Parameters: ename - The name of the external entity. | public String | getEntitySystemId(String ename) Return an external entity's system identifier.
Parameters: ename - The name of the external entity. | public int | getEntityType(String ename) Find the type of an entity. | public String | getEntityValue(String ename) Return the value of an internal entity.
Parameters: ename - The name of the internal entity. | public int | getLineNumber() Return the current line number. | int | getNextUtf8Byte(int pos, int count) Return the next byte value in a UTF-8 sequence. | public String | getNotationPublicId(String nname) Look up the public identifier for a notation.
You will normally use this method to look up a notation
that was provided as an attribute value or for an NDATA entity.
Parameters: nname - The name of the notation. | public String | getNotationSystemId(String nname) Look up the system identifier for a notation.
You will normally use this method to look up a notation
that was provided as an attribute value or for an NDATA entity.
Parameters: nname - The name of the notation. | void | initializeVariables() Re-initialize the variables for each parse. | public String | intern(String s) Return an internalised version of a string.
Ælfred uses this method to create an internalised version
of all names and attribute values, so that it can test equality
with == instead of String.equals() .
If you want to be able to test for equality in the same way,
you can use this method to internalise your own strings first:
String PARA = handler.intern("PARA");
Note that this will not return the same results as String.intern().
Parameters: s - The string to internalise. | public String | intern(char ch, int start, int length) Create an internalised string from a character array.
This is much more efficient than constructing a non-internalised
string first, and then internalising it.
Note that this will not return the same results as String.intern().
Parameters: ch - an array of characters for building the string. Parameters: start - the starting position in the array. Parameters: length - the number of characters to place in the string. | final boolean | isWhitespace(char c) Test if a character is whitespace.
[1] S ::= (#x20 | #x9 | #xd | #xa)+
Parameters: c - The character to test. | public void | parse(String systemId, String publicId, String encoding) Parse an XML document from a URI. | public void | parse(String systemId, String publicId, InputStream stream, String encoding) Parse an XML document from a byte stream. | public void | parse(String systemId, String publicId, Reader reader) Parse an XML document from a character stream. | void | parseAttDef(String elementName) Parse a single attribute definition. | void | parseAttlistDecl() Parse an attribute list declaration. | void | parseAttribute(String name) Parse an attribute assignment. | void | parseCDSect() Parse a CDATA marked section. | void | parseCharRef() Read a character reference. | void | parseComment() Skip a comment. | void | parseConditionalSect() Parse a conditional section. | void | parseContent() Parse the content of an element. | void | parseContentspec(String name) Content specification. | void | parseCp() Parse a content particle. | void | parseDefault(String elementName, String name, int type, String enumeration) Parse the default value for an attribute. | void | parseDoctypedecl() Parse a document type declaration. | void | parseDocument() Parse an XML document.
[1] document ::= prolog element Misc*
This is the top-level parsing function for a single XML
document. | void | parseETag() Parse an end tag. | void | parseElement() Parse an element, with its tags. | void | parseElementdecl() Parse an element type declaration. | void | parseElements() Parse an element-content model. | void | parseEntityDecl() Parse an entity declaration. | void | parseEntityRef(boolean externalAllowed) Parse a reference. | void | parseEnumeration() Parse an enumeration. | void | parseEq() Parse an equals sign surrounded by optional whitespace. | void | parseMarkupdecl() Parse a markup declaration in the internal or external DTD subset. | void | parseMisc() Parse miscellaneous markup outside the document element and DOCTYPE
declaration. | void | parseMixed() Parse mixed content. | void | parseNotationDecl() Parse a notation declaration. | void | parseNotationType() Parse a notation type for an attribute. | void | parsePCData() Parse PCDATA. | void | parsePEReference(boolean isEntityValue) Parse a parameter entity reference. | void | parsePI() Parse a processing instruction and do a call-back. | void | parseProlog() Parse the prolog of an XML document.
[24] prolog ::= XMLDecl? Misc* (Doctypedecl Misc*)?
There are a couple of tricks here. | void | parseTextDecl(boolean ignoreEncoding) Parse the Encoding PI. | void | parseUntil(String delim) Read all data until we find the specified string. | void | parseWhitespace() Parse whitespace characters, and leave them in the data buffer. | void | parseXMLDecl(boolean ignoreEncoding) Parse the XML declaration. | void | popInput() Restore a previous input source. | void | pushCharArray(String ename, char ch, int start, int length) Push a new internal input source.
This method is useful for expanding an internal entity,
or for unreading a string of characters. | void | pushInput(String ename) Save the current input source onto the stack.
This method saves all of the global variables associated with
the current input source, so that they can be restored when a new
input source has finished. | void | pushString(String ename, String s) This method pushes a string back onto input. | void | pushURL(String ename, String publicId, String systemId, Reader reader, InputStream stream, String encoding) Push a new external input source. | void | read8bitEncodingDeclaration() Read just the encoding declaration (or XML declaration) at the
start of an external entity.
When this method is called, we know that the declaration is
present (or appears to be). | int | readAttType() Parse the attribute type. | char | readCh() Read a single character from the readBuffer. | void | readDataChunk() Read a chunk of data from an external input source. | String[] | readExternalIds(boolean inNotation) Try reading external identifiers. | String | readLiteral(int flags) Read a literal. | String | readNmtoken(boolean isName) Read a name or name token. | void | require(String delim) Require a string to appear, or throw an exception. | void | require(char delim) Require a character to appear, or throw an exception. | void | requireWhitespace() Require whitespace characters. | void | setAttribute(String elName, String name, int type, String enumeration, String value, int valueType) Register an attribute declaration for later retrieval. | void | setElement(String name, int contentType, String contentModel, Hashtable attributes) Register an element. | void | setEntity(String eName, int eClass, String pubid, String sysid, String value, String nName) Register an entity declaration for later retrieval. | void | setExternalDataEntity(String eName, String pubid, String sysid, String nName) Register an external data entity. | void | setExternalTextEntity(String eName, String pubid, String sysid) Register an external text entity. | public void | setHandler(XmlHandler handler) Set the handler that will receive parsing events. | void | setInternalEntity(String eName, String value) Register an entity declaration for later retrieval. | void | setNotation(String nname, String pubid, String sysid) Register a notation declaration for later retrieval. | void | skipUntil(String delim) Skip all data until we find the specified string. | void | skipWhitespace() Skip whitespace characters. | boolean | tryEncoding(byte sig, byte b1, byte b2, byte b3, byte b4) Check for a four-byte signature. | boolean | tryEncoding(byte sig, byte b1, byte b2) Check for a two-byte signature. | void | tryEncodingDecl(boolean ignoreEncoding) Check for an encoding declaration. | boolean | tryRead(char delim) Return true if we can read the expected character.
Note that the character will be removed from the input stream
on success, but will be put back on failure. | boolean | tryRead(String delim) Return true if we can read the expected string.
This is simply a convenience method.
Note that the string will be removed from the input stream
on success, but will be put back on failure. | boolean | tryWhitespace() Return true if we can read some whitespace. | void | unread(char c) Push a single character back onto the current input stream. | void | unread(char ch, int length) Push a char array back onto the current input stream. |
ATTRIBUTE_ENTITIES | final public static int ATTRIBUTE_ENTITIES(Code) | | Constant: the attribute value is a list of entity names.
See Also: XmlParser.getAttributeType |
ATTRIBUTE_ENUMERATED | final public static int ATTRIBUTE_ENUMERATED(Code) | | Constant: the attribute value is a token from an enumeration.
See Also: XmlParser.getAttributeType |
ATTRIBUTE_IDREF | final public static int ATTRIBUTE_IDREF(Code) | | Constant: the attribute value is a reference to a unique identifier.
See Also: XmlParser.getAttributeType |
ATTRIBUTE_UNDECLARED | final public static int ATTRIBUTE_UNDECLARED(Code) | | Constant: the attribute has not been declared for this element type.
See Also: XmlParser.getAttributeType |
checkEncoding | void checkEncoding(String encodingName, boolean ignoreEncoding) throws java.lang.Exception(Code) | | Check that the encoding specified makes sense.
Compare what the author has specified in the XML declaration
or encoding PI with what we have detected.
This is also important for distinguishing among the various
7- and 8-bit encodings, such as ISO-LATIN-1 (I cannot autodetect
those).
Parameters: encodingName - The name of the encoding specified by the user. See Also: XmlParser.parseXMLDecl See Also: XmlParser.parseTextDecl |
cleanupVariables | void cleanupVariables()(Code) | | Clean up after the parse to allow some garbage collection.
Leave around anything that might be useful for queries.
|
copyIso8859_1ReadBuffer | void copyIso8859_1ReadBuffer(int count)(Code) | | Convert a buffer of ISO-8859-1-encoded bytes into UTF-16 characters.
When readDataChunk() calls this method, the raw bytes are in
rawReadBuffer, and the final characters will appear in
readBuffer.
This is a direct conversion, with no tricks.
Parameters: count - The number of bytes to convert. See Also: XmlParser.readDataChunk See Also: XmlParser.rawReadBuffer See Also: XmlParser.readBuffer |
copyUcs2ReadBuffer | void copyUcs2ReadBuffer(int count, int shift1, int shift2) throws java.lang.Exception(Code) | | Convert a buffer of UCS-2-encoded bytes into UTF-16 characters.
When readDataChunk() calls this method, the raw bytes are in
rawReadBuffer, and the final characters will appear in
readBuffer.
Parameters: count - The number of bytes to convert. Parameters: shift1 - The number of bits to shift byte 1. Parameters: shift2 - The number of bits to shift byte 2 See Also: XmlParser.readDataChunk See Also: XmlParser.rawReadBuffer See Also: XmlParser.readBuffer |
copyUcs4ReadBuffer | void copyUcs4ReadBuffer(int count, int shift1, int shift2, int shift3, int shift4) throws java.lang.Exception(Code) | | Convert a buffer of UCS-4-encoded bytes into UTF-16 characters.
When readDataChunk() calls this method, the raw bytes are in
rawReadBuffer, and the final characters will appear in
readBuffer.
Java has 16-bit chars, but this routine will attempt to use
surrogates to encoding values between 0x00010000 and 0x000fffff.
Parameters: count - The number of bytes to convert. Parameters: shift1 - The number of bits to shift byte 1. Parameters: shift2 - The number of bits to shift byte 2 Parameters: shift3 - The number of bits to shift byte 2 Parameters: shift4 - The number of bits to shift byte 2 See Also: XmlParser.readDataChunk See Also: XmlParser.rawReadBuffer See Also: XmlParser.readBuffer |
dataBufferAppend | void dataBufferAppend(char c)(Code) | | Add a character to the data buffer.
|
dataBufferAppend | void dataBufferAppend(String s)(Code) | | Add a string to the data buffer.
|
dataBufferAppend | void dataBufferAppend(char ch, int start, int length)(Code) | | Append (part of) a character array to the data buffer.
|
dataBufferFlush | void dataBufferFlush() throws java.lang.Exception(Code) | | Flush the contents of the data buffer to the handler, if
appropriate, and reset the buffer for new input.
|
dataBufferNormalize | void dataBufferNormalize()(Code) | | Normalise whitespace in the data buffer.
|
detectEncoding | void detectEncoding() throws java.lang.Exception(Code) | | Attempt to detect the encoding of an entity.
The trick here (as suggested in the XML standard) is that
any entity not in UTF-8, or in UCS-2 with a byte-order mark,
must begin with an XML declaration or an encoding
declaration; we simply have to look for "<?XML" in various
encodings.
This method has no way to distinguish among 8-bit encodings.
Instead, it assumes UTF-8, then (possibly) revises its assumption
later in checkEncoding(). Any ASCII-derived 8-bit encoding
should work, but most will be rejected later by checkEncoding().
I don't currently detect EBCDIC, since I'm concerned that it
could also be a valid UTF-8 sequence; I'll have to do more checking
later.
See Also: XmlParser.tryEncoding(byte[],byte,byte,byte,byte) See Also: XmlParser.tryEncoding(byte[],byte,byte) See Also: XmlParser.checkEncoding See Also: XmlParser.read8bitEncodingDeclaration |
error | void error(String message, char textFound, String textExpected) throws java.lang.Exception(Code) | | Report a serious error.
Parameters: message - The error message. Parameters: textFound - The text that caused the error (or null). |
extendArray | Object extendArray(Object array, int currentSize, int requiredSize)(Code) | | Ensure the capacity of an array, allocating a new one if
necessary.
|
getAttribute | Object[] getAttribute(String elName, String name)(Code) | | Retrieve the three-member array representing an
attribute declaration.
|
getAttributeDefaultValue | public String getAttributeDefaultValue(String name, String aname)(Code) | | Retrieve the default value of a declared attribute.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. The default value, or null if the attribute was#IMPLIED or simply undeclared and unspecified. See Also: XmlParser.getAttributeExpandedValue |
getAttributeExpandedValue | public String getAttributeExpandedValue(String name, String aname)(Code) | | Retrieve the expanded value of a declared attribute.
All general entities will be expanded.
Parameters: name - The name of the associated element. Parameters: aname - The name of the attribute. The expanded default value, or null if the attribute was#IMPLIED or simply undeclared See Also: XmlParser.getAttributeDefaultValue |
getColumnNumber | public int getColumnNumber()(Code) | | Return the current column number.
|
getElementAttributes | Hashtable getElementAttributes(String name)(Code) | | Look up the attribute hash table for an element.
The hash table is the second item in the element array.
|
getElementContentModel | public String getElementContentModel(String name)(Code) | | Look up the content model of an element.
The result will always be null unless the content type is
CONTENT_ELEMENTS or CONTENT_MIXED.
Parameters: name - The element type name. The normalised content model, as a string. See Also: XmlParser.getElementContentType |
getEntityNotationName | public String getEntityNotationName(String eName)(Code) | | Get the notation name associated with an NDATA entity.
Parameters: ename - The NDATA entity name. The associated notation name, or null if theentity was not declared, or if it is not anNDATA entity. See Also: XmlParser.getEntityType |
getEntityPublicId | public String getEntityPublicId(String ename)(Code) | | Return an external entity's public identifier, if any.
Parameters: ename - The name of the external entity. The entity's system identifier, or null if theentity was not declared, if it is not anexternal entity, or if no public identifier wasprovided. See Also: XmlParser.getEntityType |
getEntitySystemId | public String getEntitySystemId(String ename)(Code) | | Return an external entity's system identifier.
Parameters: ename - The name of the external entity. The entity's system identifier, or null if theentity was not declared, or if it is not anexternal entity. See Also: XmlParser.getEntityType |
getEntityValue | public String getEntityValue(String ename)(Code) | | Return the value of an internal entity.
Parameters: ename - The name of the internal entity. The entity's value, or null if the entity wasnot declared, or if it is not an internal entity. See Also: XmlParser.getEntityType |
getLineNumber | public int getLineNumber()(Code) | | Return the current line number.
|
getNextUtf8Byte | int getNextUtf8Byte(int pos, int count) throws java.lang.Exception(Code) | | Return the next byte value in a UTF-8 sequence.
If it is not possible to get a byte from the current
entity, throw an exception.
Parameters: pos - The current position in the rawReadBuffer. Parameters: count - The number of bytes in the rawReadBuffer The significant six bits of a non-initial byte ina UTF-8 sequence. exception: EOFException - If the sequence is incomplete. |
getNotationPublicId | public String getNotationPublicId(String nname)(Code) | | Look up the public identifier for a notation.
You will normally use this method to look up a notation
that was provided as an attribute value or for an NDATA entity.
Parameters: nname - The name of the notation. A string containing the public identifier, or nullif none was provided or if no such notation wasdeclared. See Also: XmlParser.getNotationSystemId |
getNotationSystemId | public String getNotationSystemId(String nname)(Code) | | Look up the system identifier for a notation.
You will normally use this method to look up a notation
that was provided as an attribute value or for an NDATA entity.
Parameters: nname - The name of the notation. A string containing the system identifier, or nullif no such notation was declared. See Also: XmlParser.getNotationPublicId |
initializeVariables | void initializeVariables()(Code) | | Re-initialize the variables for each parse.
|
intern | public String intern(String s)(Code) | | Return an internalised version of a string.
Ælfred uses this method to create an internalised version
of all names and attribute values, so that it can test equality
with == instead of String.equals() .
If you want to be able to test for equality in the same way,
you can use this method to internalise your own strings first:
String PARA = handler.intern("PARA");
Note that this will not return the same results as String.intern().
Parameters: s - The string to internalise. An internalised version of the string. See Also: XmlParser.intern(char[],int,int) See Also: java.lang.String.intern |
intern | public String intern(char ch, int start, int length)(Code) | | Create an internalised string from a character array.
This is much more efficient than constructing a non-internalised
string first, and then internalising it.
Note that this will not return the same results as String.intern().
Parameters: ch - an array of characters for building the string. Parameters: start - the starting position in the array. Parameters: length - the number of characters to place in the string. an internalised string. See Also: XmlParser.intern(String) See Also: java.lang.String.intern |
isWhitespace | final boolean isWhitespace(char c)(Code) | | Test if a character is whitespace.
[1] S ::= (#x20 | #x9 | #xd | #xa)+
Parameters: c - The character to test. true if the character is whitespace. |
parse | public void parse(String systemId, String publicId, String encoding) throws java.lang.Exception(Code) | | Parse an XML document from a URI.
You may parse a document more than once, but only one thread
may call this method for an object at one time.
Parameters: systemId - The URI of the document. Parameters: publicId - The public identifier of the document, or null. Parameters: encoding - The suggested encoding, or null if unknown. exception: java.lang.Exception - Any exception thrown by yourown handlers, or any derivation of java.io.IOExceptionthrown by the parser itself. |
parse | public void parse(String systemId, String publicId, InputStream stream, String encoding) throws java.lang.Exception(Code) | | Parse an XML document from a byte stream.
The URI that you supply will become the base URI for
resolving relative links, but Ælfred will actually read
the document from the supplied input stream.
You may parse a document more than once, but only one thread
may call this method for an object at one time.
Parameters: systemId - The base URI of the document, or null if notknown. Parameters: publicId - The public identifier of the document, or nullif not known. Parameters: stream - A byte input stream. Parameters: encoding - The suggested encoding, or null if unknown. exception: java.lang.Exception - Any exception thrown by yourown handlers, or any derivation of java.io.IOExceptionthrown by the parser itself. |
parse | public void parse(String systemId, String publicId, Reader reader) throws java.lang.Exception(Code) | | Parse an XML document from a character stream.
The URI that you supply will become the base URI for
resolving relative links, but Ælfred will actually read
the document from the supplied input stream.
You may parse a document more than once, but only one thread
may call this method for an object at one time.
Parameters: systemId - The base URI of the document, or null if notknown. Parameters: publicId - The public identifier of the document, or nullif not known. Parameters: reader - A character stream. exception: java.lang.Exception - Any exception thrown by yourown handlers, or any derivation of java.io.IOExceptionthrown by the parser itself. |
parseAttDef | void parseAttDef(String elementName) throws java.lang.Exception(Code) | | Parse a single attribute definition.
[53] AttDef ::= S %Name S %AttType S %Default
|
parseAttlistDecl | void parseAttlistDecl() throws java.lang.Exception(Code) | | Parse an attribute list declaration.
[52] AttlistDecl ::= ''
NOTE: the ' |
parseCDSect | void parseCDSect() throws java.lang.Exception(Code) | | Parse a CDATA marked section.
[20] CDSect ::= CDStart CData CDEnd
[21] CDStart ::= '<![CDATA['
[22] CData ::= (Char* - (Char* ']]>' Char*))
[23] CDEnd ::= ']]>'
(The '<![CDATA[' has already been read.)
Note that this just appends characters to the dataBuffer,
without actually generating an event.
|
parseCharRef | void parseCharRef() throws java.lang.Exception(Code) | | Read a character reference.
[67] CharRef ::= '' [0-9]+ ';' | '' [0-9a-fA-F]+ ';'
NOTE: the '' has already been read.
|
parseComment | void parseComment() throws java.lang.Exception(Code) | | Skip a comment.
[18] Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* "-->"
(The <!-- has already been read.)
|
parseConditionalSect | void parseConditionalSect() throws java.lang.Exception(Code) | | Parse a conditional section.
[63] conditionalSect ::= includeSect || ignoreSect
[64] includeSect ::= ''
[65] ignoreSect ::= ''
[66] ignoreSectContents ::= ((SkipLit | Comment | PI) -(Char* ']]>'))
| ('')
| (Char - (']' | [<'"]))
| (' |
parseContent | void parseContent() throws java.lang.Exception(Code) | | Parse the content of an element.
[37] content ::= (element | PCData | Reference | CDSect | PI | Comment)*
[68] Reference ::= EntityRef | CharRef
|
parseContentspec | void parseContentspec(String name) throws java.lang.Exception(Code) | | Content specification.
[41] contentspec ::= 'EMPTY' | 'ANY' | Mixed | elements
|
parseCp | void parseCp() throws java.lang.Exception(Code) | | Parse a content particle.
[43] cp ::= (Name | choice | seq) ('?' | '*' | '+')
NOTE: I actually use a slightly different production here:
cp ::= (elements | (Name ('?' | '*' | '+')?))
|
parseDefault | void parseDefault(String elementName, String name, int type, String enumeration) throws java.lang.Exception(Code) | | Parse the default value for an attribute.
[62] Default ::= '#REQUIRED' | '#IMPLIED' | ((%'#FIXED' S)? %AttValue
|
parseDoctypedecl | void parseDoctypedecl() throws java.lang.Exception(Code) | | Parse a document type declaration.
[28] doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S?
('[' %markupdecl* ']' S?)? '>'
(The <!DOCTYPE has already been read.)
|
parseDocument | void parseDocument() throws java.lang.Exception(Code) | | Parse an XML document.
[1] document ::= prolog element Misc*
This is the top-level parsing function for a single XML
document. As a minimum, a well-formed document must have
a document element, and a valid document must have a prolog
as well.
|
parseETag | void parseETag() throws java.lang.Exception(Code) | | Parse an end tag.
[36] ETag ::= '' Name S? '>'
NOTE: parseContent() chains to here.
|
parseElement | void parseElement() throws java.lang.Exception(Code) | | Parse an element, with its tags.
[33] STag ::= '<' Name (S Attribute)* S? '>' [WFC: unique Att spec]
[38] element ::= EmptyElement | STag content ETag
[39] EmptyElement ::= '<' Name (S Attribute)* S? '/>'
[WFC: unique Att spec]
(The '<' has already been read.)
NOTE: this method actually chains onto parseContent(), if necessary,
and parseContent() will take care of calling parseETag().
|
parseElementdecl | void parseElementdecl() throws java.lang.Exception(Code) | | Parse an element type declaration.
[40] elementdecl ::= ''
[VC: Unique Element Declaration]
NOTE: the ' |
parseElements | void parseElements() throws java.lang.Exception(Code) | | Parse an element-content model.
[42] elements ::= (choice | seq) ('?' | '*' | '+')?
[44] cps ::= S? %cp S?
[45] choice ::= '(' S? %ctokplus (S? '|' S? %ctoks)* S? ')'
[46] ctokplus ::= cps ('|' cps)+
[47] ctoks ::= cps ('|' cps)*
[48] seq ::= '(' S? %stoks (S? ',' S? %stoks)* S? ')'
[49] stoks ::= cps (',' cps)*
NOTE: the opening '(' and S have already been read.
TODO: go over parameter entity boundaries more carefully.
|
parseEntityDecl | void parseEntityDecl() throws java.lang.Exception(Code) | | Parse an entity declaration.
[71] EntityDecl ::= ''
| ''
[72] EntityDef ::= EntityValue | ExternalDef
[73] ExternalDef ::= ExternalID %NDataDecl?
[74] ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
[75] NDataDecl ::= S %'NDATA' S %Name
NOTE: the ' |
parseEntityRef | void parseEntityRef(boolean externalAllowed) throws java.lang.Exception(Code) | | Parse a reference.
[69] EntityRef ::= '&' Name ';'
NOTE: the '&' has already been read.
Parameters: externalAllowed - External entities are allowed here. |
parseEnumeration | void parseEnumeration() throws java.lang.Exception(Code) | | Parse an enumeration.
[60] Enumeration ::= '(' S? %Etoks (S? '|' S? %Etoks)* S? ')'
[61] Etoks ::= %Nmtoken (S? '|' S? %Nmtoken)*
NOTE: the '(' has already been read.
|
parseEq | void parseEq() throws java.lang.Exception(Code) | | Parse an equals sign surrounded by optional whitespace.
[35] Eq ::= S? '=' S?
|
parseMarkupdecl | void parseMarkupdecl() throws java.lang.Exception(Code) | | Parse a markup declaration in the internal or external DTD subset.
[29] markupdecl ::= ( %elementdecl | %AttlistDecl | %EntityDecl |
%NotationDecl | %PI | %S | %Comment |
InternalPERef )
[30] InternalPERef ::= PEReference
[31] extSubset ::= (%markupdecl | %conditionalSect)*
|
parseMisc | void parseMisc() throws java.lang.Exception(Code) | | Parse miscellaneous markup outside the document element and DOCTYPE
declaration.
[27] Misc ::= Comment | PI | S
|
parseMixed | void parseMixed() throws java.lang.Exception(Code) | | Parse mixed content.
[50] Mixed ::= '(' S? %( %'#PCDATA' (S? '|' S? %Mtoks)* ) S? ')*'
| '(' S? %('#PCDATA') S? ')'
[51] Mtoks ::= %Name (S? '|' S? %Name)*
NOTE: the S and '#PCDATA' have already been read.
|
parseNotationDecl | void parseNotationDecl() throws java.lang.Exception(Code) | | Parse a notation declaration.
[81] NotationDecl ::= ''
NOTE: the ' |
parseNotationType | void parseNotationType() throws java.lang.Exception(Code) | | Parse a notation type for an attribute.
[58] NotationType ::= %'NOTATION' S '(' S? %Ntoks (S? '|' S? %Ntoks)*
S? ')'
[59] Ntoks ::= %Name (S? '|' S? %Name)
NOTE: the 'NOTATION' has already been read
|
parsePCData | void parsePCData() throws java.lang.Exception(Code) | | Parse PCDATA.
[16] PCData ::= [^<&]*
The trick here is that the data stays in the dataBuffer without
necessarily being converted to a string right away.
|
parsePEReference | void parsePEReference(boolean isEntityValue) throws java.lang.Exception(Code) | | Parse a parameter entity reference.
[70] PEReference ::= '%' Name ';'
NOTE: the '%' has already been read.
|
parsePI | void parsePI() throws java.lang.Exception(Code) | | Parse a processing instruction and do a call-back.
[19] PI ::= '<?' Name (S (Char* - (Char* '?>' Char*)))? '?>'
(The <? has already been read.)
An XML processing instruction must begin with
a Name, which is the instruction's target.
|
parseProlog | void parseProlog() throws java.lang.Exception(Code) | | Parse the prolog of an XML document.
[24] prolog ::= XMLDecl? Misc* (Doctypedecl Misc*)?
There are a couple of tricks here. First, it is necessary to
declare the XML default attributes after the DTD (if present)
has been read. Second, it is not possible to expand general
references in attribute value literals until after the entire
DTD (if present) has been parsed.
We do not look for the XML declaration here, because it is
handled by pushURL().
See Also: pushURL |
parseTextDecl | void parseTextDecl(boolean ignoreEncoding) throws java.lang.Exception(Code) | | Parse the Encoding PI.
[78] EncodingDecl ::= S 'encoding' Eq QEncoding
[79] EncodingPI ::= '<?xml' S 'encoding' Eq QEncoding S? '?>'
[80] QEncoding ::= '"' Encoding '"' | "'" Encoding "'"
[81] Encoding ::= LatinName
[82] LatinName ::= [A-Za-z] ([A-Za-z0-9._] | '-')*
(The <?xml ' and whitespace have already been read.)
See Also: XmlParser.parseXMLDecl See Also: XmlParser.checkEncoding |
parseWhitespace | void parseWhitespace() throws java.lang.Exception(Code) | | Parse whitespace characters, and leave them in the data buffer.
|
parseXMLDecl | void parseXMLDecl(boolean ignoreEncoding) throws java.lang.Exception(Code) | | Parse the XML declaration.
[25] XMLDecl ::= '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[26] VersionInfo ::= S 'version' Eq ('"1.0"' | "'1.0'")
[33] SDDecl ::= S 'standalone' Eq "'" ('yes' | 'no') "'"
| S 'standalone' Eq '"' ("yes" | "no") '"'
[78] EncodingDecl ::= S 'encoding' Eq QEncoding
([80] to [82] are also significant.)
(The <?xml and whitespace have already been read.)
TODO: validate value of standalone.
See Also: XmlParser.parseTextDecl See Also: XmlParser.checkEncoding |
pushCharArray | void pushCharArray(String ename, char ch, int start, int length) throws java.lang.Exception(Code) | | Push a new internal input source.
This method is useful for expanding an internal entity,
or for unreading a string of characters. It creates a new
readBuffer containing the characters in the array, instead
of characters converted from an input byte stream.
I've added a couple of optimisations: don't push zero-
length strings, and just push back a single character
for 1-character strings; this should save some time and memory.
Parameters: ch - The char array to push. See Also: XmlParser.pushString See Also: XmlParser.pushURL See Also: XmlParser.readBuffer See Also: XmlParser.sourceType See Also: XmlParser.pushInput |
pushString | void pushString(String ename, String s) throws java.lang.Exception(Code) | | This method pushes a string back onto input.
It is useful either as the expansion of an internal entity,
or for backtracking during the parse.
Call pushCharArray() to do the actual work.
Parameters: s - The string to push back onto input. See Also: XmlParser.pushCharArray |
read8bitEncodingDeclaration | void read8bitEncodingDeclaration() throws java.lang.Exception(Code) | | Read just the encoding declaration (or XML declaration) at the
start of an external entity.
When this method is called, we know that the declaration is
present (or appears to be). We also know that the entity is
in some sort of ASCII-derived 8-bit encoding.
The idea of this is to let us read what the 8-bit encoding is
before we've committed to converting any more of the file; the
XML or encoding declaration must be in 7-bit ASCII, so we're
safe as long as we don't go past it.
|
readAttType | int readAttType() throws java.lang.Exception(Code) | | Parse the attribute type.
[54] AttType ::= StringType | TokenizedType | EnumeratedType
[55] StringType ::= 'CDATA'
[56] TokenizedType ::= 'ID' | 'IDREF' | 'IDREFS' | 'ENTITY' | 'ENTITIES' |
'NMTOKEN' | 'NMTOKENS'
[57] EnumeratedType ::= NotationType | Enumeration
TODO: validate the type!!
|
readCh | char readCh() throws java.lang.Exception(Code) | | Read a single character from the readBuffer.
The readDataChunk() method maintains the buffer.
If we hit the end of an entity, try to pop the stack and
keep going.
(This approach doesn't really enforce XML's rules about
entity boundaries, but this is not currently a validating
parser).
This routine also attempts to keep track of the current
position in external entities, but it's not entirely accurate.
The next available input character. See Also: XmlParser.unread(char) See Also: XmlParser.unread(String) See Also: XmlParser.readDataChunk See Also: XmlParser.readBuffer See Also: XmlParser.line The next character from the current input source. |
readExternalIds | String[] readExternalIds(boolean inNotation) throws java.lang.Exception(Code) | | Try reading external identifiers.
The system identifier is not required for notations.
Parameters: inNotation - Are we in a notation? A two-member String array containing the identifiers. |
readLiteral | String readLiteral(int flags) throws java.lang.Exception(Code) | | Read a literal.
[10] AttValue ::= '"' ([^<&"] | Reference)* '"'
| "'" ([^<&'] | Reference)* "'"
[11] SystemLiteral ::= '"' URLchar* '"' | "'" (URLchar - "'")* "'"
[13] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
[9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"'
| "'" ([^%&'] | PEReference | Reference)* "'"
|
readNmtoken | String readNmtoken(boolean isName) throws java.lang.Exception(Code) | | Read a name or name token.
[5] Name ::= (Letter | '_' | ':') (NameChar)*
[7] Nmtoken ::= (NameChar)+
NOTE: [6] is implemented implicitly where required.
|
requireWhitespace | void requireWhitespace() throws java.lang.Exception(Code) | | Require whitespace characters.
[1] S ::= (#x20 | #x9 | #xd | #xa)+
|
setAttribute | void setAttribute(String elName, String name, int type, String enumeration, String value, int valueType) throws java.lang.Exception(Code) | | Register an attribute declaration for later retrieval.
Format:
- String type
- String default value
- int value type
TODO: do something with attribute types.
|
setExternalTextEntity | void setExternalTextEntity(String eName, String pubid, String sysid)(Code) | | Register an external text entity.
|
setHandler | public void setHandler(XmlHandler handler)(Code) | | Set the handler that will receive parsing events.
Parameters: handler - The handler to receive callback events. See Also: XmlParser.parse See Also: XmlHandler |
setInternalEntity | void setInternalEntity(String eName, String value)(Code) | | Register an entity declaration for later retrieval.
|
skipWhitespace | void skipWhitespace() throws java.lang.Exception(Code) | | Skip whitespace characters.
[1] S ::= (#x20 | #x9 | #xd | #xa)+
|
tryEncoding | boolean tryEncoding(byte sig, byte b1, byte b2, byte b3, byte b4)(Code) | | Check for a four-byte signature.
Utility routine for detectEncoding().
Always looks for some part of "Parameters: sig - The first four bytes read. Parameters: b1 - The first byte of the signature Parameters: b2 - The second byte of the signature Parameters: b3 - The third byte of the signature Parameters: b4 - The fourth byte of the signature See Also: XmlParser.detectEncoding |
tryEncoding | boolean tryEncoding(byte sig, byte b1, byte b2)(Code) | | Check for a two-byte signature.
Looks for a UCS-2 byte-order mark.
Utility routine for detectEncoding().
Parameters: sig - The first four bytes read. Parameters: b1 - The first byte of the signature Parameters: b2 - The second byte of the signature See Also: XmlParser.detectEncoding |
tryEncodingDecl | void tryEncodingDecl(boolean ignoreEncoding) throws java.lang.Exception(Code) | | Check for an encoding declaration.
|
tryRead | boolean tryRead(char delim) throws java.lang.Exception(Code) | | Return true if we can read the expected character.
Note that the character will be removed from the input stream
on success, but will be put back on failure. Do not attempt to
read the character again if the method succeeds.
Parameters: delim - The character that should appear next. For ainsensitive match, you must supply this in upper-case. true if the character was successfully read, or false ifit was not. See Also: XmlParser.tryRead(String) |
tryRead | boolean tryRead(String delim) throws java.lang.Exception(Code) | | Return true if we can read the expected string.
This is simply a convenience method.
Note that the string will be removed from the input stream
on success, but will be put back on failure. Do not attempt to
read the string again if the method succeeds.
This method will push back a character rather than an
array whenever possible (probably the majority of cases).
NOTE: This method currently has a hard-coded limit
of 100 characters for the delimiter.
Parameters: delim - The string that should appear next. true if the string was successfully read, or false ifit was not. See Also: XmlParser.tryRead(char) |
tryWhitespace | boolean tryWhitespace() throws java.lang.Exception(Code) | | Return true if we can read some whitespace.
This is simply a convenience method.
This method will push back a character rather than an
array whenever possible (probably the majority of cases).
true if whitespace was found. |
unread | void unread(char c) throws java.lang.Exception(Code) | | Push a single character back onto the current input stream.
This method usually pushes the character back onto
the readBuffer, while the unread(String) method treats the
string as a new internal entity.
I don't think that this would ever be called with
readBufferPos = 0, because the methods always reads a character
before unreading it, but just in case, I've added a boundary
condition.
Parameters: c - The character to push back. See Also: XmlParser.readCh See Also: XmlParser.unread(String) See Also: XmlParser.unread(char[]) See Also: XmlParser.readBuffer |
|
|