| java.lang.Object au.id.jericho.lib.html.Segment au.id.jericho.lib.html.Element
Element | final public class Element extends Segment implements HTMLElementName(Code) | | Represents an element
in a specific
document, which encompasses a
,
an optional
and all
in between.
Take the following HTML segment as an example:
<p>This is a sample paragraph.</p>
The whole segment is represented by an Element object. This is comprised of the
StartTag "<p> ",
the
EndTag "</p> ", as well as the text in between.
An element may also contain other elements between its start and end tags.
The term normal element refers to an element having a
with a
of
StartTagType.NORMAL .
This comprises all
and non-HTML elements.
Element instances are obtained using one of the following methods:
See also the
HTMLElements class, and the
XML 1.0 specification for elements.
The three possible structures of an element are listed below:
- Single Tag Element:
-
Example:
<img src="mypicture.jpg">
The element consists only of a single
and has no
(although the start tag itself may have
).
Element.getEndTag() ==null
Element.isEmpty() ==true
Element.getEnd() getEnd() ==
Element.getStartTag() .
Element.getEnd() getEnd()
This occurs in the following situations:
- An HTML element for which the
.
- An HTML element for which the
,
but the end tag is not present in the source document.
- An HTML element for which the
,
where the implicitly terminating tag is situated immediately after the element's
.
- An
- A non-HTML element that is not an
but is missing its end tag.
- An element with a start tag of a
that does not define a
.
- An element with a start tag of a
that does define a
but is missing its end tag.
- Explicitly Terminated Element:
-
Example:
<p>This is a sample paragraph.</p>
The element consists of a
,
,
and an
.
Element.getEndTag() !=null .
Element.isEmpty() ==false (provided the end tag doesn't immediately follow the start tag)
Element.getEnd() getEnd() ==
Element.getEndTag() .
Element.getEnd() getEnd() .
This occurs in the following situations, assuming the start tag's matching end tag is present in the source document:
- Implicitly Terminated Element:
-
Example:
<p>This text is included in the paragraph element even though no end tag is present.
<p>This is the next paragraph.
The element consists of a
and
,
but no
.
Element.getEndTag() ==null .
Element.isEmpty() ==false
Element.getEnd() getEnd() !=
Element.getStartTag() .
Element.getEnd() getEnd() .
This only occurs in an HTML element for which the
.
The element ends at the start of a tag which implies the termination of the element, called the implicitly terminating tag.
If the implicitly terminating tag is situated immediately after the element's
,
the element is classed as a single tag element.
See the element parsing rules for HTML elements with optional end tags
for details on which tags can implicitly terminate a given element.
See also the documentation of the
HTMLElements.getEndTagOptionalElementNames method.
The following rules describe the algorithm used in the
StartTag.getElement method to construct an element.
The detection of the start tag's matching end tag or other terminating tags always takes into account the possible nesting of elements.
-
If the start tag has a
of
StartTagType.NORMAL :
-
If the
of the start tag matches one of the
recognised
(indicating an HTML element):
-
If the end tag for an element of this
is
,
the parser does not conduct any search for an end tag and a single tag element is created.
-
If the end tag for an element of this
is
, the parser searches for the start tag's matching end tag.
-
If the matching end tag is found, an explicitly terminated element is created.
-
If no matching end tag is found, the source document is not valid HTML and the incident is
as a missing required end tag.
In this situation a single tag element is created.
-
If the end tag for an element of this
is
, the parser searches not only for the start tag's matching end tag,
but also for any other tag that implicitly terminates the element.
For each tag (T2) following the start tag (ST1) of this element (E1):
-
If T2 is a start tag:
-
If the
of T2 is in the list of
for E1,
then continue evaluating tags from the
of T2's corresponding
.
-
If the
of T2 is in the list of
for E1,
then E1 ends at the
of T2.
If T2 follows immediately after ST1, a single tag element is created,
otherwise an implicitly terminated element is created.
-
If T2 is an end tag:
-
If no more tags are present in the source document, then E1 ends at the end of the file, and an
implicitly terminated element is created.
Note that the syntactical indication of an
in the start tag
is ignored when determining the end of HTML elements.
See the documentation of the
Element.isEmptyElementTag() method for more information.
-
If the
of the start tag does not match one of the
recognised
(indicating a non-HTML element):
-
If the start tag is an
,
the parser does not conduct any search for an end tag and a single tag element is created.
-
Otherwise, section 3.1
of the XML 1.0 specification states that a matching end tag MUST be present, and
the parser searches for the start tag's matching end tag.
-
If the matching end tag is found, an explicitly terminated element is created.
-
If no matching end tag is found, the source document is not valid XML and the incident is
as a missing required end tag.
In this situation a single tag element is created.
-
If the start tag has any
other than
StartTagType.NORMAL :
-
If the start tag's type does not define a
,
the parser does not conduct any search for an end tag and a single tag element is created.
-
If the start tag's type does define a
,
the parser assumes that a matching end tag is required and searches for it.
See Also: HTMLElements |
getAttributeValue | public String getAttributeValue(String attributeName)(Code) | | Returns the
value of the attribute with the specified name (case insensitive).
Returns null if the
does not
,
no attribute with the specified name exists or the attribute
.
This is equivalent to
Element.getStartTag() .
StartTag.getAttributeValue(String) getAttributeValue(attributeName) .
Parameters: attributeName - the name of the attribute to get. the value of the attribute with the specified name, or null if the attribute does not exist or . |
getChildElements | final public List getChildElements()(Code) | | Returns a list of the immediate children of this element in the document element hierarchy.
The objects in the list are all of type
Element .
See the
Source.getChildElements method for more details.
a list of the immediate children of this element in the document element hierarchy, guaranteed not null . See Also: Element.getParentElement() |
getChildElements | final List getChildElements(int depth)(Code) | | |
getContent | public Segment getContent()(Code) | | Returns the segment representing the content of the element.
This segment spans between the end of the start tag and the start of the end tag.
If the end tag is not present, the content reaches to the end of the element.
Note that before version 2.0 this method returned null if the element was
,
whereas now a zero-length segment is returned.
the segment representing the content of the element, guaranteed not null . |
getContentEnd | int getContentEnd()(Code) | | |
getDepth | public int getDepth()(Code) | | Returns the nesting depth of this element in the document element hierarchy.
The
Source.fullSequentialParse method should be called after construction of the Source object if this method is to be used.
A top-level element has a nesting depth of 0 .
An element formed from a
always have a nesting depth of 0 ,
regardless of whether it is nested inside a normal element.
See the
Source.getChildElements method for more details.
the nesting depth of this element in the document element hierarchy. See Also: Element.getParentElement() |
getEndTag | public EndTag getEndTag()(Code) | | Returns the end tag of the element.
If the element has no end tag this method returns null .
the end tag of the element, or null if the element has no end tag. |
getStartTag | public StartTag getStartTag()(Code) | | Returns the start tag of the element.
the start tag of the element. |
isEmptyElementTag | public boolean isEmptyElementTag()(Code) | | Indicates whether this element is an empty-element tag.
It is signified by an
element with the characters "/> " at the end of the
.
This is equivalent to
Element.isEmpty() &&
Element.getStartTag() .
StartTag.isEmptyElementTag isEmptyElementTag() .
The
StartTag.isEmptyElementTag property only checks whether the start tag syntactically an
empty-element tag, whereas this property also makes sure
the element is in fact
.
A syntactical empty-element tag that is not actually empty can occur if the end tag of an HTML element
is either
or
,
but the start tag is erroneously terminated with the characters "/> " in the source document.
All major browsers ignore the syntactical hint of an empty element in this case, even in an
XHTML document, so this parser does the same.
true if this element is an empty-element tag, otherwise false . |
|
|