Java Doc for Element.java in » HTML-Parser » jericho-html » au » id » jericho » lib » html » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » HTML Parser » jericho html » au.id.jericho.lib.html

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

au.id.jericho.lib.html .Segment

au.id.jericho.lib.html .Element

Element

final public class Element extends Segment implements HTMLElementName(Code)

Represents an element in a specific document, which encompasses a , an optional and all in between.

Take the following HTML segment as an example:

This is a sample paragraph.

The whole segment is represented by an Element object. This is comprised of the StartTag "", the EndTag "", as well as the text in between. An element may also contain other elements between its start and end tags.

The term normal element refers to an element having a with a of StartTagType.NORMAL . This comprises all and non-HTML elements.

Element instances are obtained using one of the following methods:

See also the HTMLElements class, and the XML 1.0 specification for elements.

Element Structure

The three possible structures of an element are listed below:

Single Tag Element:

Example:
<img src="mypicture.jpg">

The element consists only of a single and has no (although the start tag itself may have ).
Element.getEndTag() ==null
Element.isEmpty() ==true
Element.getEnd() getEnd() == Element.getStartTag() . Element.getEnd() getEnd()

This occurs in the following situations:

An HTML element for which the .
An HTML element for which the , but the end tag is not present in the source document.
An HTML element for which the , where the implicitly terminating tag is situated immediately after the element's .
An
A non-HTML element that is not an but is missing its end tag.
An element with a start tag of a that does not define a .
An element with a start tag of a that does define a but is missing its end tag.

Explicitly Terminated Element:

Example:
This is a sample paragraph.

The element consists of a , , and an .
Element.getEndTag() !=null.
Element.isEmpty() ==false (provided the end tag doesn't immediately follow the start tag)
Element.getEnd() getEnd() == Element.getEndTag() . Element.getEnd() getEnd() .

This occurs in the following situations, assuming the start tag's matching end tag is present in the source document:

An HTML element for which the end tag is either or .
A non-HTML element that is not an .
An element with a start tag of a that defines a .

Implicitly Terminated Element:

Example:
This text is included in the paragraph element even though no end tag is present.
This is the next paragraph.

The element consists of a and , but no .
Element.getEndTag() ==null.
Element.isEmpty() ==false
Element.getEnd() getEnd() != Element.getStartTag() . Element.getEnd() getEnd() .

This only occurs in an HTML element for which the .

The element ends at the start of a tag which implies the termination of the element, called the implicitly terminating tag. If the implicitly terminating tag is situated immediately after the element's , the element is classed as a single tag element.

See the element parsing rules for HTML elements with optional end tags for details on which tags can implicitly terminate a given element.

See also the documentation of the HTMLElements.getEndTagOptionalElementNames method.

Element Parsing Rules

The following rules describe the algorithm used in the StartTag.getElement method to construct an element. The detection of the start tag's matching end tag or other terminating tags always takes into account the possible nesting of elements.

If the start tag has a of StartTagType.NORMAL :
- If the of the start tag matches one of the recognised (indicating an HTML element):
  - If the end tag for an element of this is , the parser does not conduct any search for an end tag and a single tag element is created.
  - If the end tag for an element of this is , the parser searches for the start tag's matching end tag.
    - If the matching end tag is found, an explicitly terminated element is created.
    - If no matching end tag is found, the source document is not valid HTML and the incident is as a missing required end tag. In this situation a single tag element is created.
  - If the end tag for an element of this is , the parser searches not only for the start tag's matching end tag, but also for any other tag that implicitly terminates the element.
    For each tag (T2) following the start tag (ST1) of this element (E1):
    - If T2 is a start tag:
      - If the of T2 is in the list of for E1, then continue evaluating tags from the of T2's corresponding .
      - If the of T2 is in the list of for E1, then E1 ends at the of T2. If T2 follows immediately after ST1, a single tag element is created, otherwise an implicitly terminated element is created.
    - If T2 is an end tag:
      - If the of T2 is the same as that of ST1, an explicitly terminated element is created.
      - If the of T2 is in the list of for E1, then E1 ends at the of T2. If T2 follows immediately after ST1, a single tag element is created, otherwise an implicitly terminated element is created.
    - If no more tags are present in the source document, then E1 ends at the end of the file, and an implicitly terminated element is created.
  Note that the syntactical indication of an in the start tag is ignored when determining the end of HTML elements. See the documentation of the Element.isEmptyElementTag() method for more information.
- If the of the start tag does not match one of the recognised (indicating a non-HTML element):
  - If the start tag is an , the parser does not conduct any search for an end tag and a single tag element is created.
  - Otherwise, section 3.1 of the XML 1.0 specification states that a matching end tag MUST be present, and the parser searches for the start tag's matching end tag.
    - If the matching end tag is found, an explicitly terminated element is created.
    - If no matching end tag is found, the source document is not valid XML and the incident is as a missing required end tag. In this situation a single tag element is created.
If the start tag has any other than StartTagType.NORMAL :
- If the start tag's type does not define a , the parser does not conduct any search for an end tag and a single tag element is created.
- If the start tag's type does define a , the parser assumes that a matching end tag is required and searches for it.
  - If the matching end tag is found, an explicitly terminated element is created.
  - If no matching end tag is found, the missing required end tag is and a single tag element is created.

See Also: HTMLElements

Field Summary
final static Element	NOT_CACHED
Element	parentElement

Constructor Summary
	Element(Source source, StartTag startTag, EndTag endTag)

Method Summary
public String	getAttributeValue(String attributeName) Returns the value of the attribute with the specified name (case insensitive). Returns `null` if the does not , no attribute with the specified name exists or the attribute . This is equivalent to Element.getStartTag() `.` StartTag.getAttributeValue(String) getAttributeValue(attributeName) . Parameters: attributeName - the name of the attribute to get.
public Attributes	getAttributes() Returns the attributes specified in this element's start tag.
final public List	getChildElements() Returns a list of the immediate children of this element in the document element hierarchy.
final List	getChildElements(int depth)
public Segment	getContent() Returns the segment representing the content of the element.
int	getContentEnd()
public String	getDebugInfo()
public int	getDepth() Returns the nesting depth of this element in the document element hierarchy.
public EndTag	getEndTag() Returns the end tag of the element.
public FormControl	getFormControl() Returns the FormControl defined by this element.
public String	getName() Returns the of the of this element, always in lower case.
public Element	getParentElement() Returns the parent of this element in the document element hierarchy.
public StartTag	getStartTag() Returns the start tag of the element.
public boolean	isEmpty() Indicates whether this element has zero-length .
public boolean	isEmptyElementTag() Indicates whether this element is an empty-element tag.

Field Detail

NOT_CACHED
final static Element NOT_CACHED(Code)

parentElement
Element parentElement(Code)

Constructor Detail

Element
Element(Source source, StartTag startTag, EndTag endTag)(Code)

Method Detail

getAttributeValue

public String getAttributeValue(String attributeName)(Code)

Returns the value of the attribute with the specified name (case insensitive).

Returns null if the does not , no attribute with the specified name exists or the attribute .

This is equivalent to Element.getStartTag() . StartTag.getAttributeValue(String) getAttributeValue(attributeName) .
Parameters:
attributeName - the name of the attribute to get. the value of the attribute with the specified name, or null if the attribute does not exist or .

getAttributes
public Attributes getAttributes()(Code)
	Returns the attributes specified in this element's start tag. This is equivalent to Element.getStartTag() `.` StartTag.getAttributes getAttributes() . the attributes specified in this element's start tag. See Also: StartTag.getAttributes

getChildElements

final public List getChildElements()(Code)

Returns a list of the immediate children of this element in the document element hierarchy.

The objects in the list are all of type Element .

See the Source.getChildElements method for more details. a list of the immediate children of this element in the document element hierarchy, guaranteed not null.
See Also: Element.getParentElement()

getChildElements
final List getChildElements(int depth)(Code)

getContent

public Segment getContent()(Code)

Returns the segment representing the content of the element.

This segment spans between the end of the start tag and the start of the end tag. If the end tag is not present, the content reaches to the end of the element.

Note that before version 2.0 this method returned null if the element was , whereas now a zero-length segment is returned. the segment representing the content of the element, guaranteed not null.

getContentEnd
int getContentEnd()(Code)

getDebugInfo
public String getDebugInfo()(Code)

getDepth

public int getDepth()(Code)

Returns the nesting depth of this element in the document element hierarchy.

The Source.fullSequentialParse method should be called after construction of the Source object if this method is to be used.

A top-level element has a nesting depth of 0.

An element formed from a always have a nesting depth of 0, regardless of whether it is nested inside a normal element.

See the Source.getChildElements method for more details. the nesting depth of this element in the document element hierarchy.
See Also: Element.getParentElement()

getEndTag
public EndTag getEndTag()(Code)
	Returns the end tag of the element. If the element has no end tag this method returns `null`. the end tag of the element, or `null` if the element has no end tag.

getFormControl
public FormControl getFormControl()(Code)
	Returns the FormControl defined by this element. the FormControl defined by this element, or `null` if it is not a control.

getName

public String getName()(Code)

Returns the of the of this element, always in lower case.

This is equivalent to Element.getStartTag() . StartTag.getName getName() .

See the Tag.getName method for more information. the name of the of this element, always in lower case.

getParentElement

public Element getParentElement()(Code)

Returns the parent of this element in the document element hierarchy.

The Source.fullSequentialParse method should be called after construction of the Source object if this method is to be used.

This method returns null for a top-level element, as well as any element formed from a , regardless of whether it is nested inside a normal element.

See the Source.getChildElements method for more details. the parent of this element in the document element hierarchy, or null if this element is a top-level element.
See Also: Element.getChildElements()

getStartTag
public StartTag getStartTag()(Code)
	Returns the start tag of the element. the start tag of the element.

isEmpty

public boolean isEmpty()(Code)

Indicates whether this element has zero-length .

This is equivalent to Element.getContent() . Segment.length length() ==0.

Note that this is a broader definition than that of both the HTML definition of an empty element, which is only those elements whose end tag is , and the XML definition of an empty element, which is "either a start-tag immediately followed by an end-tag, or an ". The other possibility covered by this property is the case of an HTML element with an end tag that is immediately followed by another tag that implicitly terminates the element. true if this element has zero-length , otherwise false.
See Also: Element.isEmptyElementTag()

isEmptyElementTag

public boolean isEmptyElementTag()(Code)

Indicates whether this element is an empty-element tag.

It is signified by an element with the characters "/>" at the end of the .

This is equivalent to Element.isEmpty() && Element.getStartTag() . StartTag.isEmptyElementTag isEmptyElementTag() .

The StartTag.isEmptyElementTag property only checks whether the start tag syntactically an empty-element tag, whereas this property also makes sure the element is in fact .

A syntactical empty-element tag that is not actually empty can occur if the end tag of an HTML element is either or , but the start tag is erroneously terminated with the characters "/>" in the source document. All major browsers ignore the syntactical hint of an empty element in this case, even in an XHTML document, so this parser does the same. true if this element is an empty-element tag, otherwise false.

Fields inherited from au.id.jericho.lib.html.Segment

final int begin(Code)(Java Doc)
List childElements(Code)(Java Doc)
final int end(Code)(Java Doc)
final Source source(Code)(Java Doc)

Methods inherited from au.id.jericho.lib.html.Segment

final static StringBuffer appendCollapseWhiteSpace(StringBuffer sb, CharSequence text)(Code)(Java Doc)
final public char charAt(int index)(Code)(Java Doc)
public int compareTo(Object o)(Code)(Java Doc)
final public boolean encloses(Segment segment)(Code)(Java Doc)
final public boolean encloses(int pos)(Code)(Java Doc)
final public boolean equals(Object object)(Code)(Java Doc)
public String extractText()(Code)(Java Doc)
public String extractText(boolean includeAttributes)(Code)(Java Doc)
public List findAllCharacterReferences()(Code)(Java Doc)
public List findAllElements()(Code)(Java Doc)
public List findAllElements(String name)(Code)(Java Doc)
public List findAllElements(StartTagType startTagType)(Code)(Java Doc)
public List findAllElements(String attributeName, String value, boolean valueCaseSensitive)(Code)(Java Doc)
public List findAllStartTags()(Code)(Java Doc)
public List findAllStartTags(String name)(Code)(Java Doc)
public List findAllStartTags(String attributeName, String value, boolean valueCaseSensitive)(Code)(Java Doc)
public List findAllTags()(Code)(Java Doc)
public List findAllTags(TagType tagType)(Code)(Java Doc)
public List findFormControls()(Code)(Java Doc)
public FormFields findFormFields()(Code)(Java Doc)
final public int getBegin()(Code)(Java Doc)
public List getChildElements()(Code)(Java Doc)
public String getDebugInfo()(Code)(Java Doc)
final public int getEnd()(Code)(Java Doc)
public Renderer getRenderer()(Code)(Java Doc)
public TextExtractor getTextExtractor()(Code)(Java Doc)
public int hashCode()(Code)(Java Doc)
public void ignoreWhenParsing()(Code)(Java Doc)
final public boolean isWhiteSpace()(Code)(Java Doc)
final public static boolean isWhiteSpace(char ch)(Code)(Java Doc)
final public int length()(Code)(Java Doc)
public Attributes parseAttributes()(Code)(Java Doc)
final public CharSequence subSequence(int beginIndex, int endIndex)(Code)(Java Doc)
public String toString()(Code)(Java Doc)

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.