| java.lang.Object au.id.jericho.lib.html.SourceFormatter
SourceFormatter | final public class SourceFormatter implements CharStreamSource(Code) | | Formats HTML source by laying out each non-inline-level element on a new line with an appropriate indent.
Any indentation present in the original source text is removed.
Use one of the following methods to obtain the output:
The output text is functionally equivalent to the original source and should be rendered identically unless specified below.
The following points describe the process in general terms.
Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.
- Every element that is not an
appears on a new line
with an indent corresponding to its
in the document element hierarchy.
- The indent is formed by writing n repetitions of the string specified in the
SourceFormatter.setIndentString(String) IndentString property,
where n is the depth of the indentation.
- The
of an indented element starts on a new line and is indented at a depth one greater than that of the element,
with the end tag appearing on a new line at the same depth as the start tag.
If the content contains only text and
,
it may continue on the same line as the start tag. Additionally, if the output content contains no new lines, the end tag may also continue on the same line.
- The content of preformatted elements such as
HTMLElementName.PRE PRE and
HTMLElementName.TEXTAREA TEXTAREA are not indented,
nor is the white space modified in any way.
- Only
and
elements are indented.
All others are treated as
.
- White space and indentation inside HTML
,
, or any
is preserved,
but with the indentation of new lines starting at a depth one greater than that of the surrounding text.
- White space and indentation inside
HTMLElementName.SCRIPT SCRIPT elements is preserved,
but with the indentation of new lines starting at a depth one greater than that of the
SCRIPT element.
- If the
SourceFormatter.setTidyTags(boolean) TidyTags property is set to
true ,
every tag in the document is replaced with the output from its
Tag.tidy method.
If this property is set to false , the tag from the original text is used, including all white space,
but with any new lines indented at a depth one greater than that of the element.
- If the
SourceFormatter.setCollapseWhiteSpace(boolean) CollapseWhiteSpace property
is set to
true , every string of one or more
characters
located outside of a tag is replaced with a single space in the output.
White space located adjacent to a non-inline-level element tag (except
) may be removed.
- If the
SourceFormatter.setIndentAllElements(boolean) IndentAllElements property
is set to
true , every element appears indented on a new line, including
.
This generates output that is a good representation of the actual document element hierarchy,
but is very likely to introduce white space that compromises the functional equivalency of the document.
- The
SourceFormatter.setNewLine(String) NewLine property specifies the character sequence
to use for each newline in the output document.
- If the source document contains
, the functional equivalency of the output document may be compromised.
Formatting an entire
Source object performs a
automatically.
|
Method Summary | |
public boolean | getCollapseWhiteSpace() Indicates whether
in the text between the tags is to be collapsed. | public long | getEstimatedMaximumOutputLength() | public boolean | getIndentAllElements() Indicates whether all elements are to be indented, including
and those with preformatted contents. | public String | getIndentString() Returns the string to be used for indentation. | public String | getNewLine() Returns the string to be used to represent a newline in the output. | public boolean | getTidyTags() Indicates whether the original text of each tag is to be replaced with the output from its
Tag.tidy method. | public SourceFormatter | setCollapseWhiteSpace(boolean collapseWhiteSpace) Sets whether
in the text between the tags is to be collapsed.
The default value is false .
If this property is set to true , every string of one or more
characters
located outside of a tag is replaced with a single space in the output.
White space located adjacent to a non-inline-level element tag (except
) may be removed.
Parameters: collapseWhiteSpace - specifies whether in the text between the tags is to be collapsed. | public SourceFormatter | setIndentAllElements(boolean indentAllElements) Sets whether all elements are to be indented, including
and those with preformatted contents.
The default value is false .
If this property is set to true , every element appears indented on a new line, including
.
This generates output that is a good representation of the actual document element hierarchy,
but is very likely to introduce white space that compromises the functional equivalency of the document.
Parameters: indentAllElements - specifies whether all elements are to be indented. | public SourceFormatter | setIndentString(String indentString) Sets the string to be used for indentation.
The default value is a string containing a single tab character (U+0009).
The most commonly used indent strings are "\t" (single tab), " " (single space), " " (2 spaces), and " " (4 spaces).
Parameters: indentString - the string to be used for indentation, must not be null . | public SourceFormatter | setNewLine(String newLine) Sets the string to be used to represent a newline in the output.
The default is to use the same new line string as is used in the source document, which is determined via the
Source.getNewLine method.
If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document,
or using the value from
Config.NewLine .
Specifying a null argument resets the property to its default value, which is to use the same new line string as is used in the source document.
Parameters: newLine - the string to be used to represent a newline in the output, may be null . | public SourceFormatter | setTidyTags(boolean tidyTags) Sets whether the original text of each tag is to be replaced with the output from its
Tag.tidy method.
The default value is false .
If this property is set to false , the tag from the original text is used, including all white space,
but with any new lines indented at a depth one greater than that of the element.
Parameters: tidyTags - specifies whether the original text of each tag is to be replaced with the output from its Tag.tidy method. | public String | toString() | public void | writeTo(Writer writer) |
SourceFormatter | public SourceFormatter(Segment segment)(Code) | | Constructs a new SourceFormatter based on the specified
Segment .
Parameters: segment - the segment containing the HTML to be formatted. See Also: Source.getSourceFormatter |
getEstimatedMaximumOutputLength | public long getEstimatedMaximumOutputLength()(Code) | | |
getIndentAllElements | public boolean getIndentAllElements()(Code) | | Indicates whether all elements are to be indented, including
and those with preformatted contents.
See the
SourceFormatter.setIndentAllElements(boolean) method for a full description of this property.
true if all elements are to be indented, otherwise false . |
getTidyTags | public boolean getTidyTags()(Code) | | Indicates whether the original text of each tag is to be replaced with the output from its
Tag.tidy method.
See the
SourceFormatter.setTidyTags(boolean) method for a full description of this property.
true if the original text of each tag is to be replaced with the output from its Tag.tidy method, otherwise false . |
setCollapseWhiteSpace | public SourceFormatter setCollapseWhiteSpace(boolean collapseWhiteSpace)(Code) | | Sets whether
in the text between the tags is to be collapsed.
The default value is false .
If this property is set to true , every string of one or more
characters
located outside of a tag is replaced with a single space in the output.
White space located adjacent to a non-inline-level element tag (except
) may be removed.
Parameters: collapseWhiteSpace - specifies whether in the text between the tags is to be collapsed. this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement. See Also: SourceFormatter.getCollapseWhiteSpace() |
setIndentAllElements | public SourceFormatter setIndentAllElements(boolean indentAllElements)(Code) | | Sets whether all elements are to be indented, including
and those with preformatted contents.
The default value is false .
If this property is set to true , every element appears indented on a new line, including
.
This generates output that is a good representation of the actual document element hierarchy,
but is very likely to introduce white space that compromises the functional equivalency of the document.
Parameters: indentAllElements - specifies whether all elements are to be indented. this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement. See Also: SourceFormatter.getIndentAllElements() |
setIndentString | public SourceFormatter setIndentString(String indentString)(Code) | | Sets the string to be used for indentation.
The default value is a string containing a single tab character (U+0009).
The most commonly used indent strings are "\t" (single tab), " " (single space), " " (2 spaces), and " " (4 spaces).
Parameters: indentString - the string to be used for indentation, must not be null . this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement. See Also: SourceFormatter.getIndentString() |
setNewLine | public SourceFormatter setNewLine(String newLine)(Code) | | Sets the string to be used to represent a newline in the output.
The default is to use the same new line string as is used in the source document, which is determined via the
Source.getNewLine method.
If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document,
or using the value from
Config.NewLine .
Specifying a null argument resets the property to its default value, which is to use the same new line string as is used in the source document.
Parameters: newLine - the string to be used to represent a newline in the output, may be null . this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement. See Also: SourceFormatter.getNewLine() |
setTidyTags | public SourceFormatter setTidyTags(boolean tidyTags)(Code) | | Sets whether the original text of each tag is to be replaced with the output from its
Tag.tidy method.
The default value is false .
If this property is set to false , the tag from the original text is used, including all white space,
but with any new lines indented at a depth one greater than that of the element.
Parameters: tidyTags - specifies whether the original text of each tag is to be replaced with the output from its Tag.tidy method. this SourceFormatter instance, allowing multiple property setting methods to be chained in a single statement. See Also: SourceFormatter.getTidyTags() |
|
|