| |
|
| java.lang.Object au.id.jericho.lib.html.Renderer
Renderer | final public class Renderer implements CharStreamSource(Code) | | Performs a simple rendering of HTML markup into text.
This provides a human readable version of the segment content that is modelled on the way
Mozilla Thunderbird and other email clients provide an automatic conversion of
HTML content to text in their alternative MIME encoding of emails.
The output using default settings complies with the "text/plain; format=flowed" (DelSp=No) protocol described in
RFC3676.
Many properties are available to customise the output, possibly the most significant of which being
Renderer.setMaxLineLength(int) MaxLineLength .
See the individual property descriptions for details.
Use one of the following methods to obtain the output:
The rendering of some constructs, especially tables, is very rudimentary.
No attempt is made to render nested tables properly, except to ensure that all of the text content is included in the output.
Rendering an entire
Source object performs a
automatically.
Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.
To extract pure text without any rendering of the markup, use the
TextExtractor class instead.
|
Constructor Summary | |
public | Renderer(Segment segment) Constructs a new Renderer based on the specified
Segment . |
Method Summary | |
public int | getBlockIndentSize() Returns the size of the indent to be used for anything other than
HTMLElementName.LI LI elements. | public boolean | getConvertNonBreakingSpaces() Indicates whether non-breaking space (
CharacterEntityReference._nbsp ) character entity references are converted to spaces. | public boolean | getDecorateFontStyles() Indicates whether decoration characters are to be included around the content of some
font style elements and
phrase elements. | public long | getEstimatedMaximumOutputLength() | public char[] | getListBullets() Returns the bullet characters to use for list items inside
HTMLElementName.UL UL elements. | public int | getListIndentSize() Returns the size of the indent to be used for
HTMLElementName.LI LI elements. | public int | getMaxLineLength() Returns the column at which lines are to be wrapped. | public String | getNewLine() Returns the string to be used to represent a newline in the output. | public String | getTableCellSeparator() Returns the string that is to separate table cells. | public Renderer | setBlockIndentSize(int blockIndentSize) Sets the size of the indent to be used for anything other than
HTMLElementName.LI LI elements.
At present this applies to
HTMLElementName.BLOCKQUOTE BLOCKQUOTE and
HTMLElementName.DD DD elements.
The default value is 4 .
Parameters: blockIndentSize - the size of the indent. | public Renderer | setConvertNonBreakingSpaces(boolean convertNonBreakingSpaces) Sets whether non-breaking space (
CharacterEntityReference._nbsp ) character entity references are converted to spaces.
The default value is true .
Parameters: convertNonBreakingSpaces - specifies whether non-breaking space (CharacterEntityReference._nbsp ) character entity references are converted to spaces. | public Renderer | setDecorateFontStyles(boolean decorateFontStyles) Sets whether decoration characters are to be included around the content of some
font style elements and
phrase elements.
The default value is false .
Below is a table summarising the decorated elements.
Parameters: decorateFontStyles - specifies whether decoration characters are to be included around the content of some font style elements. | public Renderer | setListBullets(char[] listBullets) Sets the bullet characters to use for list items inside
HTMLElementName.UL UL elements.
The values in the default array are * , o , + and # .
If the nesting of rendered lists goes deeper than the length of this array, the bullet characters start repeating from the first in the array.
WARNING: If any of the characters in the default array are modified, this will affect all other instances of this class using the default array.
Parameters: listBullets - an array of characters to be used as bullets, must have at least one entry. | public Renderer | setListIndentSize(int listIndentSize) Sets the size of the indent to be used for
HTMLElementName.LI LI elements.
The default value is 6 .
This applies to
HTMLElementName.LI LI elements inside both
HTMLElementName.UL UL and
HTMLElementName.OL OL elements.
The bullet or number of the list item is included as part of the indent.
Parameters: listIndentSize - the size of the indent. | public Renderer | setMaxLineLength(int maxLineLength) Sets the column at which lines are to be wrapped.
Lines that would otherwise exceed this length are wrapped onto a new line at a word boundary.
A Line may still exceed this length if it consists of a single word, where the length of the word plus the line indent exceeds the maximum length.
In this case the line is wrapped immediately after the end of the word.
The default value is 76 , which reflects the maximum line length for sending
email data specified in RFC2049 section 3.5.
Parameters: maxLineLength - the column at which lines are to be wrapped. | public Renderer | setNewLine(String newLine) Sets the string to be used to represent a newline in the output.
The default value is "\r\n" (CR+LF) regardless of the platform on which the library is running.
This is so that the default configuration produces valid
MIME plain/text output, which mandates the use of CR+LF for line breaks.
Specifying a null argument causes the output to use same new line string as is used in the source document, which is
determined via the
Source.getNewLine method.
If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document,
or using the value from
Config.NewLine .
Parameters: newLine - the string to be used to represent a newline in the output, may be null . | public Renderer | setTableCellSeparator(String tableCellSeparator) Sets the string that is to separate table cells.
The default value is " \t" (a space followed by a tab).
Parameters: tableCellSeparator - the string that is to separate table cells. | public String | toString() | public void | writeTo(Writer writer) |
Renderer | public Renderer(Segment segment)(Code) | | Constructs a new Renderer based on the specified
Segment .
Parameters: segment - the segment containing the HTML to be rendered. See Also: Segment.getRenderer |
getDecorateFontStyles | public boolean getDecorateFontStyles()(Code) | | Indicates whether decoration characters are to be included around the content of some
font style elements and
phrase elements.
See the
Renderer.setDecorateFontStyles(boolean) method for a full description of this property.
true if decoration characters are to be included around the content of some font style elements, otherwise false . |
getEstimatedMaximumOutputLength | public long getEstimatedMaximumOutputLength()(Code) | | |
getMaxLineLength | public int getMaxLineLength()(Code) | | Returns the column at which lines are to be wrapped.
See the
Renderer.setMaxLineLength(int) method for a full description of this property.
the column at which lines are to be wrapped. |
getTableCellSeparator | public String getTableCellSeparator()(Code) | | Returns the string that is to separate table cells.
See the
Renderer.setTableCellSeparator(String) method for a full description of this property.
the string that is to separate table cells. |
setConvertNonBreakingSpaces | public Renderer setConvertNonBreakingSpaces(boolean convertNonBreakingSpaces)(Code) | | Sets whether non-breaking space (
CharacterEntityReference._nbsp ) character entity references are converted to spaces.
The default value is true .
Parameters: convertNonBreakingSpaces - specifies whether non-breaking space (CharacterEntityReference._nbsp ) character entity references are converted to spaces. this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getConvertNonBreakingSpaces() |
setDecorateFontStyles | public Renderer setDecorateFontStyles(boolean decorateFontStyles)(Code) | | Sets whether decoration characters are to be included around the content of some
font style elements and
phrase elements.
The default value is false .
Below is a table summarising the decorated elements.
Parameters: decorateFontStyles - specifies whether decoration characters are to be included around the content of some font style elements. this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getDecorateFontStyles() |
setListBullets | public Renderer setListBullets(char[] listBullets)(Code) | | Sets the bullet characters to use for list items inside
HTMLElementName.UL UL elements.
The values in the default array are * , o , + and # .
If the nesting of rendered lists goes deeper than the length of this array, the bullet characters start repeating from the first in the array.
WARNING: If any of the characters in the default array are modified, this will affect all other instances of this class using the default array.
Parameters: listBullets - an array of characters to be used as bullets, must have at least one entry. this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getListBullets() |
setMaxLineLength | public Renderer setMaxLineLength(int maxLineLength)(Code) | | Sets the column at which lines are to be wrapped.
Lines that would otherwise exceed this length are wrapped onto a new line at a word boundary.
A Line may still exceed this length if it consists of a single word, where the length of the word plus the line indent exceeds the maximum length.
In this case the line is wrapped immediately after the end of the word.
The default value is 76 , which reflects the maximum line length for sending
email data specified in RFC2049 section 3.5.
Parameters: maxLineLength - the column at which lines are to be wrapped. this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getMaxLineLength() |
setNewLine | public Renderer setNewLine(String newLine)(Code) | | Sets the string to be used to represent a newline in the output.
The default value is "\r\n" (CR+LF) regardless of the platform on which the library is running.
This is so that the default configuration produces valid
MIME plain/text output, which mandates the use of CR+LF for line breaks.
Specifying a null argument causes the output to use same new line string as is used in the source document, which is
determined via the
Source.getNewLine method.
If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document,
or using the value from
Config.NewLine .
Parameters: newLine - the string to be used to represent a newline in the output, may be null . this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getNewLine() |
setTableCellSeparator | public Renderer setTableCellSeparator(String tableCellSeparator)(Code) | | Sets the string that is to separate table cells.
The default value is " \t" (a space followed by a tab).
Parameters: tableCellSeparator - the string that is to separate table cells. this Renderer instance, allowing multiple property setting methods to be chained in a single statement. See Also: Renderer.getTableCellSeparator() |
|
|
|