Java Doc for Renderer.java in  » HTML-Parser » jericho-html » au » id » jericho » lib » html » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » HTML Parser » jericho html » au.id.jericho.lib.html 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   au.id.jericho.lib.html.Renderer

Renderer
final public class Renderer implements CharStreamSource(Code)
Performs a simple rendering of HTML markup into text.

This provides a human readable version of the segment content that is modelled on the way Mozilla Thunderbird and other email clients provide an automatic conversion of HTML content to text in their alternative MIME encoding of emails.

The output using default settings complies with the "text/plain; format=flowed" (DelSp=No) protocol described in RFC3676.

Many properties are available to customise the output, possibly the most significant of which being Renderer.setMaxLineLength(int) MaxLineLength . See the individual property descriptions for details.

Use one of the following methods to obtain the output:

The rendering of some constructs, especially tables, is very rudimentary. No attempt is made to render nested tables properly, except to ensure that all of the text content is included in the output.

Rendering an entire Source object performs a automatically.

Any aspect of the algorithm not specifically mentioned here is subject to change without notice in future versions.

To extract pure text without any rendering of the markup, use the TextExtractor class instead.




Constructor Summary
public  Renderer(Segment segment)
     Constructs a new Renderer based on the specified Segment .

Method Summary
public  intgetBlockIndentSize()
     Returns the size of the indent to be used for anything other than HTMLElementName.LI LI elements.
public  booleangetConvertNonBreakingSpaces()
     Indicates whether non-breaking space ( CharacterEntityReference._nbsp   ) character entity references are converted to spaces.
public  booleangetDecorateFontStyles()
     Indicates whether decoration characters are to be included around the content of some font style elements and phrase elements.
public  longgetEstimatedMaximumOutputLength()
    
public  char[]getListBullets()
     Returns the bullet characters to use for list items inside HTMLElementName.UL UL elements.
public  intgetListIndentSize()
     Returns the size of the indent to be used for HTMLElementName.LI LI elements.
public  intgetMaxLineLength()
     Returns the column at which lines are to be wrapped.
public  StringgetNewLine()
     Returns the string to be used to represent a newline in the output.
public  StringgetTableCellSeparator()
     Returns the string that is to separate table cells.
public  RenderersetBlockIndentSize(int blockIndentSize)
     Sets the size of the indent to be used for anything other than HTMLElementName.LI LI elements.

At present this applies to HTMLElementName.BLOCKQUOTE BLOCKQUOTE and HTMLElementName.DD DD elements.

The default value is 4.
Parameters:
  blockIndentSize - the size of the indent.

public  RenderersetConvertNonBreakingSpaces(boolean convertNonBreakingSpaces)
     Sets whether non-breaking space ( CharacterEntityReference._nbsp   ) character entity references are converted to spaces.

The default value is true.
Parameters:
  convertNonBreakingSpaces - specifies whether non-breaking space (CharacterEntityReference._nbsp  ) character entity references are converted to spaces.

public  RenderersetDecorateFontStyles(boolean decorateFontStyles)
     Sets whether decoration characters are to be included around the content of some font style elements and phrase elements.

The default value is false.

Below is a table summarising the decorated elements.

ElementsCharacterExample Output
HTMLElementName.B B and HTMLElementName.STRONG STRONG **bold text*
HTMLElementName.I I and HTMLElementName.EM EM //italic text/
HTMLElementName.U U __underlined text_
HTMLElementName.CODE CODE ||code|

Parameters:
  decorateFontStyles - specifies whether decoration characters are to be included around the content of some font style elements.

public  RenderersetListBullets(char[] listBullets)
     Sets the bullet characters to use for list items inside HTMLElementName.UL UL elements.

The values in the default array are *, o, + and #.

If the nesting of rendered lists goes deeper than the length of this array, the bullet characters start repeating from the first in the array.

WARNING: If any of the characters in the default array are modified, this will affect all other instances of this class using the default array.
Parameters:
  listBullets - an array of characters to be used as bullets, must have at least one entry.

public  RenderersetListIndentSize(int listIndentSize)
     Sets the size of the indent to be used for HTMLElementName.LI LI elements.

The default value is 6.

This applies to HTMLElementName.LI LI elements inside both HTMLElementName.UL UL and HTMLElementName.OL OL elements.

The bullet or number of the list item is included as part of the indent.
Parameters:
  listIndentSize - the size of the indent.

public  RenderersetMaxLineLength(int maxLineLength)
     Sets the column at which lines are to be wrapped.

Lines that would otherwise exceed this length are wrapped onto a new line at a word boundary.

A Line may still exceed this length if it consists of a single word, where the length of the word plus the line indent exceeds the maximum length. In this case the line is wrapped immediately after the end of the word.

The default value is 76, which reflects the maximum line length for sending email data specified in RFC2049 section 3.5.
Parameters:
  maxLineLength - the column at which lines are to be wrapped.

public  RenderersetNewLine(String newLine)
     Sets the string to be used to represent a newline in the output.

The default value is "\r\n" (CR+LF) regardless of the platform on which the library is running. This is so that the default configuration produces valid MIME plain/text output, which mandates the use of CR+LF for line breaks.

Specifying a null argument causes the output to use same new line string as is used in the source document, which is determined via the Source.getNewLine method. If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document, or using the value from Config.NewLine .
Parameters:
  newLine - the string to be used to represent a newline in the output, may be null.

public  RenderersetTableCellSeparator(String tableCellSeparator)
     Sets the string that is to separate table cells.

The default value is " \t" (a space followed by a tab).
Parameters:
  tableCellSeparator - the string that is to separate table cells.

public  StringtoString()
    
public  voidwriteTo(Writer writer)
    


Constructor Detail
Renderer
public Renderer(Segment segment)(Code)
Constructs a new Renderer based on the specified Segment .
Parameters:
  segment - the segment containing the HTML to be rendered.
See Also:   Segment.getRenderer




Method Detail
getBlockIndentSize
public int getBlockIndentSize()(Code)
Returns the size of the indent to be used for anything other than HTMLElementName.LI LI elements.

See the Renderer.setBlockIndentSize(int) method for a full description of this property. the size of the indent to be used for anything other than HTMLElementName.LI LI elements.




getConvertNonBreakingSpaces
public boolean getConvertNonBreakingSpaces()(Code)
Indicates whether non-breaking space ( CharacterEntityReference._nbsp   ) character entity references are converted to spaces.

See the Renderer.setConvertNonBreakingSpaces(boolean) method for a full description of this property. true if non-breaking space (CharacterEntityReference._nbsp  ) character entity references are converted to spaces, otherwise false.




getDecorateFontStyles
public boolean getDecorateFontStyles()(Code)
Indicates whether decoration characters are to be included around the content of some font style elements and phrase elements.

See the Renderer.setDecorateFontStyles(boolean) method for a full description of this property. true if decoration characters are to be included around the content of some font style elements, otherwise false.




getEstimatedMaximumOutputLength
public long getEstimatedMaximumOutputLength()(Code)



getListBullets
public char[] getListBullets()(Code)
Returns the bullet characters to use for list items inside HTMLElementName.UL UL elements.

See the Renderer.setListBullets(char[]) method for a full description of this property. the bullet characters to use for list items inside HTMLElementName.UL UL elements.




getListIndentSize
public int getListIndentSize()(Code)
Returns the size of the indent to be used for HTMLElementName.LI LI elements.

See the Renderer.setListIndentSize(int) method for a full description of this property. the size of the indent to be used for HTMLElementName.LI LI elements.




getMaxLineLength
public int getMaxLineLength()(Code)
Returns the column at which lines are to be wrapped.

See the Renderer.setMaxLineLength(int) method for a full description of this property. the column at which lines are to be wrapped.




getNewLine
public String getNewLine()(Code)
Returns the string to be used to represent a newline in the output.

See the Renderer.setNewLine(String) method for a full description of this property. the string to be used to represent a newline in the output.




getTableCellSeparator
public String getTableCellSeparator()(Code)
Returns the string that is to separate table cells.

See the Renderer.setTableCellSeparator(String) method for a full description of this property. the string that is to separate table cells.




setBlockIndentSize
public Renderer setBlockIndentSize(int blockIndentSize)(Code)
Sets the size of the indent to be used for anything other than HTMLElementName.LI LI elements.

At present this applies to HTMLElementName.BLOCKQUOTE BLOCKQUOTE and HTMLElementName.DD DD elements.

The default value is 4.
Parameters:
  blockIndentSize - the size of the indent. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getBlockIndentSize()




setConvertNonBreakingSpaces
public Renderer setConvertNonBreakingSpaces(boolean convertNonBreakingSpaces)(Code)
Sets whether non-breaking space ( CharacterEntityReference._nbsp   ) character entity references are converted to spaces.

The default value is true.
Parameters:
  convertNonBreakingSpaces - specifies whether non-breaking space (CharacterEntityReference._nbsp  ) character entity references are converted to spaces. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getConvertNonBreakingSpaces()




setDecorateFontStyles
public Renderer setDecorateFontStyles(boolean decorateFontStyles)(Code)
Sets whether decoration characters are to be included around the content of some font style elements and phrase elements.

The default value is false.

Below is a table summarising the decorated elements.

ElementsCharacterExample Output
HTMLElementName.B B and HTMLElementName.STRONG STRONG **bold text*
HTMLElementName.I I and HTMLElementName.EM EM //italic text/
HTMLElementName.U U __underlined text_
HTMLElementName.CODE CODE ||code|

Parameters:
  decorateFontStyles - specifies whether decoration characters are to be included around the content of some font style elements. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getDecorateFontStyles()




setListBullets
public Renderer setListBullets(char[] listBullets)(Code)
Sets the bullet characters to use for list items inside HTMLElementName.UL UL elements.

The values in the default array are *, o, + and #.

If the nesting of rendered lists goes deeper than the length of this array, the bullet characters start repeating from the first in the array.

WARNING: If any of the characters in the default array are modified, this will affect all other instances of this class using the default array.
Parameters:
  listBullets - an array of characters to be used as bullets, must have at least one entry. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getListBullets()




setListIndentSize
public Renderer setListIndentSize(int listIndentSize)(Code)
Sets the size of the indent to be used for HTMLElementName.LI LI elements.

The default value is 6.

This applies to HTMLElementName.LI LI elements inside both HTMLElementName.UL UL and HTMLElementName.OL OL elements.

The bullet or number of the list item is included as part of the indent.
Parameters:
  listIndentSize - the size of the indent. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getListIndentSize()




setMaxLineLength
public Renderer setMaxLineLength(int maxLineLength)(Code)
Sets the column at which lines are to be wrapped.

Lines that would otherwise exceed this length are wrapped onto a new line at a word boundary.

A Line may still exceed this length if it consists of a single word, where the length of the word plus the line indent exceeds the maximum length. In this case the line is wrapped immediately after the end of the word.

The default value is 76, which reflects the maximum line length for sending email data specified in RFC2049 section 3.5.
Parameters:
  maxLineLength - the column at which lines are to be wrapped. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getMaxLineLength()




setNewLine
public Renderer setNewLine(String newLine)(Code)
Sets the string to be used to represent a newline in the output.

The default value is "\r\n" (CR+LF) regardless of the platform on which the library is running. This is so that the default configuration produces valid MIME plain/text output, which mandates the use of CR+LF for line breaks.

Specifying a null argument causes the output to use same new line string as is used in the source document, which is determined via the Source.getNewLine method. If the source document does not contain any new lines, a "best guess" is made by either taking the new line string of a previously parsed document, or using the value from Config.NewLine .
Parameters:
  newLine - the string to be used to represent a newline in the output, may be null. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getNewLine()




setTableCellSeparator
public Renderer setTableCellSeparator(String tableCellSeparator)(Code)
Sets the string that is to separate table cells.

The default value is " \t" (a space followed by a tab).
Parameters:
  tableCellSeparator - the string that is to separate table cells. this Renderer instance, allowing multiple property setting methods to be chained in a single statement.
See Also:   Renderer.getTableCellSeparator()




toString
public String toString()(Code)



writeTo
public void writeTo(Writer writer) throws IOException(Code)



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.