Java Doc for OutputDocument.java in » HTML-Parser » jericho-html » au » id » jericho » lib » html » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » HTML Parser » jericho html » au.id.jericho.lib.html

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

au.id.jericho.lib.html .OutputDocument

OutputDocument

final public class OutputDocument implements CharStreamSource(Code)

Represents a modified version of an original Source document.

An OutputDocument represents an original source document that has been modified by substituting segments of it with other text. Each of these substitutions must be registered in the output document, which is most commonly done using the various replace, remove or insert methods in this class. These methods internally one or more OutputSegment objects to define each substitution. After all of the substitutions have been registered, the modified text can be retrieved using the OutputDocument.writeTo(Writer) or OutputDocument.toString() methods.

The registered may be adjacent, and as of version 2.5 may also overlap. In most cases only output segments that have been or legitimately overlap each other. Registering overlapping output segments that generate output will generally yield unexpected results.

If unexpected results are being generated from an OutputDocument, the OutputDocument.getDebugInfo() method provides information on each , which should provide enough information to determine the cause of the problem. In most cases the problem will be caused by overlapping output segments.

The following example converts all externally referenced style sheets to internal style sheets:

 URL sourceUrl=new URL(sourceUrlString);
 String htmlText=Util.getString(new InputStreamReader(sourceUrl.openStream()));
 Source source=new Source(htmlText);
 OutputDocument outputDocument=new OutputDocument(source);
 StringBuffer sb=new StringBuffer();
 List linkStartTags=source.findAllStartTags(Tag.LINK);
 for (Iterator i=linkStartTags.iterator(); i.hasNext();) {
 StartTag startTag=(StartTag)i.next();
 Attributes attributes=startTag.getAttributes();
 String rel=attributes.getValue("rel");
 if (!"stylesheet".equalsIgnoreCase(rel)) continue;
 String href=attributes.getValue("href");
 if (href==null) continue;
 String styleSheetContent;
 try {
 styleSheetContent=Util.getString(new InputStreamReader(new URL(sourceUrl,href).openStream()));
 } catch (Exception ex) {
 continue; // don't convert if URL is invalid
 }
 sb.setLength(0);
 sb.append("<style");
 Attribute typeAttribute=attributes.get("type");
 if (typeAttribute!=null) sb.append(' ').append(typeAttribute);
 sb.append(">\n").append(styleSheetContent).append("\n</style>");
 outputDocument.replace(startTag,sb);
 }
 String convertedHtmlText=outputDocument.toString();

See Also: OutputSegment

Constructor Summary
public	OutputDocument(Source source) Constructs a new output document based on the specified source document.
	OutputDocument(ParseText parseText)

Method Summary
public String	getDebugInfo() Returns a string representation of this object useful for debugging purposes.
public long	getEstimatedMaximumOutputLength()
public List	getRegisteredOutputSegments() Returns a list all of the OutputSegment objects in this output document.
public CharSequence	getSourceText() Returns the original source text upon which this output document is based.
public void	insert(int pos, CharSequence text) Inserts the specified text at the specified character position in this output document.
public void	register(OutputSegment outputSegment) Registers the specified in this output document.
public void	remove(Segment segment) Removes the specified from this output document.
public void	remove(Collection segments) Removes all the segments from this output document represented by the specified source objects.
public void	replace(Segment segment, CharSequence text) Replaces the specified in this output document with the specified text.
public void	replace(int begin, int end, CharSequence text) Replaces the specified segment of this output document with the specified text.
public void	replace(int begin, int end, char ch) Replaces the specified segment of this output document with the specified character.
public void	replace(FormControl formControl) Replaces the specified FormControl in this output document. The effect of this method is to zero or more in the output document as required to reflect previous modifications to the control's state. The state of a control includes its submission value, , and whether it has been . The state of the form control should not be modified after this method is called, as there is no guarantee that subsequent changes either will or will not be reflected in the final output. A second call to this method with the same parameter is not allowed. It is therefore recommended to call this method as the last action before the output is generated. Although the specifics of the number and nature of the output segments added in any particular circumstance is not defined in the specification, it can generally be assumed that only the minimum changes necessary are made to the original document.
public void	replace(FormFields formFields) all the constituent from the specified FormFields in this output document.
public Map	replace(Attributes attributes, boolean convertNamesToLowerCase) Replaces the specified Attributes segment in this output document with the name/value entries in the returned `Map`. The returned map initially contains entries representing the attributes from the source document, which can be modified before output. The documentation of the OutputDocument.replace(Attributes,Map) method contains more information about the requirements of the map entries. Specifying a value of `true` as an argument to the `convertNamesToLowerCase` parameter causes all original attribute names to be converted to lower case in the map. This simplifies the process of finding/updating specific attributes since map keys are case sensitive. Attribute values are automatically before being loaded into the map. This method is logically equivalent to: OutputDocument.replace(Attributes,Map) replace `(attributes, attributes.` Attributes.populateMap(Mapboolean) populateMap(new LinkedHashMap(),convertNamesToLowerCase) `)` The use of `LinkedHashMap` to implement the map ensures (probably unnecessarily) that existing attributes are output in the same order as they appear in the source document, and new attributes are output in the same order as they are added. Example: Source source=new Source(htmlDocument); Attributes bodyAttributes =source.findNextStartTag(0,Tag.BODY).getAttributes(); OutputDocument outputDocument=new OutputDocument(source); Map attributesMap=outputDocument.replace(bodyAttributes,true); attributesMap.put("bgcolor","green"); String htmlDocumentWithGreenBackground=outputDocument.toString(); Parameters: attributes - the `Attributes` segment defining the span of the segment and initial name/value entries of the returned map. Parameters: convertNamesToLowerCase - specifies whether all attribute names are converted to lower case in the map.
public void	replace(Attributes attributes, Map map) Replaces the specified attributes segment in this source document with the name/value entries in the specified `Map`.
public void	replaceWithSpaces(int begin, int end) Replaces the specified segment of this output document with a string of spaces of the same length.
public String	toString() Returns the final content of this output document as a `String`.
public void	writeTo(Writer writer) Writes the final content of this output document to the specified `Writer`.

Constructor Detail

OutputDocument
public OutputDocument(Source source)(Code)
	Constructs a new output document based on the specified source document. Parameters: source - the source document.

OutputDocument
OutputDocument(ParseText parseText)(Code)

Method Detail

getDebugInfo
public String getDebugInfo()(Code)
	Returns a string representation of this object useful for debugging purposes. The output includes details of all the OutputDocument.getRegisteredOutputSegments() registered output segments . a string representation of this object useful for debugging purposes.

getEstimatedMaximumOutputLength
public long getEstimatedMaximumOutputLength()(Code)

getRegisteredOutputSegments

public List getRegisteredOutputSegments()(Code)

Returns a list all of the OutputSegment objects in this output document.

The output segments are sorted in order of their in the document.

The returned list is modifiable and any changes will affect the output generated by this OutputDocument. a list all of the OutputSegment objects in this output document.

getSourceText
public CharSequence getSourceText()(Code)
	Returns the original source text upon which this output document is based. the original source text upon which this output document is based.

insert
public void insert(int pos, CharSequence text)(Code)
	Inserts the specified text at the specified character position in this output document. Parameters: pos - the character position at which to insert the text. Parameters: text - the replacement text.

register
public void register(OutputSegment outputSegment)(Code)
	Registers the specified in this output document. Use this method if you want to use a customised OutputSegment class. Parameters: outputSegment - the output segment to register.

remove
public void remove(Segment segment)(Code)
	Removes the specified from this output document. This is equivalent to OutputDocument.replace(Segment,CharSequence) replace `(segment,null)`. Parameters: segment - the segment to remove.

remove

public void remove(Collection segments)(Code)

Removes all the segments from this output document represented by the specified source objects.

This is equivalent to the following code:

 for (Iterator i=segments.iterator(); i.hasNext();)
OutputDocument.remove(Segment) remove ((Segment)i.next());

Parameters:
segments - a collection of segments to remove, represented by source Segment objects.

replace
public void replace(Segment segment, CharSequence text)(Code)
	Replaces the specified in this output document with the specified text. Specifying a `null` argument to the `text` parameter is exactly equivalent to specifying an empty string, and results in the segment being completely removed from the output document. Parameters: segment - the segment to replace. Parameters: text - the replacement text, or `null` to remove the segment.

replace
public void replace(int begin, int end, CharSequence text)(Code)
	Replaces the specified segment of this output document with the specified text. Specifying a `null` argument to the `text` parameter is exactly equivalent to specifying an empty string, and results in the segment being completely removed from the output document. Parameters: begin - the character position at which to begin the replacement. Parameters: end - the character position at which to end the replacement. Parameters: text - the replacement text, or `null` to remove the segment.

replace
public void replace(int begin, int end, char ch)(Code)
	Replaces the specified segment of this output document with the specified character. Parameters: begin - the character position at which to begin the replacement. Parameters: end - the character position at which to end the replacement. Parameters: ch - the replacement character.

replace

public void replace(FormControl formControl)(Code)

Replaces the specified FormControl in this output document.

The effect of this method is to zero or more in the output document as required to reflect previous modifications to the control's state. The state of a control includes its submission value, , and whether it has been .

The state of the form control should not be modified after this method is called, as there is no guarantee that subsequent changes either will or will not be reflected in the final output. A second call to this method with the same parameter is not allowed. It is therefore recommended to call this method as the last action before the output is generated.

Although the specifics of the number and nature of the output segments added in any particular circumstance is not defined in the specification, it can generally be assumed that only the minimum changes necessary are made to the original document. If the state of the control has not been modified, calling this method has no effect at all.
Parameters:
formControl - the form control to replace.
See Also: OutputDocument.replace(FormFields)

replace

public void replace(FormFields formFields)(Code)

all the constituent from the specified FormFields in this output document.

This is equivalent to the following code:

for (Iterator i=formFields.
FormFields.getFormControls getFormControls() .iterator(); i.hasNext();)
OutputDocument.replace(FormControl) replace ((FormControl)i.next());

The state of any of the form controls in the specified form fields should not be modified after this method is called, as there is no guarantee that subsequent changes either will or will not be reflected in the final output. A second call to this method with the same parameter is not allowed. It is therefore recommended to call this method as the last action before the output is generated.
Parameters:
formFields - the form fields to replace.
See Also: OutputDocument.replace(FormControl)

replace

public Map replace(Attributes attributes, boolean convertNamesToLowerCase)(Code)

Replaces the specified Attributes segment in this output document with the name/value entries in the returned Map. The returned map initially contains entries representing the attributes from the source document, which can be modified before output.

The documentation of the OutputDocument.replace(Attributes,Map) method contains more information about the requirements of the map entries.

Specifying a value of true as an argument to the convertNamesToLowerCase parameter causes all original attribute names to be converted to lower case in the map. This simplifies the process of finding/updating specific attributes since map keys are case sensitive.

Attribute values are automatically before being loaded into the map.

This method is logically equivalent to:
OutputDocument.replace(Attributes,Map) replace (attributes, attributes. Attributes.populateMap(Mapboolean) populateMap(new LinkedHashMap(),convertNamesToLowerCase) )

The use of LinkedHashMap to implement the map ensures (probably unnecessarily) that existing attributes are output in the same order as they appear in the source document, and new attributes are output in the same order as they are added.

Example:

 Source source=new Source(htmlDocument);
 Attributes bodyAttributes
 =source.findNextStartTag(0,Tag.BODY).getAttributes();
 OutputDocument outputDocument=new OutputDocument(source);
 Map attributesMap=outputDocument.replace(bodyAttributes,true);
 attributesMap.put("bgcolor","green");
 String htmlDocumentWithGreenBackground=outputDocument.toString();

Parameters:
  attributes - the Attributes segment defining the span of the segment and initial name/value entries of the returned map.
Parameters:
  convertNamesToLowerCase - specifies whether all attribute names are converted to lower case in the map. a Map containing the name/value entries to be output.
See Also:   OutputDocument.replace(Attributes,Map)

replace

public void replace(Attributes attributes, Map map)(Code)

Replaces the specified attributes segment in this source document with the name/value entries in the specified Map.

This method might be used if the Map containing the new attribute values should not be preloaded with the same entries as the source attributes, or a map implementation other than LinkedHashMap is required. Otherwise, the #replace(Attributes, boolean convertNamesToLowerCase) method is generally more useful.

Keys in the map must be String objects, and values must implement the CharSequence interface.

An attribute with no value is represented by a map entry with a null value.

Attribute values are stored unencoded in the map, and are automatically if necessary during output.

The use of invalid characters in attribute names results in unspecified behaviour.

Note that methods in the Attributes class treat attribute names as case insensitive, whereas the Map treats them as case sensitive.
Parameters:
  attributes - the Attributes object defining the span of the segment to replace.
Parameters:
  map - the Map containing the name/value entries.
See Also:    #replace(Attributes, boolean convertNamesToLowerCase)

replaceWithSpaces

public void replaceWithSpaces(int begin, int end)(Code)

Replaces the specified segment of this output document with a string of spaces of the same length.

This method is most commonly used to remove segments of the document without affecting the character positions of the remaining elements.

It is used internally to implement the functionality available through the Segment.ignoreWhenParsing method.

To remove a segment from the output document completely, use the OutputDocument.remove(Segment) method instead.
Parameters:
begin - the character position at which to begin the replacement.
Parameters:
end - the character position at which to end the replacement.

toString
public String toString()(Code)
	Returns the final content of this output document as a `String`. the final content of this output document as a `String`. See Also: OutputDocument.writeTo(Writer)

writeTo

public void writeTo(Writer writer) throws IOException(Code)

Writes the final content of this output document to the specified Writer.

As of version 2.5, the presence of overlapping output segments no longer results in an OverlappingOutputSegmentsException . It is now up to the developer to detect unintentional overlapping segments.

If the output is required in the form of a Reader, use CharStreamSourceUtil.getReader(CharStreamSource) CharStreamSourceUtil.getReader(this) instead.
Parameters:
  writer - the destination java.io.Writer for the output.
throws:
  IOException - if an I/O exception occurs.
See Also:   OutputDocument.toString()

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.