Java Doc for Clean.java in » HTML-Parser » JTidy » org » w3c » tidy » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » HTML Parser » JTidy » org.w3c.tidy

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

org.w3c.tidy .Clean

Clean

public class Clean (Code)

Clean up misuse of presentation markup. Filters from other formats such as Microsoft Word often make excessive use of presentation markup such as font tags, B, I, and the align attribute. By applying a set of production rules, it is straight forward to transform this to use CSS. Some rules replace some of the children of an element by style properties on the element, e.g.

...

...

Such rules are applied to the element's content and then to the element itself until none of the rules more apply. Having applied all the rules to an element, it will have a style attribute with one or more properties. Other rules strip the element they apply to, replacing it by style properties on the contents, e.g.

...

... These rules are applied to an element before processing its content and replace the current element by the first element in the exposed content. After applying both sets of rules, you can replace the style attribute by a class value and style rule in the document head. To support this, an association of styles and class names is built. A naive approach is to rely on string matching to test when two property lists are the same. A better approach would be to first sort the properties before matching.
author:
   Dave Raggett dsr@w3.org
author:
   Andy Quick ac.quick@sympatico.ca (translation to Java)
author:
   Fabrizio Giustina
version:
   $Revision: 1.25 $ ($Author: fgiust $)

Constructor Summary
public	Clean(TagTable tagTable) Instantiates a new Clean.

Method Summary
public void	bQ2Div(Node node) Replace implicit blockquote by div with an indent taking care to reduce nested blockquotes to a single div with the indent set to match the nesting depth.
static void	bumpObject(Lexer lexer, Node html) Where appropriate move object elements from head to body.
public void	cleanTree(Lexer lexer, Node doc) Clean an html tree.
public void	cleanWord2000(Lexer lexer, Node node) This is a major clean up to strip out all the extra stuff you get when you save as web page from Word 2000.
public void	dropSections(Lexer lexer, Node node) Drop if/endif sections inserted by word2000.
public void	emFromI(Node node) Replace i by em and b by strong.
Node	findEnclosingCell(Node node) Find the enclosing table cell for the given node.
public boolean	isWord2000(Node root) Check if the current document is a converted Word document.
public void	list2BQ(Node node) Some people use dir or ul without an li to indent the content.
public void	nestedEmphasis(Node node) simplifies ...
boolean	noMargins(Node node) Used to hunt for hidden preformatted sections.
public Node	pruneSection(Lexer lexer, Node node) node is `<![if ...]>` prune up to `<![endif]>`.
public void	purgeWord2000Attributes(Node node) Remove word2000 attributes from node.
boolean	singleSpace(Lexer lexer, Node node)
public Node	stripSpan(Lexer lexer, Node span) Word2000 uses span excessively, so we strip span out.

Constructor Detail

Clean
public Clean(TagTable tagTable)(Code)
	Instantiates a new Clean. Parameters: tagTable - tag table instance

Method Detail

bQ2Div
public void bQ2Div(Node node)(Code)
	Replace implicit blockquote by div with an indent taking care to reduce nested blockquotes to a single div with the indent set to match the nesting depth. Parameters: node - root Node

bumpObject
static void bumpObject(Lexer lexer, Node html)(Code)
	Where appropriate move object elements from head to body. Parameters: lexer - Lexer Parameters: html - html node

cleanTree
public void cleanTree(Lexer lexer, Node doc)(Code)
	Clean an html tree. Parameters: lexer - Lexer Parameters: doc - root node

cleanWord2000
public void cleanWord2000(Lexer lexer, Node node)(Code)
	This is a major clean up to strip out all the extra stuff you get when you save as web page from Word 2000. It doesn't yet know what to do with VML tags, but these will appear as errors unless you declare them as new tags, such as o:p which needs to be declared as inline. Parameters: lexer - Lexer Parameters: node - node to clean up

dropSections
public void dropSections(Lexer lexer, Node node)(Code)
	Drop if/endif sections inserted by word2000. Parameters: lexer - Lexer Parameters: node - Node root node

emFromI
public void emFromI(Node node)(Code)
	Replace i by em and b by strong. Parameters: node - root Node

findEnclosingCell
Node findEnclosingCell(Node node)(Code)
	Find the enclosing table cell for the given node. Parameters: node - Node enclosing cell node

isWord2000
public boolean isWord2000(Node root)(Code)
	Check if the current document is a converted Word document. Parameters: root - root Node `true` if the document has been geenrated by Microsoft Word.

list2BQ
public void list2BQ(Node node)(Code)
	Some people use dir or ul without an li to indent the content. The pattern to look for is a list with a single implicit li. This is recursively replaced by an implicit blockquote. Parameters: node - root Node

nestedEmphasis
public void nestedEmphasis(Node node)(Code)
	simplifies ... ... etc. Parameters: node - root Node

noMargins
boolean noMargins(Node node)(Code)
	Used to hunt for hidden preformatted sections. Parameters: node - checked node `true` if the node has a "margin-top: 0" or "margin-bottom: 0" style

pruneSection
public Node pruneSection(Lexer lexer, Node node)(Code)
	node is `<![if ...]>` prune up to `<![endif]>`. Parameters: lexer - Lexer Parameters: node - Node cleaned up Node

purgeWord2000Attributes
public void purgeWord2000Attributes(Node node)(Code)
	Remove word2000 attributes from node. Parameters: node - node to cleanup

singleSpace
boolean singleSpace(Lexer lexer, Node node)(Code)
	Does element have a single space as its content? Parameters: lexer - Lexer Parameters: node - checked node `true` if the element has a single space as its content

stripSpan
public Node stripSpan(Lexer lexer, Node span)(Code)
	Word2000 uses span excessively, so we strip span out. Parameters: lexer - Lexer Parameters: span - Node span cleaned node

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.