HTML Parser « Development « Java Products

Java Products

1. Application

2. Business

3. Byte Source Code

4. Component

5. Data File

6. Database

7. Development

8. Graph Image Diagram Movie

9. GUI Tools

10. J2EE Web Development

11. Misc

12. Net Web Mobile

13. Programming

14. Science

15. Server Side JSP Servlet

16. Swing

17. Testing

18. Utilities

19. XML

Microsoft Office Word 2007 Tutorial

Java

Java Tutorial

Java Source Code / Java Documentation

Java Open Source

Jar File Download

Java Articles

Java by API

C# / C Sharp

C# / CSharp Tutorial

ASP.Net

JavaScript DHTML

JavaScript Tutorial

JavaScript Reference

HTML / CSS

HTML CSS Reference

C / ANSI-C

C Tutorial

C++

C++ Tutorial

PHP

Python

SQL Server / T-SQL

Oracle PL / SQL

Oracle PL/SQL Tutorial

PostgreSQL

SQL / MySQL

MySQL Tutorial

VB.Net

VB.Net Tutorial

Java Products » Development » HTML Parser

1. NekoHTML
By:
License: Apache Software License
URL: http://www.apache.org/~andyc/neko/doc/html/
Description: NekoHTML is a simple HTML scanner and tag balancer that enables application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags.

2. HTMLParser
By:
License: GNU Library or Lesser General Public License (LGPL)
URL: http://htmlparser.sourceforge.net/
Description: Welcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.

3. Jericho HTML Parser
By:
License: GNU General Public License (GPL)
URL: http://jerichohtml.sourceforge.net/
Description: Jericho HTML Parser is a simple but powerful java library allowing analysis and manipulation of parts of an HTML document, including some common server-side tags, while reproducing verbatim any unrecognised or invalid HTML. It also provides high-level HTML form manipulation functions.

4. JTidy
By:
License: Open source
URL: http://jtidy.sourceforge.net/
Description: JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

5. TagSoup
By:
License: GNU General Public License (GPL)
URL: http://mercury.ccil.org/~cowan/XML/tagsoup/
Description: This is the home page of TagSoup, a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command-line processor that reads HTML files and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.

6. HotSAX
By:
License: GNU Library or Lesser General Public License (LGPL)
URL: http://hotsax.sourceforge.net/
Description: HotSAX is a small fast SAX2 parser for HTML, XHTML and XML.

7. Cobra HTML Toolkit
By:
License: GNU Library or Lesser General Public License (LGPL)
URL: http://html.xamjwg.org/cobra.jsp
Description: The Cobra HTML Toolkit is an open source library that provides a pure Java HTML parser and a renderer. Cobra is intended to support HTML 4, Javascript and CSS 2. The parser can be used independently of the Cobra renderer.

www.java2java.com | Contact Us

Copyright 2010 - 2030 Java Source and Support. All rights reserved.

All other trademarks are property of their respective owners.