Adding Quotes to Attribute Values in HTML Documents : Parse HTML « Network « Python Tutorial

Python Tutorial
1. Introduction
2. Data Type
3. Statement
4. Operator
5. String
6. Tuple
7. List
8. Dictionary
9. Collections
10. Function
11. Class
12. File
13. Buildin Function
14. Buildin Module
15. Database
16. Regular Expressions
17. Thread
18. Tkinker
19. wxPython
20. XML
21. Network
22. CGI Web
23. Windows
Java
Java Tutorial
Java Source Code / Java Documentation
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorial
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Python Tutorial » Network » Parse HTML 
21. 21. 7. Adding Quotes to Attribute Values in HTML Documents
import HTMLParser
import urllib
import sys

class parseAttrs(HTMLParser.HTMLParser):
    def init_parser (self):
        self.pieces = []

    def handle_starttag(self, tag, attrs):
        fixedAttrs = ""
        for name, value in attrs:
            fixedAttrs += "%s=\"%s\" " (name, value)
        self.pieces.append("<%s %s>" (tag, fixedAttrs))

    def handle_charref(self, name):
        self.pieces.append("&#%s;" (name))

    def handle_endtag(self, tag):
        self.pieces.append("</%s>" (tag))

    def handle_entityref(self, ref):
        self.pieces.append("&%s" (ref))

    def handle_data(self, text):
        self.pieces.append(text)

    def handle_comment(self, text):
        self.pieces.append("<!--%s-->" (text))

    def handle_pi(self, text):
        self.pieces.append("<?%s>" (text))

    def handle_decl(self, text):
        self.pieces.append("<!%s>" (text))

    def parsed (self):
        return "".join(self.pieces)

attrParser = parseAttrs()
attrParser.init_parser()
attrParser.feed(urllib.urlopen("test2.html").read())
print open("test2.html").read()
print attrParser.parsed()
attrParser.close()
21. 21. Parse HTML
21. 21. 1. Extract list of URLs in a web page
21. 21. 2. Opening HTML Documents
21. 21. 3. Retrieving Links from HTML Documents
21. 21. 4. Retrieving Images from HTML Documents
21. 21. 5. Retrieving Text from HTML Documents
21. 21. 6. Retrieving Cookies in HTML Documents
21. 21. 7. Adding Quotes to Attribute Values in HTML Documents
21. 21. 8. Basic HTML Title Retriever
21. 21. 9. HTML Title Retriever With Entity Support
www.java2java.com | Contact Us
Copyright 2010 - 2030 Java Source and Support. All rights reserved.
All other trademarks are property of their respective owners.