Retrieving Text from HTML Documents : Parse HTML « Network « Python Tutorial

Python Tutorial

1. Introduction

2. Data Type

3. Statement

4. Operator

5. String

6. Tuple

7. List

8. Dictionary

9. Collections

10. Function

11. Class

12. File

13. Buildin Function

14. Buildin Module

15. Database

16. Regular Expressions

17. Thread

18. Tkinker

19. wxPython

20. XML

21. Network

22. CGI Web

23. Windows

Java

Java Tutorial

Java Source Code / Java Documentation

Java Open Source

Jar File Download

Java Articles

Java Products

Java by API

Photoshop Tutorial

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

ASP.Net

ASP.NET Tutorial

JavaScript DHTML

JavaScript Tutorial

JavaScript Reference

HTML / CSS

HTML CSS Reference

C / ANSI-C

C Tutorial

C++

C++ Tutorial

Ruby

PHP

Python

Python Open Source

SQL Server / T-SQL

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

PostgreSQL

SQL / MySQL

MySQL Tutorial

VB.Net

VB.Net Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Python Tutorial » Network » Parse HTML

21. 21. 5. Retrieving Text from HTML Documents

`import HTMLParser import urllib urlText = [] class parseText(HTMLParser.HTMLParser): def handle_data(self, data): if data != '\n': urlText.append(data) lParser = parseText() lParser.feed(urllib.urlopen("http://www.python.org/index.html").read()) lParser.close() for item in urlText: print item`

21. 21. Parse HTML
21. 21. 1. Extract list of URLs in a web page
21. 21. 2. Opening HTML Documents
21. 21. 3. Retrieving Links from HTML Documents
21. 21. 4. Retrieving Images from HTML Documents
21. 21. 5. Retrieving Text from HTML Documents
21. 21. 6. Retrieving Cookies in HTML Documents
21. 21. 7. Adding Quotes to Attribute Values in HTML Documents
21. 21. 8. Basic HTML Title Retriever
21. 21. 9. HTML Title Retriever With Entity Support

www.java2java.com | Contact Us

Copyright 2010 - 2030 Java Source and Support. All rights reserved.

All other trademarks are property of their respective owners.