Grab links for anchor and image : HTML Parser « Network « Ruby

Ruby
1. ActiveRecord
2. Array
3. CGI
4. Class
5. Collections
6. Database
7. Date
8. Design Patterns
9. Development
10. File Directory
11. GUI
12. Hash
13. Language Basics
14. Method
15. Network
16. Number
17. Rails
18. Range
19. Reflection
20. Statement
21. String
22. Threads
23. Time
24. Tk
25. Unit Test
26. Windows Platform
27. XML
Java
Java Tutorial
Java Source Code / Java Documentation
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
C# / C Sharp
C# / CSharp Tutorial
ASP.Net
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
PHP
Python
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Ruby » Network » HTML Parser 
Grab links for anchor and image


require 'rexml/document'
require 'rexml/streamlistener'
require 'set'

class LinkGrabber
  include REXML::StreamListener
  attr_reader :links

 def initialize(interesting_tags = {'a' => %w{href}'img' => %w{src}}.freeze)
    @tags = interesting_tags
    @links = Set.new
  end

  def tag_start(name, attrs)
    @tags[name].each do |uri_attr|
      @links << attrs[uri_attrif attrs[uri_attr]
    end if @tags[name]
  end

  def parse(text)
    REXML::Document.parse_stream(text, self)
  end
end


text = %{"test
<a href="http://www.example.com/">http://www.example.com/</a>, http://www.example.com/blog/. Email me at <a
href="mailto:bob@example.com">b@e.com</a>.}

grabber = LinkGrabber.new
grabber.parse(text)
p grabber.links

 
Related examples in the same category
1. Extract URL
2. Customizing HTTP Request Headers
3. Check body size
www.java2java.com | Contact Us
Copyright 2010 - 2030 Java Source and Support. All rights reserved.
All other trademarks are property of their respective owners.