| java.lang.Object bdd.search.spider.HTMLLinkExtractor
HTMLLinkExtractor | public class HTMLLinkExtractor implements LinkExtractor(Code) | | Written by Tim Macinta 1997
Distributed under the GNU Public License
(a copy of which is enclosed with the source).
This LinkExtractor can extract URLs from HTML files.
|
Constructor Summary | |
public | HTMLLinkExtractor(File cache_file, URL base_url) Creates a new HTMLLinkExtractor that will enumerate all the
URLs in the give "cache_file". |
HTMLLinkExtractor | public HTMLLinkExtractor(File cache_file, URL base_url) throws IOException(Code) | | Creates a new HTMLLinkExtractor that will enumerate all the
URLs in the give "cache_file".
|
addURL | public void addURL(URL url)(Code) | | Adds "url" to the list of URLs.
|
analyze | public void analyze(String param)(Code) | | Analyzes "param", which should be the contents between a '<' and a '>',
and adds any URLs that are found to the list of URLs.
|
analyzeFrame | void analyzeFrame(String frame)(Code) | | Analyzes the tag.
|
extract | String extract(String line, String key)(Code) | | Returns the value in "line" associated with "key", or null if "key"
is not found. For instance, if line were "a href="blah blah blah"
and "key" were "href" this method would return "blah blah blah".
Keys are case insensitive.
|
extractBase | void extractBase(String b)(Code) | | Extracts the base URL from the tag.
|
hasMoreElements | public boolean hasMoreElements()(Code) | | |
reset | public void reset()(Code) | | Resets this enumeration.
|
|
|