| org.archive.extractor.LinkExtractor
All known Subclasses: org.archive.extractor.CharSequenceLinkExtractor,
LinkExtractor | public interface LinkExtractor extends Iterator(Code) | | LinkExtractor is a general interface for classes which, when given an
InputStream and Charset, can scan for Links and return them via
an Iterator interface.
Implementors may in fact complete all extraction on the first
hasNext(), then trickle Links out from an internal collection,
depending on whether the link-extraction technique used is amenable
to incremental scanning.
ROUGH DRAFT IN PROGRESS / incomplete... untested...
author: gojomo |
Method Summary | |
public Link | nextLink() Alternative to Iterator.next() which returns type Link. | public void | reset() Discard all state and release any used resources. | public void | setup(UURI source, UURI base, InputStream content, Charset charset, ExtractErrorListener listener) Setup the LinkExtractor to operate on the given stream and charset,
considering the given contextURI as the initial 'base' URI for
resolving relative URIs. | public void | setup(UURI sourceandbase, InputStream content, Charset charset, ExtractErrorListener listener) Convenience version of above for common case where source and base are
same. |
nextLink | public Link nextLink()(Code) | | Alternative to Iterator.next() which returns type Link.
a discovered Link |
reset | public void reset()(Code) | | Discard all state and release any used resources.
|
setup | public void setup(UURI source, UURI base, InputStream content, Charset charset, ExtractErrorListener listener)(Code) | | Setup the LinkExtractor to operate on the given stream and charset,
considering the given contextURI as the initial 'base' URI for
resolving relative URIs.
May be called to 'reset' a LinkExtractor to start with new input.
Parameters: source - source URI Parameters: base - base URI (usually the source URI) for URI derelativizing Parameters: content - input stream of content to scan for links Parameters: charset - Charset to consult to decode stream to characters Parameters: listener - ExtractErrorListener to notify, rather than raisingexception through extraction loop |
setup | public void setup(UURI sourceandbase, InputStream content, Charset charset, ExtractErrorListener listener)(Code) | | Convenience version of above for common case where source and base are
same.
Parameters: sourceandbase - URI to use as source and base for derelativizing Parameters: content - input stream of content to scan for links Parameters: charset - Charset to consult to decode stream to characters Parameters: listener - ExtractErrorListener to notify, rather than raisingexception through extraction loop |
|
|