| java.lang.Object org.archive.extractor.CharSequenceLinkExtractor
All known Subclasses: org.archive.extractor.RegexpJSLinkExtractor, org.archive.extractor.RegexpCSSLinkExtractor, org.archive.extractor.RegexpHTMLLinkExtractor,
CharSequenceLinkExtractor | abstract public class CharSequenceLinkExtractor implements LinkExtractor(Code) | | Abstract superclass providing utility methods for LinkExtractors which
would prefer to work on a CharSequence rather than a stream.
ROUGH DRAFT IN PROGRESS / incomplete... untested...
author: gojomo |
Method Summary | |
protected CharSequence | charSequenceFrom(InputStream content, Charset charset) | protected CharSequence | createCharSequenceFrom(InputStream content, Charset charset) | public static void | extract(CharSequence content, UURI source, UURI base, List<Link> collector, ExtractErrorListener extractErrorListener) Convenience method to do default extraction. | abstract protected boolean | findNextLink() Scan to the next link(s), if any, loading it into the next buffer. | public boolean | hasNext() | protected static CharSequenceLinkExtractor | newDefaultInstance() | public Object | next() | public Link | nextLink() | public void | remove() | public void | reset() Discard all state. | public void | setup(UURI source, UURI base, InputStream content, Charset charset, ExtractErrorListener listener) | public void | setup(UURI source, UURI base, CharSequence content, ExtractErrorListener listener) | public void | setup(UURI sourceandbase, CharSequence content, ExtractErrorListener listener) Convenience method for when source and base are same. | public void | setup(UURI sourceandbase, InputStream content, Charset charset, ExtractErrorListener listener) |
charSequenceFrom | protected CharSequence charSequenceFrom(InputStream content, Charset charset)(Code) | | Parameters: content - Parameters: charset - CharSequence obtained from stream in given charset |
createCharSequenceFrom | protected CharSequence createCharSequenceFrom(InputStream content, Charset charset)(Code) | | Parameters: content - Parameters: charset - CharSequence built over given stream in given charset |
extract | public static void extract(CharSequence content, UURI source, UURI base, List<Link> collector, ExtractErrorListener extractErrorListener)(Code) | | Convenience method to do default extraction.
Parameters: content - Parameters: source - Parameters: base - Parameters: collector - Parameters: extractErrorListener - |
findNextLink | abstract protected boolean findNextLink()(Code) | | Scan to the next link(s), if any, loading it into the next buffer.
true if any links are found/available, false otherwise |
hasNext | public boolean hasNext()(Code) | | |
remove | public void remove()(Code) | | |
reset | public void reset()(Code) | | Discard all state. Another setup() is required to use again.
|
setup | public void setup(UURI sourceandbase, CharSequence content, ExtractErrorListener listener)(Code) | | Convenience method for when source and base are same.
Parameters: sourceandbase - Parameters: content - Parameters: listener - |
|
|