Method Summary |
|
public void | dumpToDatabase(DataOutputStream out) Creates a database containing just this URL. |
void | dumpWords(DataOutputStream out) Dumps the words contained in this URL in database format to "out". |
public void | finalize() Gets rid of the temporary file. |
public File | getCacheFile() Returns the file that is used to cache the contents of this URL. |
public long | getContentLength() Returns the length of the content, or 0 if it's unknown. |
public LinkExtractor | getLinkExtractor() Returns a LinkExtractor that can handle this URL's mime type.
To add support for new mime types add a LinkExtractor that handles
those mime types here and add appropriate WordExtractors to the
getWordExtractor() method. |
public WordExtractor | getWordExtractor() Returns a WordExtractor that can handle this URL's mime type.
To add support for new mime types add a WordExtractor that handles
those mime types here and add appropriate LinkExtractors to the
getLinkExtractor() method. |
public boolean | loaded() Returns true if and only if this URL was loaded without an error. |
public boolean | mimeTypeUnderstood(String mime_type) Returns true if and only if this mime type can be processed. |
public boolean | moved() Returns true if and only if this URL causes a redirection. |
void | pipe(InputStream in, OutputStream out) Pipes "in" to "out" until "in" is exhausted then closes "in". |
public void | readContent() Downloads the content of the given URL and stores it in a temporary
cache file. |
void | readGeneric() This method provides a fallback to the default Java implementation
for protocols which have not been re-implemented. |
void | readHTTP() Downloads a file using the HTTP protocol. |
String | readLine(PushbackInputStream in) A replacement for the java.io.DataInputStream which doesn't return
the line ending characters like it should. |