| net.matuschek.http.HttpDocManager
All known Subclasses: net.matuschek.http.AbstractHttpDocManager, net.matuschek.http.HttpDocCache,
HttpDocManager | public interface HttpDocManager (Code) | | An HttpDocManager does something with an HttpDoc.
It is used by the WebRobot to store the retrieved documents.
See Also: net.matuschek.http.AbstractHttpDocManager author: Daniel Matuschek version: $Id: HttpDocManager.java,v 1.3 2003/02/27 18:40:19 oliver_schmidt Exp $ |
Method Summary | |
public String | findDuplicate(HttpDoc doc) Returns URL of a stored document with the same content or null. | public void | finish() Should be called if the instance is not used any more. | void | processDocument(HttpDoc doc) "Processes" a document (without storing it).
Either direct processing or collecting urls and later processing.
Most documents should be stored (for reruns) but not all of them should be
processed (Maybe you only want to process PDF documents).
Parameters: doc - a HttpDoc object to process. | public void | removeDocument(URL url) Removes a document from cache
Parameters: doc - a HttpDoc object to store. | HttpDoc | retrieveFromCache(URL u) If a HttpDocManager stores the complete HttpDocs, it is possible
to use it as a cache. | void | storeDocument(HttpDoc doc) Stores a document. |
findDuplicate | public String findDuplicate(HttpDoc doc) throws IOException(Code) | | Returns URL of a stored document with the same content or null.
Parameters: doc - URL of duplicate document as String or null throws: IOException - |
finish | public void finish()(Code) | | Should be called if the instance is not used any more.
Some resources might need to be released.
|
processDocument | void processDocument(HttpDoc doc) throws DocManagerException(Code) | | "Processes" a document (without storing it).
Either direct processing or collecting urls and later processing.
Most documents should be stored (for reruns) but not all of them should be
processed (Maybe you only want to process PDF documents).
Parameters: doc - a HttpDoc object to process. This may also be null exception: DocManagerException - will be thrown if an error occurswhile processing the document. |
removeDocument | public void removeDocument(URL url)(Code) | | Removes a document from cache
Parameters: doc - a HttpDoc object to store. This may also be null exception: DocManagerException - will be thrown if an error occurswhile storing the document. |
retrieveFromCache | HttpDoc retrieveFromCache(URL u)(Code) | | If a HttpDocManager stores the complete HttpDocs, it is possible
to use it as a cache. Using this method it is possible to access the cached
objects. If a HttpDocManager can't be used as a cache, it should always
return null.
a cached HttpDoc for this URL or null |
storeDocument | void storeDocument(HttpDoc doc) throws DocManagerException(Code) | | Stores a document. Usually this will store the document somewhere (file
system, database, ...). It is also possible that this will not store the
whole documents, but extract information from it and process this
information.
Most documents should be stored (for reruns) but not all of them should be
processed (Maybe you only want to process PDF documents).
Parameters: doc - a HttpDoc object to store. This may also be null exception: DocManagerException - will be thrown if an error occurswhile storing the document. |
|
|