| |
|
| java.lang.Object net.matuschek.spider.WebRobot
Constructor Summary | |
public | WebRobot(int expectedDocumentCount) | public | WebRobot() |
Method Summary | |
protected void | addTask(RobotTask task) | protected void | addTaskAtStart(RobotTask task) | protected boolean | basicURLCheck(URL currURL) Basic URL allow check
it is allowed to walk to a new URL if | protected void | cleanUp() | public void | clearCookies() | public void | finish() This method finishes HttpTool, NoRobots, HttpDocManager. | protected void | finishThreads() | public String | getAgentName() | public boolean | getAllowCaching() Gets the AllowCaching value. | public boolean | getAllowWholeDomain() Gets the AllowWholeDomain value.
true if the Robot is allowed to travel to the whole domain of the start host, false otherwise. | public boolean | getAllowWholeHost() gets the AllowWholeHost value
true if the Robot is allowed to travel to the whole host where it started from, false otherwise. | public Vector | getAllowedURLs() | public int | getBandwidth() | public String | getContentVisitedURL(HttpDoc doc) Method getContentVisitedURL. | public CookieManager | getCookieManager() | public HttpDocManager | getDocManager() | public boolean | getEnableCookies() | public RobotExceptionHandler | getExceptionHandler() Method getExceptionHandler. | public long | getExpirationAge() get expiration age of documents in cache. | public boolean | getFlexibleHostCheck() Gets the state of flexible host checking (enabled or disabled).
To find out if a new URL is on the same host, the robot usually
compares the host part of both. | public Vector | getFormHandlers() | public boolean | getIgnoreRobotsTxt() | public int | getMaxDepth() | public long | getMaxDocumentAge() | public int | getMaxRetries() | protected String | getMimeTypeForFilename(String filename) Get the Mime type for the given filename. | public NTLMAuthorization | getNtlmAuthorization() | public String | getProxy() | public int | getSleepTime() | public String | getStart() Method getStart. | public String | getStartReferer() | public URL | getStartURL() | public int | getTimeout() | public Vector | getVisitMany() | public boolean | getWalkToOtherHosts() | public Vector | getWasteParameters() | public WebRobotCallback | getWebRobotCallback() | protected void | handleMemoryError(OutOfMemoryError memoryError) Implements OutOfMemory handling strategies. | protected boolean | isAllowed(URL u) | protected boolean | isProcessingAllowed(HttpDoc doc) | public boolean | isSleeping() | public static void | main(String[] args) | protected byte[] | readFileToByteArray(File file) Reads a File to a byte array. | public void | registerToDoList(TaskList todo) Sets the implementation class for the backend task list storage. | public void | registerVisitedList(TaskList visited) Sets the implementation class for the backend task list storage. | public static String | removeParametersFromString(String urlString, Vector wasteParameters) | public URL | removeWasteParameters(URL url) Removes wasteParameters from URL.
(eg. | public void | retrieveURL(RobotTask task) | public void | run() | public void | setAgentName(String name) sets the Agent-Name authentication for this robot
Parameters: name - a name for this robot (e.g. | public void | setAllowCaching(boolean allowCaching) Sets the AllowCaching status
Parameters: allowCaching - if true, the Robot is allows to usecached documents. | public void | setAllowWholeDomain(boolean allowWholeDomain) Sets the AllowWholeDomain status
Parameters: allowWholeDomain - if true, the Robot is allows to travelto all hosts in the same domain as the starting host. | public void | setAllowWholeHost(boolean allowWholeHost) sets the AllowWholeHost status
Parameters: allowWholeHost - if true, the Robot is allowed totravel to the whole host where it started from. | public void | setAllowedURLs(Vector allowed) Set the list of allowed URLs
Parameters: allowed - a Vector containing Strings. | public void | setBandwidth(int bandwidth) | public void | setContentVisitedURL(HttpDoc doc, String url) Method setContentVisitedURL. | public void | setCookieManager(CookieManager cm) Sets the CookieManager used by the HttpTool
By default a MemoryCookieManager will be used, but you can
use this method to use your own CookieManager implementation. | public void | setDocManager(HttpDocManager docManager) Sets the document manager for this robot
Without a document manager, the robot will travel through the web but
don't do anything with the retrieved documents (simply forget
them). | public void | setDownloadRuleSet(DownloadRuleSet rules) | public void | setEnableCookies(boolean enable) | public void | setExceptionHandler(RobotExceptionHandler newExceptionHandler) Method setExceptionHandler. | public void | setExpirationAge(long age) set expiration age of documents in cache.
Documents older than expirationAge will be removed,
negative value means no limit. | public void | setFilters(FilterChain filters) Sets a FilterChain. | public void | setFlexibleHostCheck(boolean flexibleHostCheck) Defines if the host test should be more flexible.
To find out if a new URL is on the same host, the robot usually
compares the host part of both. | public void | setFormHandlers(Vector handlers) | public void | setFromAddress(String fromAddress) sets the From: HTTP header
this should be a valid email address. | public void | setHttpToolCallback(HttpToolCallback callback) | public void | setIgnoreRobotsTxt(boolean ignoreRobotsTxt) | public void | setMaxDepth(int maxDepth) | public void | setMaxDocumentAge(long maxAge) | public void | setMaxRetries(int maxRetries) | public void | setNtlmAuthorization(NTLMAuthorization ntlmAuthorization) | public void | setProxy(String proxyDescr) | public void | setSleep(boolean sleep) Sets the sleep status for this robot. | public void | setSleepTime(int sleepTime) set the sleeptime
after every retrieved document the robot will wait this time
before getting the next document. | public void | setStart(String startURL) Method setStart. | public void | setStartReferer(String startReferer) sets the Referer setting for the first HTTP reuest
Parameters: startReferer - an URL (e.g. | public void | setStartURL(URL startURL) | public void | setTimeout(int timeout) Sets the timeout for getting data. | public void | setURLCheck(URLCheck check) | public void | setVisitMany(Vector visitMany) | public void | setWalkToOtherHosts(boolean walkToOtherHosts) | public void | setWasteParameters(Vector wasteParameters) | public void | setWebRobotCallback(WebRobotCallback webRobotCallback) | public void | sleepNow() sleep for sleepTime seconds. | protected synchronized void | spawnThread() Start subThreads for spidering. | public void | stopRobot() | protected boolean | taskAddAllowed(RobotTask task) | public void | updateProgressInfo() Inform about spidering progress. | public void | walkTree() | public void | work() |
activatedContentHistory | protected boolean activatedContentHistory(Code) | | Are visited contents collected? (may depend on memoryLevel)
|
activatedNewTasks | protected boolean activatedNewTasks(Code) | | Can new tasks be added? (may depend on memoryLevel)
|
activatedUrlHistory | protected boolean activatedUrlHistory(Code) | | Are visited URLs collected? (may depend on memoryLevel)
|
allowCaching | protected boolean allowCaching(Code) | | don't retrieve pages again that are already stored in the DocManager
|
allowWholeHost | protected boolean allowWholeHost(Code) | | allow travelling the whole host ?
|
allowedURLs | protected Vector allowedURLs(Code) | | list of allowed URLs (even if walkToOtherHosts is false) *
|
content2UrlMap | protected HashMap content2UrlMap(Code) | | remember visited content here (md5, urlString)
|
countCache | long countCache(Code) | | counter for pages that were found in cache
|
countNoRefresh | long countNoRefresh(Code) | | counter for pages that didn´t need a refresh
|
countRefresh | long countRefresh(Code) | | counter for refreshed pages (=cache+web)
|
countWeb | long countWeb(Code) | | counter for pages retrieved by web
|
docManager | protected HttpDocManager docManager(Code) | | DocManager will store or process retrieved documents
|
duplicateCheck | protected boolean duplicateCheck(Code) | | Check for documents with the same content
|
expectedDocumentCount | protected int expectedDocumentCount(Code) | | expected count of documents
|
expirationAge | protected long expirationAge(Code) | | expiration age of documents in cache.
Documents older than expirationAge will be removed,
negative value means no limit.
|
filters | protected FilterChain filters(Code) | | FilterChain to filter the document before storing it
|
hasFormHandlers | boolean hasFormHandlers(Code) | | only true if form-handlers are defined
|
httpTool | protected HttpTool httpTool(Code) | | HttpTool will be used to retrieve documents from a web server
|
ignoreRobotsTxt | protected boolean ignoreRobotsTxt(Code) | | ignore settings in /robots.txt ?
|
iteration | protected int iteration(Code) | | counter for calls of retrieveURL
|
log | protected Category log(Code) | | Log4J category for logging
|
maxDepth | protected int maxDepth(Code) | | maximal search depth
|
maxDocumentAge | protected long maxDocumentAge(Code) | | maximum document age in seconds, negative value means
no limit
|
maxRetries | protected int maxRetries(Code) | | number of allowed retries for document retrieval
|
sleep | protected boolean sleep(Code) | | should the robot suspend the current walk() *
|
sleepTime | protected int sleepTime(Code) | | sleep that number of seconds after every retrieved document
|
startDir | protected String startDir(Code) | | the host and directory where retrieval started from
|
startReferer | protected String startReferer(Code) | | Referer used to retrieve to first document
|
startTime | protected long startTime(Code) | | time of WebRobot start in milliseconds
|
startURL | protected URL startURL(Code) | | the URL where the robot walk starts from
|
stopIt | protected boolean stopIt(Code) | | should we stop robot operation ? *
|
urlCheck | protected URLCheck urlCheck(Code) | | to check if it is allowed to travel to a given URL *
|
visitMany | protected Vector visitMany(Code) | | this URLs can be visited more then once
|
visited | protected TaskList visited(Code) | | a list of all URLs we got already
|
walkToOtherHosts | protected boolean walkToOtherHosts(Code) | | is it allowed to walk to other hosts then the starting host ?
|
wasteParameters | protected Vector wasteParameters(Code) | | list of wasteParameters (will be removed from URLs) *
|
WebRobot | public WebRobot(int expectedDocumentCount)(Code) | | initializes the robot with the default implementation
of the TaskList interface
Parameters: expected - document count |
WebRobot | public WebRobot()(Code) | | initializes the robot with the default implementation of the TaskList
interface
|
addTask | protected void addTask(RobotTask task)(Code) | | adds a new task to the task vector but does some checks to
|
basicURLCheck | protected boolean basicURLCheck(URL currURL)(Code) | | Basic URL allow check
it is allowed to walk to a new URL if
- WalkToOtherHost is true. In this case there will be no additional
tests.
- The new URL is located below the start URL, e.g. is the start URL
is http://localhost/test, the URL http://localhost/test/index.html
is allowed, but http://localhost/ is not allowed.
- AllowWholeHost is true and the new URL is located on the same host
as the start URL.
- FlexibleHostCheck is true and the host part of the current URL
is equal to the host part of the start URL modulo the prefix "www."
- The URL starts with a string in the "AllowedURLs" list.
|
cleanUp | protected void cleanUp()(Code) | | Clean up temporary data
|
clearCookies | public void clearCookies()(Code) | | Delete all cookies
|
finish | public void finish()(Code) | | This method finishes HttpTool, NoRobots, HttpDocManager.
|
finishThreads | protected void finishThreads()(Code) | | calls webRobotDone and finishes docManager if
executed in mainThread
|
getAgentName | public String getAgentName()(Code) | | Gets the name of the "User-Agent" header that the robot will use
the user agent name |
getAllowWholeDomain | public boolean getAllowWholeDomain()(Code) | | Gets the AllowWholeDomain value.
true if the Robot is allowed to travel to the whole domain of the start host, false otherwise. See Also: WebRobot.setAllowWholeDomain(boolean) |
getAllowWholeHost | public boolean getAllowWholeHost()(Code) | | gets the AllowWholeHost value
true if the Robot is allowed to travel to the whole host where it started from, false otherwise. If false, it is onlyallowed to travel to URLs below the start URL |
getBandwidth | public int getBandwidth()(Code) | | Get the value of bandwith of the used HttpTool
value of bandwith. |
getContentVisitedURL | public String getContentVisitedURL(HttpDoc doc)(Code) | | Method getContentVisitedURL.
Checks if the content was visited before and retrieves the corresponding URL.
Parameters: content - found url or null if not found |
getCookieManager | public CookieManager getCookieManager()(Code) | | Gets the CookieManager used by the HttpTool
the CookieManager that will be used by the HttpTool |
getEnableCookies | public boolean getEnableCookies()(Code) | | Get the status of the cookie engine
true, if HTTP cookies are enabled, false otherwise |
getExceptionHandler | public RobotExceptionHandler getExceptionHandler()(Code) | | Method getExceptionHandler.
RobotExceptionHandler the exceptionhandler of the robot |
getExpirationAge | public long getExpirationAge()(Code) | | get expiration age of documents in cache.
long |
getFlexibleHostCheck | public boolean getFlexibleHostCheck()(Code) | | Gets the state of flexible host checking (enabled or disabled).
To find out if a new URL is on the same host, the robot usually
compares the host part of both. Some web servers have an inconsistent
addressing scheme and use the hostname www.domain.com and domain.com.
With flexible host check enabled, the robot will consider both
hosts as equal.
true, if flexible host checking is enabled |
getIgnoreRobotsTxt | public boolean getIgnoreRobotsTxt()(Code) | | Gets the setting of the IgnoreRobotsTxt property
true if robots.txt will be ignored, false otherwise |
getMaxDepth | public int getMaxDepth()(Code) | | the maximal allowed search depth |
getMaxDocumentAge | public long getMaxDocumentAge()(Code) | | Gets the maximum age of documents to retrieve
maximum document age (in seconds), negative value means no limit. |
getMaxRetries | public int getMaxRetries()(Code) | | Get allowed retries for document retrieval
maxRetries |
getMimeTypeForFilename | protected String getMimeTypeForFilename(String filename)(Code) | | Get the Mime type for the given filename.
Parameters: filename - Mime type |
getNtlmAuthorization | public NTLMAuthorization getNtlmAuthorization()(Code) | | Gets the ntlmAuthentication of the robot
the ntlmAuthentication |
getProxy | public String getProxy()(Code) | | the current proxy setting in the format host:port |
getSleepTime | public int getSleepTime()(Code) | | the sleeptime setting |
getStart | public String getStart()(Code) | | Method getStart.
gets the start url as string
String |
getStartReferer | public String getStartReferer()(Code) | | the Referer setting for the first HTTP reuest |
getStartURL | public URL getStartURL()(Code) | | the start URL for this robot |
getTimeout | public int getTimeout()(Code) | | Gets the timeout for getting data in seconds of the used HttpTool
the value of sockerTimeout See Also: WebRobot.setTimeout(int) |
getVisitMany | public Vector getVisitMany()(Code) | | Gets a vector of URLs that can be visited more then once
a vector containing URLs formated as Strings |
getWalkToOtherHosts | public boolean getWalkToOtherHosts()(Code) | | gets the WalkToOtherHost status
true if the Robot is allowed to travel to otherhost then the start host, false otherwise |
getWasteParameters | public Vector getWasteParameters()(Code) | | Gets the list of wasteParameters (will be removed from URLs)
a Vector containing Strings |
isAllowed | protected boolean isAllowed(URL u)(Code) | | Is it allowed to travel to this new URL ?
Parameters: u - the URL to test true if traveling to this URL is allowed, false otherwise |
isProcessingAllowed | protected boolean isProcessingAllowed(HttpDoc doc)(Code) | | Is it allowed to process this document ?
Parameters: document - true if processing of this URL is allowed |
isSleeping | public boolean isSleeping()(Code) | | Is the robot sleeping ?
|
readFileToByteArray | protected byte[] readFileToByteArray(File file) throws IOException(Code) | | Reads a File to a byte array.
Parameters: file - byte[] throws: IOException - |
registerToDoList | public void registerToDoList(TaskList todo)(Code) | | Sets the implementation class for the backend task list storage.
WebRobot uses the TaskList interface to store future tasks.
If you want to use your own TaskList implementation, just call
this method.
Parameters: todo - TaskList to be used for the "to do" list |
registerVisitedList | public void registerVisitedList(TaskList visited)(Code) | | Sets the implementation class for the backend task list storage.
WebRobot uses the TaskList interface to store URLs that have
been retrieved before.
If you want to use your own TaskList implementation, just call
this method.
Parameters: visited - TaskList to be used for the list of visited URLs |
removeParametersFromString | public static String removeParametersFromString(String urlString, Vector wasteParameters)(Code) | | Remove passed Parameters from UrlString
Parameters: urlString - Parameters: wasteParameters - String |
removeWasteParameters | public URL removeWasteParameters(URL url)(Code) | | Removes wasteParameters from URL.
(eg. ID)
Parameters: url - URL |
retrieveURL | public void retrieveURL(RobotTask task)(Code) | | retrieve the next URL, save it, extract all included links and
add those links to the tasks list
Parameters: task - task to retrieve, function does nothing if this is null |
setAgentName | public void setAgentName(String name)(Code) | | sets the Agent-Name authentication for this robot
Parameters: name - a name for this robot (e.g. "Mozilla 4.0 (compatible; Robot)") |
setAllowCaching | public void setAllowCaching(boolean allowCaching)(Code) | | Sets the AllowCaching status
Parameters: allowCaching - if true, the Robot is allows to usecached documents. That means it will first try to get teh documentfrom the docManager cache and will only retrieve it if it isnot found in the cache. If the cache returns a document, the robotwill NEVER retrieve it again. Therefore, expiration mechanisms haveto be included in the HttpDocManager method retrieveFromCache. See Also: net.matuschek.http.HttpDocManager.retrieveFromCache(java.net.URL) |
setAllowWholeDomain | public void setAllowWholeDomain(boolean allowWholeDomain)(Code) | | Sets the AllowWholeDomain status
Parameters: allowWholeDomain - if true, the Robot is allows to travelto all hosts in the same domain as the starting host. E.g. if youstart at www.apache.org, it is also allowed to travel tojakarta.apache.org, xml.apache.org ... |
setAllowWholeHost | public void setAllowWholeHost(boolean allowWholeHost)(Code) | | sets the AllowWholeHost status
Parameters: allowWholeHost - if true, the Robot is allowed totravel to the whole host where it started from. Otherwise it is onlyallowed to travel to URLs below the start URL. |
setAllowedURLs | public void setAllowedURLs(Vector allowed)(Code) | | Set the list of allowed URLs
Parameters: allowed - a Vector containing Strings. URLs will be checkedif they begin of a string in this vector |
setBandwidth | public void setBandwidth(int bandwidth)(Code) | | Set the value of bandwith of the used HttpTool
Parameters: bandwidth - Value to assign to bandwith. |
setContentVisitedURL | public void setContentVisitedURL(HttpDoc doc, String url)(Code) | | Method setContentVisitedURL.
Makes an URL retrievable by its content by entering it in content2UrlMap.
Parameters: content - Parameters: url - |
setCookieManager | public void setCookieManager(CookieManager cm)(Code) | | Sets the CookieManager used by the HttpTool
By default a MemoryCookieManager will be used, but you can
use this method to use your own CookieManager implementation.
Parameters: cm - an object that implements the CookieManager interface |
setDocManager | public void setDocManager(HttpDocManager docManager)(Code) | | Sets the document manager for this robot
Without a document manager, the robot will travel through the web but
don't do anything with the retrieved documents (simply forget
them).
A document manager can store them, extract information or
whatever you like.
There can be only one document manager, but you are free to combine
functionalities of available document managers in a new object (e.g.
to store the document and extract meta informations).
Parameters: docManager - |
setDownloadRuleSet | public void setDownloadRuleSet(DownloadRuleSet rules)(Code) | | Sets the DownloadRule
Parameters: rule - the download rule set to use |
setEnableCookies | public void setEnableCookies(boolean enable)(Code) | | Enable/disable cookies
Parameters: enable - if true, HTTP cookies will be enabled, if falsethe robot will not use cookies |
setExceptionHandler | public void setExceptionHandler(RobotExceptionHandler newExceptionHandler)(Code) | | Method setExceptionHandler.
sets the exceptionhandler of the robot
Parameters: newExceptionHandler - the new exception handler |
setExpirationAge | public void setExpirationAge(long age)(Code) | | set expiration age of documents in cache.
Documents older than expirationAge will be removed,
negative value means no limit.
Parameters: age - |
setFilters | public void setFilters(FilterChain filters)(Code) | | Sets a FilterChain. If teh WebRobot use a FilterChain it will
process any retrieved document by this FilterChain before
storing it
Parameters: filter - a FilterChain to use for filtering HttpDocs |
setFlexibleHostCheck | public void setFlexibleHostCheck(boolean flexibleHostCheck)(Code) | | Defines if the host test should be more flexible.
To find out if a new URL is on the same host, the robot usually
compares the host part of both. Some web servers have an inconsistent
addressing scheme and use the hostname www.domain.com and domain.com.
With flexible host check enabled, the robot will consider both
hosts as equal.
Parameters: flexibleHostCheck - set this true, to enable flexible host checking(disabled by default) |
setFromAddress | public void setFromAddress(String fromAddress)(Code) | | sets the From: HTTP header
this should be a valid email address. it is not needed for the robot,
but you should use it, because the administrator of the web server
can contact you if the robot is doing things that he don't want
Parameters: fromAdress - an RFC 822 email adress |
setIgnoreRobotsTxt | public void setIgnoreRobotsTxt(boolean ignoreRobotsTxt)(Code) | | should we ignore robots.txt Robot Exclusion protocol ?
Parameters: ignoreRobotsTxt - if set to true, the robot will ignorethe settings of the /robots.txt file on the webserverKnow what you are doing if you change this setting |
setMaxDepth | public void setMaxDepth(int maxDepth)(Code) | | sets the maximal search depth
Parameters: maxDepth - |
setMaxDocumentAge | public void setMaxDocumentAge(long maxAge)(Code) | | Set the maximum age of documents to retrieve to this number
of seconds
Parameters: maxAge - integer value of the maximum document age (in seconds), negative value means no limit. |
setMaxRetries | public void setMaxRetries(int maxRetries)(Code) | | Set allowed retries for document retrieval
Parameters: maxRetries - |
setNtlmAuthorization | public void setNtlmAuthorization(NTLMAuthorization ntlmAuthorization)(Code) | | sets a ntlmAuthentication for this robot
Parameters: ntlmAuthentication - for this robot |
setProxy | public void setProxy(String proxyDescr) throws HttpException(Code) | | sets a proxy to use
Parameters: proxyDescr - the Proxy definition in the format host:port |
setSleep | public void setSleep(boolean sleep)(Code) | | Sets the sleep status for this robot. If a WebRobot is set to sleep
after starting run(), is will wait after retrieving the current document
and wait for setSleep(false)
|
setSleepTime | public void setSleepTime(int sleepTime)(Code) | | set the sleeptime
after every retrieved document the robot will wait this time
before getting the next document. this allows it to limit the
load on the server
Parameters: sleeptime - wait time in seconds |
setStart | public void setStart(String startURL)(Code) | | Method setStart.
sets the start URL
Parameters: the - startURL as String |
setStartReferer | public void setStartReferer(String startReferer)(Code) | | sets the Referer setting for the first HTTP reuest
Parameters: startReferer - an URL (e.g. http://www.matuschek.net) |
setStartURL | public void setStartURL(URL startURL)(Code) | | Sets the start URL for this robot
Parameters: startURL - the start URL |
setTimeout | public void setTimeout(int timeout)(Code) | | Sets the timeout for getting data. If HttpTool can't read data from a
remote web server after this number of seconds it will stop the download
of the current file
Parameters: timeout - Timeout in seconds |
setURLCheck | public void setURLCheck(URLCheck check)(Code) | | Sets the URLCheck for this robot
Parameters: check - |
setVisitMany | public void setVisitMany(Vector visitMany)(Code) | | |
setWalkToOtherHosts | public void setWalkToOtherHosts(boolean walkToOtherHosts)(Code) | | sets the WalkToOtherHosts status
Parameters: walkToOtherHosts - true if the Robot is allowed to travel to otherhost then the start host, false otherwise |
setWasteParameters | public void setWasteParameters(Vector wasteParameters)(Code) | | Set the list of wasteParameters (will be removed from URLs)
Parameters: wasteParameters - if they begin of a string in this vector |
sleepNow | public void sleepNow()(Code) | | sleep for sleepTime seconds.
|
spawnThread | protected synchronized void spawnThread()(Code) | | Start subThreads for spidering.
WARNING: Should only be implemented and used for local
spidering purposes!
|
stopRobot | public void stopRobot()(Code) | | stop the current robot run
note that this will not abourt the current download but stop after
the current download has finished
|
taskAddAllowed | protected boolean taskAddAllowed(RobotTask task)(Code) | | Checks if a tasks should be added to the task list
Parameters: robotTask - true if this tasks can be added to the task list,false otherwise |
updateProgressInfo | public void updateProgressInfo()(Code) | | Inform about spidering progress.
May use iteration, startTime,
countCache, countWeb, countRefresh, countNoRefresh
|
walkTree | public void walkTree()(Code) | | do your job !
|
work | public void work()(Code) | | do your job travel through the web using the configured
parameters and retrieve documents
|
|
|
|