Download parameters. These parameters are limits on
how Page can download a Link. A Crawler has a
default set of download parameters, but the defaults
can be overridden on individual links by calling
Link.setDownloadParameters().
DownloadParameters is an immutable class (like String).
"Changing" a parameter actually returns a new instance
of the class with only the specified parameter changed.
changeCrawlTimeout(int timeout) Change timeout value.
Parameters: timeout - maximum length of time (in seconds) that crawler will run.Use a negative value to turn off timeout.
changeDownloadTimeout(int timeout) Change download timeout value.
Parameters: timeout - length of time (in seconds) to wait for a page to downloadUse a negative value to turn off timeout.
changeObeyRobotExclusion(boolean f) Change obey-robot-exclusion flag.
Parameters: f - If true, then thecrawler checks robots.txt on the remote Web sitebefore downloading a page.
getUserAgent() Get User-agent header used in HTTP requests.
user-agent field used in HTTP requests,or null if the Java library's default user-agentis used.
Change accepted MIME types.
Parameters: types - list of MIME types that can be handledby the crawler. Use null if the crawler can handle anything. new DownloadParameters object with the specified parameter changed.
Change timeout value.
Parameters: timeout - maximum length of time (in seconds) that crawler will run.Use a negative value to turn off timeout. new DownloadParameters object with the specified parameter changed.
Change download timeout value.
Parameters: timeout - length of time (in seconds) to wait for a page to downloadUse a negative value to turn off timeout. new DownloadParameters object with the specified parameter changed.
Change interactive flag.
Parameters: f - true if a user is available to respondto dialog boxes new DownloadParameters object with the specified parameter changed.
Change maximum page size. Pages larger than this limit are treated as
leaves in the crawl graph -- neither downloaded nor parsed.
Parameters: maxPageSize - maximum page size in kilobytes new DownloadParameters object with the specified parameter changed.
Set maximum threads.
Parameters: maxthreads - maximum number of background threads used by crawler new DownloadParameters object with the specified parameter changed.
Change obey-robot-exclusion flag.
Parameters: f - If true, then thecrawler checks robots.txt on the remote Web sitebefore downloading a page. new DownloadParameters object with the specified parameter changed.
Change use-caches flag.
Parameters: f - true if cached pages should be used whenever possible new DownloadParameters object with the specified parameter changed.
Change User-agent field used in HTTP requests.
Parameters: userAgent - user-agent field used in HTTPrequests. Pass null to use the Java library's defaultuser-agent field. new DownloadParameters object with the specified parameter changed.
Get accepted MIME types.
list of MIME types that can be handled by the crawler (which are passed as the Accept headerin the HTTP request).Default is null.
Get download timeout value.
length of time (in seconds) that crawler will wait for a page to downloadbefore aborting it.timeout. Default is 60 seconds.
Get maximum page size. Pages larger than this limit are neither
downloaded nor parsed.
Default value is 100 (KB). 0 or negative values mean no limit.
maximum page size in kilobytes
Get User-agent header used in HTTP requests.
user-agent field used in HTTP requests,or null if the Java library's default user-agentis used. Default value is null (but for a Crawler,the default DownloadParameters has the Crawler'sname as its default user-agent).