| |
|
| org.archive.crawler.framework.Processor org.archive.crawler.prefetch.PreconditionEnforcer
PreconditionEnforcer | public class PreconditionEnforcer extends Processor implements CoreAttributeConstants,FetchStatusCodes(Code) | | Ensures the preconditions for a fetch -- such as DNS lookup
or acquiring and respecting a robots.txt policy -- are
satisfied before a URI is passed to subsequent stages.
author: gojomo |
ATTR_CALCULATE_ROBOTS_ONLY | final public static String ATTR_CALCULATE_ROBOTS_ONLY(Code) | | |
ATTR_IP_VALIDITY_DURATION | final public static String ATTR_IP_VALIDITY_DURATION(Code) | | seconds to keep IP information for
|
ATTR_ROBOTS_VALIDITY_DURATION | final public static String ATTR_ROBOTS_VALIDITY_DURATION(Code) | | seconds to cache robots info
|
DEFAULT_CALCULATE_ROBOTS_ONLY | final public static Boolean DEFAULT_CALCULATE_ROBOTS_ONLY(Code) | | whether to calculate robots exclusion without applying
|
PreconditionEnforcer | public PreconditionEnforcer(String name)(Code) | | |
getIPValidityDuration | public long getIPValidityDuration(CrawlURI curi)(Code) | | Get the maximum time a dns-record is valid.
Parameters: curi - the uri this time is valid for. the maximum time a dns-record is valid -- in seconds -- ornegative if record's ttl should be used. |
getRobotsValidityDuration | public long getRobotsValidityDuration(CrawlURI curi)(Code) | | Get the maximum time a robots.txt is valid.
Parameters: curi - the time a robots.txt is valid in milliseconds. |
isIpExpired | public boolean isIpExpired(CrawlURI curi)(Code) | | Return true if ip should be looked up.
Parameters: curi - the URI to check. true if ip should be looked up. |
isRobotsExpired | public boolean isRobotsExpired(CrawlURI curi)(Code) | | Is the robots policy expired.
This method will also return true if we haven't tried to get the
robots.txt for this server.
Parameters: curi - true if the robots policy is expired. |
|
|
|