| org.archive.crawler.framework.Scoper org.archive.crawler.postprocessor.LinksScoper
LinksScoper | public class LinksScoper extends Scoper implements FetchStatusCodes(Code) | | Determine which extracted links are within scope.
TODO: To test scope, requires that Link be converted to
a CandidateURI. Make it so don't have to make a CandidateURI to test
if Link is in scope.
Since this scoper has to create CandidateURIs, no sense
discarding them since later in the processing chain CandidateURIs rather
than Links are whats needed scheduling extracted links w/ the
Frontier (Frontier#schedule expects CandidateURI, not Link). This class
replaces Links w/ the CandidateURI that wraps the Link in the CrawlURI.
author: gojomo author: stack |
ATTR_PREFERENCE_DEPTH_HOPS | final public static String ATTR_PREFERENCE_DEPTH_HOPS(Code) | | |
ATTR_REJECTLOG_DECIDE_RULES | final public static String ATTR_REJECTLOG_DECIDE_RULES(Code) | | |
LinksScoper | public LinksScoper(String name)(Code) | | Parameters: name - Name of this filter. |
getSchedulingFor | protected int getSchedulingFor(CrawlURI curi, Link wref, int preferenceDepthHops)(Code) | | Determine scheduling for the curi .
As with the LinksScoper in general, this only handles extracted links,
seeds do not pass through here, but are given MEDIUM priority.
Imports into the frontier similarly do not pass through here,
but are given NORMAL priority.
|
handlePrerequisite | protected void handlePrerequisite(CrawlURI curi)(Code) | | The CrawlURI has a prerequisite; apply scoping and update
Link to CandidateURI in manner analogous to outlink handling.
Parameters: curi - CrawlURI with prereq to consider |
Fields inherited from org.archive.crawler.framework.Scoper | final protected static String ATTR_OVERRIDE_LOGGER_ENABLED(Code)(Java Doc)
|
|
|