| |
|
| org.archive.crawler.deciderules.DecideRule org.archive.crawler.deciderules.ConfiguredDecideRule org.archive.crawler.deciderules.PredicatedDecideRule org.archive.crawler.deciderules.SurtPrefixedDecideRule
All known Subclasses: org.archive.crawler.deciderules.OnDomainsDecideRule, org.archive.crawler.deciderules.OnHostsDecideRule, org.archive.crawler.deciderules.ScopePlusOneDecideRule, org.archive.crawler.deciderules.NotSurtPrefixedDecideRule,
SurtPrefixedDecideRule | public class SurtPrefixedDecideRule extends PredicatedDecideRule implements SeedListener(Code) | | Rule applies configured decision to any URIs that, when
expressed in SURT form, begin with one of the prefixes
in the configured set.
The set can be filled with SURT prefixes implied or
listed in the seeds file, or another external file.
The "also-check-via" option to implement "one hop off"
scoping derives from a contribution by Shifra Raffel
of the California Digital Library.
author: gojomo |
ATTR_ALSO_CHECK_VIA | final public static String ATTR_ALSO_CHECK_VIA(Code) | | Whether the 'via' of CrawlURIs should also be checked
to see if it is prefixed by the set of SURT prefixes
|
ATTR_REBUILD_ON_RECONFIG | final public static String ATTR_REBUILD_ON_RECONFIG(Code) | | Whether every config change should trigger a
rebuilding of the prefix set.
|
ATTR_SEEDS_AS_SURT_PREFIXES | final public static String ATTR_SEEDS_AS_SURT_PREFIXES(Code) | | |
ATTR_SURTS_DUMP_FILE | final public static String ATTR_SURTS_DUMP_FILE(Code) | | |
ATTR_SURTS_SOURCE_FILE | final public static String ATTR_SURTS_SOURCE_FILE(Code) | | |
DEFAULT_ALSO_CHECK_VIA | final public static Boolean DEFAULT_ALSO_CHECK_VIA(Code) | | |
DEFAULT_REBUILD_ON_RECONFIG | final public static Boolean DEFAULT_REBUILD_ON_RECONFIG(Code) | | |
SurtPrefixedDecideRule | public SurtPrefixedDecideRule(String name)(Code) | | Usual constructor.
Parameters: name - |
buildSurtPrefixSet | protected void buildSurtPrefixSet()(Code) | | Construct the set of prefixes to use, from the seed list (
which may include both URIs and '+'-prefixed directives).
|
dumpSurtPrefixSet | protected void dumpSurtPrefixSet()(Code) | | Dump the current prefixes in use to configured dump file (if any)
|
evaluate | protected boolean evaluate(Object object)(Code) | | Evaluate whether given object's URI is covered by the SURT prefix set
Parameters: object - Item to evaluate. true if item, as SURT form URI, is prefixed by an item in the set |
getSeedfile | protected File getSeedfile()(Code) | | Dig through everything to get the crawl-global seeds file.
Add self as listener while at it.
Seed list file |
readPrefixes | protected void readPrefixes()(Code) | | |
|
|
|