| |
|
| org.archive.crawler.frontier.AbstractFrontier org.archive.crawler.frontier.WorkQueueFrontier
All known Subclasses: org.archive.crawler.frontier.BdbFrontier,
WorkQueueFrontier | abstract public class WorkQueueFrontier extends AbstractFrontier implements FetchStatusCodes,CoreAttributeConstants,HasUriReceiver,Serializable(Code) | | A common Frontier base using several queues to hold pending URIs.
Uses in-memory map of all known 'queues' inside a single database.
Round-robins between all queues.
author: Gordon Mohr author: Christian Kohlschuetter |
Inner Class :public class WakeTask extends TimerTask | |
ATTR_BALANCE_REPLENISH_AMOUNT | final public static String ATTR_BALANCE_REPLENISH_AMOUNT(Code) | | amount to replenish budget on each activation (duty cycle)
|
ATTR_COST_POLICY | final public static String ATTR_COST_POLICY(Code) | | cost assignment policy to use (by class name)
|
ATTR_ERROR_PENALTY_AMOUNT | final public static String ATTR_ERROR_PENALTY_AMOUNT(Code) | | whether to hold queues INACTIVE until needed for throughput
|
ATTR_HOLD_QUEUES | final public static String ATTR_HOLD_QUEUES(Code) | | whether to hold queues INACTIVE until needed for throughput
|
ATTR_QUEUE_TOTAL_BUDGET | final public static String ATTR_QUEUE_TOTAL_BUDGET(Code) | | total expenditure to allow a queue before 'retiring' it
|
ATTR_SNOOZE_DEACTIVATE_MS | final public static String ATTR_SNOOZE_DEACTIVATE_MS(Code) | | When a snooze target for a queue is longer than this amount, and
there are already ready queues, deactivate rather than snooze
the current queue -- so other more responsive sites get a chance
in active rotation. (As a result, queue's next try may be much
further in the future than the snooze target delay.)
|
ATTR_TARGET_READY_QUEUES_BACKLOG | final public static String ATTR_TARGET_READY_QUEUES_BACKLOG(Code) | | target size of ready queues backlog
|
AVAILABLE_COST_POLICIES | String[] AVAILABLE_COST_POLICIES(Code) | | all policies available to be chosen
|
DEFAULT_BALANCE_REPLENISH_AMOUNT | final protected static Integer DEFAULT_BALANCE_REPLENISH_AMOUNT(Code) | | |
DEFAULT_COST_POLICY | final protected static String DEFAULT_COST_POLICY(Code) | | |
DEFAULT_ERROR_PENALTY_AMOUNT | final protected static Integer DEFAULT_ERROR_PENALTY_AMOUNT(Code) | | |
DEFAULT_HOLD_QUEUES | final protected static Boolean DEFAULT_HOLD_QUEUES(Code) | | |
DEFAULT_QUEUE_TOTAL_BUDGET | final protected static Long DEFAULT_QUEUE_TOTAL_BUDGET(Code) | | |
DEFAULT_SNOOZE_DEACTIVATE_MS | public static Long DEFAULT_SNOOZE_DEACTIVATE_MS(Code) | | |
DEFAULT_TARGET_READY_QUEUES_BACKLOG | final protected static Integer DEFAULT_TARGET_READY_QUEUES_BACKLOG(Code) | | |
alreadyIncluded | protected transient UriUniqFilter alreadyIncluded(Code) | | those UURIs which are already in-process (or processed), and
thus should not be rescheduled
|
inProcessQueues | protected Bag inProcessQueues(Code) | | all per-class queues from whom a URI is outstanding
|
inactiveQueues | protected BlockingQueue<String> inactiveQueues(Code) | | All 'inactive' queues, not yet in active rotation.
Linked-list of keys for the queues.
|
nextWake | protected transient WakeTask nextWake(Code) | | Task for next wake
|
readyClassQueues | protected BlockingQueue<String> readyClassQueues(Code) | | All per-class queues whose first item may be handed out.
Linked-list of keys for the queues.
|
retiredQueues | protected BlockingQueue<String> retiredQueues(Code) | | 'retired' queues, no longer considered for activation.
Linked-list of keys for queues.
|
snoozedClassQueues | protected SortedSet<WorkQueue> snoozedClassQueues(Code) | | All per-class queues held in snoozed state, sorted by wake time.
|
targetSizeForReadyQueues | protected int targetSizeForReadyQueues(Code) | | Target (minimum) size to keep readyClassQueues
|
wakeTimer | protected transient Timer wakeTimer(Code) | | Timer for tasks which wake head item of snoozedClassQueues
|
WorkQueueFrontier | public WorkQueueFrontier(String name, String description)(Code) | | Create the CommonFrontier
Parameters: name - Parameters: description - |
appendQueueReports | protected void appendQueueReports(PrintWriter w, Iterator iterator, int total, int max)(Code) | | Append queue report to general Frontier report.
Parameters: w - StringBuffer to append to. Parameters: iterator - An iterator over Parameters: total - Parameters: max - |
averageDepth | public long averageDepth()(Code) | | |
congestionRatio | public float congestionRatio()(Code) | | |
considerIncluded | public void considerIncluded(UURI u)(Code) | | |
createAlreadyIncluded | abstract protected UriUniqFilter createAlreadyIncluded() throws IOException(Code) | | Create a UriUniqFilter that will serve as record
of already seen URIs.
A UURISet that will serve as a record of already seen URIs throws: IOException - |
deepestUri | public long deepestUri()(Code) | | |
deleteURIs | public long deleteURIs(String match)(Code) | | Parameters: match - String to match. Number of items deleted. |
forget | protected void forget(CrawlURI curi)(Code) | | Forget the given CrawlURI. This allows a new instance
to be created in the future, if it is reencountered under
different circumstances.
Parameters: curi - The CrawlURI to forget |
getQueueFor | abstract protected WorkQueue getQueueFor(CrawlURI curi)(Code) | | Return the work queue for the given CrawlURI's classKey. URIs
are ordered and politeness-delayed within their 'class'.
If the requested queue is not found, a new instance is created.
Parameters: curi - CrawlURI to base queue on the found or created ClassKeyQueue |
getQueueFor | abstract protected WorkQueue getQueueFor(String classKey)(Code) | | Return the work queue for the given classKey, or null
if no such queue exists.
Parameters: classKey - key to look for the found WorkQueue |
isEmpty | public synchronized boolean isEmpty()(Code) | | |
receive | public void receive(CandidateURI caUri)(Code) | | Accept the given CandidateURI for scheduling, as it has
passed the alreadyIncluded filter.
Choose a per-classKey queue and enqueue it. If this
item has made an unready queue ready, place that
queue on the readyClassQueues queue.
Parameters: caUri - CandidateURI. |
reportTo | public synchronized void reportTo(String name, PrintWriter writer)(Code) | | This method compiles a human readable report on the status of the frontier
at the time of the call.
Parameters: name - Name of report. Parameters: writer - Where to write to. |
sendToQueue | protected void sendToQueue(CrawlURI curi)(Code) | | Send a CrawlURI to the appropriate subqueue.
Parameters: curi - |
singleLineReportTo | public void singleLineReportTo(PrintWriter w)(Code) | | Parameters: w - Where to write to. |
wakeQueues | void wakeQueues()(Code) | | Wake any queues sitting in the snoozed queue whose time has come.
|
workQueueDataOnDisk | abstract protected boolean workQueueDataOnDisk()(Code) | | Returns true if the WorkQueue implementation of this
Frontier stores its workload on disk instead of relying
on serialization mechanisms.
a constant boolean value for this class/instance |
|
|
|