Method Summary |
|
public void | acquireContinuePermission() Proceed only if allowed, giving CrawlController a chance
to enforce single-thread mode. |
public void | addCrawlStatusListener(CrawlStatusListener cl) Register for CrawlStatus events. |
public void | addCrawlURIDispositionListener(CrawlURIDispositionListener cl) Register for CrawlURIDisposition events. |
public void | addOrderToManifest() Add order file contents to manifest. |
public void | addToManifest(String file, char type, boolean bundle) Add a file to the manifest of files used/generated by the current
crawl.
TODO: Its possible for a file to be added twice if reports are
force generated midcrawl. |
public boolean | atFinish() Evaluate if the crawl should stop because it is finished,
without actually stopping the crawl. |
public void | beginCrawlStop() Start the process of stopping the crawl. |
public void | checkFinish() Evaluate if the crawl should stop because it is finished. |
void | checkpoint() Run checkpointing.
CrawlController takes care of managing the checkpointing/serializing
of bdb, the StatisticsTracker, and the CheckpointContext. |
protected void | checkpointBdb(File checkpointDir) Checkpoint bdb.
I used do a call to log cleaning as suggested in je-2.0 javadoc but takes
way too much time (20minutes for a crawl of 1million items). |
protected void | checkpointBigMaps(File cpDir) |
public void | closeLogFiles() Close all log files and remove handlers from loggers. |
synchronized void | completePause() |
protected void | completeStop() Called when the last toethread exits. |
protected FatalConfigurationException | convertToFatalConfigurationException(Exception e) |
protected void | copySettings(File checkpointDir) Copy off the settings. |
public void | fireCrawledURIDisregardEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion
crawledURIDisregard event that will be broadcast to all listeners that
have registered with the CrawlController. |
public void | fireCrawledURIFailureEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion crawledURIFailure event
that will be broadcast to all listeners that have registered with the CrawlController. |
public void | fireCrawledURINeedRetryEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion
crawledURINeedRetry event that will be broadcast to all listeners that
have registered with the CrawlController. |
public void | fireCrawledURISuccessfulEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion
crawledURISuccessful event that will be broadcast to all listeners that
have registered with the CrawlController. |
public void | freeReserveMemory() |
public int | getActiveToeCount() |
public EnhancedEnvironment | getBdbEnvironment() |
protected String | getBdbLogFileName(long index) |
public Map<K, V> | getBigMap(String dbName, Class<? super K> keyClass, Class<? super V> valueClass) Call this method to get instance of the crawler BigMap implementation.
A "BigMap" is a Map that knows how to manage ever-growing sets of
key/value pairs. |
protected boolean | getCheckpointCopyBdbjeLogs() |
public synchronized Checkpoint | getCheckpointRecover() Get recover checkpoint.
Returns null if we're NOT in recover mode.
Looks at ATTR_RECOVER_PATH and if its a directory, assumes checkpoint
recover. |
public static Checkpoint | getCheckpointRecover(CrawlOrder order) |
public File | getCheckpointsDisk() |
public StoredClassCatalog | getClassCatalog() |
public File | getDisk() Get the 'working' directory of the current crawl. |
public ProcessorChain | getFirstProcessorChain() Get the first processor chain. |
public Frontier | getFrontier() |
public File | getLogsDir() |
public CrawlOrder | getOrder() |
public ProcessorChain | getPostprocessorChain() Get the postprocessor chain. |
public ProcessorChainList | getProcessorChainList() Get the list of processor chains. |
public String[] | getReports() |
public CrawlScope | getScope() |
public File | getScratchDisk() |
public ServerCache | getServerCache() |
public File | getSettingsDir(String key) Return fullpath to the directory named by key
in settings.
If directory does not exist, it and all intermediary dirs
will be created.
Parameters: key - Key to use going to settings. |
public SettingsHandler | getSettingsHandler() |
public Object | getState() |
public File | getStateDisk() |
public StatisticsTracking | getStatistics() |
public int | getToeCount() |
public ToePool | getToePool() |
public void | initialize(SettingsHandler sH) Starting from nothing, set up CrawlController and associated
classes to be ready for a first crawl. |
public static boolean | isCheckpointRecover(CrawlOrder order) |
public boolean | isCheckpointRecover() True if we're in checkpoint recover mode. |
public boolean | isCheckpointing() |
public boolean | isPaused() |
public boolean | isPausing() |
public boolean | isRunning() |
public void | kickUpdate() While many settings will update automatically when the SettingsHandler is
modified, some settings need to be explicitly changed to reflect new
settings. |
public void | killThread(int threadNumber, boolean replace) Kills a thread. |
public void | logProgressStatistics(String msg) Log to the progress statistics log. |
public void | logUriError(URIException e, UURI u, CharSequence l) Log a URIException from deep inside other components to the crawl's
shared log. |
public void | multiThreadMode() |
public String | oneLineReportThreads() |
protected void | processBdbLogs(File checkpointDir, String lastBdbCheckpointLog) |
public void | progressStatisticsEvent(EventObject e) Called whenever progress statistics logging event. |
public void | releaseContinuePermission() Relinquish continue permission at end of processing (allowing
another thread to proceed if in single-thread mode). |
protected void | reportManifestTo(PrintWriter writer) |
protected void | reportProcessorsTo(PrintWriter writer) Compiles and returns a human readable report on the active processors. |
public void | reportTo(PrintWriter writer) |
public void | reportTo(String name, PrintWriter writer) |
public synchronized void | requestCrawlCheckpoint() Request a checkpoint. |
public synchronized void | requestCrawlPause() Stop the crawl temporarly. |
public synchronized void | requestCrawlResume() |
public void | requestCrawlStart() |
public synchronized void | requestCrawlStop() Operator requested for crawl to stop. |
public synchronized void | requestCrawlStop(String message) Operator requested for crawl to stop. |
protected void | restoreStatisticsTracker(MapType loggers, String replaceName) |
protected void | rotateLogFiles(String generationSuffix) |
protected void | runFrontierRecover(String recoverPath) |
protected void | sendCheckpointEvent(File checkpointDir) Send the checkpoint event. |
protected void | sendCrawlStateChangeEvent(Object newState, String message) Send crawl change event to all listeners. |
protected void | setBdbjeBkgrdThreads(EnvironmentConfig config, List threads, String setting) |
public void | setOrder(CrawlOrder o) |
protected void | setupCheckpointRecover() Does setup of checkpoint recover. |
public String | singleLineLegend() |
public String | singleLineReport() |
public void | singleLineReportTo(PrintWriter writer) |
public void | singleThreadMode() Go to single thread mode, where only one ToeThread may
proceed at a time. |
public synchronized void | toeEnded() Note that a ToeThread ended, possibly completing the crawl-stop. |
public synchronized void | toePaused() Note that a ToeThread reached paused condition, possibly
completing the crawl-pause. |