Java Doc for CrawlController.java in » Web-Crawler » heritrix » org » archive » crawler » framework » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.framework

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

org.archive.crawler.framework .CrawlController

CrawlController
public class CrawlController implements Serializable,Reporter(Code)
	CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl. As the "global context" for a crawl, subcomponents will often reach each other through the CrawlController. author: Gordon Mohr

Field Summary
final public static String	CURRENT_LOG_SUFFIX
final public static char	MANIFEST_CONFIG_FILE
final public static char	MANIFEST_LOG_FILE
final public static String	MANIFEST_REPORT
final public static char	MANIFEST_REPORT_FILE
final public static String	PROCESSORS_REPORT
final protected static String[]	REPORTS
public transient Logger	localErrors This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor.
protected transient ArrayList<CrawlURIDispositionListener>	registeredCrawlURIDispositionListeners
public transient Logger	reports Logger to hold job summary report. Large state reports made at infrequent intervals (e.g.
public transient Logger	runtimeErrors This logger contains unexpected runtime errors.
protected StatisticsTracking	statistics
public transient Logger	uriErrors Special log for URI format problems, wherever they may occur.
public transient Logger	uriProcessing Crawl progress logger. No exceptions.

Constructor Summary
public	CrawlController()

Method Summary
public void	acquireContinuePermission() Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.
public void	addCrawlStatusListener(CrawlStatusListener cl) Register for CrawlStatus events.
public void	addCrawlURIDispositionListener(CrawlURIDispositionListener cl) Register for CrawlURIDisposition events.
public void	addOrderToManifest() Add order file contents to manifest.
public void	addToManifest(String file, char type, boolean bundle) Add a file to the manifest of files used/generated by the current crawl. TODO: Its possible for a file to be added twice if reports are force generated midcrawl.
public boolean	atFinish() Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
public void	beginCrawlStop() Start the process of stopping the crawl.
public void	checkFinish() Evaluate if the crawl should stop because it is finished.
void	checkpoint() Run checkpointing. CrawlController takes care of managing the checkpointing/serializing of bdb, the StatisticsTracker, and the CheckpointContext.
protected void	checkpointBdb(File checkpointDir) Checkpoint bdb. I used do a call to log cleaning as suggested in je-2.0 javadoc but takes way too much time (20minutes for a crawl of 1million items).
protected void	checkpointBigMaps(File cpDir)
public void	closeLogFiles() Close all log files and remove handlers from loggers.
synchronized void	completePause()
protected void	completeStop() Called when the last toethread exits.
protected FatalConfigurationException	convertToFatalConfigurationException(Exception e)
protected void	copySettings(File checkpointDir) Copy off the settings.
public void	fireCrawledURIDisregardEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController.
public void	fireCrawledURIFailureEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController.
public void	fireCrawledURINeedRetryEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController.
public void	fireCrawledURISuccessfulEvent(CrawlURI curi) Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController.
public void	freeReserveMemory()
public int	getActiveToeCount()
public EnhancedEnvironment	getBdbEnvironment()
protected String	getBdbLogFileName(long index)
public Map<K, V>	getBigMap(String dbName, Class<? super K> keyClass, Class<? super V> valueClass) Call this method to get instance of the crawler BigMap implementation. A "BigMap" is a Map that knows how to manage ever-growing sets of key/value pairs.
protected boolean	getCheckpointCopyBdbjeLogs()
public synchronized Checkpoint	getCheckpointRecover() Get recover checkpoint. Returns null if we're NOT in recover mode. Looks at ATTR_RECOVER_PATH and if its a directory, assumes checkpoint recover.
public static Checkpoint	getCheckpointRecover(CrawlOrder order)
public File	getCheckpointsDisk()
public StoredClassCatalog	getClassCatalog()
public File	getDisk() Get the 'working' directory of the current crawl.
public ProcessorChain	getFirstProcessorChain() Get the first processor chain.
public Frontier	getFrontier()
public File	getLogsDir()
public CrawlOrder	getOrder()
public ProcessorChain	getPostprocessorChain() Get the postprocessor chain.
public ProcessorChainList	getProcessorChainList() Get the list of processor chains.
public String[]	getReports()
public CrawlScope	getScope()
public File	getScratchDisk()
public ServerCache	getServerCache()
public File	getSettingsDir(String key) Return fullpath to the directory named by `key` in settings. If directory does not exist, it and all intermediary dirs will be created. Parameters: key - Key to use going to settings.
public SettingsHandler	getSettingsHandler()
public Object	getState()
public File	getStateDisk()
public StatisticsTracking	getStatistics()
public int	getToeCount()
public ToePool	getToePool()
public void	initialize(SettingsHandler sH) Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl.
public static boolean	isCheckpointRecover(CrawlOrder order)
public boolean	isCheckpointRecover() True if we're in checkpoint recover mode.
public boolean	isCheckpointing()
public boolean	isPaused()
public boolean	isPausing()
public boolean	isRunning()
public void	kickUpdate() While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings.
public void	killThread(int threadNumber, boolean replace) Kills a thread.
public void	logProgressStatistics(String msg) Log to the progress statistics log.
public void	logUriError(URIException e, UURI u, CharSequence l) Log a URIException from deep inside other components to the crawl's shared log.
public void	multiThreadMode()
public String	oneLineReportThreads()
protected void	processBdbLogs(File checkpointDir, String lastBdbCheckpointLog)
public void	progressStatisticsEvent(EventObject e) Called whenever progress statistics logging event.
public void	releaseContinuePermission() Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).
protected void	reportManifestTo(PrintWriter writer)
protected void	reportProcessorsTo(PrintWriter writer) Compiles and returns a human readable report on the active processors.
public void	reportTo(PrintWriter writer)
public void	reportTo(String name, PrintWriter writer)
public synchronized void	requestCrawlCheckpoint() Request a checkpoint.
public synchronized void	requestCrawlPause() Stop the crawl temporarly.
public synchronized void	requestCrawlResume()
public void	requestCrawlStart()
public synchronized void	requestCrawlStop() Operator requested for crawl to stop.
public synchronized void	requestCrawlStop(String message) Operator requested for crawl to stop.
protected void	restoreStatisticsTracker(MapType loggers, String replaceName)
protected void	rotateLogFiles(String generationSuffix)
protected void	runFrontierRecover(String recoverPath)
protected void	sendCheckpointEvent(File checkpointDir) Send the checkpoint event.
protected void	sendCrawlStateChangeEvent(Object newState, String message) Send crawl change event to all listeners.
protected void	setBdbjeBkgrdThreads(EnvironmentConfig config, List threads, String setting)
public void	setOrder(CrawlOrder o)
protected void	setupCheckpointRecover() Does setup of checkpoint recover.
public String	singleLineLegend()
public String	singleLineReport()
public void	singleLineReportTo(PrintWriter writer)
public void	singleThreadMode() Go to single thread mode, where only one ToeThread may proceed at a time.
public synchronized void	toeEnded() Note that a ToeThread ended, possibly completing the crawl-stop.
public synchronized void	toePaused() Note that a ToeThread reached paused condition, possibly completing the crawl-pause.

Field Detail

CURRENT_LOG_SUFFIX
final public static String CURRENT_LOG_SUFFIX(Code)
	suffix to use on active logs

MANIFEST_CONFIG_FILE
final public static char MANIFEST_CONFIG_FILE(Code)
	abbrieviation label for config files in manifest

MANIFEST_LOG_FILE
final public static char MANIFEST_LOG_FILE(Code)
	abbrieviation label for log files in manifest

MANIFEST_REPORT
final public static String MANIFEST_REPORT(Code)

MANIFEST_REPORT_FILE
final public static char MANIFEST_REPORT_FILE(Code)
	abbrieviation label for report files in manifest

PROCESSORS_REPORT
final public static String PROCESSORS_REPORT(Code)

REPORTS
final protected static String[] REPORTS(Code)

localErrors
public transient Logger localErrors(Code)
	This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor. Examples would be socket timeouts, exceptions thrown by extractors, etc.

registeredCrawlURIDispositionListeners
protected transient ArrayList<CrawlURIDispositionListener> registeredCrawlURIDispositionListeners(Code)

reports
public transient Logger reports(Code)
	Logger to hold job summary report. Large state reports made at infrequent intervals (e.g. job ending) go here.

runtimeErrors
public transient Logger runtimeErrors(Code)
	This logger contains unexpected runtime errors. Would contain errors trying to set up a job or failures inside processors that they are not prepared to recover from.

statistics
protected StatisticsTracking statistics(Code)

uriErrors
public transient Logger uriErrors(Code)
	Special log for URI format problems, wherever they may occur.

uriProcessing
public transient Logger uriProcessing(Code)
	Crawl progress logger. No exceptions. Logs summary result of each url processing.

Constructor Detail

CrawlController
public CrawlController()(Code)
	Default constructor

Method Detail

acquireContinuePermission
public void acquireContinuePermission()(Code)
	Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.

addCrawlStatusListener
public void addCrawlStatusListener(CrawlStatusListener cl)(Code)
	Register for CrawlStatus events. Parameters: cl - a class implementing the CrawlStatusListener interface See Also: CrawlStatusListener

addCrawlURIDispositionListener
public void addCrawlURIDispositionListener(CrawlURIDispositionListener cl)(Code)
	Register for CrawlURIDisposition events. Parameters: cl - a class implementing the CrawlURIDispostionListener interface See Also: CrawlURIDispositionListener

addOrderToManifest
public void addOrderToManifest()(Code)
	Add order file contents to manifest. Write configuration files and any files managed by CrawlController to it - files managed by other classes, excluding the settings framework, are responsible for adding their files to the manifest themselves. by calling addToManifest. Call before writing out reports.

addToManifest
public void addToManifest(String file, char type, boolean bundle)(Code)
	Add a file to the manifest of files used/generated by the current crawl. TODO: Its possible for a file to be added twice if reports are force generated midcrawl. Fix. Parameters: file - The filename (with absolute path) of the file to add Parameters: type - The type of the file Parameters: bundle - Should the file be included in a typical bundling ofcrawler files. See Also: CrawlController.MANIFEST_CONFIG_FILE See Also: CrawlController.MANIFEST_LOG_FILE See Also: CrawlController.MANIFEST_REPORT_FILE

atFinish
public boolean atFinish()(Code)
	Evaluate if the crawl should stop because it is finished, without actually stopping the crawl. true if crawl is at a finish-possible state

beginCrawlStop
public void beginCrawlStop()(Code)
	Start the process of stopping the crawl.

checkFinish
public void checkFinish()(Code)
	Evaluate if the crawl should stop because it is finished.

checkpoint
void checkpoint() throws Exception(Code)
	Run checkpointing. CrawlController takes care of managing the checkpointing/serializing of bdb, the StatisticsTracker, and the CheckpointContext. Other modules that want to revive themselves on checkpoint recovery need to save state during their CrawlStatusListener.crawlCheckpoint(File) invocation and then in their #initialize if a module, or in their #initialTask if a processor, check with the CrawlController if its checkpoint recovery. If it is, read in their old state from the pointed to checkpoint directory. Default access only to be called by Checkpointer. throws: Exception -

checkpointBdb

protected void checkpointBdb(File checkpointDir) throws DatabaseException, IOException, RuntimeException(Code)

Checkpoint bdb. I used do a call to log cleaning as suggested in je-2.0 javadoc but takes way too much time (20minutes for a crawl of 1million items). Assume cleaner is keeping up. Below was log cleaning loop .

int totalCleaned = 0;
 for (int cleaned = 0; (cleaned = this.bdbEnvironment.cleanLog()) != 0;
 totalCleaned += cleaned) {
 LOGGER.fine("Cleaned " + cleaned + " log files.");
 }

I also used to do a sync. But, from Mark Hayes, sync and checkpoint are effectively same thing only sync is not configurable. He suggests doing one or the other:

MS: Reading code, Environment.sync() is a checkpoint. Looks like I don't need to call a checkpoint after calling a sync?

MH: Right, they're almost the same thing -- just do one or the other, not both. With the new API, you'll need to do a checkpoint not a sync, because the sync() method has no config parameter. Don't worry -- it's fine to do a checkpoint even though you're not using.
Parameters:
  checkpointDir - Directory to write checkpoint to.
throws:
  DatabaseException -
throws:
  IOException -
throws:
  RuntimeException - Thrown if failed setup of new bdb environment.

checkpointBigMaps
protected void checkpointBigMaps(File cpDir) throws Exception(Code)

closeLogFiles
public void closeLogFiles()(Code)
	Close all log files and remove handlers from loggers.

completePause
synchronized void completePause()(Code)

completeStop
protected void completeStop()(Code)
	Called when the last toethread exits.

convertToFatalConfigurationException
protected FatalConfigurationException convertToFatalConfigurationException(Exception e)(Code)

copySettings
protected void copySettings(File checkpointDir) throws IOException(Code)
	Copy off the settings. Parameters: checkpointDir - Directory to write checkpoint to. throws: IOException -

fireCrawledURIDisregardEvent
public void fireCrawledURIDisregardEvent(CrawlURI curi)(Code)
	Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController. Parameters: curi - -The CrawlURI that will be sent with the event notification. See Also: CrawlURIDispositionListener.crawledURIDisregard(CrawlURI)

fireCrawledURIFailureEvent
public void fireCrawledURIFailureEvent(CrawlURI curi)(Code)
	Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController. Parameters: curi - - The CrawlURI that will be sent with the event notification. See Also: CrawlURIDispositionListener.crawledURIFailure(CrawlURI)

fireCrawledURINeedRetryEvent
public void fireCrawledURINeedRetryEvent(CrawlURI curi)(Code)
	Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController. Parameters: curi - - The CrawlURI that will be sent with the event notification. See Also: CrawlURIDispositionListener.crawledURINeedRetry(CrawlURI)

fireCrawledURISuccessfulEvent
public void fireCrawledURISuccessfulEvent(CrawlURI curi)(Code)
	Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController. Parameters: curi - - The CrawlURI that will be sent with the event notification. See Also: CrawlURIDispositionListener.crawledURISuccessful(CrawlURI)

freeReserveMemory
public void freeReserveMemory()(Code)

getActiveToeCount
public int getActiveToeCount()(Code)
	Active toe thread count.

getBdbEnvironment
public EnhancedEnvironment getBdbEnvironment()(Code)
	the shared EnhancedEnvironment

getBdbLogFileName
protected String getBdbLogFileName(long index)(Code)

getBigMap
public Map<K, V> getBigMap(String dbName, Class<? super K> keyClass, Class<? super V> valueClass) throws Exception(Code)
	Call this method to get instance of the crawler BigMap implementation. A "BigMap" is a Map that knows how to manage ever-growing sets of key/value pairs. If we're in a checkpoint recovery, this method will manage reinstantiation of checkpointed bigmaps. Parameters: dbName - Name to give any associated database. Also usedas part of name serializing out bigmap. Needs to be unique to a crawl. Parameters: keyClass - Class of keys we'll be using. Parameters: valueClass - Class of values we'll be using. Map that knows how to carry large sets of key/value pairs orif none available, returns instance of HashMap. throws: Exception -

getCheckpointCopyBdbjeLogs
protected boolean getCheckpointCopyBdbjeLogs()(Code)

getCheckpointRecover
public synchronized Checkpoint getCheckpointRecover()(Code)
	Get recover checkpoint. Returns null if we're NOT in recover mode. Looks at ATTR_RECOVER_PATH and if its a directory, assumes checkpoint recover. If checkpoint mode, returns Checkpoint instance if checkpoint was VALID (else null). Checkpoint instance if we're in recover checkpointmode and the pointed-to checkpoint was valid. See Also: CrawlController.isCheckpointRecover()

getCheckpointRecover
public static Checkpoint getCheckpointRecover(CrawlOrder order)(Code)

getCheckpointsDisk
public File getCheckpointsDisk()(Code)

getClassCatalog
public StoredClassCatalog getClassCatalog()(Code)

getDisk
public File getDisk()(Code)
	Get the 'working' directory of the current crawl. the 'working' directory of the current crawl.

getFirstProcessorChain
public ProcessorChain getFirstProcessorChain()(Code)
	Get the first processor chain. the first processor chain.

getFrontier
public Frontier getFrontier()(Code)
	The frontier.

getLogsDir
public File getLogsDir()(Code)
	The logging directory or null if problem reading the settings.

getOrder
public CrawlOrder getOrder()(Code)
	The order file instance.

getPostprocessorChain
public ProcessorChain getPostprocessorChain()(Code)
	Get the postprocessor chain. the postprocessor chain.

getProcessorChainList
public ProcessorChainList getProcessorChainList()(Code)
	Get the list of processor chains. the list of processor chains.

getReports
public String[] getReports()(Code)

getScope
public CrawlScope getScope()(Code)
	This crawl scope.

getScratchDisk
public File getScratchDisk()(Code)
	Scratch disk location.

getServerCache
public ServerCache getServerCache()(Code)
	The server cache instance.

getSettingsDir
public File getSettingsDir(String key) throws AttributeNotFoundException(Code)
	Return fullpath to the directory named by `key` in settings. If directory does not exist, it and all intermediary dirs will be created. Parameters: key - Key to use going to settings. Full path to directory named by `key`. throws: AttributeNotFoundException -

getSettingsHandler
public SettingsHandler getSettingsHandler()(Code)
	The settings handler.

getState
public Object getState()(Code)
	CrawlController state.

getStateDisk
public File getStateDisk()(Code)
	State disk location.

getStatistics
public StatisticsTracking getStatistics()(Code)
	Object this controller is using to track crawl statistics

getToeCount
public int getToeCount()(Code)
	The number of ToeThreads See Also: ToePool.getToeCount

getToePool
public ToePool getToePool()(Code)
	The ToePool

initialize
public void initialize(SettingsHandler sH) throws InitializationException(Code)
	Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl. Parameters: sH - Settings handler. throws: InitializationException -

isCheckpointRecover
public static boolean isCheckpointRecover(CrawlOrder order)(Code)

isCheckpointRecover
public boolean isCheckpointRecover()(Code)
	True if we're in checkpoint recover mode. CallCrawlController.getCheckpointRecover() to get at Checkpoint instancethat has info on checkpoint directory being recovered from.

isCheckpointing
public boolean isCheckpointing()(Code)
	True if checkpointing.

isPaused
public boolean isPaused()(Code)
	Tell if the controller is paused true if paused

isPausing
public boolean isPausing()(Code)

isRunning
public boolean isRunning()(Code)

kickUpdate
public void kickUpdate()(Code)
	While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings. This includes, number of toe threads and seeds.

killThread
public void killThread(int threadNumber, boolean replace)(Code)
	Kills a thread. For details see org.archive.crawler.framework.ToePool.killThread(intboolean)ToePool.killThread(int, boolean) . Parameters: threadNumber - Thread to kill. Parameters: replace - Should thread be replaced. See Also: org.archive.crawler.framework.ToePool.killThread(intboolean)

logProgressStatistics
public void logProgressStatistics(String msg)(Code)
	Log to the progress statistics log. Parameters: msg - Message to write the progress statistics log.

logUriError
public void logUriError(URIException e, UURI u, CharSequence l)(Code)
	Log a URIException from deep inside other components to the crawl's shared log. Parameters: e - URIException encountered Parameters: u - CrawlURI where problem occurred Parameters: l - String which could not be interpreted as URI without exception

multiThreadMode
public void multiThreadMode()(Code)
	Go to back to regular multi thread mode, where all ToeThreads may proceed at once

oneLineReportThreads
public String oneLineReportThreads()(Code)
	toepool one-line report

processBdbLogs
protected void processBdbLogs(File checkpointDir, String lastBdbCheckpointLog) throws IOException(Code)

progressStatisticsEvent
public void progressStatisticsEvent(EventObject e)(Code)
	Called whenever progress statistics logging event. Parameters: e - Progress statistics event.

releaseContinuePermission
public void releaseContinuePermission()(Code)
	Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).

reportManifestTo
protected void reportManifestTo(PrintWriter writer)(Code)
	Parameters: writer - Where to write report to.

reportProcessorsTo
protected void reportProcessorsTo(PrintWriter writer)(Code)
	Compiles and returns a human readable report on the active processors. Parameters: writer - Where to write to. See Also: org.archive.crawler.framework.Processor.report

reportTo
public void reportTo(PrintWriter writer)(Code)

reportTo
public void reportTo(String name, PrintWriter writer)(Code)

requestCrawlCheckpoint
public synchronized void requestCrawlCheckpoint() throws IllegalStateException(Code)
	Request a checkpoint. Sets a checkpointing thread running. throws: IllegalStateException - Thrown if crawl is not in paused state(Crawl must be first paused before checkpointing).

requestCrawlPause
public synchronized void requestCrawlPause()(Code)
	Stop the crawl temporarly.

requestCrawlResume
public synchronized void requestCrawlResume()(Code)
	Resume crawl from paused state

requestCrawlStart
public void requestCrawlStart()(Code)
	Operator requested crawl begin

requestCrawlStop
public synchronized void requestCrawlStop()(Code)
	Operator requested for crawl to stop.

requestCrawlStop
public synchronized void requestCrawlStop(String message)(Code)
	Operator requested for crawl to stop. Parameters: message -

restoreStatisticsTracker
protected void restoreStatisticsTracker(MapType loggers, String replaceName) throws FatalConfigurationException(Code)

rotateLogFiles
protected void rotateLogFiles(String generationSuffix) throws IOException(Code)

runFrontierRecover
protected void runFrontierRecover(String recoverPath) throws AttributeNotFoundException, MBeanException, ReflectionException, FatalConfigurationException(Code)

sendCheckpointEvent
protected void sendCheckpointEvent(File checkpointDir) throws Exception(Code)
	Send the checkpoint event. Has its own method apart from CrawlController.sendCrawlStateChangeEvent(Object,String) because checkpointing throws an Exception (Didn't want to have to wrap all of the sendCrawlStateChangeEvent in try/catches). Parameters: checkpointDir - Where to write checkpoint state to. throws: Exception -

sendCrawlStateChangeEvent
protected void sendCrawlStateChangeEvent(Object newState, String message)(Code)
	Send crawl change event to all listeners. Parameters: newState - State change we're to tell listeners' about. Parameters: message - Message on state change. See Also: CrawlController.sendCheckpointEvent(File) See Also: for special case event sending See Also: telling listeners to checkpoint.

setBdbjeBkgrdThreads
protected void setBdbjeBkgrdThreads(EnvironmentConfig config, List threads, String setting)(Code)

setOrder
public void setOrder(CrawlOrder o)(Code)
	Parameters: o -

setupCheckpointRecover
protected void setupCheckpointRecover() throws IOException(Code)
	Does setup of checkpoint recover. Copies bdb log files into state dir. throws: IOException -

singleLineLegend
public String singleLineLegend()(Code)

singleLineReport
public String singleLineReport()(Code)

singleLineReportTo
public void singleLineReportTo(PrintWriter writer)(Code)

singleThreadMode
public void singleThreadMode()(Code)
	Go to single thread mode, where only one ToeThread may proceed at a time. Also acquires the single lock, so no further threads will proceed past an acquireContinuePermission. Caller mush be sure to release lock to allow other threads to proceed one at a time.

toeEnded
public synchronized void toeEnded()(Code)
	Note that a ToeThread ended, possibly completing the crawl-stop.

toePaused
public synchronized void toePaused()(Code)
	Note that a ToeThread reached paused condition, possibly completing the crawl-pause.

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.