Java Doc for CrawlController.java in  » Web-Crawler » heritrix » org » archive » crawler » framework » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.framework 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.crawler.framework.CrawlController

CrawlController
public class CrawlController implements Serializable,Reporter(Code)
CrawlController collects all the classes which cooperate to perform a crawl and provides a high-level interface to the running crawl. As the "global context" for a crawl, subcomponents will often reach each other through the CrawlController.
author:
   Gordon Mohr


Field Summary
final public static  StringCURRENT_LOG_SUFFIX
    
final public static  charMANIFEST_CONFIG_FILE
    
final public static  charMANIFEST_LOG_FILE
    
final public static  StringMANIFEST_REPORT
    
final public static  charMANIFEST_REPORT_FILE
    
final public static  StringPROCESSORS_REPORT
    
final protected static  String[]REPORTS
    
public transient  LoggerlocalErrors
     This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor.
protected transient  ArrayList<CrawlURIDispositionListener>registeredCrawlURIDispositionListeners
    
public transient  Loggerreports
     Logger to hold job summary report. Large state reports made at infrequent intervals (e.g.
public transient  LoggerruntimeErrors
     This logger contains unexpected runtime errors.
protected  StatisticsTrackingstatistics
    
public transient  LoggeruriErrors
     Special log for URI format problems, wherever they may occur.
public transient  LoggeruriProcessing
     Crawl progress logger. No exceptions.

Constructor Summary
public  CrawlController()
    

Method Summary
public  voidacquireContinuePermission()
     Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.
public  voidaddCrawlStatusListener(CrawlStatusListener cl)
     Register for CrawlStatus events.
public  voidaddCrawlURIDispositionListener(CrawlURIDispositionListener cl)
     Register for CrawlURIDisposition events.
public  voidaddOrderToManifest()
     Add order file contents to manifest.
public  voidaddToManifest(String file, char type, boolean bundle)
     Add a file to the manifest of files used/generated by the current crawl. TODO: Its possible for a file to be added twice if reports are force generated midcrawl.
public  booleanatFinish()
     Evaluate if the crawl should stop because it is finished, without actually stopping the crawl.
public  voidbeginCrawlStop()
     Start the process of stopping the crawl.
public  voidcheckFinish()
     Evaluate if the crawl should stop because it is finished.
 voidcheckpoint()
     Run checkpointing. CrawlController takes care of managing the checkpointing/serializing of bdb, the StatisticsTracker, and the CheckpointContext.
protected  voidcheckpointBdb(File checkpointDir)
     Checkpoint bdb. I used do a call to log cleaning as suggested in je-2.0 javadoc but takes way too much time (20minutes for a crawl of 1million items).
protected  voidcheckpointBigMaps(File cpDir)
    
public  voidcloseLogFiles()
     Close all log files and remove handlers from loggers.
synchronized  voidcompletePause()
    
protected  voidcompleteStop()
     Called when the last toethread exits.
protected  FatalConfigurationExceptionconvertToFatalConfigurationException(Exception e)
    
protected  voidcopySettings(File checkpointDir)
     Copy off the settings.
public  voidfireCrawledURIDisregardEvent(CrawlURI curi)
     Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController.
public  voidfireCrawledURIFailureEvent(CrawlURI curi)
     Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController.
public  voidfireCrawledURINeedRetryEvent(CrawlURI curi)
     Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController.
public  voidfireCrawledURISuccessfulEvent(CrawlURI curi)
     Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController.
public  voidfreeReserveMemory()
    
public  intgetActiveToeCount()
    
public  EnhancedEnvironmentgetBdbEnvironment()
    
protected  StringgetBdbLogFileName(long index)
    
public  Map<K, V>getBigMap(String dbName, Class<? super K> keyClass, Class<? super V> valueClass)
     Call this method to get instance of the crawler BigMap implementation. A "BigMap" is a Map that knows how to manage ever-growing sets of key/value pairs.
protected  booleangetCheckpointCopyBdbjeLogs()
    
public synchronized  CheckpointgetCheckpointRecover()
     Get recover checkpoint. Returns null if we're NOT in recover mode. Looks at ATTR_RECOVER_PATH and if its a directory, assumes checkpoint recover.
public static  CheckpointgetCheckpointRecover(CrawlOrder order)
    
public  FilegetCheckpointsDisk()
    
public  StoredClassCataloggetClassCatalog()
    
public  FilegetDisk()
     Get the 'working' directory of the current crawl.
public  ProcessorChaingetFirstProcessorChain()
     Get the first processor chain.
public  FrontiergetFrontier()
    
public  FilegetLogsDir()
    
public  CrawlOrdergetOrder()
    
public  ProcessorChaingetPostprocessorChain()
     Get the postprocessor chain.
public  ProcessorChainListgetProcessorChainList()
     Get the list of processor chains.
public  String[]getReports()
    
public  CrawlScopegetScope()
    
public  FilegetScratchDisk()
    
public  ServerCachegetServerCache()
    
public  FilegetSettingsDir(String key)
     Return fullpath to the directory named by key in settings. If directory does not exist, it and all intermediary dirs will be created.
Parameters:
  key - Key to use going to settings.
public  SettingsHandlergetSettingsHandler()
    
public  ObjectgetState()
    
public  FilegetStateDisk()
    
public  StatisticsTrackinggetStatistics()
    
public  intgetToeCount()
    
public  ToePoolgetToePool()
    
public  voidinitialize(SettingsHandler sH)
     Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl.
public static  booleanisCheckpointRecover(CrawlOrder order)
    
public  booleanisCheckpointRecover()
     True if we're in checkpoint recover mode.
public  booleanisCheckpointing()
    
public  booleanisPaused()
    
public  booleanisPausing()
    
public  booleanisRunning()
    
public  voidkickUpdate()
     While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings.
public  voidkillThread(int threadNumber, boolean replace)
     Kills a thread.
public  voidlogProgressStatistics(String msg)
     Log to the progress statistics log.
public  voidlogUriError(URIException e, UURI u, CharSequence l)
     Log a URIException from deep inside other components to the crawl's shared log.
public  voidmultiThreadMode()
    
public  StringoneLineReportThreads()
    
protected  voidprocessBdbLogs(File checkpointDir, String lastBdbCheckpointLog)
    
public  voidprogressStatisticsEvent(EventObject e)
     Called whenever progress statistics logging event.
public  voidreleaseContinuePermission()
     Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).
protected  voidreportManifestTo(PrintWriter writer)
    
protected  voidreportProcessorsTo(PrintWriter writer)
     Compiles and returns a human readable report on the active processors.
public  voidreportTo(PrintWriter writer)
    
public  voidreportTo(String name, PrintWriter writer)
    
public synchronized  voidrequestCrawlCheckpoint()
     Request a checkpoint.
public synchronized  voidrequestCrawlPause()
     Stop the crawl temporarly.
public synchronized  voidrequestCrawlResume()
    
public  voidrequestCrawlStart()
    
public synchronized  voidrequestCrawlStop()
     Operator requested for crawl to stop.
public synchronized  voidrequestCrawlStop(String message)
     Operator requested for crawl to stop.
protected  voidrestoreStatisticsTracker(MapType loggers, String replaceName)
    
protected  voidrotateLogFiles(String generationSuffix)
    
protected  voidrunFrontierRecover(String recoverPath)
    
protected  voidsendCheckpointEvent(File checkpointDir)
     Send the checkpoint event.
protected  voidsendCrawlStateChangeEvent(Object newState, String message)
     Send crawl change event to all listeners.
protected  voidsetBdbjeBkgrdThreads(EnvironmentConfig config, List threads, String setting)
    
public  voidsetOrder(CrawlOrder o)
    
protected  voidsetupCheckpointRecover()
     Does setup of checkpoint recover.
public  StringsingleLineLegend()
    
public  StringsingleLineReport()
    
public  voidsingleLineReportTo(PrintWriter writer)
    
public  voidsingleThreadMode()
     Go to single thread mode, where only one ToeThread may proceed at a time.
public synchronized  voidtoeEnded()
     Note that a ToeThread ended, possibly completing the crawl-stop.
public synchronized  voidtoePaused()
     Note that a ToeThread reached paused condition, possibly completing the crawl-pause.

Field Detail
CURRENT_LOG_SUFFIX
final public static String CURRENT_LOG_SUFFIX(Code)
suffix to use on active logs



MANIFEST_CONFIG_FILE
final public static char MANIFEST_CONFIG_FILE(Code)
abbrieviation label for config files in manifest



MANIFEST_LOG_FILE
final public static char MANIFEST_LOG_FILE(Code)
abbrieviation label for log files in manifest



MANIFEST_REPORT
final public static String MANIFEST_REPORT(Code)



MANIFEST_REPORT_FILE
final public static char MANIFEST_REPORT_FILE(Code)
abbrieviation label for report files in manifest



PROCESSORS_REPORT
final public static String PROCESSORS_REPORT(Code)



REPORTS
final protected static String[] REPORTS(Code)



localErrors
public transient Logger localErrors(Code)
This logger is for job-scoped logging, specifically errors which happen and are handled within a particular processor. Examples would be socket timeouts, exceptions thrown by extractors, etc.



registeredCrawlURIDispositionListeners
protected transient ArrayList<CrawlURIDispositionListener> registeredCrawlURIDispositionListeners(Code)



reports
public transient Logger reports(Code)
Logger to hold job summary report. Large state reports made at infrequent intervals (e.g. job ending) go here.



runtimeErrors
public transient Logger runtimeErrors(Code)
This logger contains unexpected runtime errors. Would contain errors trying to set up a job or failures inside processors that they are not prepared to recover from.



statistics
protected StatisticsTracking statistics(Code)



uriErrors
public transient Logger uriErrors(Code)
Special log for URI format problems, wherever they may occur.



uriProcessing
public transient Logger uriProcessing(Code)
Crawl progress logger. No exceptions. Logs summary result of each url processing.




Constructor Detail
CrawlController
public CrawlController()(Code)
Default constructor




Method Detail
acquireContinuePermission
public void acquireContinuePermission()(Code)
Proceed only if allowed, giving CrawlController a chance to enforce single-thread mode.



addCrawlStatusListener
public void addCrawlStatusListener(CrawlStatusListener cl)(Code)
Register for CrawlStatus events.
Parameters:
  cl - a class implementing the CrawlStatusListener interface
See Also:   CrawlStatusListener



addCrawlURIDispositionListener
public void addCrawlURIDispositionListener(CrawlURIDispositionListener cl)(Code)
Register for CrawlURIDisposition events.
Parameters:
  cl - a class implementing the CrawlURIDispostionListener interface
See Also:   CrawlURIDispositionListener



addOrderToManifest
public void addOrderToManifest()(Code)
Add order file contents to manifest. Write configuration files and any files managed by CrawlController to it - files managed by other classes, excluding the settings framework, are responsible for adding their files to the manifest themselves. by calling addToManifest. Call before writing out reports.



addToManifest
public void addToManifest(String file, char type, boolean bundle)(Code)
Add a file to the manifest of files used/generated by the current crawl. TODO: Its possible for a file to be added twice if reports are force generated midcrawl. Fix.
Parameters:
  file - The filename (with absolute path) of the file to add
Parameters:
  type - The type of the file
Parameters:
  bundle - Should the file be included in a typical bundling ofcrawler files.
See Also:   CrawlController.MANIFEST_CONFIG_FILE
See Also:   CrawlController.MANIFEST_LOG_FILE
See Also:   CrawlController.MANIFEST_REPORT_FILE



atFinish
public boolean atFinish()(Code)
Evaluate if the crawl should stop because it is finished, without actually stopping the crawl. true if crawl is at a finish-possible state



beginCrawlStop
public void beginCrawlStop()(Code)
Start the process of stopping the crawl.



checkFinish
public void checkFinish()(Code)
Evaluate if the crawl should stop because it is finished.



checkpoint
void checkpoint() throws Exception(Code)
Run checkpointing. CrawlController takes care of managing the checkpointing/serializing of bdb, the StatisticsTracker, and the CheckpointContext. Other modules that want to revive themselves on checkpoint recovery need to save state during their CrawlStatusListener.crawlCheckpoint(File) invocation and then in their #initialize if a module, or in their #initialTask if a processor, check with the CrawlController if its checkpoint recovery. If it is, read in their old state from the pointed to checkpoint directory.

Default access only to be called by Checkpointer.
throws:
  Exception -




checkpointBdb
protected void checkpointBdb(File checkpointDir) throws DatabaseException, IOException, RuntimeException(Code)
Checkpoint bdb. I used do a call to log cleaning as suggested in je-2.0 javadoc but takes way too much time (20minutes for a crawl of 1million items). Assume cleaner is keeping up. Below was log cleaning loop .
int totalCleaned = 0;
 for (int cleaned = 0; (cleaned = this.bdbEnvironment.cleanLog()) != 0;
 totalCleaned += cleaned) {
 LOGGER.fine("Cleaned " + cleaned + " log files.");
 }
 

I also used to do a sync. But, from Mark Hayes, sync and checkpoint are effectively same thing only sync is not configurable. He suggests doing one or the other:

MS: Reading code, Environment.sync() is a checkpoint. Looks like I don't need to call a checkpoint after calling a sync?

MH: Right, they're almost the same thing -- just do one or the other, not both. With the new API, you'll need to do a checkpoint not a sync, because the sync() method has no config parameter. Don't worry -- it's fine to do a checkpoint even though you're not using.
Parameters:
  checkpointDir - Directory to write checkpoint to.
throws:
  DatabaseException -
throws:
  IOException -
throws:
  RuntimeException - Thrown if failed setup of new bdb environment.




checkpointBigMaps
protected void checkpointBigMaps(File cpDir) throws Exception(Code)



closeLogFiles
public void closeLogFiles()(Code)
Close all log files and remove handlers from loggers.



completePause
synchronized void completePause()(Code)



completeStop
protected void completeStop()(Code)
Called when the last toethread exits.



convertToFatalConfigurationException
protected FatalConfigurationException convertToFatalConfigurationException(Exception e)(Code)



copySettings
protected void copySettings(File checkpointDir) throws IOException(Code)
Copy off the settings.
Parameters:
  checkpointDir - Directory to write checkpoint to.
throws:
  IOException -



fireCrawledURIDisregardEvent
public void fireCrawledURIDisregardEvent(CrawlURI curi)(Code)
Allows an external class to raise a CrawlURIDispostion crawledURIDisregard event that will be broadcast to all listeners that have registered with the CrawlController.
Parameters:
  curi - -The CrawlURI that will be sent with the event notification.
See Also:   CrawlURIDispositionListener.crawledURIDisregard(CrawlURI)



fireCrawledURIFailureEvent
public void fireCrawledURIFailureEvent(CrawlURI curi)(Code)
Allows an external class to raise a CrawlURIDispostion crawledURIFailure event that will be broadcast to all listeners that have registered with the CrawlController.
Parameters:
  curi - - The CrawlURI that will be sent with the event notification.
See Also:   CrawlURIDispositionListener.crawledURIFailure(CrawlURI)



fireCrawledURINeedRetryEvent
public void fireCrawledURINeedRetryEvent(CrawlURI curi)(Code)
Allows an external class to raise a CrawlURIDispostion crawledURINeedRetry event that will be broadcast to all listeners that have registered with the CrawlController.
Parameters:
  curi - - The CrawlURI that will be sent with the event notification.
See Also:   CrawlURIDispositionListener.crawledURINeedRetry(CrawlURI)



fireCrawledURISuccessfulEvent
public void fireCrawledURISuccessfulEvent(CrawlURI curi)(Code)
Allows an external class to raise a CrawlURIDispostion crawledURISuccessful event that will be broadcast to all listeners that have registered with the CrawlController.
Parameters:
  curi - - The CrawlURI that will be sent with the event notification.
See Also:   CrawlURIDispositionListener.crawledURISuccessful(CrawlURI)



freeReserveMemory
public void freeReserveMemory()(Code)



getActiveToeCount
public int getActiveToeCount()(Code)
Active toe thread count.



getBdbEnvironment
public EnhancedEnvironment getBdbEnvironment()(Code)
the shared EnhancedEnvironment



getBdbLogFileName
protected String getBdbLogFileName(long index)(Code)



getBigMap
public Map<K, V> getBigMap(String dbName, Class<? super K> keyClass, Class<? super V> valueClass) throws Exception(Code)
Call this method to get instance of the crawler BigMap implementation. A "BigMap" is a Map that knows how to manage ever-growing sets of key/value pairs. If we're in a checkpoint recovery, this method will manage reinstantiation of checkpointed bigmaps.
Parameters:
  dbName - Name to give any associated database. Also usedas part of name serializing out bigmap. Needs to be unique to a crawl.
Parameters:
  keyClass - Class of keys we'll be using.
Parameters:
  valueClass - Class of values we'll be using. Map that knows how to carry large sets of key/value pairs orif none available, returns instance of HashMap.
throws:
  Exception -



getCheckpointCopyBdbjeLogs
protected boolean getCheckpointCopyBdbjeLogs()(Code)



getCheckpointRecover
public synchronized Checkpoint getCheckpointRecover()(Code)
Get recover checkpoint. Returns null if we're NOT in recover mode. Looks at ATTR_RECOVER_PATH and if its a directory, assumes checkpoint recover. If checkpoint mode, returns Checkpoint instance if checkpoint was VALID (else null). Checkpoint instance if we're in recover checkpointmode and the pointed-to checkpoint was valid.
See Also:   CrawlController.isCheckpointRecover()



getCheckpointRecover
public static Checkpoint getCheckpointRecover(CrawlOrder order)(Code)



getCheckpointsDisk
public File getCheckpointsDisk()(Code)



getClassCatalog
public StoredClassCatalog getClassCatalog()(Code)



getDisk
public File getDisk()(Code)
Get the 'working' directory of the current crawl. the 'working' directory of the current crawl.



getFirstProcessorChain
public ProcessorChain getFirstProcessorChain()(Code)
Get the first processor chain. the first processor chain.



getFrontier
public Frontier getFrontier()(Code)
The frontier.



getLogsDir
public File getLogsDir()(Code)
The logging directory or null if problem reading the settings.



getOrder
public CrawlOrder getOrder()(Code)
The order file instance.



getPostprocessorChain
public ProcessorChain getPostprocessorChain()(Code)
Get the postprocessor chain. the postprocessor chain.



getProcessorChainList
public ProcessorChainList getProcessorChainList()(Code)
Get the list of processor chains. the list of processor chains.



getReports
public String[] getReports()(Code)



getScope
public CrawlScope getScope()(Code)
This crawl scope.



getScratchDisk
public File getScratchDisk()(Code)
Scratch disk location.



getServerCache
public ServerCache getServerCache()(Code)
The server cache instance.



getSettingsDir
public File getSettingsDir(String key) throws AttributeNotFoundException(Code)
Return fullpath to the directory named by key in settings. If directory does not exist, it and all intermediary dirs will be created.
Parameters:
  key - Key to use going to settings. Full path to directory named by key.
throws:
  AttributeNotFoundException -



getSettingsHandler
public SettingsHandler getSettingsHandler()(Code)
The settings handler.



getState
public Object getState()(Code)
CrawlController state.



getStateDisk
public File getStateDisk()(Code)
State disk location.



getStatistics
public StatisticsTracking getStatistics()(Code)
Object this controller is using to track crawl statistics



getToeCount
public int getToeCount()(Code)
The number of ToeThreads
See Also:   ToePool.getToeCount



getToePool
public ToePool getToePool()(Code)
The ToePool



initialize
public void initialize(SettingsHandler sH) throws InitializationException(Code)
Starting from nothing, set up CrawlController and associated classes to be ready for a first crawl.
Parameters:
  sH - Settings handler.
throws:
  InitializationException -



isCheckpointRecover
public static boolean isCheckpointRecover(CrawlOrder order)(Code)



isCheckpointRecover
public boolean isCheckpointRecover()(Code)
True if we're in checkpoint recover mode. CallCrawlController.getCheckpointRecover() to get at Checkpoint instancethat has info on checkpoint directory being recovered from.



isCheckpointing
public boolean isCheckpointing()(Code)
True if checkpointing.



isPaused
public boolean isPaused()(Code)
Tell if the controller is paused true if paused



isPausing
public boolean isPausing()(Code)



isRunning
public boolean isRunning()(Code)



kickUpdate
public void kickUpdate()(Code)
While many settings will update automatically when the SettingsHandler is modified, some settings need to be explicitly changed to reflect new settings. This includes, number of toe threads and seeds.



killThread
public void killThread(int threadNumber, boolean replace)(Code)
Kills a thread. For details see org.archive.crawler.framework.ToePool.killThread(intboolean)ToePool.killThread(int, boolean) .
Parameters:
  threadNumber - Thread to kill.
Parameters:
  replace - Should thread be replaced.
See Also:   org.archive.crawler.framework.ToePool.killThread(intboolean)



logProgressStatistics
public void logProgressStatistics(String msg)(Code)
Log to the progress statistics log.
Parameters:
  msg - Message to write the progress statistics log.



logUriError
public void logUriError(URIException e, UURI u, CharSequence l)(Code)
Log a URIException from deep inside other components to the crawl's shared log.
Parameters:
  e - URIException encountered
Parameters:
  u - CrawlURI where problem occurred
Parameters:
  l - String which could not be interpreted as URI without exception



multiThreadMode
public void multiThreadMode()(Code)
Go to back to regular multi thread mode, where all ToeThreads may proceed at once



oneLineReportThreads
public String oneLineReportThreads()(Code)
toepool one-line report



processBdbLogs
protected void processBdbLogs(File checkpointDir, String lastBdbCheckpointLog) throws IOException(Code)



progressStatisticsEvent
public void progressStatisticsEvent(EventObject e)(Code)
Called whenever progress statistics logging event.
Parameters:
  e - Progress statistics event.



releaseContinuePermission
public void releaseContinuePermission()(Code)
Relinquish continue permission at end of processing (allowing another thread to proceed if in single-thread mode).



reportManifestTo
protected void reportManifestTo(PrintWriter writer)(Code)

Parameters:
  writer - Where to write report to.



reportProcessorsTo
protected void reportProcessorsTo(PrintWriter writer)(Code)
Compiles and returns a human readable report on the active processors.
Parameters:
  writer - Where to write to.
See Also:   org.archive.crawler.framework.Processor.report



reportTo
public void reportTo(PrintWriter writer)(Code)



reportTo
public void reportTo(String name, PrintWriter writer)(Code)



requestCrawlCheckpoint
public synchronized void requestCrawlCheckpoint() throws IllegalStateException(Code)
Request a checkpoint. Sets a checkpointing thread running.
throws:
  IllegalStateException - Thrown if crawl is not in paused state(Crawl must be first paused before checkpointing).



requestCrawlPause
public synchronized void requestCrawlPause()(Code)
Stop the crawl temporarly.



requestCrawlResume
public synchronized void requestCrawlResume()(Code)
Resume crawl from paused state



requestCrawlStart
public void requestCrawlStart()(Code)
Operator requested crawl begin



requestCrawlStop
public synchronized void requestCrawlStop()(Code)
Operator requested for crawl to stop.



requestCrawlStop
public synchronized void requestCrawlStop(String message)(Code)
Operator requested for crawl to stop.
Parameters:
  message -



restoreStatisticsTracker
protected void restoreStatisticsTracker(MapType loggers, String replaceName) throws FatalConfigurationException(Code)



rotateLogFiles
protected void rotateLogFiles(String generationSuffix) throws IOException(Code)



runFrontierRecover
protected void runFrontierRecover(String recoverPath) throws AttributeNotFoundException, MBeanException, ReflectionException, FatalConfigurationException(Code)



sendCheckpointEvent
protected void sendCheckpointEvent(File checkpointDir) throws Exception(Code)
Send the checkpoint event. Has its own method apart from CrawlController.sendCrawlStateChangeEvent(Object,String) because checkpointing throws an Exception (Didn't want to have to wrap all of the sendCrawlStateChangeEvent in try/catches).
Parameters:
  checkpointDir - Where to write checkpoint state to.
throws:
  Exception -



sendCrawlStateChangeEvent
protected void sendCrawlStateChangeEvent(Object newState, String message)(Code)
Send crawl change event to all listeners.
Parameters:
  newState - State change we're to tell listeners' about.
Parameters:
  message - Message on state change.
See Also:   CrawlController.sendCheckpointEvent(File)
See Also:    for special case event sending
See Also:   telling listeners to checkpoint.



setBdbjeBkgrdThreads
protected void setBdbjeBkgrdThreads(EnvironmentConfig config, List threads, String setting)(Code)



setOrder
public void setOrder(CrawlOrder o)(Code)

Parameters:
  o -



setupCheckpointRecover
protected void setupCheckpointRecover() throws IOException(Code)
Does setup of checkpoint recover. Copies bdb log files into state dir.
throws:
  IOException -



singleLineLegend
public String singleLineLegend()(Code)



singleLineReport
public String singleLineReport()(Code)



singleLineReportTo
public void singleLineReportTo(PrintWriter writer)(Code)



singleThreadMode
public void singleThreadMode()(Code)
Go to single thread mode, where only one ToeThread may proceed at a time. Also acquires the single lock, so no further threads will proceed past an acquireContinuePermission. Caller mush be sure to release lock to allow other threads to proceed one at a time.



toeEnded
public synchronized void toeEnded()(Code)
Note that a ToeThread ended, possibly completing the crawl-stop.



toePaused
public synchronized void toePaused()(Code)
Note that a ToeThread reached paused condition, possibly completing the crawl-pause.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.