Java Doc for CrawlJob.java in  » Web-Crawler » heritrix » org » archive » crawler » admin » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.admin 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   javax.management.NotificationBroadcasterSupport
      org.archive.crawler.admin.CrawlJob

CrawlJob
public class CrawlJob extends NotificationBroadcasterSupport implements DynamicMBean,MBeanRegistration,CrawlStatusListener,Serializable(Code)
A CrawlJob encapsulates a 'crawl order' with any and all information and methods needed by a CrawlJobHandler to accept and execute them.

A given crawl job may also be a 'profile' for a crawl. In that case it should not be executed as a crawl but can be edited and used as a template for creating new CrawlJobs.

All of it's constructors are protected since only a CrawlJobHander should construct new CrawlJobs.
author:
   Kristinn Sigurdsson
See Also:   org.archive.crawler.admin.CrawlJobHandler.newJob(CrawlJobStringStringStringStringint)
See Also:   org.archive.crawler.admin.CrawlJobHandler.newProfile(CrawlJobStringStringString)


Inner Class :public class MBeanCrawlController extends CrawlController implements Serializable

Field Summary
final public static  intPRIORITY_AVERAGE
    
final public static  intPRIORITY_CRITICAL
    
final public static  intPRIORITY_HIGH
    
final public static  intPRIORITY_LOW
    
final public static  intPRIORITY_MINIMAL
    
final public static  StringSTATUS_ABORTED
    
final public static  StringSTATUS_CHECKPOINTING
     Job is being checkpointed.
final public static  StringSTATUS_CREATED
     Inital value.
final public static  StringSTATUS_DELETED
     Job was deleted by user, will not be displayed in UI.
final public static  StringSTATUS_FINISHED
     Job finished normally having completed its crawl.
final public static  StringSTATUS_FINISHED_ABNORMAL
    
final public static  StringSTATUS_FINISHED_DATA_LIMIT
    
final public static  StringSTATUS_FINISHED_DOCUMENT_LIMIT
     Job finished normally when the specified number of documents had been fetched.
final public static  StringSTATUS_FINISHED_TIME_LIMIT
     Job finished normally when the specified timelimit was hit.
final public static  StringSTATUS_MISCONFIGURED
    
final public static  StringSTATUS_PAUSED
     Job was temporarly stopped.
final public static  StringSTATUS_PENDING
    
final public static  StringSTATUS_PREPARING
    
final public static  StringSTATUS_PROFILE
    
final public static  StringSTATUS_RUNNING
    
final public static  StringSTATUS_WAITING_FOR_PAUSE
     Job is going to be temporarly stopped after active threads are finished.
protected transient  XMLSettingsHandlersettingsHandler
    

Constructor Summary
protected  CrawlJob()
     A shutdown Constructor.
public  CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir)
     A constructor for jobs.

Create, ready to crawl, jobs.
Parameters:
  UID - A unique ID for this job.

protected  CrawlJob(String UIDandName, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler)
     A constructor for profiles.

Any job created with this constructor will be considered a profile.

public  CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir, String status, boolean isProfile, boolean isNew)
    
protected  CrawlJob(File jobFile, CrawlJobErrorHandler errorHandler)
     A constructor for reloading jobs from disk.

Method Summary
protected  voidaddBdbjeAttributes(List<OpenMBeanAttributeInfo> attributes, List<MBeanAttributeInfo> bdbjeAttributes, List<String> bdbjeNamesToAdd)
    
protected  voidaddBdbjeOperations(List<OpenMBeanOperationInfo> operations, List<MBeanOperationInfo> bdbjeOperations, List<String> bdbjeNamesToAdd)
    
protected  voidaddCrawlOrderAttributes(ComplexType type, List<OpenMBeanAttributeInfo> attributes)
    
protected  OpenMBeanInfoSupportbuildMBeanInfo()
     Build up the MBean info for Heritrix main.
protected  voidcheckpoint()
    
public  voidcrawlCheckpoint(File checkpointDir)
    
public  voidcrawlEnded(String sExitMessage)
    
public  voidcrawlEnding(String sExitMessage)
    
public  voidcrawlPaused(String statusMessage)
    
public  voidcrawlPausing(String statusMessage)
    
public  voidcrawlResuming(String statusMessage)
    
public  voidcrawlStarted(String message)
    
protected  CrawlControllercreateCrawlController()
    
public  longdeleteURIsFromPending(String regexpr)
     Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
protected  voidflush()
     If its a HostQueuesFrontier, needs to be flushed for the queued.
public  ObjectgetAttribute(String attribute_name)
    
public  AttributeListgetAttributes(String[] attributeNames)
    
public  CrawlControllergetController()
    
protected  ObjectgetCrawlOrderAttribute(String attribute_name)
    
protected  ObjectgetCrawlOrderAttribute(String attribute_name, ComplexType ct)
    
public  StringgetCrawlStatus()
    
public  FilegetDirectory()
     Returns the path of the job's base directory.
public  StringgetDisplayName()
     Return the combination of given name and UID most commonly used in administrative interface.
public  CrawlJobErrorHandlergetErrorHandler()
    
public  StringgetErrorMessage()
     Get the error message associated with this job.
public  StringgetFrontierOneLine()
    
public  StringgetFrontierReport(String reportName)
    
Parameters:
  reportName - Name of report to write.
protected  HeritrixgetHostingHeritrix()
    
public  StringgetIgnoredSeeds()
     Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier.
public  FrontierMarkergetInitialMarker(String regexpr, boolean inCacheOnly)
     Returns a URIFrontierMarker for the current, paused, job.
public  StringgetJmxJobName()
    
public  StringgetJobName()
     Returns this job's 'name'.
public  intgetJobPriority()
     Get this job's level of priority.
public  StringgetLogPath(String log)
     Returns the absolute path of the specified log.
public  MBeanInfogetMBeanInfo()
    
protected  ObjectNamegetMbeanName()
    
protected static  intgetNotificationsSequenceNumber()
    
public  intgetNumberOfJournalEntries()
    
public  ArrayListgetPendingURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose)
     Returns the frontiers URI list based on the provided marker.
public  StringgetProcessorsReport()
     Get the Processors report for the running crawl.
public  StringgetSettingsDirectory()
     Returns the directory where the configuration files for this job are located.
public  XMLSettingsHandlergetSettingsHandler()
     Returns the settings handler for this job.
public  StatisticsTrackinggetStatisticsTracking()
    
public  StringgetStatus()
    
public  StringgetThreadOneLine()
    
public  StringgetThreadsReport()
     Get the CrawlControllers ToeThreads report for the running crawl.
public  StringgetUID()
     Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created.
public  voidimportUri(String uri, boolean forceFetch, boolean isSeed)
     Schedule a uri.
public  voidimportUri(String str, boolean forceFetch, boolean isSeed, boolean isFlush)
     Schedule a uri.
Parameters:
  str - String that can be: 1.
public  StringimportUris(String file, String style, String force)
    
public  StringimportUris(String fileOrUrl, String style, boolean forceRevisit)
    
public  StringimportUris(String fileOrUrl, String style, boolean forceRevisit, boolean areSeeds)
    
protected  intimportUris(InputStream is, String style, boolean forceRevisit)
    
protected  intimportUris(InputStream is, String style, boolean forceRevisit, boolean areSeeds)
     Import URIs.
Parameters:
  is - Stream to use as URI source.
Parameters:
  style - Style in which URIs are rendored.
public  Objectinvoke(String operationName, Object[] params, String[] signature)
    
public  booleanisCheckpointing()
    
public  booleanisCrawling()
    
public  booleanisNew()
    
public  booleanisProfile()
    
public  booleanisReadOnly()
    
public  booleanisRunning()
     Returns true if the job is being crawled.
public  voidkickUpdate()
     Forward a 'kick' update to current controller if any.
public  voidkillThread(int threadNumber, boolean replace)
     Kills a thread.
public  voidmustBeCrawling()
    
protected  voidpause()
    
public  voidpostDeregister()
    
public  voidpostRegister(Boolean registrationDone)
    
public  voidpreDeregister()
    
public  ObjectNamepreRegister(MBeanServer server, ObjectName on)
    
protected  voidresume()
    
public  CollectionscanCheckpoints()
    
public  voidsetAttribute(Attribute attribute)
    
public  AttributeListsetAttributes(AttributeList attributes)
    
protected  voidsetCrawlOrderAttribute(String attribute_name, ComplexType ct, Attribute attribute)
    
public  voidsetErrorMessage(String string)
     Set an error message for this job.
public  voidsetJobPriority(int priority)
     Set this job's level of priority.
public  voidsetNew(boolean b)
     Set if the job is considered a new job or not.
public  voidsetNumberOfJournalEntries(int numberOfJournalEntries)
    
public  voidsetReadOnly()
     Once called no changes can be made to the settings for this job.
protected  voidsetRunning(boolean b)
     Set if job is being crawled.
public  voidsetStatus(String status)
     Set the status of this CrawlJob.
protected  CrawlControllersetupCrawlController()
    
public  voidsetupForCrawlStart()
    
public  voidstopCrawling()
    
protected  voidunregisterMBean()
    
public  voidwriteFrontierReport(String reportName, PrintWriter writer)
    
public  voidwriteThreadsReport(String reportName, PrintWriter writer)
    

Field Detail
PRIORITY_AVERAGE
final public static int PRIORITY_AVERAGE(Code)
average



PRIORITY_CRITICAL
final public static int PRIORITY_CRITICAL(Code)
highest



PRIORITY_HIGH
final public static int PRIORITY_HIGH(Code)
high



PRIORITY_LOW
final public static int PRIORITY_LOW(Code)
low



PRIORITY_MINIMAL
final public static int PRIORITY_MINIMAL(Code)
lowest



STATUS_ABORTED
final public static String STATUS_ABORTED(Code)
Job was terminted by user input while crawling



STATUS_CHECKPOINTING
final public static String STATUS_CHECKPOINTING(Code)
Job is being checkpointed. When finished checkpointing, job is set back to STATUS_PAUSED (Job must be first paused before checkpointing will run).



STATUS_CREATED
final public static String STATUS_CREATED(Code)
Inital value. May not be ready to run/incomplete.



STATUS_DELETED
final public static String STATUS_DELETED(Code)
Job was deleted by user, will not be displayed in UI.



STATUS_FINISHED
final public static String STATUS_FINISHED(Code)
Job finished normally having completed its crawl.



STATUS_FINISHED_ABNORMAL
final public static String STATUS_FINISHED_ABNORMAL(Code)
Something went very wrong



STATUS_FINISHED_DATA_LIMIT
final public static String STATUS_FINISHED_DATA_LIMIT(Code)
Job finished normally when the specifed amount of data (MB) had been downloaded



STATUS_FINISHED_DOCUMENT_LIMIT
final public static String STATUS_FINISHED_DOCUMENT_LIMIT(Code)
Job finished normally when the specified number of documents had been fetched.



STATUS_FINISHED_TIME_LIMIT
final public static String STATUS_FINISHED_TIME_LIMIT(Code)
Job finished normally when the specified timelimit was hit.



STATUS_MISCONFIGURED
final public static String STATUS_MISCONFIGURED(Code)
Job could not be launced due to an InitializationException



STATUS_PAUSED
final public static String STATUS_PAUSED(Code)
Job was temporarly stopped. State is kept so it can be resumed



STATUS_PENDING
final public static String STATUS_PENDING(Code)
Job has been successfully submitted to a CrawlJobHandler



STATUS_PREPARING
final public static String STATUS_PREPARING(Code)



STATUS_PROFILE
final public static String STATUS_PROFILE(Code)
Job is actually a profile



STATUS_RUNNING
final public static String STATUS_RUNNING(Code)
Job is being crawled



STATUS_WAITING_FOR_PAUSE
final public static String STATUS_WAITING_FOR_PAUSE(Code)
Job is going to be temporarly stopped after active threads are finished.



settingsHandler
protected transient XMLSettingsHandler settingsHandler(Code)




Constructor Detail
CrawlJob
protected CrawlJob()(Code)
A shutdown Constructor.



CrawlJob
public CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir)(Code)
A constructor for jobs.

Create, ready to crawl, jobs.
Parameters:
  UID - A unique ID for this job. Typically emitted by theCrawlJobHandler.
Parameters:
  name - The name of the job
Parameters:
  settingsHandler - The associated settings
Parameters:
  errorHandler - The crawl jobs settings error handler.null means none is set
Parameters:
  priority - job priority.
Parameters:
  dir - The directory that is considered this jobs working directory.




CrawlJob
protected CrawlJob(String UIDandName, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler)(Code)
A constructor for profiles.

Any job created with this constructor will be considered a profile. Profiles are not stored on disk (only their settings files are stored on disk). This is because their data is predictible given any settings files.
Parameters:
  UIDandName - A unique ID for this job. For profiles this is the sameas name
Parameters:
  settingsHandler - The associated settings
Parameters:
  errorHandler - The crawl jobs settings error handler.null means none is set




CrawlJob
public CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir, String status, boolean isProfile, boolean isNew)(Code)



CrawlJob
protected CrawlJob(File jobFile, CrawlJobErrorHandler errorHandler) throws InvalidJobFileException, IOException(Code)
A constructor for reloading jobs from disk. Jobs (not profiles) have their data written to persistent storage in the file system. This method is used to load the job from such storage. This is done by the CrawlJobHandler.

Proper structure of a job file (TODO: Maybe one day make this an XML file) Line 1. UID
Line 2. Job name (string)
Line 3. Job status (string)
Line 4. is job read only (true/false)
Line 5. is job running (true/false)
Line 6. job priority (int)
Line 7. number of journal entries
Line 8. setting file (with path)
Line 9. statistics tracker file (with path)
Line 10-?. error message (String, empty for null), can be many lines

Parameters:
  jobFile - a file containing information about the job to load.
Parameters:
  errorHandler - The crawl jobs settings error handler.null means none is set
throws:
  InvalidJobFileException - if the specified file does not refer to a valid job file.
throws:
  IOException - if io operations fail





Method Detail
addBdbjeAttributes
protected void addBdbjeAttributes(List<OpenMBeanAttributeInfo> attributes, List<MBeanAttributeInfo> bdbjeAttributes, List<String> bdbjeNamesToAdd)(Code)



addBdbjeOperations
protected void addBdbjeOperations(List<OpenMBeanOperationInfo> operations, List<MBeanOperationInfo> bdbjeOperations, List<String> bdbjeNamesToAdd)(Code)



addCrawlOrderAttributes
protected void addCrawlOrderAttributes(ComplexType type, List<OpenMBeanAttributeInfo> attributes)(Code)



buildMBeanInfo
protected OpenMBeanInfoSupport buildMBeanInfo() throws InitializationException(Code)
Build up the MBean info for Heritrix main. Return created mbean info instance.
throws:
  InitializationException -



checkpoint
protected void checkpoint() throws IllegalStateException(Code)

throws:
  IllegalStateException - Thrown if crawl is not paused.



crawlCheckpoint
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)



crawlEnded
public void crawlEnded(String sExitMessage)(Code)



crawlEnding
public void crawlEnding(String sExitMessage)(Code)



crawlPaused
public void crawlPaused(String statusMessage)(Code)



crawlPausing
public void crawlPausing(String statusMessage)(Code)



crawlResuming
public void crawlResuming(String statusMessage)(Code)



crawlStarted
public void crawlStarted(String message)(Code)



createCrawlController
protected CrawlController createCrawlController()(Code)



deleteURIsFromPending
public long deleteURIsFromPending(String regexpr)(Code)
Delete any URI from the frontier of the current (paused) job that match the specified regular expression. If the current job is not paused (or there is no current job) nothing will be done.
Parameters:
  regexpr - Regular expression to delete URIs by. the number of URIs deleted



flush
protected void flush()(Code)
If its a HostQueuesFrontier, needs to be flushed for the queued.



getAttribute
public Object getAttribute(String attribute_name) throws AttributeNotFoundException(Code)



getAttributes
public AttributeList getAttributes(String[] attributeNames)(Code)



getController
public CrawlController getController()(Code)



getCrawlOrderAttribute
protected Object getCrawlOrderAttribute(String attribute_name)(Code)



getCrawlOrderAttribute
protected Object getCrawlOrderAttribute(String attribute_name, ComplexType ct) throws AttributeNotFoundException, MBeanException, ReflectionException(Code)



getCrawlStatus
public String getCrawlStatus()(Code)
Status of the crawler (Used by JMX).



getDirectory
public File getDirectory()(Code)
Returns the path of the job's base directory. For profiles this is always equal to new File(getSettingsDirectory()). the path of the job's base directory.



getDisplayName
public String getDisplayName()(Code)
Return the combination of given name and UID most commonly used in administrative interface. Job's name with UID notation



getErrorHandler
public CrawlJobErrorHandler getErrorHandler()(Code)
Returns the error handler for this crawl job



getErrorMessage
public String getErrorMessage()(Code)
Get the error message associated with this job. Will return null if there is no error message. the error message associated with this job



getFrontierOneLine
public String getFrontierOneLine()(Code)
One-line Frontier report.



getFrontierReport
public String getFrontierReport(String reportName)(Code)

Parameters:
  reportName - Name of report to write. A report of the frontier's status.



getHostingHeritrix
protected Heritrix getHostingHeritrix()(Code)
Heritrix that is hosting this job.



getIgnoredSeeds
public String getIgnoredSeeds()(Code)
Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier. String of all ignored seed items, or null if none



getInitialMarker
public FrontierMarker getInitialMarker(String regexpr, boolean inCacheOnly)(Code)
Returns a URIFrontierMarker for the current, paused, job. If there is no current job or it is not paused null will be returned.
Parameters:
  regexpr - A regular expression that each URI must match in order tobe considered 'within' the marker.
Parameters:
  inCacheOnly - Limit marker scope to 'cached' URIs. a URIFrontierMarker for the current job.
See Also:   CrawlJob.getPendingURIsList(FrontierMarker,int,boolean)
See Also:   org.archive.crawler.framework.Frontier.getInitialMarker(Stringboolean)
See Also:   org.archive.crawler.framework.FrontierMarker



getJmxJobName
public String getJmxJobName()(Code)
Unique name for job that is safe to use in jmx (Like displayname but without spaces).



getJobName
public String getJobName()(Code)
Returns this job's 'name'. The name comes from the settings for this job, need not be unique and may change. For a unique identifier use CrawlJob.getUID() getUID() .

The name corrisponds to the value of the 'name' tag in the 'meta' section of the settings file. This job's 'name'




getJobPriority
public int getJobPriority()(Code)
Get this job's level of priority. this job's priority
See Also:   CrawlJob.setJobPriority(int)
See Also:   CrawlJob.PRIORITY_MINIMAL
See Also:   CrawlJob.PRIORITY_LOW
See Also:   CrawlJob.PRIORITY_AVERAGE
See Also:   CrawlJob.PRIORITY_HIGH
See Also:   CrawlJob.PRIORITY_CRITICAL



getLogPath
public String getLogPath(String log) throws AttributeNotFoundException, MBeanException, ReflectionException(Code)
Returns the absolute path of the specified log. Note: If crawl has not begun, this file may not exist.
Parameters:
  log - the absolute path for the specified log.
throws:
  AttributeNotFoundException -
throws:
  ReflectionException -
throws:
  MBeanException -



getMBeanInfo
public MBeanInfo getMBeanInfo()(Code)
Our mbean info (Needed for CrawlJob to qualify as aDynamicMBean).



getMbeanName
protected ObjectName getMbeanName()(Code)



getNotificationsSequenceNumber
protected static int getNotificationsSequenceNumber()(Code)
Notification sequence number (Does increment after each access).



getNumberOfJournalEntries
public int getNumberOfJournalEntries()(Code)
Returns the number of journal entries.



getPendingURIsList
public ArrayList getPendingURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose) throws InvalidFrontierMarkerException(Code)
Returns the frontiers URI list based on the provided marker. This method will return null if there is not current job or if the current job is not paused. Only when there is a paused current job will this method return a URI list.
Parameters:
  marker - URIFrontier marker
Parameters:
  numberOfMatches - Maximum number of matches to return
Parameters:
  verbose - Should detailed info be provided on each URI? the frontiers URI list based on the provided marker
throws:
  InvalidFrontierMarkerException - When marker is inconsistent with the current state of thefrontier.
See Also:   CrawlJob.getInitialMarker(String,boolean)
See Also:   org.archive.crawler.framework.FrontierMarker



getProcessorsReport
public String getProcessorsReport()(Code)
Get the Processors report for the running crawl. The Processors report for the running crawl.



getSettingsDirectory
public String getSettingsDirectory()(Code)
Returns the directory where the configuration files for this job are located. the directory where the configuration files for this job arelocated



getSettingsHandler
public XMLSettingsHandler getSettingsHandler()(Code)
Returns the settings handler for this job. It will have been initialized. the settings handler for this job.



getStatisticsTracking
public StatisticsTracking getStatisticsTracking()(Code)
the statistics tracking instance (of null if none yet available).



getStatus
public String getStatus()(Code)
Get the current status of this CrawlJob The current status of this CrawlJob(see constants defined here beginning with STATUS)



getThreadOneLine
public String getThreadOneLine()(Code)
One-line threads report.



getThreadsReport
public String getThreadsReport()(Code)
Get the CrawlControllers ToeThreads report for the running crawl. The CrawlControllers ToeThreads report



getUID
public String getUID()(Code)
Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created. Job This jobs UID.
See Also:   CrawlJobHandler.getNextJobUID



importUri
public void importUri(String uri, boolean forceFetch, boolean isSeed) throws URIException(Code)
Schedule a uri.
Parameters:
  uri - Uri to schedule.
Parameters:
  forceFetch - Should it be forcefetched.
Parameters:
  isSeed - True if seed.
throws:
  URIException -



importUri
public void importUri(String str, boolean forceFetch, boolean isSeed, boolean isFlush) throws URIException(Code)
Schedule a uri.
Parameters:
  str - String that can be: 1. a UURI, 2. a snippet of thecrawl.log line, or 3. a snippet from recover log. SeeCrawlJob.importUris(InputStream,String,boolean) for how it subparsesthe lines from crawl.log and recover.log.
Parameters:
  forceFetch - Should it be forcefetched.
Parameters:
  isSeed - True if seed.
Parameters:
  isFlush - If true, flush the frontier IF it implementsflushing.
throws:
  URIException -



importUris
public String importUris(String file, String style, String force)(Code)



importUris
public String importUris(String fileOrUrl, String style, boolean forceRevisit)(Code)



importUris
public String importUris(String fileOrUrl, String style, boolean forceRevisit, boolean areSeeds)(Code)

Parameters:
  fileOrUrl - Name of file w/ seeds.
Parameters:
  style - What style of seeds -- crawl log, recovery journal, orseeds file.
Parameters:
  forceRevisit - Should we revisit even if seen before?
Parameters:
  areSeeds - Is the file exclusively seeds? A display string that has a count of all added.



importUris
protected int importUris(InputStream is, String style, boolean forceRevisit)(Code)



importUris
protected int importUris(InputStream is, String style, boolean forceRevisit, boolean areSeeds)(Code)
Import URIs.
Parameters:
  is - Stream to use as URI source.
Parameters:
  style - Style in which URIs are rendored. Currently support forrecoveryJournal, crawlLog, and seeds fileformat (i.e default) where default style isa UURI per line (comments allowed).
Parameters:
  forceRevisit - Whether we should revisit this URI even if we'vevisited it previously.
Parameters:
  areSeeds - Are the imported URIs seeds? Count of added URIs.



invoke
public Object invoke(String operationName, Object[] params, String[] signature) throws ReflectionException(Code)



isCheckpointing
public boolean isCheckpointing()(Code)
True if checkpointing.



isCrawling
public boolean isCrawling()(Code)



isNew
public boolean isNew()(Code)
Is this a new job? True if is new.



isProfile
public boolean isProfile()(Code)
Set if the job is considered to be a profile True if is a profile.



isReadOnly
public boolean isReadOnly()(Code)
Is job read only? false until setReadOnly has been invoked, after that it returns true.



isRunning
public boolean isRunning()(Code)
Returns true if the job is being crawled. true if the job is being crawled



kickUpdate
public void kickUpdate()(Code)
Forward a 'kick' update to current controller if any.
See Also:   CrawlController.kickUpdate



killThread
public void killThread(int threadNumber, boolean replace)(Code)
Kills a thread. For details see org.archive.crawler.framework.ToePool.killThread(intboolean)ToePool.killThread(int, boolean) .
Parameters:
  threadNumber - Thread to kill.
Parameters:
  replace - Should thread be replaced.
See Also:   org.archive.crawler.framework.ToePool.killThread(intboolean)



mustBeCrawling
public void mustBeCrawling()(Code)



pause
protected void pause()(Code)



postDeregister
public void postDeregister()(Code)



postRegister
public void postRegister(Boolean registrationDone)(Code)



preDeregister
public void preDeregister() throws Exception(Code)



preRegister
public ObjectName preRegister(MBeanServer server, ObjectName on) throws Exception(Code)



resume
protected void resume()(Code)



scanCheckpoints
public Collection scanCheckpoints()(Code)
Read all the checkpoints found in the job's checkpoints directory into Checkpoint instances Collection containing list of all checkpoints.



setAttribute
public void setAttribute(Attribute attribute) throws AttributeNotFoundException(Code)



setAttributes
public AttributeList setAttributes(AttributeList attributes)(Code)



setCrawlOrderAttribute
protected void setCrawlOrderAttribute(String attribute_name, ComplexType ct, Attribute attribute) throws AttributeNotFoundException, InvalidAttributeValueException, MBeanException, ReflectionException(Code)



setErrorMessage
public void setErrorMessage(String string)(Code)
Set an error message for this job. Generally this only occurs if the job is misconfigured.
Parameters:
  string - the error message associated with this job



setJobPriority
public void setJobPriority(int priority)(Code)
Set this job's level of priority.
Parameters:
  priority - The level of priority
See Also:   CrawlJob.getJobPriority()
See Also:   CrawlJob.PRIORITY_MINIMAL
See Also:   CrawlJob.PRIORITY_LOW
See Also:   CrawlJob.PRIORITY_AVERAGE
See Also:   CrawlJob.PRIORITY_HIGH
See Also:   CrawlJob.PRIORITY_CRITICAL



setNew
public void setNew(boolean b)(Code)
Set if the job is considered a new job or not.
Parameters:
  b - Is the job considered to be new.



setNumberOfJournalEntries
public void setNumberOfJournalEntries(int numberOfJournalEntries)(Code)

Parameters:
  numberOfJournalEntries - The number of journal entries to set.



setReadOnly
public void setReadOnly()(Code)
Once called no changes can be made to the settings for this job. Typically this is done once a crawl is completed and further changes to the crawl order are therefor meaningless.



setRunning
protected void setRunning(boolean b)(Code)
Set if job is being crawled.
Parameters:
  b - Is job being crawled.



setStatus
public void setStatus(String status)(Code)
Set the status of this CrawlJob.
Parameters:
  status - Current status of CrawlJob(see constants defined here beginning with STATUS)



setupCrawlController
protected CrawlController setupCrawlController() throws InitializationException(Code)



setupForCrawlStart
public void setupForCrawlStart() throws InitializationException(Code)



stopCrawling
public void stopCrawling()(Code)



unregisterMBean
protected void unregisterMBean()(Code)



writeFrontierReport
public void writeFrontierReport(String reportName, PrintWriter writer)(Code)
Write the requested frontier report to the given PrintWriter
Parameters:
  reportName - Name of report to write.
Parameters:
  writer - Where to write to.



writeThreadsReport
public void writeThreadsReport(String reportName, PrintWriter writer)(Code)
Write the requested threads report to the given PrintWriter
Parameters:
  reportName - Name of report to write.
Parameters:
  writer - Where to write to.



Methods inherited from javax.management.NotificationBroadcasterSupport
public void addNotificationListener(NotificationListener listener, NotificationFilter filter, Object handback)(Code)(Java Doc)
public MBeanNotificationInfo[] getNotificationInfo()(Code)(Java Doc)
protected void handleNotification(NotificationListener listener, Notification notif, Object handback)(Code)(Java Doc)
public void removeNotificationListener(NotificationListener listener) throws ListenerNotFoundException(Code)(Java Doc)
public void removeNotificationListener(NotificationListener listener, NotificationFilter filter, Object handback) throws ListenerNotFoundException(Code)(Java Doc)
public void sendNotification(Notification notification)(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.