Java Doc for CrawlJob.java in » Web-Crawler » heritrix » org » archive » crawler » admin » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.admin

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

javax.management .NotificationBroadcasterSupport

org.archive.crawler.admin .CrawlJob

CrawlJob

public class CrawlJob extends NotificationBroadcasterSupport implements DynamicMBean,MBeanRegistration,CrawlStatusListener,Serializable(Code)

A CrawlJob encapsulates a 'crawl order' with any and all information and methods needed by a CrawlJobHandler to accept and execute them.

A given crawl job may also be a 'profile' for a crawl. In that case it should not be executed as a crawl but can be edited and used as a template for creating new CrawlJobs.

All of it's constructors are protected since only a CrawlJobHander should construct new CrawlJobs.
author:
   Kristinn Sigurdsson
See Also:   org.archive.crawler.admin.CrawlJobHandler.newJob(CrawlJobStringStringStringStringint)
See Also:   org.archive.crawler.admin.CrawlJobHandler.newProfile(CrawlJobStringStringString)

Inner Class :public class MBeanCrawlController extends CrawlController implements Serializable

Field Summary
final public static int	PRIORITY_AVERAGE
final public static int	PRIORITY_CRITICAL
final public static int	PRIORITY_HIGH
final public static int	PRIORITY_LOW
final public static int	PRIORITY_MINIMAL
final public static String	STATUS_ABORTED
final public static String	STATUS_CHECKPOINTING Job is being checkpointed.
final public static String	STATUS_CREATED Inital value.
final public static String	STATUS_DELETED Job was deleted by user, will not be displayed in UI.
final public static String	STATUS_FINISHED Job finished normally having completed its crawl.
final public static String	STATUS_FINISHED_ABNORMAL
final public static String	STATUS_FINISHED_DATA_LIMIT
final public static String	STATUS_FINISHED_DOCUMENT_LIMIT Job finished normally when the specified number of documents had been fetched.
final public static String	STATUS_FINISHED_TIME_LIMIT Job finished normally when the specified timelimit was hit.
final public static String	STATUS_MISCONFIGURED
final public static String	STATUS_PAUSED Job was temporarly stopped.
final public static String	STATUS_PENDING
final public static String	STATUS_PREPARING
final public static String	STATUS_PROFILE
final public static String	STATUS_RUNNING
final public static String	STATUS_WAITING_FOR_PAUSE Job is going to be temporarly stopped after active threads are finished.
protected transient XMLSettingsHandler	settingsHandler

Constructor Summary
protected	CrawlJob() A shutdown Constructor.
public	CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir) A constructor for jobs. Create, ready to crawl, jobs. Parameters: UID - A unique ID for this job.
protected	CrawlJob(String UIDandName, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler) A constructor for profiles. Any job created with this constructor will be considered a profile.
public	CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir, String status, boolean isProfile, boolean isNew)
protected	CrawlJob(File jobFile, CrawlJobErrorHandler errorHandler) A constructor for reloading jobs from disk.

Method Summary
protected void	addBdbjeAttributes(List<OpenMBeanAttributeInfo> attributes, List<MBeanAttributeInfo> bdbjeAttributes, List<String> bdbjeNamesToAdd)
protected void	addBdbjeOperations(List<OpenMBeanOperationInfo> operations, List<MBeanOperationInfo> bdbjeOperations, List<String> bdbjeNamesToAdd)
protected void	addCrawlOrderAttributes(ComplexType type, List<OpenMBeanAttributeInfo> attributes)
protected OpenMBeanInfoSupport	buildMBeanInfo() Build up the MBean info for Heritrix main.
protected void	checkpoint()
public void	crawlCheckpoint(File checkpointDir)
public void	crawlEnded(String sExitMessage)
public void	crawlEnding(String sExitMessage)
public void	crawlPaused(String statusMessage)
public void	crawlPausing(String statusMessage)
public void	crawlResuming(String statusMessage)
public void	crawlStarted(String message)
protected CrawlController	createCrawlController()
public long	deleteURIsFromPending(String regexpr) Delete any URI from the frontier of the current (paused) job that match the specified regular expression.
protected void	flush() If its a HostQueuesFrontier, needs to be flushed for the queued.
public Object	getAttribute(String attribute_name)
public AttributeList	getAttributes(String[] attributeNames)
public CrawlController	getController()
protected Object	getCrawlOrderAttribute(String attribute_name)
protected Object	getCrawlOrderAttribute(String attribute_name, ComplexType ct)
public String	getCrawlStatus()
public File	getDirectory() Returns the path of the job's base directory.
public String	getDisplayName() Return the combination of given name and UID most commonly used in administrative interface.
public CrawlJobErrorHandler	getErrorHandler()
public String	getErrorMessage() Get the error message associated with this job.
public String	getFrontierOneLine()
public String	getFrontierReport(String reportName) Parameters: reportName - Name of report to write.
protected Heritrix	getHostingHeritrix()
public String	getIgnoredSeeds() Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier.
public FrontierMarker	getInitialMarker(String regexpr, boolean inCacheOnly) Returns a URIFrontierMarker for the current, paused, job.
public String	getJmxJobName()
public String	getJobName() Returns this job's 'name'.
public int	getJobPriority() Get this job's level of priority.
public String	getLogPath(String log) Returns the absolute path of the specified log.
public MBeanInfo	getMBeanInfo()
protected ObjectName	getMbeanName()
protected static int	getNotificationsSequenceNumber()
public int	getNumberOfJournalEntries()
public ArrayList	getPendingURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose) Returns the frontiers URI list based on the provided marker.
public String	getProcessorsReport() Get the Processors report for the running crawl.
public String	getSettingsDirectory() Returns the directory where the configuration files for this job are located.
public XMLSettingsHandler	getSettingsHandler() Returns the settings handler for this job.
public StatisticsTracking	getStatisticsTracking()
public String	getStatus()
public String	getThreadOneLine()
public String	getThreadsReport() Get the CrawlControllers ToeThreads report for the running crawl.
public String	getUID() Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created.
public void	importUri(String uri, boolean forceFetch, boolean isSeed) Schedule a uri.
public void	importUri(String str, boolean forceFetch, boolean isSeed, boolean isFlush) Schedule a uri. Parameters: str - String that can be: 1.
public String	importUris(String file, String style, String force)
public String	importUris(String fileOrUrl, String style, boolean forceRevisit)
public String	importUris(String fileOrUrl, String style, boolean forceRevisit, boolean areSeeds)
protected int	importUris(InputStream is, String style, boolean forceRevisit)
protected int	importUris(InputStream is, String style, boolean forceRevisit, boolean areSeeds) Import URIs. Parameters: is - Stream to use as URI source. Parameters: style - Style in which URIs are rendored.
public Object	invoke(String operationName, Object[] params, String[] signature)
public boolean	isCheckpointing()
public boolean	isCrawling()
public boolean	isNew()
public boolean	isProfile()
public boolean	isReadOnly()
public boolean	isRunning() Returns true if the job is being crawled.
public void	kickUpdate() Forward a 'kick' update to current controller if any.
public void	killThread(int threadNumber, boolean replace) Kills a thread.
public void	mustBeCrawling()
protected void	pause()
public void	postDeregister()
public void	postRegister(Boolean registrationDone)
public void	preDeregister()
public ObjectName	preRegister(MBeanServer server, ObjectName on)
protected void	resume()
public Collection	scanCheckpoints()
public void	setAttribute(Attribute attribute)
public AttributeList	setAttributes(AttributeList attributes)
protected void	setCrawlOrderAttribute(String attribute_name, ComplexType ct, Attribute attribute)
public void	setErrorMessage(String string) Set an error message for this job.
public void	setJobPriority(int priority) Set this job's level of priority.
public void	setNew(boolean b) Set if the job is considered a new job or not.
public void	setNumberOfJournalEntries(int numberOfJournalEntries)
public void	setReadOnly() Once called no changes can be made to the settings for this job.
protected void	setRunning(boolean b) Set if job is being crawled.
public void	setStatus(String status) Set the status of this CrawlJob.
protected CrawlController	setupCrawlController()
public void	setupForCrawlStart()
public void	stopCrawling()
protected void	unregisterMBean()
public void	writeFrontierReport(String reportName, PrintWriter writer)
public void	writeThreadsReport(String reportName, PrintWriter writer)

Field Detail

PRIORITY_AVERAGE
final public static int PRIORITY_AVERAGE(Code)
	average

PRIORITY_CRITICAL
final public static int PRIORITY_CRITICAL(Code)
	highest

PRIORITY_HIGH
final public static int PRIORITY_HIGH(Code)
	high

PRIORITY_LOW
final public static int PRIORITY_LOW(Code)
	low

PRIORITY_MINIMAL
final public static int PRIORITY_MINIMAL(Code)
	lowest

STATUS_ABORTED
final public static String STATUS_ABORTED(Code)
	Job was terminted by user input while crawling

STATUS_CHECKPOINTING
final public static String STATUS_CHECKPOINTING(Code)
	Job is being checkpointed. When finished checkpointing, job is set back to STATUS_PAUSED (Job must be first paused before checkpointing will run).

STATUS_CREATED
final public static String STATUS_CREATED(Code)
	Inital value. May not be ready to run/incomplete.

STATUS_DELETED
final public static String STATUS_DELETED(Code)
	Job was deleted by user, will not be displayed in UI.

STATUS_FINISHED
final public static String STATUS_FINISHED(Code)
	Job finished normally having completed its crawl.

STATUS_FINISHED_ABNORMAL
final public static String STATUS_FINISHED_ABNORMAL(Code)
	Something went very wrong

STATUS_FINISHED_DATA_LIMIT
final public static String STATUS_FINISHED_DATA_LIMIT(Code)
	Job finished normally when the specifed amount of data (MB) had been downloaded

STATUS_FINISHED_DOCUMENT_LIMIT
final public static String STATUS_FINISHED_DOCUMENT_LIMIT(Code)
	Job finished normally when the specified number of documents had been fetched.

STATUS_FINISHED_TIME_LIMIT
final public static String STATUS_FINISHED_TIME_LIMIT(Code)
	Job finished normally when the specified timelimit was hit.

STATUS_MISCONFIGURED
final public static String STATUS_MISCONFIGURED(Code)
	Job could not be launced due to an InitializationException

STATUS_PAUSED
final public static String STATUS_PAUSED(Code)
	Job was temporarly stopped. State is kept so it can be resumed

STATUS_PENDING
final public static String STATUS_PENDING(Code)
	Job has been successfully submitted to a CrawlJobHandler

STATUS_PREPARING
final public static String STATUS_PREPARING(Code)

STATUS_PROFILE
final public static String STATUS_PROFILE(Code)
	Job is actually a profile

STATUS_RUNNING
final public static String STATUS_RUNNING(Code)
	Job is being crawled

STATUS_WAITING_FOR_PAUSE
final public static String STATUS_WAITING_FOR_PAUSE(Code)
	Job is going to be temporarly stopped after active threads are finished.

settingsHandler
protected transient XMLSettingsHandler settingsHandler(Code)

Constructor Detail

CrawlJob
protected CrawlJob()(Code)
	A shutdown Constructor.

CrawlJob
public CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir)(Code)
	A constructor for jobs. Create, ready to crawl, jobs. Parameters: UID - A unique ID for this job. Typically emitted by theCrawlJobHandler. Parameters: name - The name of the job Parameters: settingsHandler - The associated settings Parameters: errorHandler - The crawl jobs settings error handler.`null` means none is set Parameters: priority - job priority. Parameters: dir - The directory that is considered this jobs working directory.

CrawlJob
protected CrawlJob(String UIDandName, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler)(Code)
	A constructor for profiles. Any job created with this constructor will be considered a profile. Profiles are not stored on disk (only their settings files are stored on disk). This is because their data is predictible given any settings files. Parameters: UIDandName - A unique ID for this job. For profiles this is the sameas name Parameters: settingsHandler - The associated settings Parameters: errorHandler - The crawl jobs settings error handler.`null` means none is set

CrawlJob
public CrawlJob(String UID, String name, XMLSettingsHandler settingsHandler, CrawlJobErrorHandler errorHandler, int priority, File dir, String status, boolean isProfile, boolean isNew)(Code)

CrawlJob
protected CrawlJob(File jobFile, CrawlJobErrorHandler errorHandler) throws InvalidJobFileException, IOException(Code)
	A constructor for reloading jobs from disk. Jobs (not profiles) have their data written to persistent storage in the file system. This method is used to load the job from such storage. This is done by the `CrawlJobHandler`. Proper structure of a job file (TODO: Maybe one day make this an XML file) Line 1. UID Line 2. Job name (string) Line 3. Job status (string) Line 4. is job read only (true/false) Line 5. is job running (true/false) Line 6. job priority (int) Line 7. number of journal entries Line 8. setting file (with path) Line 9. statistics tracker file (with path) Line 10-?. error message (String, empty for null), can be many lines Parameters: jobFile - a file containing information about the job to load. Parameters: errorHandler - The crawl jobs settings error handler.null means none is set throws: InvalidJobFileException - if the specified file does not refer to a valid job file. throws: IOException - if io operations fail

Method Detail

addBdbjeAttributes
protected void addBdbjeAttributes(List<OpenMBeanAttributeInfo> attributes, List<MBeanAttributeInfo> bdbjeAttributes, List<String> bdbjeNamesToAdd)(Code)

addBdbjeOperations
protected void addBdbjeOperations(List<OpenMBeanOperationInfo> operations, List<MBeanOperationInfo> bdbjeOperations, List<String> bdbjeNamesToAdd)(Code)

addCrawlOrderAttributes
protected void addCrawlOrderAttributes(ComplexType type, List<OpenMBeanAttributeInfo> attributes)(Code)

buildMBeanInfo
protected OpenMBeanInfoSupport buildMBeanInfo() throws InitializationException(Code)
	Build up the MBean info for Heritrix main. Return created mbean info instance. throws: InitializationException -

checkpoint
protected void checkpoint() throws IllegalStateException(Code)
	throws: IllegalStateException - Thrown if crawl is not paused.

crawlCheckpoint
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)

crawlEnded
public void crawlEnded(String sExitMessage)(Code)

crawlEnding
public void crawlEnding(String sExitMessage)(Code)

crawlPaused
public void crawlPaused(String statusMessage)(Code)

crawlPausing
public void crawlPausing(String statusMessage)(Code)

crawlResuming
public void crawlResuming(String statusMessage)(Code)

crawlStarted
public void crawlStarted(String message)(Code)

createCrawlController
protected CrawlController createCrawlController()(Code)

deleteURIsFromPending
public long deleteURIsFromPending(String regexpr)(Code)
	Delete any URI from the frontier of the current (paused) job that match the specified regular expression. If the current job is not paused (or there is no current job) nothing will be done. Parameters: regexpr - Regular expression to delete URIs by. the number of URIs deleted

flush
protected void flush()(Code)
	If its a HostQueuesFrontier, needs to be flushed for the queued.

getAttribute
public Object getAttribute(String attribute_name) throws AttributeNotFoundException(Code)

getAttributes
public AttributeList getAttributes(String[] attributeNames)(Code)

getController
public CrawlController getController()(Code)

getCrawlOrderAttribute
protected Object getCrawlOrderAttribute(String attribute_name)(Code)

getCrawlOrderAttribute
protected Object getCrawlOrderAttribute(String attribute_name, ComplexType ct) throws AttributeNotFoundException, MBeanException, ReflectionException(Code)

getCrawlStatus
public String getCrawlStatus()(Code)
	Status of the crawler (Used by JMX).

getDirectory
public File getDirectory()(Code)
	Returns the path of the job's base directory. For profiles this is always equal to `new File(getSettingsDirectory())`. the path of the job's base directory.

getDisplayName
public String getDisplayName()(Code)
	Return the combination of given name and UID most commonly used in administrative interface. Job's name with UID notation

getErrorHandler
public CrawlJobErrorHandler getErrorHandler()(Code)
	Returns the error handler for this crawl job

getErrorMessage
public String getErrorMessage()(Code)
	Get the error message associated with this job. Will return null if there is no error message. the error message associated with this job

getFrontierOneLine
public String getFrontierOneLine()(Code)
	One-line Frontier report.

getFrontierReport
public String getFrontierReport(String reportName)(Code)
	Parameters: reportName - Name of report to write. A report of the frontier's status.

getHostingHeritrix
protected Heritrix getHostingHeritrix()(Code)
	Heritrix that is hosting this job.

getIgnoredSeeds
public String getIgnoredSeeds()(Code)
	Utility method to get the stored list of ignored seed items (if any), from the last time the seeds were imported to the frontier. String of all ignored seed items, or null if none

getInitialMarker
public FrontierMarker getInitialMarker(String regexpr, boolean inCacheOnly)(Code)
	Returns a URIFrontierMarker for the current, paused, job. If there is no current job or it is not paused null will be returned. Parameters: regexpr - A regular expression that each URI must match in order tobe considered 'within' the marker. Parameters: inCacheOnly - Limit marker scope to 'cached' URIs. a URIFrontierMarker for the current job. See Also: CrawlJob.getPendingURIsList(FrontierMarker,int,boolean) See Also: org.archive.crawler.framework.Frontier.getInitialMarker(Stringboolean) See Also: org.archive.crawler.framework.FrontierMarker

getJmxJobName
public String getJmxJobName()(Code)
	Unique name for job that is safe to use in jmx (Like displayname but without spaces).

getJobName
public String getJobName()(Code)
	Returns this job's 'name'. The name comes from the settings for this job, need not be unique and may change. For a unique identifier use CrawlJob.getUID() getUID() . The name corrisponds to the value of the 'name' tag in the 'meta' section of the settings file. This job's 'name'

getJobPriority
public int getJobPriority()(Code)
	Get this job's level of priority. this job's priority See Also: CrawlJob.setJobPriority(int) See Also: CrawlJob.PRIORITY_MINIMAL See Also: CrawlJob.PRIORITY_LOW See Also: CrawlJob.PRIORITY_AVERAGE See Also: CrawlJob.PRIORITY_HIGH See Also: CrawlJob.PRIORITY_CRITICAL

getLogPath
public String getLogPath(String log) throws AttributeNotFoundException, MBeanException, ReflectionException(Code)
	Returns the absolute path of the specified log. Note: If crawl has not begun, this file may not exist. Parameters: log - the absolute path for the specified log. throws: AttributeNotFoundException - throws: ReflectionException - throws: MBeanException -

getMBeanInfo
public MBeanInfo getMBeanInfo()(Code)
	Our mbean info (Needed for CrawlJob to qualify as aDynamicMBean).

getMbeanName
protected ObjectName getMbeanName()(Code)

getNotificationsSequenceNumber
protected static int getNotificationsSequenceNumber()(Code)
	Notification sequence number (Does increment after each access).

getNumberOfJournalEntries
public int getNumberOfJournalEntries()(Code)
	Returns the number of journal entries.

getPendingURIsList
public ArrayList getPendingURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose) throws InvalidFrontierMarkerException(Code)
	Returns the frontiers URI list based on the provided marker. This method will return null if there is not current job or if the current job is not paused. Only when there is a paused current job will this method return a URI list. Parameters: marker - URIFrontier marker Parameters: numberOfMatches - Maximum number of matches to return Parameters: verbose - Should detailed info be provided on each URI? the frontiers URI list based on the provided marker throws: InvalidFrontierMarkerException - When marker is inconsistent with the current state of thefrontier. See Also: CrawlJob.getInitialMarker(String,boolean) See Also: org.archive.crawler.framework.FrontierMarker

getProcessorsReport
public String getProcessorsReport()(Code)
	Get the Processors report for the running crawl. The Processors report for the running crawl.

getSettingsDirectory
public String getSettingsDirectory()(Code)
	Returns the directory where the configuration files for this job are located. the directory where the configuration files for this job arelocated

getSettingsHandler
public XMLSettingsHandler getSettingsHandler()(Code)
	Returns the settings handler for this job. It will have been initialized. the settings handler for this job.

getStatisticsTracking
public StatisticsTracking getStatisticsTracking()(Code)
	the statistics tracking instance (of null if none yet available).

getStatus
public String getStatus()(Code)
	Get the current status of this CrawlJob The current status of this CrawlJob(see constants defined here beginning with STATUS)

getThreadOneLine
public String getThreadOneLine()(Code)
	One-line threads report.

getThreadsReport
public String getThreadsReport()(Code)
	Get the CrawlControllers ToeThreads report for the running crawl. The CrawlControllers ToeThreads report

getUID
public String getUID()(Code)
	Returns this jobs unique ID (UID) that was issued by the CrawlJobHandler() when this job was first created. Job This jobs UID. See Also: CrawlJobHandler.getNextJobUID

importUri
public void importUri(String uri, boolean forceFetch, boolean isSeed) throws URIException(Code)
	Schedule a uri. Parameters: uri - Uri to schedule. Parameters: forceFetch - Should it be forcefetched. Parameters: isSeed - True if seed. throws: URIException -

importUri
public void importUri(String str, boolean forceFetch, boolean isSeed, boolean isFlush) throws URIException(Code)
	Schedule a uri. Parameters: str - String that can be: 1. a UURI, 2. a snippet of thecrawl.log line, or 3. a snippet from recover log. SeeCrawlJob.importUris(InputStream,String,boolean) for how it subparsesthe lines from crawl.log and recover.log. Parameters: forceFetch - Should it be forcefetched. Parameters: isSeed - True if seed. Parameters: isFlush - If true, flush the frontier IF it implementsflushing. throws: URIException -

importUris
public String importUris(String file, String style, String force)(Code)

importUris
public String importUris(String fileOrUrl, String style, boolean forceRevisit)(Code)

importUris
public String importUris(String fileOrUrl, String style, boolean forceRevisit, boolean areSeeds)(Code)
	Parameters: fileOrUrl - Name of file w/ seeds. Parameters: style - What style of seeds -- crawl log, recovery journal, orseeds file. Parameters: forceRevisit - Should we revisit even if seen before? Parameters: areSeeds - Is the file exclusively seeds? A display string that has a count of all added.

importUris
protected int importUris(InputStream is, String style, boolean forceRevisit)(Code)

importUris
protected int importUris(InputStream is, String style, boolean forceRevisit, boolean areSeeds)(Code)
	Import URIs. Parameters: is - Stream to use as URI source. Parameters: style - Style in which URIs are rendored. Currently support for`recoveryJournal`, `crawlLog`, and seeds fileformat (i.e `default`) where `default` style isa UURI per line (comments allowed). Parameters: forceRevisit - Whether we should revisit this URI even if we'vevisited it previously. Parameters: areSeeds - Are the imported URIs seeds? Count of added URIs.

invoke
public Object invoke(String operationName, Object[] params, String[] signature) throws ReflectionException(Code)

isCheckpointing
public boolean isCheckpointing()(Code)
	True if checkpointing.

isCrawling
public boolean isCrawling()(Code)

isNew
public boolean isNew()(Code)
	Is this a new job? True if is new.

isProfile
public boolean isProfile()(Code)
	Set if the job is considered to be a profile True if is a profile.

isReadOnly
public boolean isReadOnly()(Code)
	Is job read only? false until setReadOnly has been invoked, after that it returns true.

isRunning
public boolean isRunning()(Code)
	Returns true if the job is being crawled. true if the job is being crawled

kickUpdate
public void kickUpdate()(Code)
	Forward a 'kick' update to current controller if any. See Also: CrawlController.kickUpdate

killThread
public void killThread(int threadNumber, boolean replace)(Code)
	Kills a thread. For details see org.archive.crawler.framework.ToePool.killThread(intboolean)ToePool.killThread(int, boolean) . Parameters: threadNumber - Thread to kill. Parameters: replace - Should thread be replaced. See Also: org.archive.crawler.framework.ToePool.killThread(intboolean)

mustBeCrawling
public void mustBeCrawling()(Code)

pause
protected void pause()(Code)

postDeregister
public void postDeregister()(Code)

postRegister
public void postRegister(Boolean registrationDone)(Code)

preDeregister
public void preDeregister() throws Exception(Code)

preRegister
public ObjectName preRegister(MBeanServer server, ObjectName on) throws Exception(Code)

resume
protected void resume()(Code)

scanCheckpoints
public Collection scanCheckpoints()(Code)
	Read all the checkpoints found in the job's checkpoints directory into Checkpoint instances Collection containing list of all checkpoints.

setAttribute
public void setAttribute(Attribute attribute) throws AttributeNotFoundException(Code)

setAttributes
public AttributeList setAttributes(AttributeList attributes)(Code)

setCrawlOrderAttribute
protected void setCrawlOrderAttribute(String attribute_name, ComplexType ct, Attribute attribute) throws AttributeNotFoundException, InvalidAttributeValueException, MBeanException, ReflectionException(Code)

setErrorMessage
public void setErrorMessage(String string)(Code)
	Set an error message for this job. Generally this only occurs if the job is misconfigured. Parameters: string - the error message associated with this job

setJobPriority
public void setJobPriority(int priority)(Code)
	Set this job's level of priority. Parameters: priority - The level of priority See Also: CrawlJob.getJobPriority() See Also: CrawlJob.PRIORITY_MINIMAL See Also: CrawlJob.PRIORITY_LOW See Also: CrawlJob.PRIORITY_AVERAGE See Also: CrawlJob.PRIORITY_HIGH See Also: CrawlJob.PRIORITY_CRITICAL

setNew
public void setNew(boolean b)(Code)
	Set if the job is considered a new job or not. Parameters: b - Is the job considered to be new.

setNumberOfJournalEntries
public void setNumberOfJournalEntries(int numberOfJournalEntries)(Code)
	Parameters: numberOfJournalEntries - The number of journal entries to set.

setReadOnly
public void setReadOnly()(Code)
	Once called no changes can be made to the settings for this job. Typically this is done once a crawl is completed and further changes to the crawl order are therefor meaningless.

setRunning
protected void setRunning(boolean b)(Code)
	Set if job is being crawled. Parameters: b - Is job being crawled.

setStatus
public void setStatus(String status)(Code)
	Set the status of this CrawlJob. Parameters: status - Current status of CrawlJob(see constants defined here beginning with STATUS)

setupCrawlController
protected CrawlController setupCrawlController() throws InitializationException(Code)

setupForCrawlStart
public void setupForCrawlStart() throws InitializationException(Code)

stopCrawling
public void stopCrawling()(Code)

unregisterMBean
protected void unregisterMBean()(Code)

writeFrontierReport
public void writeFrontierReport(String reportName, PrintWriter writer)(Code)
	Write the requested frontier report to the given PrintWriter Parameters: reportName - Name of report to write. Parameters: writer - Where to write to.

writeThreadsReport
public void writeThreadsReport(String reportName, PrintWriter writer)(Code)
	Write the requested threads report to the given PrintWriter Parameters: reportName - Name of report to write. Parameters: writer - Where to write to.

Methods inherited from javax.management.NotificationBroadcasterSupport

public void addNotificationListener(NotificationListener listener, NotificationFilter filter, Object handback)(Code)(Java Doc)
public MBeanNotificationInfo[] getNotificationInfo()(Code)(Java Doc)
protected void handleNotification(NotificationListener listener, Notification notif, Object handback)(Code)(Java Doc)
public void removeNotificationListener(NotificationListener listener) throws ListenerNotFoundException(Code)(Java Doc)
public void removeNotificationListener(NotificationListener listener, NotificationFilter filter, Object handback) throws ListenerNotFoundException(Code)(Java Doc)
public void sendNotification(Notification notification)(Code)(Java Doc)

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.