Java Doc for Heritrix.java in  » Web-Crawler » heritrix » org » archive » crawler » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.crawler.Heritrix

Heritrix
public class Heritrix implements DynamicMBean,MBeanRegistration(Code)
Main class for Heritrix crawler. Heritrix is usually launched by a shell script that backgrounds heritrix that redirects all stdout and stderr emitted by heritrix to a log file. So that startup messages emitted subsequent to the redirection of stdout and stderr show on the console, this class prints usage or startup output such as where the web UI can be found, etc., to a STARTLOG that the shell script is waiting on. As soon as the shell script sees output in this file, it prints its content and breaks out of its wait. See ${HERITRIX_HOME}/bin/heritrix.

Heritrix can also be embedded or launched by webapp initialization or by JMX bootstrapping. So far I count 4 methods of instantiation:

  1. From this classes main -- the method usually used;
  2. From the Heritrix UI (The local-instances.jsp) page;
  3. A creation by a JMX agent at the behest of a remote JMX client; and
  4. A container such as tomcat or jboss.

author:
   gojomo
author:
   Kristinn Sigurdsson
author:
   Stack


Field Summary
final public static  StringDEFAULT_ENCODING
     Default encoding.

Constructor Summary
public  Heritrix()
     Constructor. Does not register the created instance with JMX.
public  Heritrix(boolean jmxregister)
    
public  Heritrix(String name, boolean jmxregister)
     Constructor.
public  Heritrix(String name, boolean jmxregister, CrawlJobHandler cjh)
     Constructor.

Method Summary
public  StringaddCrawlJob(String orderPathOrUrl, String name, String description, String seeds)
     This method is called when we have an order file to hand that we want to base a job on.
protected  StringaddCrawlJob(URL url, HttpURLConnection connection, String name, String description, String seeds)
    
protected  StringaddCrawlJob(File order, String name, String description, String seeds)
    
protected  CrawlJobaddCrawlJob(CrawlJob job)
    
public  StringaddCrawlJobBasedOn(String jobUidOrProfile, String name, String description, String seeds)
    
protected  CrawlJobaddCrawlJobBasedOn(File orderFile, String name, String description, String seeds)
    
protected  StringaddCrawlJobBasedonJar(File jarFile, String name, String description, String seeds)
     Undo jar file and use as basis for a new job.
protected static  ObjectNameaddGuiPort(ObjectName name)
    
protected static  ObjectNameaddVitals(ObjectName name)
     Add vital stats to passed in ObjectName.
Parameters:
  name - ObjectName to add to.
protected  OpenMBeanInfoSupportbuildMBeanInfo()
     Build up the MBean info for Heritrix main.
protected  StringcheckForEmptyPlaceHolder(String str)
     If passed str has placeholder for the empty string, return the empty string else return orginal. Dumb jmx clients can't pass empty string so they'll pass a representation of empty string such as ' ' or '-'.
protected static  voidconfigureTrustStore()
     Configure our trust store. If system property is defined, then use it for our truststore.
protected static  voidcontainerInitialization()
     Run setup tasks for this 'container'.
protected static  CrawlJobcreateCrawlJob(CrawlJobHandler handler, File crawlOrderFile, String name)
    
protected  CrawlJobcreateCrawlJobBasedOn(File orderFile, String name, String description, String seeds)
    
protected static  voidderegisterJndi(ObjectName name)
    
public  voiddestroy()
     Do inverse of construction.
protected static  StringdoCmdLineArgs(String[] args)
    
protected  StringdoOneCrawl(String crawlOrderFile)
     Launch the crawler without a web UI and run the passed crawl only.
protected  StringdoOneCrawl(String crawlOrderFile, CrawlStatusListener listener)
     Launch the crawler without a web UI and run passed crawl only.
public  SinkHandlerLogRecordgetAlert(String id)
    
public  VectorgetAlerts()
    
public  intgetAlertsCount()
    
public  ObjectgetAttribute(String attribute_name)
    
public  AttributeListgetAttributes(String[] attributeNames)
    
public static  FilegetConfdir()
     Get the configuration directory.
public static  FilegetConfdir(boolean fail)
     Get the configuration directory.
Parameters:
  fail - Throw IOE if can't find directory if true, else justreturn null.
protected  StringgetCrawlendReport(String jobUid, String reportName)
     Return named crawl end report for job with passed uid. Crawler makes reports when its finished its crawl.
protected static  FilegetHeritrixHome()
     Exploit -Dheritrix.home if available to us.
public static  StringgetHeritrixOut()
    
public static  SimpleHttpServergetHttpServer()
     Returns the httpServer.
public static  MapgetInstances()
    
public static  ObjectNamegetJmxObjectName()
    
public static  ObjectNamegetJmxObjectName(String name)
    
public static  ObjectNamegetJmxObjectName(String name, String type)
    
protected static  ObjectNamegetJndiContainerName()
     Jndi container name -- the name to use for the 'container' thatcan host zero or more heritrix instances (Return a JMX ObjectName.
protected static  ContextgetJndiContext()
    
public  CrawlJobHandlergetJobHandler()
    
public static  FilegetJobsdir()
     The directory into which we put jobs.
public  MBeanInfogetMBeanInfo()
    
public  ObjectNamegetMBeanName()
    
public static  MBeanServergetMBeanServer()
     Get MBeanServer. Currently uses first MBeanServer found.
public  VectorgetNewAlerts()
    
public  intgetNewAlertsCount()
    
protected  StringgetNoJmxName()
    
protected static  InputStreamgetPropertiesInputStream()
    
protected static  ThreadgetShutdownThread(boolean sysexit, int exitCode, String name)
    
public static  HeritrixgetSingleInstance()
    
public  StringgetStatus()
    
protected static  FilegetSubDir(String subdirName)
     Get and check for existence of expected subdir. If development flag set, then look for dir under src dir.
Parameters:
  subdirName - Dir to look for.
protected static  FilegetSubDir(String subdirName, boolean fail)
     Get and optionally check for existence of subdir. If development flag set, then look for dir under src dir.
Parameters:
  subdirName - Dir to look for.
Parameters:
  fail - True if we are to fail if directory does notexist; false if we are to return false if the directory does not exist.
public static  StringgetVersion()
     Get the heritrix version. The heritrix version.
public static  FilegetWarsdir()
    
public  Stringinterrupt(String threadName)
    
public  Objectinvoke(String operationName, Object[] params, String[] signature)
    
public static  booleanisCommandLine()
    
protected static  booleanisDevelopment()
    
public static  booleanisSingleInstance()
    
public  booleanisStarted()
    
protected static  booleanisValidLoginPasswordString(String str)
     Test string is valid login/password string. A valid login/password string has the login and password compounded w/ a ':' delimiter.
Parameters:
  str - String to test.
public  Stringlaunch()
     Launch the crawler for a web UI.
public  Stringlaunch(String crawlOrderFile, boolean runMode)
     Launch the crawler for a web UI. Crawler hangs around waiting on jobs.
Parameters:
  crawlOrderFile - File to crawl.
protected static  PropertiesloadProperties()
     Load the heritrix.properties file.
public static  voidmain(String[] args)
     Launch program. Optionally will launch a web server to host UI.
protected  TabularDatamakeJobsTabularData(List jobs)
    
protected static  voidpatchLogging()
     If the user hasn't altered the default logging parameters, tighten them up somewhat: some of our libraries are way too verbose at the INFO or WARNING levels. This might be a problem running inside in someone else's container.
public static  voidperformHeritrixShutDown()
     Exit program.
public static  voidperformHeritrixShutDown(int exitCode)
     Exit program.
public  voidpostDeregister()
    
public  voidpostRegister(Boolean registrationDone)
    
public  voidpreDeregister()
    
public  ObjectNamepreRegister(MBeanServer server, ObjectName name)
    
public static  voidprepareHeritrixShutDown()
     Prepars for program shutdown.
public  voidreadAlert(String id)
    
protected static  voidregisterContainerJndi()
    
protected static  voidregisterHeritrix(Heritrix h, String name, boolean jmxregister)
     Register Heritrix with JNDI, JMX, and with the static hashtable of all Heritrix instances known to this JVM. If launched from cmdline, register Heritrix MBean if an agent to register ourselves with.
protected static  voidregisterJndi(ObjectName name)
    
public static  MBeanServerregisterMBean(Object objToRegister, String name, String type)
    
public static  MBeanServerregisterMBean(MBeanServer server, Object objToRegister, String name, String type)
    
public static  MBeanServerregisterMBean(MBeanServer server, Object objToRegister, ObjectName objName)
    
public  voidremoveAlert(String id)
    
public static  voidresetAuthentication(String newUsername, String newPassword)
     Replace existing administrator login info with new info.
protected static  Stringselftest(String oneSelfTestName, int port)
    
public  voidsetAttribute(Attribute attribute)
    
public  AttributeListsetAttributes(AttributeList attributes)
    
public static  voidshutdown(int exitCode)
     Shutdown all running heritrix instances and the JVM.
public static  voidshutdown()
    
public  voidstart()
     Start Heritrix. Used by JMX and webapp initialization for starting Heritrix. Not by the cmdline launched Heritrix.
public  voidstartCrawling()
    
protected static  StringstartEmbeddedWebserver(int port, boolean lho, String adminLoginPassword)
     Start up the embedded Jetty webserver instance.
protected static  StringstartEmbeddedWebserver(Collection<String> hosts, int port, String adminLoginPassword)
     Start up the embedded Jetty webserver instance.
public  voidstop()
     Stop Heritrix.
public  voidstopCrawling()
    
protected static  voidunregisterHeritrix(Heritrix h)
    
public static  voidunregisterMBean(MBeanServer server, String name, String type)
    
public static  voidunregisterMBean(MBeanServer server, ObjectName name)
    

Field Detail
DEFAULT_ENCODING
final public static String DEFAULT_ENCODING(Code)
Default encoding. Used for content when fetching if none specified.




Constructor Detail
Heritrix
public Heritrix() throws IOException(Code)
Constructor. Does not register the created instance with JMX. Assumed this constructor is used by such as JMX agent creating an instance of Heritrix at the commmand of a remote client (In this case Heritrix will be registered by the invoking agent).
throws:
  IOException -



Heritrix
public Heritrix(boolean jmxregister) throws IOException(Code)



Heritrix
public Heritrix(String name, boolean jmxregister) throws IOException(Code)
Constructor.
Parameters:
  name - If null, we bring up the default Heritrix instance.
Parameters:
  jmxregister - True if we are to register this instance with JMXagent.
throws:
  IOException -



Heritrix
public Heritrix(String name, boolean jmxregister, CrawlJobHandler cjh) throws IOException(Code)
Constructor.
Parameters:
  name - If null, we bring up the default Heritrix instance.
Parameters:
  jmxregister - True if we are to register this instance with JMXagent.
Parameters:
  cjh - CrawlJobHandler to use.
throws:
  IOException -




Method Detail
addCrawlJob
public String addCrawlJob(String orderPathOrUrl, String name, String description, String seeds) throws IOException, FatalConfigurationException(Code)
This method is called when we have an order file to hand that we want to base a job on. It leaves the order file in place and just starts up a job that uses all the order points to for locations for logs, etc.
Parameters:
  orderPathOrUrl - Path to an order file or to a seeds file.
Parameters:
  name - Name to use for this job.
Parameters:
  description -
Parameters:
  seeds - A status string.
throws:
  IOException -
throws:
  FatalConfigurationException -



addCrawlJob
protected String addCrawlJob(URL url, HttpURLConnection connection, String name, String description, String seeds) throws IOException, FatalConfigurationException(Code)



addCrawlJob
protected String addCrawlJob(File order, String name, String description, String seeds) throws FatalConfigurationException, IOException(Code)



addCrawlJob
protected CrawlJob addCrawlJob(CrawlJob job)(Code)



addCrawlJobBasedOn
public String addCrawlJobBasedOn(String jobUidOrProfile, String name, String description, String seeds)(Code)



addCrawlJobBasedOn
protected CrawlJob addCrawlJobBasedOn(File orderFile, String name, String description, String seeds) throws FatalConfigurationException(Code)



addCrawlJobBasedonJar
protected String addCrawlJobBasedonJar(File jarFile, String name, String description, String seeds) throws IOException, FatalConfigurationException(Code)
Undo jar file and use as basis for a new job.
Parameters:
  jarFile - Pointer to file that holds jar.
Parameters:
  name - Name to use for new job.
Parameters:
  description -
Parameters:
  seeds - Message.
throws:
  IOException -
throws:
  FatalConfigurationException -



addGuiPort
protected static ObjectName addGuiPort(ObjectName name) throws MalformedObjectNameException, NullPointerException(Code)



addVitals
protected static ObjectName addVitals(ObjectName name) throws UnknownHostException, MalformedObjectNameException, NullPointerException(Code)
Add vital stats to passed in ObjectName.
Parameters:
  name - ObjectName to add to. name with host, guiport, and jmxport added.
throws:
  UnknownHostException -
throws:
  MalformedObjectNameException -
throws:
  NullPointerException -



buildMBeanInfo
protected OpenMBeanInfoSupport buildMBeanInfo()(Code)
Build up the MBean info for Heritrix main. Return created mbean info instance.



checkForEmptyPlaceHolder
protected String checkForEmptyPlaceHolder(String str)(Code)
If passed str has placeholder for the empty string, return the empty string else return orginal. Dumb jmx clients can't pass empty string so they'll pass a representation of empty string such as ' ' or '-'. Convert such strings to empty string.
Parameters:
  str - String to check. Original str or empty string if strcontains a placeholder for the empty-string (e.g. '-', or ' ').



configureTrustStore
protected static void configureTrustStore()(Code)
Configure our trust store. If system property is defined, then use it for our truststore. Otherwise use the heritrix truststore under conf directory if it exists.

If we're not launched from the command-line, we will not be able to find our truststore. The truststore is nor normally used so rare should this be a problem (In case where we don't use find our trust store, we'll use the 'default' -- either the JVMs or the containers).




containerInitialization
protected static void containerInitialization() throws IOException(Code)
Run setup tasks for this 'container'. Idempotent.
throws:
  IOException -



createCrawlJob
protected static CrawlJob createCrawlJob(CrawlJobHandler handler, File crawlOrderFile, String name) throws InvalidAttributeValueException(Code)



createCrawlJobBasedOn
protected CrawlJob createCrawlJobBasedOn(File orderFile, String name, String description, String seeds) throws FatalConfigurationException(Code)



deregisterJndi
protected static void deregisterJndi(ObjectName name) throws NullPointerException, NamingException(Code)



destroy
public void destroy()(Code)
Do inverse of construction. Used by anyone who does a 'new Heritrix' when they want to cleanup the instance. Of note, there may be Heritrix threads still hanging around after the call to destroy completes. They'll eventually go down after they've finished their cleanup routines. In particular, if you are watching Heritrix via JMX, you can see the Heritrix instance JMX bean unregister ahead of the CrawlJob JMX bean that its hosting.



doCmdLineArgs
protected static String doCmdLineArgs(String[] args) throws Exception(Code)



doOneCrawl
protected String doOneCrawl(String crawlOrderFile) throws InitializationException, InvalidAttributeValueException(Code)
Launch the crawler without a web UI and run the passed crawl only. Specialized version of Heritrix.launch() .
Parameters:
  crawlOrderFile - The crawl order to crawl.
throws:
  InitializationException -
throws:
  InvalidAttributeValueException - Status string.



doOneCrawl
protected String doOneCrawl(String crawlOrderFile, CrawlStatusListener listener) throws InitializationException, InvalidAttributeValueException(Code)
Launch the crawler without a web UI and run passed crawl only. Specialized version of Heritrix.launch() .
Parameters:
  crawlOrderFile - The crawl order to crawl.
Parameters:
  listener - Register this crawl status listener before startingcrawl (You can use this listener to notice end-of-crawl).
throws:
  InitializationException -
throws:
  InvalidAttributeValueException - Status string.



getAlert
public SinkHandlerLogRecord getAlert(String id)(Code)



getAlerts
public Vector getAlerts()(Code)



getAlertsCount
public int getAlertsCount()(Code)



getAttribute
public Object getAttribute(String attribute_name) throws AttributeNotFoundException(Code)



getAttributes
public AttributeList getAttributes(String[] attributeNames)(Code)



getConfdir
public static File getConfdir() throws IOException(Code)
Get the configuration directory. The conf directory under HERITRIX_HOME or null if none canbe found.
throws:
  IOException -



getConfdir
public static File getConfdir(boolean fail) throws IOException(Code)
Get the configuration directory.
Parameters:
  fail - Throw IOE if can't find directory if true, else justreturn null. The conf directory under HERITRIX_HOME or null (or an IOE) ifcan't be found.
throws:
  IOException -



getCrawlendReport
protected String getCrawlendReport(String jobUid, String reportName) throws IOException(Code)
Return named crawl end report for job with passed uid. Crawler makes reports when its finished its crawl. Use this method to get a String version of one of these files.
Parameters:
  jobUid - The unique ID for the job whose reports you want to see(Must be a completed job).
Parameters:
  reportName - Name of report minus '.txt' (e.g. crawl-report). String version of the on-disk report.
throws:
  IOException -



getHeritrixHome
protected static File getHeritrixHome() throws IOException(Code)
Exploit -Dheritrix.home if available to us. Is current working dir if no heritrix.home property supplied. Heritrix home directory.
throws:
  IOException -



getHeritrixOut
public static String getHeritrixOut()(Code)
The file we dump stdout and stderr into.



getHttpServer
public static SimpleHttpServer getHttpServer()(Code)
Returns the httpServer. May be null if one was not started.



getInstances
public static Map getInstances()(Code)
Return all registered instances of Heritrix (Rare are there more than one).



getJmxObjectName
public static ObjectName getJmxObjectName() throws MalformedObjectNameException, NullPointerException(Code)



getJmxObjectName
public static ObjectName getJmxObjectName(String name) throws MalformedObjectNameException, NullPointerException(Code)



getJmxObjectName
public static ObjectName getJmxObjectName(String name, String type) throws MalformedObjectNameException, NullPointerException(Code)



getJndiContainerName
protected static ObjectName getJndiContainerName() throws MalformedObjectNameException, NullPointerException, UnknownHostException(Code)
Jndi container name -- the name to use for the 'container' thatcan host zero or more heritrix instances (Return a JMX ObjectName. Weuse ObjectName because then we're sync'd with JMX naming and ObjectNamehas nice parsing).
throws:
  NullPointerException -
throws:
  MalformedObjectNameException -
throws:
  UnknownHostException -



getJndiContext
protected static Context getJndiContext() throws NamingException(Code)
Jndi context for the crawler or null if none found.
throws:
  NamingException -



getJobHandler
public CrawlJobHandler getJobHandler()(Code)
Get the job handler The CrawlJobHandler being used.



getJobsdir
public static File getJobsdir() throws IOException(Code)
The directory into which we put jobs. If the system property'heritrix.jobsdir' is set, we will use its value in place of the default'jobs' directory in the current working directory.
throws:
  IOException -



getMBeanInfo
public MBeanInfo getMBeanInfo()(Code)



getMBeanName
public ObjectName getMBeanName()(Code)
Name this instance registered in JMX (Only available after JMXregistration).



getMBeanServer
public static MBeanServer getMBeanServer()(Code)
Get MBeanServer. Currently uses first MBeanServer found. This will definetly not be whats always wanted. TODO: Make which server settable. Also, if none, put up our own MBeanServer. An MBeanServer to register with or null.



getNewAlerts
public Vector getNewAlerts()(Code)



getNewAlertsCount
public int getNewAlertsCount()(Code)



getNoJmxName
protected String getNoJmxName()(Code)
Name to use when no JMX agent available.



getPropertiesInputStream
protected static InputStream getPropertiesInputStream() throws IOException(Code)



getShutdownThread
protected static Thread getShutdownThread(boolean sysexit, int exitCode, String name)(Code)



getSingleInstance
public static Heritrix getSingleInstance()(Code)
Returns single instance or null if no instance or multiple.



getStatus
public String getStatus()(Code)



getSubDir
protected static File getSubDir(String subdirName) throws IOException(Code)
Get and check for existence of expected subdir. If development flag set, then look for dir under src dir.
Parameters:
  subdirName - Dir to look for. The extant subdir. Otherwise null if we're runningin a webapp context where there is no conf directory available.
throws:
  IOException - if unable to find expected subdir.



getSubDir
protected static File getSubDir(String subdirName, boolean fail) throws IOException(Code)
Get and optionally check for existence of subdir. If development flag set, then look for dir under src dir.
Parameters:
  subdirName - Dir to look for.
Parameters:
  fail - True if we are to fail if directory does notexist; false if we are to return false if the directory does not exist. The extant subdir. Otherwise null if we're runningin a webapp context where there is no subdir directory available.
throws:
  IOException - if unable to find expected subdir.



getVersion
public static String getVersion()(Code)
Get the heritrix version. The heritrix version. May be null.



getWarsdir
public static File getWarsdir() throws IOException(Code)

throws:
  IOException - Returns the directory under which reside the WAR fileswe're to load into the servlet container.



interrupt
public String interrupt(String threadName)(Code)



invoke
public Object invoke(String operationName, Object[] params, String[] signature) throws ReflectionException(Code)



isCommandLine
public static boolean isCommandLine()(Code)
Returns true if Heritrix was launched from the command line.(When launched from command line, we do stuff like put up a web serverto manage our web interface and we register ourselves with the firstavailable jmx agent).



isDevelopment
protected static boolean isDevelopment()(Code)



isSingleInstance
public static boolean isSingleInstance()(Code)
True if only one instance of Heritrix.



isStarted
public boolean isStarted()(Code)
True if heritrix has been started.



isValidLoginPasswordString
protected static boolean isValidLoginPasswordString(String str)(Code)
Test string is valid login/password string. A valid login/password string has the login and password compounded w/ a ':' delimiter.
Parameters:
  str - String to test. True if valid password/login string.



launch
public String launch() throws Exception(Code)
Launch the crawler for a web UI. Crawler hangs around waiting on jobs.
exception:
  Exception - A status string describing how the launch went.
throws:
  Exception -



launch
public String launch(String crawlOrderFile, boolean runMode) throws Exception(Code)
Launch the crawler for a web UI. Crawler hangs around waiting on jobs.
Parameters:
  crawlOrderFile - File to crawl. May be null.
Parameters:
  runMode - Whether crawler should be set to run mode.
exception:
  Exception - A status string describing how the launch went.



loadProperties
protected static Properties loadProperties() throws IOException(Code)
Load the heritrix.properties file. Adds any property that starts with HERITRIX_PROPERTIES_PREFIX or ARCHIVE_PACKAGE into system properties (except logging '.level' directives). Loaded properties.
throws:
  IOException -



main
public static void main(String[] args) throws Exception(Code)
Launch program. Optionally will launch a web server to host UI. Will also register Heritrix MBean with first found JMX Agent (Usually the 1.5.0 JVM Agent).
Parameters:
  args - Command line arguments.
throws:
  Exception -



makeJobsTabularData
protected TabularData makeJobsTabularData(List jobs) throws OpenDataException(Code)



patchLogging
protected static void patchLogging() throws SecurityException, IOException(Code)
If the user hasn't altered the default logging parameters, tighten them up somewhat: some of our libraries are way too verbose at the INFO or WARNING levels. This might be a problem running inside in someone else's container. Container's seem to prefer commons logging so we ain't messing them doing the below.
throws:
  IOException -
throws:
  SecurityException -



performHeritrixShutDown
public static void performHeritrixShutDown()(Code)
Exit program. Recommended that prepareHeritrixShutDown() be invoked prior to this method.



performHeritrixShutDown
public static void performHeritrixShutDown(int exitCode)(Code)
Exit program. Recommended that prepareHeritrixShutDown() be invoked prior to this method.
Parameters:
  exitCode - Code to pass System.exit.



postDeregister
public void postDeregister()(Code)



postRegister
public void postRegister(Boolean registrationDone)(Code)



preDeregister
public void preDeregister() throws Exception(Code)



preRegister
public ObjectName preRegister(MBeanServer server, ObjectName name) throws Exception(Code)



prepareHeritrixShutDown
public static void prepareHeritrixShutDown()(Code)
Prepars for program shutdown. This method does it's best to prepare the program so that it can exit normally. It will kill the httpServer and terminate any running job.
It is advisible to wait a few (~1000) millisec after calling this method and before calling performHeritrixShutDown() to allow as many threads as possible to finish what they are doing.



readAlert
public void readAlert(String id)(Code)



registerContainerJndi
protected static void registerContainerJndi() throws MalformedObjectNameException, NullPointerException, UnknownHostException, NamingException(Code)



registerHeritrix
protected static void registerHeritrix(Heritrix h, String name, boolean jmxregister) throws MalformedObjectNameException, InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException(Code)
Register Heritrix with JNDI, JMX, and with the static hashtable of all Heritrix instances known to this JVM. If launched from cmdline, register Heritrix MBean if an agent to register ourselves with. Usually this method will only have effect if we're running in a 1.5.0 JDK and command line options such as '-Dcom.sun.management.jmxremote.port=8082 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false' are supplied. See Monitoring and Management Using JMX for more on the command line options and how to connect to the Heritrix bean using the JDK 1.5.0 jconsole tool. We register currently with first server we find (TODO: Make configurable).

If we register successfully with a JMX agent, then part of the registration will include our registering ourselves with JNDI.

Finally, add the heritrix instance to the hashtable of all the Heritrix instances floating in the current VM. This latter registeration happens whether or no there is a JMX agent to register with. This is a list we keep out of convenience so its easy iterating over all all instances calling stop when main application is going down.
Parameters:
  h - Instance of heritrix to register.
Parameters:
  name - Name to use for this Heritrix instance.
Parameters:
  jmxregister - True if we are to register this instance with JMX.
throws:
  NullPointerException -
throws:
  MalformedObjectNameException -
throws:
  NotCompliantMBeanException -
throws:
  MBeanRegistrationException -
throws:
  InstanceAlreadyExistsException -




registerJndi
protected static void registerJndi(ObjectName name) throws NullPointerException, NamingException(Code)



registerMBean
public static MBeanServer registerMBean(Object objToRegister, String name, String type) throws InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException(Code)



registerMBean
public static MBeanServer registerMBean(MBeanServer server, Object objToRegister, String name, String type) throws InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException(Code)



registerMBean
public static MBeanServer registerMBean(MBeanServer server, Object objToRegister, ObjectName objName) throws InstanceAlreadyExistsException, MBeanRegistrationException, NotCompliantMBeanException(Code)



removeAlert
public void removeAlert(String id)(Code)



resetAuthentication
public static void resetAuthentication(String newUsername, String newPassword)(Code)
Replace existing administrator login info with new info.
Parameters:
  newUsername - new administrator login username
Parameters:
  newPassword - new administrator login password



selftest
protected static String selftest(String oneSelfTestName, int port) throws Exception(Code)
Run the selftest
Parameters:
  oneSelfTestName - Name of a test if we are to run one only ratherthan the default running all tests.
Parameters:
  port - Port number to use for web UI.
exception:
  Exception - Status of how selftest startup went.



setAttribute
public void setAttribute(Attribute attribute) throws AttributeNotFoundException(Code)



setAttributes
public AttributeList setAttributes(AttributeList attributes)(Code)



shutdown
public static void shutdown(int exitCode)(Code)
Shutdown all running heritrix instances and the JVM. Assumes stop has already been called.
Parameters:
  exitCode - Exit code to pass system exit.



shutdown
public static void shutdown()(Code)



start
public void start()(Code)
Start Heritrix. Used by JMX and webapp initialization for starting Heritrix. Not by the cmdline launched Heritrix. Idempotent. If start is called by JMX, then new instance of Heritrix is automatically registered w/ JMX Agent. If started by webapp, need to register the new Heritrix instance.



startCrawling
public void startCrawling()(Code)



startEmbeddedWebserver
protected static String startEmbeddedWebserver(int port, boolean lho, String adminLoginPassword) throws Exception(Code)
Start up the embedded Jetty webserver instance. This is done when we're run from the command-line.
Parameters:
  port - Port number to use for web UI.
Parameters:
  adminLoginPassword - Compound of login and password.
throws:
  Exception - Status on webserver startup.



startEmbeddedWebserver
protected static String startEmbeddedWebserver(Collection<String> hosts, int port, String adminLoginPassword) throws Exception(Code)
Start up the embedded Jetty webserver instance. This is done when we're run from the command-line.
Parameters:
  hosts - a list of IP addresses or hostnames to bind to, or anempty collection to bind to all available network interfaces
Parameters:
  port - Port number to use for web UI.
Parameters:
  adminLoginPassword - Compound of login and password.
throws:
  Exception - Status on webserver startup.



stop
public void stop()(Code)
Stop Heritrix. Used by JMX and webapp initialization for stopping Heritrix.



stopCrawling
public void stopCrawling()(Code)



unregisterHeritrix
protected static void unregisterHeritrix(Heritrix h) throws InstanceNotFoundException, MBeanRegistrationException, NullPointerException(Code)



unregisterMBean
public static void unregisterMBean(MBeanServer server, String name, String type)(Code)



unregisterMBean
public static void unregisterMBean(MBeanServer server, ObjectName name)(Code)



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.