Java Doc for Processor.java in  » Web-Crawler » heritrix » org » archive » crawler » framework » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.framework 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


org.archive.crawler.settings.ModuleType
   org.archive.crawler.framework.Processor

All known Subclasses:   org.archive.crawler.processor.recrawl.PersistProcessor,  org.archive.crawler.extractor.Extractor,  org.archive.crawler.fetcher.FetchHTTP,  org.archive.crawler.processor.BeanShellProcessor,  org.archive.crawler.framework.WriterPoolProcessor,  org.archive.crawler.prefetch.RuntimeLimitEnforcer,  org.archive.crawler.postprocessor.FrontierScheduler,  org.archive.crawler.prefetch.QuotaEnforcer,  org.archive.crawler.extractor.ChangeEvaluator,  org.archive.crawler.fetcher.FetchDNS,  org.archive.crawler.prefetch.PreconditionEnforcer,  org.archive.crawler.postprocessor.LowDiskPauseProcessor,  org.archive.crawler.processor.recrawl.FetchHistoryProcessor,  org.archive.crawler.writer.MirrorWriterProcessor,  org.archive.crawler.processor.CrawlMapper,  org.archive.crawler.postprocessor.WaitEvaluator,  org.archive.crawler.extractor.HTTPContentDigest,  org.archive.crawler.fetcher.FetchFTP,  org.archive.crawler.extractor.ExtractorHTTP,  org.archive.crawler.writer.Kw3WriterProcessor,  org.archive.crawler.postprocessor.CrawlStateUpdater,  org.archive.crawler.framework.Scoper,
Processor
public class Processor extends ModuleType (Code)
Base class for URI processing classes.

Each URI is processed by a user defined series of processors. This class provides the basic infrastructure for these but does not actually do anything. New processors can be easily created by subclassing this class.

Classes subclassing this one should not trap InterruptedExceptions. They should be allowed to propagate to the ToeThread executing the processor. Also they should immediately exit their main method (innerProcess()) if the interrupted flag is set.
author:
   Gordon Mohr
See Also:   org.archive.crawler.framework.ToeThread



Field Summary
final public static  StringATTR_DECIDE_RULES
     Key to use asking settings for decide-rules value.
final public static  StringATTR_ENABLED
     Key to use asking settings for enabled value.
protected  StringattrDecideRules
    

Constructor Summary
public  Processor(String name, String description)
    

Method Summary
protected  voidcheckForInterrupt()
    
protected  voidfinalTasks()
     Classes subclassing this one should override this method to perform processor specific actions.
public  CrawlControllergetController()
     Get the controller object.
protected  DecideRulegetDecideRule(Object o)
    
public  ProcessorgetDefaultNextProcessor(CrawlURI curi)
     Returns the next processor for the given CrawlURI in the processor chain.
Parameters:
  curi - The CrawlURI that we want to find the next processor for.
protected  voidinitialTasks()
     Classes subclassing this one should override this method to perform processor specific actions.
protected  voidinnerProcess(CrawlURI curi)
     Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
protected  voidinnerRejectProcess(CrawlURI curi)
    
protected  booleanisContentToProcess(CrawlURI curi)
    
Parameters:
  curi - CrawlURI to examine.
protected  booleanisExpectedMimeType(String contentType, String expectedPrefix)
    
Parameters:
  contentType - Found content type.
Parameters:
  expectedPrefix - String to find at start of contenttype: e.g.text/html.
protected  booleanisHttpTransactionContentToProcess(CrawlURI curi)
    
Parameters:
  curi - CrawlURI to examine.
public  voidkickUpdate()
    
final public  voidprocess(CrawlURI curi)
     Perform processing on the given CrawlURI.
public  Stringreport()
     Compiles and returns a report (in human readable form) about the status of the processor.
protected  booleanrulesAccept(Object o)
    
protected  booleanrulesAccept(DecideRule rule, Object o)
    
public  voidsetDefaultNextProcessor(Processor nextProcessor)
     Set the default next processor in the chain.
public  Processorspawn(int serialNum)
    

Field Detail
ATTR_DECIDE_RULES
final public static String ATTR_DECIDE_RULES(Code)
Key to use asking settings for decide-rules value.



ATTR_ENABLED
final public static String ATTR_ENABLED(Code)
Key to use asking settings for enabled value.



attrDecideRules
protected String attrDecideRules(Code)
local name for decide-rules




Constructor Detail
Processor
public Processor(String name, String description)(Code)

Parameters:
  name -
Parameters:
  description -




Method Detail
checkForInterrupt
protected void checkForInterrupt() throws InterruptedException(Code)



finalTasks
protected void finalTasks()(Code)
Classes subclassing this one should override this method to perform processor specific actions.



getController
public CrawlController getController()(Code)
Get the controller object. the controller object.



getDecideRule
protected DecideRule getDecideRule(Object o)(Code)



getDefaultNextProcessor
public Processor getDefaultNextProcessor(CrawlURI curi)(Code)
Returns the next processor for the given CrawlURI in the processor chain.
Parameters:
  curi - The CrawlURI that we want to find the next processor for. The next processor for the given CrawlURI in the processor chain.



initialTasks
protected void initialTasks()(Code)
Classes subclassing this one should override this method to perform processor specific actions.

This method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.




innerProcess
protected void innerProcess(CrawlURI curi) throws InterruptedException(Code)
Classes subclassing this one should override this method to perform their custom actions on the CrawlURI.
Parameters:
  curi - The CrawlURI being processed.
throws:
  InterruptedException -



innerRejectProcess
protected void innerRejectProcess(CrawlURI curi) throws InterruptedException(Code)

Parameters:
  curi - CrawlURI instance.
throws:
  InterruptedException -



isContentToProcess
protected boolean isContentToProcess(CrawlURI curi)(Code)

Parameters:
  curi - CrawlURI to examine. True if content to process -- content length is > 0 -- and links have not yet been extracted.



isExpectedMimeType
protected boolean isExpectedMimeType(String contentType, String expectedPrefix)(Code)

Parameters:
  contentType - Found content type.
Parameters:
  expectedPrefix - String to find at start of contenttype: e.g.text/html. True if passed content-type begins withexpected mimetype.



isHttpTransactionContentToProcess
protected boolean isHttpTransactionContentToProcess(CrawlURI curi)(Code)

Parameters:
  curi - CrawlURI to examine. True if Processor.isContentToProcess(CrawlURI) andthe CrawlURI represents a successful http transaction.



kickUpdate
public void kickUpdate()(Code)



process
final public void process(CrawlURI curi) throws InterruptedException(Code)
Perform processing on the given CrawlURI.
Parameters:
  curi -
throws:
  InterruptedException -



report
public String report()(Code)
Compiles and returns a report (in human readable form) about the status of the processor. The processor's name (of implementing class) should always be included.

Examples of stats declared would include:
Number of CrawlURIs handled.
Number of links extracted (for link extractors)
etc. A human readable report on the processor's state.




rulesAccept
protected boolean rulesAccept(Object o)(Code)



rulesAccept
protected boolean rulesAccept(DecideRule rule, Object o)(Code)



setDefaultNextProcessor
public void setDefaultNextProcessor(Processor nextProcessor)(Code)
Set the default next processor in the chain.
Parameters:
  nextProcessor - the default next processor in the chain.



spawn
public Processor spawn(int serialNum)(Code)



Methods inherited from org.archive.crawler.settings.ModuleType
public Type addElement(CrawlerSettings settings, Type type) throws InvalidAttributeValueException(Code)(Java Doc)
protected void listUsedFiles(List<String> list)(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.