Java Doc for RobotsHonoringPolicy.java in  » Web-Crawler » heritrix » org » archive » crawler » datamodel » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.datamodel 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


org.archive.crawler.settings.ModuleType
   org.archive.crawler.datamodel.RobotsHonoringPolicy

RobotsHonoringPolicy
public class RobotsHonoringPolicy extends ModuleType (Code)
RobotsHonoringPolicy represent the strategy used by the crawler for determining how robots.txt files will be honored. Five kinds of policies exist:
classic:
obey the first set of robots.txt directives that apply to your current user-agent
ignore:
ignore robots.txt directives entirely
custom:
obey a specific operator-entered set of robots.txt directives for a given host
most-favored:
obey the most liberal restrictions offered (if *any* crawler is allowed to get a page, get it)
most-favored-set:
given some set of user-agent patterns, obey the most liberal restriction offered to any
The two last ones has the opportunity of adopting a different user-agent to reflect the restrictions we've opted to use.
author:
   John Erik Halse


Field Summary
final public static  StringATTR_CUSTOM_ROBOTS
    
final public static  StringATTR_MASQUERADE
    
final public static  StringATTR_NAME
    
final public static  StringATTR_TYPE
    
final public static  StringATTR_USER_AGENTS
    
final public static  intCLASSIC
    
final public static  intCUSTOM
    
final public static  intIGNORE
    
final public static  intMOST_FAVORED
    
final public static  intMOST_FAVORED_SET
    

Constructor Summary
public  RobotsHonoringPolicy(String name)
     Creates a new instance of RobotsHonoringPolicy.
public  RobotsHonoringPolicy()
    

Method Summary
public  StringgetCustomRobots(CrawlerSettings settings)
    
public  intgetType(Object context)
     Get the policy-type.
public  StringListgetUserAgents(CrawlerSettings settings)
     If policy-type is most favored crawler of set, then this method gets a list of all useragents in that set.
public  booleanisType(Object o, int type)
     Check if policy is of a certain type.
Parameters:
  o - An object that can be resolved into a settings object.
Parameters:
  type - the type to check against.
public  booleanshouldMasquerade(CrawlURI curi)
     This method returns true if the crawler should masquerade as the user agent which restrictions it opted to use.

Field Detail
ATTR_CUSTOM_ROBOTS
final public static String ATTR_CUSTOM_ROBOTS(Code)



ATTR_MASQUERADE
final public static String ATTR_MASQUERADE(Code)



ATTR_NAME
final public static String ATTR_NAME(Code)



ATTR_TYPE
final public static String ATTR_TYPE(Code)



ATTR_USER_AGENTS
final public static String ATTR_USER_AGENTS(Code)



CLASSIC
final public static int CLASSIC(Code)



CUSTOM
final public static int CUSTOM(Code)



IGNORE
final public static int IGNORE(Code)



MOST_FAVORED
final public static int MOST_FAVORED(Code)



MOST_FAVORED_SET
final public static int MOST_FAVORED_SET(Code)




Constructor Detail
RobotsHonoringPolicy
public RobotsHonoringPolicy(String name)(Code)
Creates a new instance of RobotsHonoringPolicy.
Parameters:
  name - the name of the RobotsHonoringPolicy attirubte.



RobotsHonoringPolicy
public RobotsHonoringPolicy()(Code)




Method Detail
getCustomRobots
public String getCustomRobots(CrawlerSettings settings)(Code)
Get the supplied custom robots.txt String with content of alternate robots.txt



getType
public int getType(Object context)(Code)
Get the policy-type.
See Also:   RobotsHonoringPolicy.CLASSIC
See Also:   RobotsHonoringPolicy.IGNORE
See Also:   RobotsHonoringPolicy.CUSTOM
See Also:   RobotsHonoringPolicy.MOST_FAVORED
See Also:   RobotsHonoringPolicy.MOST_FAVORED_SET policy type



getUserAgents
public StringList getUserAgents(CrawlerSettings settings)(Code)
If policy-type is most favored crawler of set, then this method gets a list of all useragents in that set. List of Strings with user agents



isType
public boolean isType(Object o, int type)(Code)
Check if policy is of a certain type.
Parameters:
  o - An object that can be resolved into a settings object.
Parameters:
  type - the type to check against. true if the policy is of the submitted type



shouldMasquerade
public boolean shouldMasquerade(CrawlURI curi)(Code)
This method returns true if the crawler should masquerade as the user agent which restrictions it opted to use. (Only relevant for policy-types: most-favored and most-favored-set). true if we should masquerade



Methods inherited from org.archive.crawler.settings.ModuleType
public Type addElement(CrawlerSettings settings, Type type) throws InvalidAttributeValueException(Code)(Java Doc)
protected void listUsedFiles(List<String> list)(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.