org.archive.crawler.deciderules

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.deciderules 
org.archive.crawler.deciderules
Provides classes for a simple decision rules framework.

Each 'step' in a decision rule set which can affect an objects ultimate fate is called a DecideRule. Each DecideRule renders a decision (possibly neutral) on the passed objects fate.

Possible decisions are:

  • ACCEPT means the object is ruled-in for further processing
  • REJECT means the object is ruled-out for further processing
  • PASS means this particular DecideRule has no opinion

As previously outlined, each DecideRule is applied in turn; the last one to express a non-PASS preference wins.

For example, if the rules are:

  • AcceptDecideRule -- ACCEPTs all (establishing a default)
  • TooManyHopsDecideRule(max-hops=3) -- REJECTS all with hopsPath.length()>3, PASSes otherwise
  • PrerequisiteAcceptDecideRule -- ACCEPTs any with 'P' as last hop, PASSes otherwise (this allows 'LLL's which need a 'LLLP' prerequisite a chance to complete)
Then, you have a crawl that will go 3 hops (of any type) from the seeds, with a special affordance to get prerequisites of 3-hop items (which may be 4 "hops" out)

To allow this style of decision processing to be plugged into the existing Filter and Scope slots:

  • There's a DecidingFilter which takes an (ordered) map of DecideRules
  • There's a DecidingScope which takes the same

See NewScopingModel for background.

Java Source File NameTypeComment
AcceptDecideRule.javaClass Rule which responds ACCEPT to anything passed in.
AddRedirectFromRootServerToScope.javaClass
BeanShellDecideRule.javaClass Rule which runs a groovy script to make its decision.
ClassKeyMatchesRegExpDecideRule.javaClass Rule applies configured decision to any CrawlURI class key -- i.e.
ConfiguredDecideRule.javaClass Rule which can be configured to ACCEPT or REJECT at operator's option.
ConfiguredDecideRuleTest.javaClass
ContentTypeMatchesRegExpDecideRule.javaClass DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression.
ContentTypeNotMatchesRegExpDecideRule.javaClass DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression.
DecideRule.javaClass Interface for rules which, given an object to evaluate, respond with a decision: DecideRule.ACCEPT , DecideRule.REJECT , or DecideRule.PASS .
DecideRuleSequence.javaClass RuleSequence represents a series of Rules, which are applied in turn to give the final result.
DecideRuleSequenceTest.javaClass
DecidingFilter.javaClass DecidingFilter: a classic Filter which makes its accept/reject decision based on whatever DecideRule s have been set up inside it.
DecidingScope.javaClass DecidingScope: a Scope which makes its accept/reject decision based on whatever DecideRules have been set up inside it.
ExceedsDocumentLengthTresholdDecideRule.javaClass
ExternalGeoLocationDecideRule.javaClass A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface.
ExternalGeoLookupInterface.javaInterface Interface used by ExternalImplDecideRule .
ExternalImplDecideRule.javaClass A rule that can be configured to take alternate implementations of the ExternalImplInterface.
ExternalImplInterface.javaInterface Interface used by ExternalImplDecideRule .
FetchStatusDecideRule.javaClass Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting.
FetchStatusMatchesRegExpDecideRule.javaClass
FetchStatusNotMatchesRegExpDecideRule.javaClass
FilterDecideRule.javaClass FilterDecideRule wraps a legacy Filter for use in DecideRule contexts.
HasViaDecideRule.javaClass Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds).
HopsPathMatchesRegExpDecideRule.javaClass Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp.
MatchesFilePatternDecideRule.javaClass Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches. Several predefined patterns are available for convenience.
MatchesListRegExpDecideRule.javaClass Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps.
MatchesRegExpDecideRule.javaClass Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp.
NotExceedsDocumentLengthTresholdDecideRule.javaClass
NotMatchesFilePatternDecideRule.javaClass Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp.
NotMatchesListRegExpDecideRule.javaClass Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotMatchesRegExpDecideRule.javaClass Rule applies configured decision to any URIs which do *not* match the supplied regexp.
NotOnDomainsDecideRule.javaClass Rule applies configured decision to any URIs that are not* in one of the domains in the configured set of domains, filled from the seed set.
NotOnHostsDecideRule.javaClass Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set.
NotSurtPrefixedDecideRule.javaClass Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set.
OnDomainsDecideRule.javaClass Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set.
OnHostsDecideRule.javaClass Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set.
PathologicalPathDecideRule.javaClass
PredicatedDecideRule.javaClass Rule which applies the configured decision only if a test evaluates to true.
PrerequisiteAcceptDecideRule.javaClass Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position).
RejectDecideRule.javaClass Rule which answers REJECT to everything evaluated.
ScopePlusOneDecideRule.javaClass Rule allows one level of discovery beyond configured scope (e.g.
SeedAcceptDecideRule.javaClass Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true).
SurtPrefixedDecideRule.javaClass Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set.
TooManyHopsDecideRule.javaClass Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold.
TooManyPathSegmentsDecideRule.javaClass Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold.
TransclusionDecideRule.javaClass Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see CandidateURI.getPathFromSeed ) ends with at least one, but not more than, the given number of non-navlink ('L') hops.
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.