Java Doc for DomainSensitiveFrontier.java in  » Web-Crawler » heritrix » org » archive » crawler » frontier » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.frontier 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


org.archive.crawler.frontier.AbstractFrontier
   org.archive.crawler.frontier.WorkQueueFrontier
      org.archive.crawler.frontier.BdbFrontier
         org.archive.crawler.frontier.DomainSensitiveFrontier

DomainSensitiveFrontier
public class DomainSensitiveFrontier extends BdbFrontier implements CrawlURIDispositionListener(Code)
Behaves like BdbFrontier (i.e., a basic mostly breadth-first frontier), but with the addition that you can set the number of documents to download on a per site basis. Useful for case of frequent revisits of a site of frequent changes.

Choose the number of docs you want to download and specify the count in max-docs. If count-per-host is true, the default, then the crawler will download max-docs per host. If you create an override, the overridden max-docs count will be downloaded instead, whether it is higher or lower.

If count-per-host is false, then max-docs acts like the the crawl order max-docs and the crawler will download this total amount of docs only. Overrides will download max-docs total in the overridden domain.
author:
   Oskar Grenholm BdbFrontierQuotaEnforcer



Field Summary
final public static  String[]ATTR_AVAILABLE_MODES
    
final public static  StringATTR_COUNTER_MODE
    
final public static  StringATTR_MAX_DOCS
    
final public static  StringCOUNT_DOMAIN
    
final public static  StringCOUNT_HOST
    
final public static  StringCOUNT_OVERRIDE
    
final public static  StringDEFAULT_MODE
    

Constructor Summary
public  DomainSensitiveFrontier(String name)
    

Method Summary
public  voidcrawledURIDisregard(CrawlURI curi)
    
public  voidcrawledURIFailure(CrawlURI curi)
    
public  voidcrawledURINeedRetry(CrawlURI curi)
    
public  voidcrawledURISuccessful(CrawlURI curi)
    
protected synchronized  voidincrementHostCounters(CrawlURI curi)
    
public  voidinitialize(CrawlController c)
    

Field Detail
ATTR_AVAILABLE_MODES
final public static String[] ATTR_AVAILABLE_MODES(Code)



ATTR_COUNTER_MODE
final public static String ATTR_COUNTER_MODE(Code)



ATTR_MAX_DOCS
final public static String ATTR_MAX_DOCS(Code)



COUNT_DOMAIN
final public static String COUNT_DOMAIN(Code)



COUNT_HOST
final public static String COUNT_HOST(Code)



COUNT_OVERRIDE
final public static String COUNT_OVERRIDE(Code)



DEFAULT_MODE
final public static String DEFAULT_MODE(Code)




Constructor Detail
DomainSensitiveFrontier
public DomainSensitiveFrontier(String name)(Code)




Method Detail
crawledURIDisregard
public void crawledURIDisregard(CrawlURI curi)(Code)



crawledURIFailure
public void crawledURIFailure(CrawlURI curi)(Code)



crawledURINeedRetry
public void crawledURINeedRetry(CrawlURI curi)(Code)



crawledURISuccessful
public void crawledURISuccessful(CrawlURI curi)(Code)



incrementHostCounters
protected synchronized void incrementHostCounters(CrawlURI curi)(Code)



initialize
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)



Fields inherited from org.archive.crawler.frontier.BdbFrontier
final public static String ATTR_INCLUDED(Code)(Java Doc)
protected transient BdbMultipleWorkQueues pendingUris(Code)(Java Doc)

Methods inherited from org.archive.crawler.frontier.BdbFrontier
protected void closeQueue()(Code)(Java Doc)
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)(Java Doc)
protected UriUniqFilter createAlreadyIncluded() throws IOException(Code)(Java Doc)
protected UriUniqFilter deserializeAlreadySeen(Class<? extends UriUniqFilter> cls, File dir) throws FileNotFoundException, IOException(Code)(Java Doc)
public FrontierMarker getInitialMarker(String regexpr, boolean inCacheOnly)(Code)(Java Doc)
protected WorkQueue getQueueFor(CrawlURI curi)(Code)(Java Doc)
protected WorkQueue getQueueFor(String classKey)(Code)(Java Doc)
public ArrayList<String> getURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose)(Code)(Java Doc)
protected BdbMultipleWorkQueues getWorkQueues()(Code)(Java Doc)
protected void initQueue() throws IOException(Code)(Java Doc)
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)(Java Doc)
protected boolean workQueueDataOnDisk()(Code)(Java Doc)

Fields inherited from org.archive.crawler.frontier.WorkQueueFrontier
public static String ALL_NONEMPTY(Code)(Java Doc)
public static String ALL_QUEUES(Code)(Java Doc)
final public static String ATTR_BALANCE_REPLENISH_AMOUNT(Code)(Java Doc)
final public static String ATTR_COST_POLICY(Code)(Java Doc)
final public static String ATTR_ERROR_PENALTY_AMOUNT(Code)(Java Doc)
final public static String ATTR_HOLD_QUEUES(Code)(Java Doc)
final public static String ATTR_QUEUE_TOTAL_BUDGET(Code)(Java Doc)
final public static String ATTR_SNOOZE_DEACTIVATE_MS(Code)(Java Doc)
final public static String ATTR_TARGET_READY_QUEUES_BACKLOG(Code)(Java Doc)
String[] AVAILABLE_COST_POLICIES(Code)(Java Doc)
final protected static Integer DEFAULT_BALANCE_REPLENISH_AMOUNT(Code)(Java Doc)
final protected static String DEFAULT_COST_POLICY(Code)(Java Doc)
final protected static Integer DEFAULT_ERROR_PENALTY_AMOUNT(Code)(Java Doc)
final protected static Boolean DEFAULT_HOLD_QUEUES(Code)(Java Doc)
final protected static Long DEFAULT_QUEUE_TOTAL_BUDGET(Code)(Java Doc)
public static Long DEFAULT_SNOOZE_DEACTIVATE_MS(Code)(Java Doc)
final protected static Integer DEFAULT_TARGET_READY_QUEUES_BACKLOG(Code)(Java Doc)
protected static String[] REPORTS(Code)(Java Doc)
public static String STANDARD_REPORT(Code)(Java Doc)
protected transient Map<String, WorkQueue> allQueues(Code)(Java Doc)
protected transient UriUniqFilter alreadyIncluded(Code)(Java Doc)
protected Bag inProcessQueues(Code)(Java Doc)
protected BlockingQueue<String> inactiveQueues(Code)(Java Doc)
protected WorkQueue longestActiveQueue(Code)(Java Doc)
protected transient WakeTask nextWake(Code)(Java Doc)
protected BlockingQueue<String> readyClassQueues(Code)(Java Doc)
protected BlockingQueue<String> retiredQueues(Code)(Java Doc)
protected SortedSet<WorkQueue> snoozedClassQueues(Code)(Java Doc)
protected int targetSizeForReadyQueues(Code)(Java Doc)
protected transient Timer wakeTimer(Code)(Java Doc)

Methods inherited from org.archive.crawler.frontier.WorkQueueFrontier
protected void appendQueueReports(PrintWriter w, Iterator iterator, int total, int max)(Code)(Java Doc)
protected CrawlURI asCrawlUri(CandidateURI caUri)(Code)(Java Doc)
public long averageDepth()(Code)(Java Doc)
abstract protected void closeQueue() throws IOException(Code)(Java Doc)
public float congestionRatio()(Code)(Java Doc)
public void considerIncluded(UURI u)(Code)(Java Doc)
public void crawlEnded(String sExitMessage)(Code)(Java Doc)
abstract protected UriUniqFilter createAlreadyIncluded() throws IOException(Code)(Java Doc)
public long deepestUri()(Code)(Java Doc)
public long deleteURIs(String match)(Code)(Java Doc)
public synchronized void deleted(CrawlURI curi)(Code)(Java Doc)
public long discoveredUriCount()(Code)(Java Doc)
public void finished(CrawlURI curi)(Code)(Java Doc)
protected void forget(CrawlURI curi)(Code)(Java Doc)
public FrontierGroup getGroup(CrawlURI curi)(Code)(Java Doc)
abstract protected WorkQueue getQueueFor(CrawlURI curi)(Code)(Java Doc)
abstract protected WorkQueue getQueueFor(String classKey)(Code)(Java Doc)
public String[] getReports()(Code)(Java Doc)
abstract protected void initQueue() throws IOException(Code)(Java Doc)
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)(Java Doc)
public synchronized boolean isEmpty()(Code)(Java Doc)
public void kickUpdate()(Code)(Java Doc)
public CrawlURI next() throws InterruptedException, EndedException(Code)(Java Doc)
public void receive(CandidateURI caUri)(Code)(Java Doc)
public synchronized void reportTo(String name, PrintWriter writer)(Code)(Java Doc)
public void schedule(CandidateURI caUri)(Code)(Java Doc)
protected void sendToQueue(CrawlURI curi)(Code)(Java Doc)
public String singleLineLegend()(Code)(Java Doc)
public void singleLineReportTo(PrintWriter w)(Code)(Java Doc)
void wakeQueues()(Code)(Java Doc)
abstract protected boolean workQueueDataOnDisk()(Code)(Java Doc)

Fields inherited from org.archive.crawler.frontier.AbstractFrontier
final protected static String ACCEPTABLE_FORCE_QUEUE(Code)(Java Doc)
final public static String ATTR_DELAY_FACTOR(Code)(Java Doc)
final public static String ATTR_FORCE_QUEUE(Code)(Java Doc)
final public static String ATTR_MAX_DELAY(Code)(Java Doc)
final public static String ATTR_MAX_HOST_BANDWIDTH_USAGE(Code)(Java Doc)
final public static String ATTR_MAX_OVERALL_BANDWIDTH_USAGE(Code)(Java Doc)
final public static String ATTR_MAX_RETRIES(Code)(Java Doc)
final public static String ATTR_MIN_DELAY(Code)(Java Doc)
final public static String ATTR_PAUSE_AT_FINISH(Code)(Java Doc)
final public static String ATTR_PAUSE_AT_START(Code)(Java Doc)
final public static String ATTR_PREFERENCE_EMBED_HOPS(Code)(Java Doc)
final public static String ATTR_QUEUE_ASSIGNMENT_POLICY(Code)(Java Doc)
final protected static String ATTR_RECOVERY_ENABLED(Code)(Java Doc)
final public static String ATTR_RETRY_DELAY(Code)(Java Doc)
final public static String ATTR_SOURCE_TAG_SEEDS(Code)(Java Doc)
final protected static Boolean DEFAULT_ATTR_RECOVERY_ENABLED(Code)(Java Doc)
final protected static Float DEFAULT_DELAY_FACTOR(Code)(Java Doc)
final protected static String DEFAULT_FORCE_QUEUE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_DELAY(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_HOST_BANDWIDTH_USAGE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_RETRIES(Code)(Java Doc)
final protected static Integer DEFAULT_MIN_DELAY(Code)(Java Doc)
final protected static Boolean DEFAULT_PAUSE_AT_FINISH(Code)(Java Doc)
final protected static Boolean DEFAULT_PAUSE_AT_START(Code)(Java Doc)
final protected static Integer DEFAULT_PREFERENCE_EMBED_HOPS(Code)(Java Doc)
final protected static Long DEFAULT_RETRY_DELAY(Code)(Java Doc)
final protected static Boolean DEFAULT_SOURCE_TAG_SEEDS(Code)(Java Doc)
final public static String IGNORED_SEEDS_FILENAME(Code)(Java Doc)
protected transient CrawlController controller(Code)(Java Doc)
protected long disregardedUriCount(Code)(Java Doc)
protected long failedFetchCount(Code)(Java Doc)
protected int lastMaxBandwidthKB(Code)(Java Doc)
protected AtomicLong nextOrdinal(Code)(Java Doc)
protected long processedBytesAfterLastEmittedURI(Code)(Java Doc)
protected transient QueueAssignmentPolicy queueAssignmentPolicy(Code)(Java Doc)
protected long queuedUriCount(Code)(Java Doc)
protected boolean shouldPause(Code)(Java Doc)
protected transient boolean shouldTerminate(Code)(Java Doc)
protected long succeededFetchCount(Code)(Java Doc)
protected long totalProcessedBytes(Code)(Java Doc)

Methods inherited from org.archive.crawler.frontier.AbstractFrontier
protected void applySpecialHandling(CrawlURI curi)(Code)(Java Doc)
protected CrawlURI asCrawlUri(CandidateURI caUri)(Code)(Java Doc)
protected String canonicalize(UURI uuri)(Code)(Java Doc)
protected String canonicalize(CandidateURI cauri)(Code)(Java Doc)
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)(Java Doc)
public void crawlEnded(String sExitMessage)(Code)(Java Doc)
public void crawlEnding(String sExitMessage)(Code)(Java Doc)
public void crawlPaused(String statusMessage)(Code)(Java Doc)
public void crawlPausing(String statusMessage)(Code)(Java Doc)
public void crawlResuming(String statusMessage)(Code)(Java Doc)
public void crawlStarted(String message)(Code)(Java Doc)
protected synchronized void decrementQueuedCount(long numberOfDeletes)(Code)(Java Doc)
public long disregardedUriCount()(Code)(Java Doc)
protected void doJournalAdded(CrawlURI c)(Code)(Java Doc)
protected void doJournalEmitted(CrawlURI c)(Code)(Java Doc)
protected void doJournalFinishedFailure(CrawlURI c)(Code)(Java Doc)
protected void doJournalFinishedSuccess(CrawlURI c)(Code)(Java Doc)
protected void doJournalRescheduled(CrawlURI c)(Code)(Java Doc)
public long failedFetchCount()(Code)(Java Doc)
public long finishedUriCount()(Code)(Java Doc)
public String getClassKey(CandidateURI cauri)(Code)(Java Doc)
public FrontierJournal getFrontierJournal()(Code)(Java Doc)
protected CrawlServer getServer(CrawlURI curi)(Code)(Java Doc)
public void importRecoverLog(String pathToLog, boolean retainFailures) throws IOException(Code)(Java Doc)
protected synchronized void incrementDisregardedUriCount()(Code)(Java Doc)
protected synchronized void incrementFailedFetchCount()(Code)(Java Doc)
protected synchronized void incrementQueuedUriCount()(Code)(Java Doc)
protected synchronized void incrementQueuedUriCount(long increment)(Code)(Java Doc)
protected synchronized void incrementSucceededFetchCount()(Code)(Java Doc)
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)(Java Doc)
protected boolean isDisregarded(CrawlURI curi)(Code)(Java Doc)
public synchronized boolean isEmpty()(Code)(Java Doc)
public void kickUpdate()(Code)(Java Doc)
public void loadSeeds()(Code)(Java Doc)
protected void log(CrawlURI curi)(Code)(Java Doc)
protected void logLocalizedErrors(CrawlURI curi)(Code)(Java Doc)
protected boolean needsRetrying(CrawlURI curi)(Code)(Java Doc)
protected void noteAboutToEmit(CrawlURI curi, WorkQueue q)(Code)(Java Doc)
protected boolean overMaxRetries(CrawlURI curi)(Code)(Java Doc)
public synchronized void pause()(Code)(Java Doc)
protected long politenessDelayFor(CrawlURI curi)(Code)(Java Doc)
protected synchronized void preNext(long now) throws InterruptedException, EndedException(Code)(Java Doc)
public long queuedUriCount()(Code)(Java Doc)
public void reportTo(PrintWriter writer)(Code)(Java Doc)
protected long retryDelayFor(CrawlURI curi)(Code)(Java Doc)
public static void saveIgnoredItems(String ignoredItems, File dir)(Code)(Java Doc)
protected File scratchDirFor(String key)(Code)(Java Doc)
public String singleLineReport()(Code)(Java Doc)
public void start()(Code)(Java Doc)
public long succeededFetchCount()(Code)(Java Doc)
public synchronized void terminate()(Code)(Java Doc)
public long totalBytesWritten()(Code)(Java Doc)
public synchronized void unpause()(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.