Java Doc for BdbFrontier.java in  » Web-Crawler » heritrix » org » archive » crawler » frontier » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.frontier 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


org.archive.crawler.frontier.AbstractFrontier
   org.archive.crawler.frontier.WorkQueueFrontier
      org.archive.crawler.frontier.BdbFrontier

All known Subclasses:   org.archive.crawler.frontier.DomainSensitiveFrontier,
BdbFrontier
public class BdbFrontier extends WorkQueueFrontier implements Serializable(Code)
A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs.
author:
   Gordon Mohr


Field Summary
final public static  StringATTR_INCLUDED
    
protected transient  BdbMultipleWorkQueuespendingUris
    

Constructor Summary
public  BdbFrontier(String name)
     Constructor.
public  BdbFrontier(String name, String description)
    

Method Summary
protected  voidcloseQueue()
    
public  voidcrawlCheckpoint(File checkpointDir)
    
protected  UriUniqFiltercreateAlreadyIncluded()
     Create a UriUniqFilter that will serve as record of already seen URIs.
protected  UriUniqFilterdeserializeAlreadySeen(Class<? extends UriUniqFilter> cls, File dir)
    
public  FrontierMarkergetInitialMarker(String regexpr, boolean inCacheOnly)
    
protected  WorkQueuegetQueueFor(CrawlURI curi)
     Return the work queue for the given CrawlURI's classKey.
protected  WorkQueuegetQueueFor(String classKey)
     Return the work queue for the given classKey, or null if no such queue exists.
public  ArrayList<String>getURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose)
     Return list of urls.
protected  BdbMultipleWorkQueuesgetWorkQueues()
    
protected  voidinitQueue()
    
public  voidinitialize(CrawlController c)
    
protected  booleanworkQueueDataOnDisk()
    

Field Detail
ATTR_INCLUDED
final public static String ATTR_INCLUDED(Code)
URI-already-included to use (by class name)



pendingUris
protected transient BdbMultipleWorkQueues pendingUris(Code)
all URIs scheduled to be crawled




Constructor Detail
BdbFrontier
public BdbFrontier(String name)(Code)
Constructor.
Parameters:
  name - Name for of this Frontier.



BdbFrontier
public BdbFrontier(String name, String description)(Code)
Create the BdbFrontier
Parameters:
  name -
Parameters:
  description -




Method Detail
closeQueue
protected void closeQueue()(Code)



crawlCheckpoint
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)



createAlreadyIncluded
protected UriUniqFilter createAlreadyIncluded() throws IOException(Code)
Create a UriUniqFilter that will serve as record of already seen URIs. A UURISet that will serve as a record of already seen URIs
throws:
  IOException -



deserializeAlreadySeen
protected UriUniqFilter deserializeAlreadySeen(Class<? extends UriUniqFilter> cls, File dir) throws FileNotFoundException, IOException(Code)



getInitialMarker
public FrontierMarker getInitialMarker(String regexpr, boolean inCacheOnly)(Code)



getQueueFor
protected WorkQueue getQueueFor(CrawlURI curi)(Code)
Return the work queue for the given CrawlURI's classKey. URIs are ordered and politeness-delayed within their 'class'.
Parameters:
  curi - CrawlURI to base queue on the found or created BdbWorkQueue



getQueueFor
protected WorkQueue getQueueFor(String classKey)(Code)
Return the work queue for the given classKey, or null if no such queue exists.
Parameters:
  classKey - key to look for the found WorkQueue



getURIsList
public ArrayList<String> getURIsList(FrontierMarker marker, int numberOfMatches, boolean verbose)(Code)
Return list of urls.
Parameters:
  marker -
Parameters:
  numberOfMatches -
Parameters:
  verbose - List of URIs (strings).



getWorkQueues
protected BdbMultipleWorkQueues getWorkQueues()(Code)



initQueue
protected void initQueue() throws IOException(Code)



initialize
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)



workQueueDataOnDisk
protected boolean workQueueDataOnDisk()(Code)



Fields inherited from org.archive.crawler.frontier.WorkQueueFrontier
public static String ALL_NONEMPTY(Code)(Java Doc)
public static String ALL_QUEUES(Code)(Java Doc)
final public static String ATTR_BALANCE_REPLENISH_AMOUNT(Code)(Java Doc)
final public static String ATTR_COST_POLICY(Code)(Java Doc)
final public static String ATTR_ERROR_PENALTY_AMOUNT(Code)(Java Doc)
final public static String ATTR_HOLD_QUEUES(Code)(Java Doc)
final public static String ATTR_QUEUE_TOTAL_BUDGET(Code)(Java Doc)
final public static String ATTR_SNOOZE_DEACTIVATE_MS(Code)(Java Doc)
final public static String ATTR_TARGET_READY_QUEUES_BACKLOG(Code)(Java Doc)
String[] AVAILABLE_COST_POLICIES(Code)(Java Doc)
final protected static Integer DEFAULT_BALANCE_REPLENISH_AMOUNT(Code)(Java Doc)
final protected static String DEFAULT_COST_POLICY(Code)(Java Doc)
final protected static Integer DEFAULT_ERROR_PENALTY_AMOUNT(Code)(Java Doc)
final protected static Boolean DEFAULT_HOLD_QUEUES(Code)(Java Doc)
final protected static Long DEFAULT_QUEUE_TOTAL_BUDGET(Code)(Java Doc)
public static Long DEFAULT_SNOOZE_DEACTIVATE_MS(Code)(Java Doc)
final protected static Integer DEFAULT_TARGET_READY_QUEUES_BACKLOG(Code)(Java Doc)
protected static String[] REPORTS(Code)(Java Doc)
public static String STANDARD_REPORT(Code)(Java Doc)
protected transient Map<String, WorkQueue> allQueues(Code)(Java Doc)
protected transient UriUniqFilter alreadyIncluded(Code)(Java Doc)
protected Bag inProcessQueues(Code)(Java Doc)
protected BlockingQueue<String> inactiveQueues(Code)(Java Doc)
protected WorkQueue longestActiveQueue(Code)(Java Doc)
protected transient WakeTask nextWake(Code)(Java Doc)
protected BlockingQueue<String> readyClassQueues(Code)(Java Doc)
protected BlockingQueue<String> retiredQueues(Code)(Java Doc)
protected SortedSet<WorkQueue> snoozedClassQueues(Code)(Java Doc)
protected int targetSizeForReadyQueues(Code)(Java Doc)
protected transient Timer wakeTimer(Code)(Java Doc)

Methods inherited from org.archive.crawler.frontier.WorkQueueFrontier
protected void appendQueueReports(PrintWriter w, Iterator iterator, int total, int max)(Code)(Java Doc)
protected CrawlURI asCrawlUri(CandidateURI caUri)(Code)(Java Doc)
public long averageDepth()(Code)(Java Doc)
abstract protected void closeQueue() throws IOException(Code)(Java Doc)
public float congestionRatio()(Code)(Java Doc)
public void considerIncluded(UURI u)(Code)(Java Doc)
public void crawlEnded(String sExitMessage)(Code)(Java Doc)
abstract protected UriUniqFilter createAlreadyIncluded() throws IOException(Code)(Java Doc)
public long deepestUri()(Code)(Java Doc)
public long deleteURIs(String match)(Code)(Java Doc)
public synchronized void deleted(CrawlURI curi)(Code)(Java Doc)
public long discoveredUriCount()(Code)(Java Doc)
public void finished(CrawlURI curi)(Code)(Java Doc)
protected void forget(CrawlURI curi)(Code)(Java Doc)
public FrontierGroup getGroup(CrawlURI curi)(Code)(Java Doc)
abstract protected WorkQueue getQueueFor(CrawlURI curi)(Code)(Java Doc)
abstract protected WorkQueue getQueueFor(String classKey)(Code)(Java Doc)
public String[] getReports()(Code)(Java Doc)
abstract protected void initQueue() throws IOException(Code)(Java Doc)
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)(Java Doc)
public synchronized boolean isEmpty()(Code)(Java Doc)
public void kickUpdate()(Code)(Java Doc)
public CrawlURI next() throws InterruptedException, EndedException(Code)(Java Doc)
public void receive(CandidateURI caUri)(Code)(Java Doc)
public synchronized void reportTo(String name, PrintWriter writer)(Code)(Java Doc)
public void schedule(CandidateURI caUri)(Code)(Java Doc)
protected void sendToQueue(CrawlURI curi)(Code)(Java Doc)
public String singleLineLegend()(Code)(Java Doc)
public void singleLineReportTo(PrintWriter w)(Code)(Java Doc)
void wakeQueues()(Code)(Java Doc)
abstract protected boolean workQueueDataOnDisk()(Code)(Java Doc)

Fields inherited from org.archive.crawler.frontier.AbstractFrontier
final protected static String ACCEPTABLE_FORCE_QUEUE(Code)(Java Doc)
final public static String ATTR_DELAY_FACTOR(Code)(Java Doc)
final public static String ATTR_FORCE_QUEUE(Code)(Java Doc)
final public static String ATTR_MAX_DELAY(Code)(Java Doc)
final public static String ATTR_MAX_HOST_BANDWIDTH_USAGE(Code)(Java Doc)
final public static String ATTR_MAX_OVERALL_BANDWIDTH_USAGE(Code)(Java Doc)
final public static String ATTR_MAX_RETRIES(Code)(Java Doc)
final public static String ATTR_MIN_DELAY(Code)(Java Doc)
final public static String ATTR_PAUSE_AT_FINISH(Code)(Java Doc)
final public static String ATTR_PAUSE_AT_START(Code)(Java Doc)
final public static String ATTR_PREFERENCE_EMBED_HOPS(Code)(Java Doc)
final public static String ATTR_QUEUE_ASSIGNMENT_POLICY(Code)(Java Doc)
final protected static String ATTR_RECOVERY_ENABLED(Code)(Java Doc)
final public static String ATTR_RETRY_DELAY(Code)(Java Doc)
final public static String ATTR_SOURCE_TAG_SEEDS(Code)(Java Doc)
final protected static Boolean DEFAULT_ATTR_RECOVERY_ENABLED(Code)(Java Doc)
final protected static Float DEFAULT_DELAY_FACTOR(Code)(Java Doc)
final protected static String DEFAULT_FORCE_QUEUE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_DELAY(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_HOST_BANDWIDTH_USAGE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_OVERALL_BANDWIDTH_USAGE(Code)(Java Doc)
final protected static Integer DEFAULT_MAX_RETRIES(Code)(Java Doc)
final protected static Integer DEFAULT_MIN_DELAY(Code)(Java Doc)
final protected static Boolean DEFAULT_PAUSE_AT_FINISH(Code)(Java Doc)
final protected static Boolean DEFAULT_PAUSE_AT_START(Code)(Java Doc)
final protected static Integer DEFAULT_PREFERENCE_EMBED_HOPS(Code)(Java Doc)
final protected static Long DEFAULT_RETRY_DELAY(Code)(Java Doc)
final protected static Boolean DEFAULT_SOURCE_TAG_SEEDS(Code)(Java Doc)
final public static String IGNORED_SEEDS_FILENAME(Code)(Java Doc)
protected transient CrawlController controller(Code)(Java Doc)
protected long disregardedUriCount(Code)(Java Doc)
protected long failedFetchCount(Code)(Java Doc)
protected int lastMaxBandwidthKB(Code)(Java Doc)
protected AtomicLong nextOrdinal(Code)(Java Doc)
protected long processedBytesAfterLastEmittedURI(Code)(Java Doc)
protected transient QueueAssignmentPolicy queueAssignmentPolicy(Code)(Java Doc)
protected long queuedUriCount(Code)(Java Doc)
protected boolean shouldPause(Code)(Java Doc)
protected transient boolean shouldTerminate(Code)(Java Doc)
protected long succeededFetchCount(Code)(Java Doc)
protected long totalProcessedBytes(Code)(Java Doc)

Methods inherited from org.archive.crawler.frontier.AbstractFrontier
protected void applySpecialHandling(CrawlURI curi)(Code)(Java Doc)
protected CrawlURI asCrawlUri(CandidateURI caUri)(Code)(Java Doc)
protected String canonicalize(UURI uuri)(Code)(Java Doc)
protected String canonicalize(CandidateURI cauri)(Code)(Java Doc)
public void crawlCheckpoint(File checkpointDir) throws Exception(Code)(Java Doc)
public void crawlEnded(String sExitMessage)(Code)(Java Doc)
public void crawlEnding(String sExitMessage)(Code)(Java Doc)
public void crawlPaused(String statusMessage)(Code)(Java Doc)
public void crawlPausing(String statusMessage)(Code)(Java Doc)
public void crawlResuming(String statusMessage)(Code)(Java Doc)
public void crawlStarted(String message)(Code)(Java Doc)
protected synchronized void decrementQueuedCount(long numberOfDeletes)(Code)(Java Doc)
public long disregardedUriCount()(Code)(Java Doc)
protected void doJournalAdded(CrawlURI c)(Code)(Java Doc)
protected void doJournalEmitted(CrawlURI c)(Code)(Java Doc)
protected void doJournalFinishedFailure(CrawlURI c)(Code)(Java Doc)
protected void doJournalFinishedSuccess(CrawlURI c)(Code)(Java Doc)
protected void doJournalRescheduled(CrawlURI c)(Code)(Java Doc)
public long failedFetchCount()(Code)(Java Doc)
public long finishedUriCount()(Code)(Java Doc)
public String getClassKey(CandidateURI cauri)(Code)(Java Doc)
public FrontierJournal getFrontierJournal()(Code)(Java Doc)
protected CrawlServer getServer(CrawlURI curi)(Code)(Java Doc)
public void importRecoverLog(String pathToLog, boolean retainFailures) throws IOException(Code)(Java Doc)
protected synchronized void incrementDisregardedUriCount()(Code)(Java Doc)
protected synchronized void incrementFailedFetchCount()(Code)(Java Doc)
protected synchronized void incrementQueuedUriCount()(Code)(Java Doc)
protected synchronized void incrementQueuedUriCount(long increment)(Code)(Java Doc)
protected synchronized void incrementSucceededFetchCount()(Code)(Java Doc)
public void initialize(CrawlController c) throws FatalConfigurationException, IOException(Code)(Java Doc)
protected boolean isDisregarded(CrawlURI curi)(Code)(Java Doc)
public synchronized boolean isEmpty()(Code)(Java Doc)
public void kickUpdate()(Code)(Java Doc)
public void loadSeeds()(Code)(Java Doc)
protected void log(CrawlURI curi)(Code)(Java Doc)
protected void logLocalizedErrors(CrawlURI curi)(Code)(Java Doc)
protected boolean needsRetrying(CrawlURI curi)(Code)(Java Doc)
protected void noteAboutToEmit(CrawlURI curi, WorkQueue q)(Code)(Java Doc)
protected boolean overMaxRetries(CrawlURI curi)(Code)(Java Doc)
public synchronized void pause()(Code)(Java Doc)
protected long politenessDelayFor(CrawlURI curi)(Code)(Java Doc)
protected synchronized void preNext(long now) throws InterruptedException, EndedException(Code)(Java Doc)
public long queuedUriCount()(Code)(Java Doc)
public void reportTo(PrintWriter writer)(Code)(Java Doc)
protected long retryDelayFor(CrawlURI curi)(Code)(Java Doc)
public static void saveIgnoredItems(String ignoredItems, File dir)(Code)(Java Doc)
protected File scratchDirFor(String key)(Code)(Java Doc)
public String singleLineReport()(Code)(Java Doc)
public void start()(Code)(Java Doc)
public long succeededFetchCount()(Code)(Java Doc)
public synchronized void terminate()(Code)(Java Doc)
public long totalBytesWritten()(Code)(Java Doc)
public synchronized void unpause()(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.