Java Doc for BdbUriUniqFilter.java in  » Web-Crawler » heritrix » org » archive » crawler » util » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.util 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.crawler.util.SetBasedUriUniqFilter
      org.archive.crawler.util.BdbUriUniqFilter

BdbUriUniqFilter
public class BdbUriUniqFilter extends SetBasedUriUniqFilter implements Serializable(Code)
A BDB implementation of an AlreadySeen list. This implementation performs adequately without blowing out the heap. See AlreadySeen.

Makes keys that have URIs from same server close to each other. Mercator and 2.3.5 'Elminating Already-Visited URLs' in 'Mining the Web' by Soumen Chakrabarti talk of a two-level key with the first 24 bits a hash of the host plus port and with the last 40 as a hash of the path. Testing showed adoption of such a scheme halving lookup times (This implementation actually concatenates scheme + host in first 24 bits and path + query in trailing 40 bits).
author:
   stack
version:
   $Date: 2007-02-21 10:18:39 +0000 (Wed, 21 Feb 2007) $, $Revision: 4927 $



Field Summary
protected static  DatabaseEntryZERO_LENGTH_ENTRY
    
protected transient  DatabasealreadySeen
    
protected  longcount
    
protected  booleancreatedEnvironment
    
protected  longlastCacheMiss
    
protected  longlastCacheMissDiff
    

Constructor Summary
protected  BdbUriUniqFilter()
     Shutdown default constructor.
public  BdbUriUniqFilter(Environment environment)
     Constructor.
public  BdbUriUniqFilter(File bdbEnv)
     Constructor.
Parameters:
  bdbEnv - The directory that holds the bdb environment.
public  BdbUriUniqFilter(File bdbEnv, int cacheSizePercentage)
     Constructor.
Parameters:
  bdbEnv - The directory that holds the bdb environment.

Method Summary
public synchronized  voidclose()
    
public static  longcreateKey(CharSequence uri)
     Create fingerprint. Pubic access so test code can access createKey.
Parameters:
  uri - URI to fingerprint.
public  longflush()
    
public synchronized  longgetCacheMisses()
    
protected  DatabaseConfiggetDatabaseConfig()
    
public  longgetLastCacheMissDiff()
    
protected  voidinitialize(Environment env)
     Method shared by constructors.
protected  voidopen(Environment env, DatabaseConfig dbConfig)
    
public  voidreopen(Environment env)
     Call after deserializing an instance of this class.
protected  booleansetAdd(CharSequence uri)
    
protected  longsetCount()
    
protected  booleansetRemove(CharSequence uri)
    

Field Detail
ZERO_LENGTH_ENTRY
protected static DatabaseEntry ZERO_LENGTH_ENTRY(Code)



alreadySeen
protected transient Database alreadySeen(Code)



count
protected long count(Code)



createdEnvironment
protected boolean createdEnvironment(Code)



lastCacheMiss
protected long lastCacheMiss(Code)



lastCacheMissDiff
protected long lastCacheMissDiff(Code)




Constructor Detail
BdbUriUniqFilter
protected BdbUriUniqFilter()(Code)
Shutdown default constructor.



BdbUriUniqFilter
public BdbUriUniqFilter(Environment environment) throws IOException(Code)
Constructor.
Parameters:
  environment - A bdb environment ready-configured.
throws:
  IOException -



BdbUriUniqFilter
public BdbUriUniqFilter(File bdbEnv) throws IOException(Code)
Constructor.
Parameters:
  bdbEnv - The directory that holds the bdb environment. Willmake a database under here if doesn't already exit. Otherwisereopens any existing dbs.
throws:
  IOException -



BdbUriUniqFilter
public BdbUriUniqFilter(File bdbEnv, int cacheSizePercentage) throws IOException(Code)
Constructor.
Parameters:
  bdbEnv - The directory that holds the bdb environment. Willmake a database under here if doesn't already exit. Otherwisereopens any existing dbs.
Parameters:
  cacheSizePercentage - Percentage of JVM bdb allocates asits cache. Pass -1 to get default cache size.
throws:
  IOException -




Method Detail
close
public synchronized void close()(Code)



createKey
public static long createKey(CharSequence uri)(Code)
Create fingerprint. Pubic access so test code can access createKey.
Parameters:
  uri - URI to fingerprint. Fingerprint of passed url.



flush
public long flush()(Code)



getCacheMisses
public synchronized long getCacheMisses() throws DatabaseException(Code)



getDatabaseConfig
protected DatabaseConfig getDatabaseConfig()(Code)
DatabaseConfig to use



getLastCacheMissDiff
public long getLastCacheMissDiff()(Code)



initialize
protected void initialize(Environment env) throws DatabaseException(Code)
Method shared by constructors.
Parameters:
  env - Environment to use.
throws:
  DatabaseException -



open
protected void open(Environment env, DatabaseConfig dbConfig) throws DatabaseException(Code)



reopen
public void reopen(Environment env) throws DatabaseException(Code)
Call after deserializing an instance of this class. Will open the already seen in passed environment.
Parameters:
  env - DB Environment to use.
throws:
  DatabaseException -



setAdd
protected boolean setAdd(CharSequence uri)(Code)



setCount
protected long setCount()(Code)



setRemove
protected boolean setRemove(CharSequence uri)(Code)



Fields inherited from org.archive.crawler.util.SetBasedUriUniqFilter
protected long duplicateCount(Code)(Java Doc)
protected long duplicatesAtLastSample(Code)(Java Doc)
protected PrintWriter profileLog(Code)(Java Doc)
protected HasUriReceiver receiver(Code)(Java Doc)

Methods inherited from org.archive.crawler.util.SetBasedUriUniqFilter
public void add(String key, CandidateURI value)(Code)(Java Doc)
public void addForce(String key, CandidateURI value)(Code)(Java Doc)
public void addNow(String key, CandidateURI value)(Code)(Java Doc)
public void close()(Code)(Java Doc)
public long count()(Code)(Java Doc)
public void forget(String key, CandidateURI value)(Code)(Java Doc)
public void note(String key)(Code)(Java Doc)
public long pending()(Code)(Java Doc)
protected void profileLog(String key)(Code)(Java Doc)
public long requestFlush()(Code)(Java Doc)
abstract protected boolean setAdd(CharSequence key)(Code)(Java Doc)
abstract protected long setCount()(Code)(Java Doc)
public void setDestination(HasUriReceiver receiver)(Code)(Java Doc)
public void setProfileLog(File logfile)(Code)(Java Doc)
abstract protected boolean setRemove(CharSequence key)(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.