Java Doc for HttpDocCache.java in  » Web-Crawler » JoBo » net » matuschek » http » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » JoBo » net.matuschek.http 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   net.matuschek.http.HttpDocCache

HttpDocCache
public class HttpDocCache implements HttpDocManager(Code)
Full implementation of HttpDocManager interface. Caches documents, links and headers in ZIP-files. Documents with same content will be detected and share the same content-storage.
author:
   Oliver Schmidt
version:
   $Revision: 1.2 $


Field Summary
final protected static  StringCONTENT
    
final protected static  StringCONTENT_DUPLICATE
    
final protected static  StringDOCUMENTS
    
final static  StringLF
    
final protected static  StringLINKS
    
final static  StringQUOTE
    
protected static  Categorylog
    
protected  intstorageDirDepth
     Depth of source set directory.
protected  FilestorageDirectoryFile
    
protected  Stringstoragedir
    
public  booleanuseMD5
    

Constructor Summary
public  HttpDocCache(String storageDirectory)
    

Method Summary
protected  FilecontentFile(String hex, String extension)
     Returns a File with the mapping of this content to its URLs.
protected  voidfinalize()
     Calls finish and super.finalize().
public  StringfindDuplicate(HttpDoc doc)
     Returns URL-String of duplicate content (if found).
public  voidfinish()
     Close storageDirectory File.
protected  StringgenerateFilename(String docURI)
     Generate a valid filename for the given docURI.
protected  StringgetDefaultExtension(String contentType)
     Get default extension for given contentType.
public  intgetStorageDirDepth()
     Method getstorageDirDepth. returns the directory depth of the source set directory
Parameters:
  desired - depth of source set directory.
public  voidprocessDocument(HttpDoc doc)
     Collects Urls (duplicates will be skipped).
Parameters:
  doc - a HttpDoc object to process.
protected  voidreadContentFromZipFile(HttpDoc doc, ZipFile contentZip)
    
protected  booleanreadHeadersFromZipFile(HttpDoc doc, ZipFile zf)
    
protected  booleanreadLinksFromZipFile(HttpDoc doc, ZipFile zf)
    
public  voidremoveDocument(URL url)
     Remove document from cache.
public  HttpDocretrieveFromCache(java.net.URL url)
     retrieves a document from the cache.
public  voidsetStorageDirDepth(int depth)
    
protected  voidstoreContent(HttpDoc doc)
     Creates a file with a name created by the content, containing the URL.
public  voidstoreDocument(HttpDoc doc)
     Method store.
public  StringtoString()
     List collected URLs.
protected  voidwriteContentToZipFile(HttpDoc doc, ZipOutputStream zos)
    
protected  voidwriteDirectoryInfo(HttpDoc doc, String filename)
     Write Directory info.
protected  ZipEntrywriteHeadersToZipFile(HttpDoc doc, ZipOutputStream zos)
     Write headers to zipFile.
protected  voidwriteLinksToZipFile(List links, ZipOutputStream zs)
     Write links to ZipFile.
protected  ZipEntrywriteUrlToZipFile(HttpDoc doc, ZipOutputStream zos)
     Write Url to ZipFile.

Field Detail
CONTENT
final protected static String CONTENT(Code)
subdirectory name for content



CONTENT_DUPLICATE
final protected static String CONTENT_DUPLICATE(Code)
internally used header name to mark duplicates



DOCUMENTS
final protected static String DOCUMENTS(Code)
subdirectory name for document information



LF
final static String LF(Code)



LINKS
final protected static String LINKS(Code)
subdirectory name for links



QUOTE
final static String QUOTE(Code)



log
protected static Category log(Code)
log4j logging instance



storageDirDepth
protected int storageDirDepth(Code)
Depth of source set directory. (depth = number of used subdirectory levels) The first storageDirDepth characters of file will be used as directories.



storageDirectoryFile
protected File storageDirectoryFile(Code)
file that holds directory information



storagedir
protected String storagedir(Code)
storage main directory



useMD5
public boolean useMD5(Code)
use MD5 encoding for filenames




Constructor Detail
HttpDocCache
public HttpDocCache(String storageDirectory)(Code)
Constructor
Parameters:
  storageDirectory -




Method Detail
contentFile
protected File contentFile(String hex, String extension)(Code)
Returns a File with the mapping of this content to its URLs.
Parameters:
  content - long



finalize
protected void finalize() throws Throwable(Code)
Calls finish and super.finalize().
See Also:   java.lang.Object.finalize



findDuplicate
public String findDuplicate(HttpDoc doc) throws IOException(Code)
Returns URL-String of duplicate content (if found).
See Also:   net.matuschek.http.HttpDocManager.findDuplicate(HttpDoc)



finish
public void finish()(Code)
Close storageDirectory File.
See Also:   net.matuschek.http.HttpDocManager.finish



generateFilename
protected String generateFilename(String docURI)(Code)
Generate a valid filename for the given docURI.
Parameters:
  docURI - String



getDefaultExtension
protected String getDefaultExtension(String contentType)(Code)
Get default extension for given contentType.
Parameters:
  contentType - default extension or null



getStorageDirDepth
public int getStorageDirDepth()(Code)
Method getstorageDirDepth. returns the directory depth of the source set directory
Parameters:
  desired - depth of source set directory. the directory depth of the source set directory



processDocument
public void processDocument(HttpDoc doc) throws DocManagerException(Code)
Collects Urls (duplicates will be skipped).
Parameters:
  doc - a HttpDoc object to process. This may also be null
exception:
  DocManagerException - will be thrown if an error occurswhile processing the document.
See Also:   net.matuschek.http.HttpDocManager.processDocument(net.matuschek.http.HttpDoc)



readContentFromZipFile
protected void readContentFromZipFile(HttpDoc doc, ZipFile contentZip) throws IOException(Code)
Read content from ZipFile
Parameters:
  doc -
Parameters:
  contentZip -
throws:
  IOException -



readHeadersFromZipFile
protected boolean readHeadersFromZipFile(HttpDoc doc, ZipFile zf) throws IOException(Code)
Read headers from ZipFile
Parameters:
  doc -
Parameters:
  zf - boolean
throws:
  IOException -



readLinksFromZipFile
protected boolean readLinksFromZipFile(HttpDoc doc, ZipFile zf) throws IOException(Code)
Read links from ZipFile
Parameters:
  doc -
Parameters:
  zf - boolean
throws:
  IOException -



removeDocument
public void removeDocument(URL url)(Code)
Remove document from cache.
Parameters:
  url -
See Also:   net.matuschek.http.HttpDocManager.removeDocument(URL)



retrieveFromCache
public HttpDoc retrieveFromCache(java.net.URL url)(Code)
retrieves a document from the cache.
Parameters:
  url -
See Also:   net.matuschek.http.HttpDocManager.retrieveFromCache(java.net.URL)



setStorageDirDepth
public void setStorageDirDepth(int depth)(Code)
Sets the desired directory depth of the source set directory (depth = number of used subdirectory levels)
Parameters:
  desired - depth of source set directory.



storeContent
protected void storeContent(HttpDoc doc) throws IOException(Code)
Creates a file with a name created by the content, containing the URL.
Parameters:
  doc -



storeDocument
public void storeDocument(HttpDoc doc) throws DocManagerException(Code)
Method store. stores the document to the storage directory
Parameters:
  doc - the document to be stored
Parameters:
  links - to be stored (optional) String
throws:
  DocManagerException - if the document cannot be written to the directory



toString
public String toString()(Code)
List collected URLs.
See Also:   java.lang.Object.toString



writeContentToZipFile
protected void writeContentToZipFile(HttpDoc doc, ZipOutputStream zos) throws IOException(Code)
Write content to zipFile
Parameters:
  doc -
Parameters:
  zos -
throws:
  IOException -



writeDirectoryInfo
protected void writeDirectoryInfo(HttpDoc doc, String filename) throws IOException(Code)
Write Directory info.
Parameters:
  doc -
Parameters:
  filename - in cache
throws:
  IOException -



writeHeadersToZipFile
protected ZipEntry writeHeadersToZipFile(HttpDoc doc, ZipOutputStream zos) throws IOException(Code)
Write headers to zipFile.
Parameters:
  doc -
Parameters:
  zos - ZipEntry
throws:
  IOException -



writeLinksToZipFile
protected void writeLinksToZipFile(List links, ZipOutputStream zs) throws IOException(Code)
Write links to ZipFile.
Parameters:
  links -
Parameters:
  ZipOutputStream -



writeUrlToZipFile
protected ZipEntry writeUrlToZipFile(HttpDoc doc, ZipOutputStream zos) throws IOException(Code)
Write Url to ZipFile.
Parameters:
  doc -
Parameters:
  zos - ZipEntry
throws:
  IOException -



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.