Java Doc for DefaultBlockFileSystem.java in  » Web-Crawler » heritrix » org » archive » util » ms » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.util.ms 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.util.ms.DefaultBlockFileSystem

DefaultBlockFileSystem
public class DefaultBlockFileSystem implements BlockFileSystem(Code)
Default implementation of the Block File System.

The overall structure of a BlockFileSystem file (such as a .doc file) is as follows. The file is divided into blocks, which are of uniform length (512 bytes). The first block (at file pointer 0) is called the header block. It's used to look up other blocks in the file.

Subfiles contained within the .doc file are organized using a Block Allocation Table, or BAT. The BAT is basically a linked list; given a block number, the BAT will tell you the next block number. Note that the header block has no number; block #0 is the first block after the header. Thus, to convert a block number to a file pointer: int filePointer = (blockNumber + 1) * BLOCK_SIZE.

The BAT itself is discontinuous, however. To find the blocks that comprise the BAT, you have to look in the header block. The header block contains an array of 109 pointers to the blocks that comprise the BAT. If more than 109 BAT blocks are required (in other words, if the .doc file is larger than ~6 megabytes), then something called the XBAT comes into play.

XBAT blocks contain pointers to the 110th BAT block and beyond. The first XBAT block is stored at a file pointer listed in the header. The other XBAT blocks are always stored in order after the first; the XBAT table is continuous. One is inclined to wonder why the BAT itself is not so stored, but oh well.

The BAT only tells you the next block for a given block. To find the first block for a subfile, you have to look up that subfile's directory entry. Each directory entry is a 128 byte structure in the file, so four of them fit in a block. The number of the first block of the entry list is stored in the header. To find subsequent entry blocks, the BAT must be used.

I'm telling you all this so that you understand the caching that this class provides.

First, directory entries are not cached. It's assumed that they will be looked up at the beginning of a lengthy operation, and then forgotten about. This is certainly the case for Doc.getText(BlockFileSystem) . If you need to remember directory entries, you can manually store the Entry objects in a map or something, as they don't grow stale.

This class keeps all 512 bytes of the header block in memory at all times. This prevents a potentially expensive file pointer repositioning every time you're trying to figure out what comes next.

BAT and XBAT blocks are stored in a least-recently used cache. The n most recent BAT and XBAT blocks are remembered, where n is set at construction time. The minimum value of n is 1. For small files, this can prevent file pointer repositioning for BAT look ups.

The BAT/XBAT cache only takes up memory as needed. If the specified cache size is 100 blocks, but the file only has 4 BAT blocks, then only 2048 bytes will be used by the cache.

Note this class only caches BAT and XBAT blocks. It does not cache the blocks that actually make up a subfile's contents. It is assumed that those blocks will only be accessed once per operation (again, this is what {Doc.getText(BlockFileSystem)} typically requires.)
author:
   pjack
See Also:    http://jakarta.apache.org/poi/poifs/fileformat.html




Constructor Summary
public  DefaultBlockFileSystem(SeekInputStream input, int batCacheSize)
     Constructor.

Method Summary
 EntrygetEntry(int entryNumber)
     Returns the entry with the given number.
public  intgetNextBlock(int block)
    
public  SeekInputStreamgetRawInput()
    
public  EntrygetRoot()
    


Constructor Detail
DefaultBlockFileSystem
public DefaultBlockFileSystem(SeekInputStream input, int batCacheSize) throws IOException(Code)
Constructor.
Parameters:
  input - the file to read from
Parameters:
  batCacheSize - number of BAT and XBAT blocks to cache
throws:
  IOException - if an IO error occurs




Method Detail
getEntry
Entry getEntry(int entryNumber) throws IOException(Code)
Returns the entry with the given number.
Parameters:
  entryNumber - the number of the entry to return that entry, or null if no such entry exists
throws:
  IOException - if an IO error occurs



getNextBlock
public int getNextBlock(int block) throws IOException(Code)



getRawInput
public SeekInputStream getRawInput()(Code)



getRoot
public Entry getRoot() throws IOException(Code)



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.