The POI Browser is a very simple Swing GUI tool that
displays the internal structure of a Microsoft Office file. It concentrates
on streams in the Horrible Property Set Format (HPSF). In order to
access these streams the POI Browser uses the package
org.apache.poi.hpsf.
A file in Microsoft's Office format can be seen as a filesystem within a
file. For example, a Word document like sample.doc is just a
simple file from the operation system's point of view. However, internally
it is organized into various directories and files. For example,
sample.doc might consist of the three internal files (or
"streams", as Microsoft calls them) \001CompObj,
\005SummaryInformation, and WordDocument. (In these names
\001 and \005 denote the unprintable characters with the character codes 1
and 5, respectively.) A more complicated Word file typically contains a
directory named ObjectPool with more directories and files nested
within it.
The POI Browser makes these internal structures visible. It takes one or
more Microsoft files as input on the command line and shows directories and
files in a tree-like structure. On the top-level POI Browser displays the
(operating system) filenames. An internal file (i.e. a "stream" or a
"document") is shown with its name, its size and a hexadecimal dump of its
first bytes.
Property Set Streams
The POI Browser pays special attention to property set streams. For
example, the \005SummaryInformation stream contains information
like title and author of the document. The POI Browser opens every stream
in a POI filesystem. If it encounters a property set stream, it displays
not just its first bytes but analyses the whole stream and displays its
contents in a more or less readable manner.
Running POI Browser
Running the POI Browser requires you to start a Java Virtual Machine
(JVM) and to set up a valid classpath so that the JVM can find all the Java
classes it needs. These are the main POI classes and the "contrib" POI
classes.
The following instructions assume that you have set up your Java
enviromnent variables properly, i.e. the variable JAVA_HOME contains the
name of your Java installation directory and the variable PATH includes the
bin subdirectory of the Java installation directory. At the time
of this writing the current POI version was 2.5.1-final dating from August
4th, 2004. The example statements reflect version numbering and
date. Change the commands accordingly if you are running the POI Browser of
a later or earlier than this!
Running POI Browser on Unix
Suppose you have unpacked the POI 2.5.1 release in the
/opt/local/poi directory of your Unix box. Then the following
command starts the POI Browser and displays the structure of the files
MyWord.doc, MyExcel.xls and
MyPowerpoint.ppt:
java -classpath /opt/local/poi/poi-2.5.1-final-20040804.jar:/opt/local/poi/poi-contrib-2.5.1-final-20040804.jar org.apache.poi.contrib.poibrowser.POIBrowser MyWord.doc MyExcel.xls MyPowerpoint.ppt
Running POI Browser on Windows
Suppose you have unpacked the POI 2.5.1 release in the
C:\Programs\POI directory of your Windows box. Then the following
command starts the POI Browser and displays the structure of the files
MyWord.doc, MyExcel.xls and
MyPowerpoint.ppt:
java -classpath C:\Programs\POI\poi-2.5.1-final-20040804.jar;C:\Programs\POI\poi-contrib-2.5.1-final-20040804.jar org.apache.poi.contrib.poibrowser.POIBrowser MyWord.doc MyExcel.xls MyPowerpoint.ppt