Processor module that writes the results of successful fetches to
files on disk.
Writes contents of one URI to one file on disk. The files are
arranged in a directory hierarchy based on the URI paths. In that sense
they mirror the file hierarchy that might exist on the servers.
There are a number of issues involved:
URIs can have arbitrary length, but file systems have length constraints.
URIs can contain characters that file systems prohibit.
URI paths are case-sensitive, but some file systems are case-insensitive.
This class tries very hard to map each URI into a file system path that
obeys all file system constraints and yet reasonably represents
the original URI.
There would normally be a single instance of this class per Heritrix
instance. This class is thread-safe; any number of threads can be in its
innerProcess method at once. However, conflicts can still arise in the file
system. For example, if several threads try to create the same directory at
the same time, only one can win. Therefore, there should be at most one
access to a server at a given time.
author: Howard Lee Gayle