Provides a (Replay)CharSequence view on recorded streams (a prefix
buffer and overflow backing file) that can handle streams of multibyte
characters.
If possible, use
ByteReplayCharSequence . It performs better even
for the single byte case (Decoding is an expensive process).
Call close on this class when done so can clean up resources.
Implementation currently works by checking to see if content to read
all fits the in-memory buffer. If so, we decode into a CharBuffer and
keep this around for CharSequence operations. This CharBuffer is
discarded on close.
If content length is greater than in-memory buffer, we decode the
buffer plus backing file into a new file named for the backing file w/
a suffix of the encoding we write the file as. We then run w/ a
memory-mapped CharBuffer against this file to implement CharSequence.
Reasons for this implemenation are that CharSequence wants to return the
length of the CharSequence.
Obvious optimizations would keep around decodings whether the
in-memory decoded buffer or the file of decodings written to disk but the
general usage pattern processing URIs is that the decoding is used by one
processor only. Also of note, files usually fit into the in-memory
buffer.
We might also be able to keep up 3 windows that moved across the file
decoding a window at a time trying to keep one of the buffers just in
front of the regex processing returning it a length that would be only
the length of current position to end of current block or else the length
could be got by multipling the backing files length by the decoders'
estimate of average character size. This would save us writing out the
decoded file. We'd have to do the latter for files that are
> Integer.MAX_VALUE.
author: stack version: $Revision: 4844 $, $Date: 2007-01-10 17:18:34 +0000 (Wed, 10 Jan 2007) $ |