| java.lang.Object org.apache.xml.utils.FastStringBuffer
FastStringBuffer | public class FastStringBuffer (Code) | | Bare-bones, unsafe, fast string buffer. No thread-safety, no
parameter range checking, exposed fields. Note that in typical
applications, thread-safety of a StringBuffer is a somewhat
dubious concept in any case.
Note that Stree and DTM used a single FastStringBuffer as a string pool,
by recording start and length indices within this single buffer. This
minimizes heap overhead, but of course requires more work when retrieving
the data.
FastStringBuffer operates as a "chunked buffer". Doing so
reduces the need to recopy existing information when an append
exceeds the space available; we just allocate another chunk and
flow across to it. (The array of chunks may need to grow,
admittedly, but that's a much smaller object.) Some excess
recopying may arise when we extract Strings which cross chunk
boundaries; larger chunks make that less frequent.
The size values are parameterized, to allow tuning this code. In
theory, Result Tree Fragments might want to be tuned differently
from the main document's text.
%REVIEW% An experiment in self-tuning is
included in the code (using nested FastStringBuffers to achieve
variation in chunk sizes), but this implementation has proven to
be problematic when data may be being copied from the FSB into itself.
We should either re-architect that to make this safe (if possible)
or remove that code and clean up for performance/maintainability reasons.
|
Field Summary | |
final static boolean | DEBUG_FORCE_FIXED_CHUNKSIZE | final static int | DEBUG_FORCE_INIT_BITS | final static char[] | SINGLE_SPACE | final public static int | SUPPRESS_BOTH Manifest constant: Suppress both leading and trailing whitespace.
This should be used when normalize-to-SAX is called for a complete string.
(I'm not wild about the name of this one. | final public static int | SUPPRESS_LEADING_WS Manifest constant: Suppress leading whitespace. | final public static int | SUPPRESS_TRAILING_WS Manifest constant: Suppress trailing whitespace. | char[][] | m_array Field m_array holds the string buffer's text contents, using an
array-of-arrays. | int | m_chunkBits Field m_chunkBits sets our chunking strategy, by saying how many
bits of index can be used within a single chunk before flowing over
to the next chunk. | int | m_chunkMask Field m_chunkMask is m_chunkSize-1 -- in other words, m_chunkBits
worth of low-order '1' bits, useful for shift-and-mask addressing
within the chunks. | int | m_chunkSize Field m_chunkSize establishes the maximum size of one chunk of the array
as 2**chunkbits characters. | int | m_firstFree Field m_firstFree is an index into m_array[m_lastChunk][], pointing to
the first character in the Chunked Array which is not part of the
FastStringBuffer's current content. | FastStringBuffer | m_innerFSB Field m_innerFSB, when non-null, is a FastStringBuffer whose total
length equals m_chunkSize, and which replaces m_array[0]. | int | m_lastChunk Field m_lastChunk is an index into m_array[], pointing to the last
chunk of the Chunked Array currently in use. | int | m_maxChunkBits Field m_maxChunkBits affects our chunk-growth strategy, by saying what
the largest permissible chunk size is in this particular FastStringBuffer
hierarchy. | int | m_rebundleBits Field m_rechunkBits affects our chunk-growth strategy, by saying how
many chunks should be allocated at one size before we encapsulate them
into the first chunk of the next size up. |
Constructor Summary | |
public | FastStringBuffer(int initChunkBits, int maxChunkBits, int rebundleBits) Construct a FastStringBuffer, with allocation policy as per parameters.
For coding convenience, I've expressed both allocation sizes in terms of
a number of bits. | public | FastStringBuffer(int initChunkBits, int maxChunkBits) Construct a FastStringBuffer, using a default rebundleBits value. | public | FastStringBuffer(int initChunkBits) Construct a FastStringBuffer, using default maxChunkBits and
rebundleBits values. | public | FastStringBuffer() Construct a FastStringBuffer, using a default allocation policy. |
Method Summary | |
final public void | append(char value) Append a single character onto the FastStringBuffer, growing the
storage if necessary. | final public void | append(String value) Append the contents of a String onto the FastStringBuffer,
growing the storage if necessary. | final public void | append(StringBuffer value) Append the contents of a StringBuffer onto the FastStringBuffer,
growing the storage if necessary. | final public void | append(char[] chars, int start, int length) Append part of the contents of a Character Array onto the
FastStringBuffer, growing the storage if necessary. | final public void | append(FastStringBuffer value) Append the contents of another FastStringBuffer onto
this FastStringBuffer, growing the storage if necessary. | public char | charAt(int pos) Get a single character from the string buffer.
Parameters: pos - character position requested. | protected String | getOneChunkString(int startChunk, int startColumn, int length) | public String | getString(int start, int length) Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. | StringBuffer | getString(StringBuffer sb, int start, int length) Parameters: sb - StringBuffer to be appended to Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. | StringBuffer | getString(StringBuffer sb, int startChunk, int startColumn, int length) Internal support for toString() and getString().
PLEASE NOTE SIGNATURE CHANGE from earlier versions; it now appends into
and returns a StringBuffer supplied by the caller. | public boolean | isWhitespace(int start, int length) | final public int | length() Get the length of the list. | final public void | reset() Discard the content of the FastStringBuffer, and most of the memory
that was allocated by it, restoring the initial state. | public int | sendNormalizedSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length) Sends the specified range of characters as one or more SAX characters()
events, normalizing the characters according to XSLT rules.
Parameters: ch - SAX ContentHandler object to receive the event. Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. | static int | sendNormalizedSAXcharacters(char ch, int start, int length, org.xml.sax.ContentHandler handler, int edgeTreatmentFlags) Internal method to directly normalize and dispatch the character array.
This version is aware of the fact that it may be called several times
in succession if the data is made up of multiple "chunks", and thus
must actively manage the handling of leading and trailing whitespace.
Note: The recursion is due to the possible recursion of inner FSBs.
Parameters: ch - The characters from the XML document. Parameters: start - The start position in the array. Parameters: length - The number of characters to read from the array. Parameters: handler - SAX ContentHandler object to receive the event. Parameters: edgeTreatmentFlags - How leading/trailing spaces should be handled. | public static void | sendNormalizedSAXcharacters(char ch, int start, int length, org.xml.sax.ContentHandler handler) Directly normalize and dispatch the character array. | public void | sendSAXComment(org.xml.sax.ext.LexicalHandler ch, int start, int length) Sends the specified range of characters as sax Comment. | public void | sendSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length) Sends the specified range of characters as one or more SAX characters()
events.
Note that the buffer reference passed to the ContentHandler may be
invalidated if the FastStringBuffer is edited; it's the user's
responsibility to manage access to the FastStringBuffer to prevent this
problem from arising.
Note too that there is no promise that the output will be sent as a
single call. | final public void | setLength(int l) Directly set how much of the FastStringBuffer's storage is to be
considered part of its content. | final public int | size() Get the length of the list. | final public String | toString() Note that this operation has been somewhat deoptimized by the shift to a
chunked array, as there is no factory method to produce a String object
directly from an array of arrays and hence a double copy is needed. |
DEBUG_FORCE_FIXED_CHUNKSIZE | final static boolean DEBUG_FORCE_FIXED_CHUNKSIZE(Code) | | |
DEBUG_FORCE_INIT_BITS | final static int DEBUG_FORCE_INIT_BITS(Code) | | |
SINGLE_SPACE | final static char[] SINGLE_SPACE(Code) | | |
SUPPRESS_TRAILING_WS | final public static int SUPPRESS_TRAILING_WS(Code) | | Manifest constant: Suppress trailing whitespace.
This should be used when normalize-to-SAX is called for the last chunk of a
multi-chunk output; it may have to be or'ed with SUPPRESS_LEADING_WS.
|
m_array | char[][] m_array(Code) | | Field m_array holds the string buffer's text contents, using an
array-of-arrays. Note that this array, and the arrays it contains, may be
reallocated when necessary in order to allow the buffer to grow;
references to them should be considered to be invalidated after any
append. However, the only time these arrays are directly exposed
is in the sendSAXcharacters call.
|
m_chunkBits | int m_chunkBits(Code) | | Field m_chunkBits sets our chunking strategy, by saying how many
bits of index can be used within a single chunk before flowing over
to the next chunk. For example, if m_chunkbits is set to 15, each
chunk can contain up to 2^15 (32K) characters
|
m_chunkMask | int m_chunkMask(Code) | | Field m_chunkMask is m_chunkSize-1 -- in other words, m_chunkBits
worth of low-order '1' bits, useful for shift-and-mask addressing
within the chunks.
|
m_chunkSize | int m_chunkSize(Code) | | Field m_chunkSize establishes the maximum size of one chunk of the array
as 2**chunkbits characters.
(Which may also be the minimum size if we aren't tuning for storage)
|
m_firstFree | int m_firstFree(Code) | | Field m_firstFree is an index into m_array[m_lastChunk][], pointing to
the first character in the Chunked Array which is not part of the
FastStringBuffer's current content. Since m_array[][] is zero-based,
the length of that content can be calculated as
(m_lastChunk< |
m_innerFSB | FastStringBuffer m_innerFSB(Code) | | Field m_innerFSB, when non-null, is a FastStringBuffer whose total
length equals m_chunkSize, and which replaces m_array[0]. This allows
building a hierarchy of FastStringBuffers, where early appends use
a smaller chunkSize (for less wasted memory overhead) but later
ones use a larger chunkSize (for less heap activity overhead).
|
m_lastChunk | int m_lastChunk(Code) | | Field m_lastChunk is an index into m_array[], pointing to the last
chunk of the Chunked Array currently in use. Note that additional
chunks may actually be allocated, eg if the FastStringBuffer had
previously been truncated or if someone issued an ensureSpace request.
The insertion point for append operations is addressed by the combination
of m_lastChunk and m_firstFree.
|
m_maxChunkBits | int m_maxChunkBits(Code) | | Field m_maxChunkBits affects our chunk-growth strategy, by saying what
the largest permissible chunk size is in this particular FastStringBuffer
hierarchy.
|
m_rebundleBits | int m_rebundleBits(Code) | | Field m_rechunkBits affects our chunk-growth strategy, by saying how
many chunks should be allocated at one size before we encapsulate them
into the first chunk of the next size up. For example, if m_rechunkBits
is set to 3, then after 8 chunks at a given size we will rebundle
them as the first element of a FastStringBuffer using a chunk size
8 times larger (chunkBits shifted left three bits).
|
FastStringBuffer | public FastStringBuffer(int initChunkBits, int maxChunkBits, int rebundleBits)(Code) | | Construct a FastStringBuffer, with allocation policy as per parameters.
For coding convenience, I've expressed both allocation sizes in terms of
a number of bits. That's needed for the final size of a chunk,
to permit fast and efficient shift-and-mask addressing. It's less critical
for the inital size, and may be reconsidered.
An alternative would be to accept integer sizes and round to powers of two;
that really doesn't seem to buy us much, if anything.
Parameters: initChunkBits - Length in characters of the initial allocationof a chunk, expressed in log-base-2. (That is, 10 means allocate 1024characters.) Later chunks will use larger allocation units, to trade offallocation speed of large document against storage efficiency of smallones. Parameters: maxChunkBits - Number of character-offset bits that should be used foraddressing within a chunk. Maximum length of a chunk is 2^chunkBitscharacters. Parameters: rebundleBits - Number of character-offset bits that addressing shouldadvance before we attempt to take a step from initChunkBits to maxChunkBits |
FastStringBuffer | public FastStringBuffer(int initChunkBits, int maxChunkBits)(Code) | | Construct a FastStringBuffer, using a default rebundleBits value.
NEEDSDOC @param initChunkBits
NEEDSDOC @param maxChunkBits
|
FastStringBuffer | public FastStringBuffer(int initChunkBits)(Code) | | Construct a FastStringBuffer, using default maxChunkBits and
rebundleBits values.
ISSUE: Should this call assert initial size, or fixed size?
Now configured as initial, with a default for fixed.
NEEDSDOC @param initChunkBits
|
FastStringBuffer | public FastStringBuffer()(Code) | | Construct a FastStringBuffer, using a default allocation policy.
|
append | final public void append(char value)(Code) | | Append a single character onto the FastStringBuffer, growing the
storage if necessary.
NOTE THAT after calling append(), previously obtained
references to m_array[][] may no longer be valid....
though in fact they should be in this instance.
Parameters: value - character to be appended. |
append | final public void append(String value)(Code) | | Append the contents of a String onto the FastStringBuffer,
growing the storage if necessary.
NOTE THAT after calling append(), previously obtained
references to m_array[] may no longer be valid.
Parameters: value - String whose contents are to be appended. |
append | final public void append(StringBuffer value)(Code) | | Append the contents of a StringBuffer onto the FastStringBuffer,
growing the storage if necessary.
NOTE THAT after calling append(), previously obtained
references to m_array[] may no longer be valid.
Parameters: value - StringBuffer whose contents are to be appended. |
append | final public void append(char[] chars, int start, int length)(Code) | | Append part of the contents of a Character Array onto the
FastStringBuffer, growing the storage if necessary.
NOTE THAT after calling append(), previously obtained
references to m_array[] may no longer be valid.
Parameters: chars - character array from which data is to be copied Parameters: start - offset in chars of first character to be copied,zero-based. Parameters: length - number of characters to be copied |
append | final public void append(FastStringBuffer value)(Code) | | Append the contents of another FastStringBuffer onto
this FastStringBuffer, growing the storage if necessary.
NOTE THAT after calling append(), previously obtained
references to m_array[] may no longer be valid.
Parameters: value - FastStringBuffer whose contents areto be appended. |
charAt | public char charAt(int pos)(Code) | | Get a single character from the string buffer.
Parameters: pos - character position requested. A character from the requested position. |
getOneChunkString | protected String getOneChunkString(int startChunk, int startColumn, int length)(Code) | | |
getString | public String getString(int start, int length)(Code) | | Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. a new String object initialized from the specified range ofcharacters. |
getString | StringBuffer getString(StringBuffer sb, int start, int length)(Code) | | Parameters: sb - StringBuffer to be appended to Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. sb with the requested text appended to it |
getString | StringBuffer getString(StringBuffer sb, int startChunk, int startColumn, int length)(Code) | | Internal support for toString() and getString().
PLEASE NOTE SIGNATURE CHANGE from earlier versions; it now appends into
and returns a StringBuffer supplied by the caller. This simplifies
m_innerFSB support.
Note that this operation has been somewhat deoptimized by the shift to a
chunked array, as there is no factory method to produce a String object
directly from an array of arrays and hence a double copy is needed.
By presetting length we hope to minimize the heap overhead of building
the intermediate StringBuffer.
(It really is a pity that Java didn't design String as a final subclass
of MutableString, rather than having StringBuffer be a separate hierarchy.
We'd avoid a lot of double-buffering.)
Parameters: sb - Parameters: startChunk - Parameters: startColumn - Parameters: length - the contents of the FastStringBuffer as a standard Java string. |
isWhitespace | public boolean isWhitespace(int start, int length)(Code) | | true if the specified range of characters are all whitespace,as defined by XMLCharacterRecognizer. CURRENTLY DOES NOT CHECK FOR OUT-OF-RANGE. Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. |
length | final public int length()(Code) | | Get the length of the list. Synonym for size().
the number of characters in the FastStringBuffer's content. |
reset | final public void reset()(Code) | | Discard the content of the FastStringBuffer, and most of the memory
that was allocated by it, restoring the initial state. Note that this
may eventually be different from setLength(0), which see.
|
sendNormalizedSAXcharacters | public int sendNormalizedSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length) throws org.xml.sax.SAXException(Code) | | Sends the specified range of characters as one or more SAX characters()
events, normalizing the characters according to XSLT rules.
Parameters: ch - SAX ContentHandler object to receive the event. Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. normalization status to apply to next chunk (because we mayhave been called recursively to process an inner FSB):- 0
- if this output did not end in retained whitespace, and thus whitespaceat the start of the following chunk (if any) should be converted to asingle space.
- SUPPRESS_LEADING_WS
- if this output ended in retained whitespace, and thus whitespaceat the start of the following chunk (if any) should be completelysuppressed.
exception: org.xml.sax.SAXException - may be thrown by handler'scharacters() method. |
sendNormalizedSAXcharacters | static int sendNormalizedSAXcharacters(char ch, int start, int length, org.xml.sax.ContentHandler handler, int edgeTreatmentFlags) throws org.xml.sax.SAXException(Code) | | Internal method to directly normalize and dispatch the character array.
This version is aware of the fact that it may be called several times
in succession if the data is made up of multiple "chunks", and thus
must actively manage the handling of leading and trailing whitespace.
Note: The recursion is due to the possible recursion of inner FSBs.
Parameters: ch - The characters from the XML document. Parameters: start - The start position in the array. Parameters: length - The number of characters to read from the array. Parameters: handler - SAX ContentHandler object to receive the event. Parameters: edgeTreatmentFlags - How leading/trailing spaces should be handled. This is a bitfield contining two flags, bitwise-ORed together:- SUPPRESS_LEADING_WS
- When false, causes leading whitespace to be converted to a singlespace; when true, causes it to be discarded entirely.Should be set TRUE for the first chunk, and (in multi-chunk output)whenever the previous chunk ended in retained whitespace.
- SUPPRESS_TRAILING_WS
- When false, causes trailing whitespace to be converted to a singlespace; when true, causes it to be discarded entirely.Should be set TRUE for the last or only chunk.
normalization status, as in the edgeTreatmentFlags parameter:- 0
- if this output did not end in retained whitespace, and thus whitespaceat the start of the following chunk (if any) should be converted to asingle space.
- SUPPRESS_LEADING_WS
- if this output ended in retained whitespace, and thus whitespaceat the start of the following chunk (if any) should be completelysuppressed.
exception: org.xml.sax.SAXException - Any SAX exception, possiblywrapping another exception. |
sendNormalizedSAXcharacters | public static void sendNormalizedSAXcharacters(char ch, int start, int length, org.xml.sax.ContentHandler handler) throws org.xml.sax.SAXException(Code) | | Directly normalize and dispatch the character array.
Parameters: ch - The characters from the XML document. Parameters: start - The start position in the array. Parameters: length - The number of characters to read from the array. Parameters: handler - SAX ContentHandler object to receive the event. exception: org.xml.sax.SAXException - Any SAX exception, possiblywrapping another exception. |
sendSAXComment | public void sendSAXComment(org.xml.sax.ext.LexicalHandler ch, int start, int length) throws org.xml.sax.SAXException(Code) | | Sends the specified range of characters as sax Comment.
Note that, unlike sendSAXcharacters, this has to be done as a single
call to LexicalHandler#comment.
Parameters: ch - SAX LexicalHandler object to receive the event. Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. exception: org.xml.sax.SAXException - may be thrown by handler'scharacters() method. |
sendSAXcharacters | public void sendSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length) throws org.xml.sax.SAXException(Code) | | Sends the specified range of characters as one or more SAX characters()
events.
Note that the buffer reference passed to the ContentHandler may be
invalidated if the FastStringBuffer is edited; it's the user's
responsibility to manage access to the FastStringBuffer to prevent this
problem from arising.
Note too that there is no promise that the output will be sent as a
single call. As is always true in SAX, one logical string may be split
across multiple blocks of memory and hence delivered as several
successive events.
Parameters: ch - SAX ContentHandler object to receive the event. Parameters: start - Offset of first character in the range. Parameters: length - Number of characters to send. exception: org.xml.sax.SAXException - may be thrown by handler'scharacters() method. |
setLength | final public void setLength(int l)(Code) | | Directly set how much of the FastStringBuffer's storage is to be
considered part of its content. This is a fast but hazardous
operation. It is not protected against negative values, or values
greater than the amount of storage currently available... and even
if additional storage does exist, its contents are unpredictable.
The only safe use for our setLength() is to truncate the FastStringBuffer
to a shorter string.
Parameters: l - New length. If l<0 or l>=getLength(), this operation willnot report an error but future operations will almost certainly fail. |
size | final public int size()(Code) | | Get the length of the list. Synonym for length().
the number of characters in the FastStringBuffer's content. |
toString | final public String toString()(Code) | | Note that this operation has been somewhat deoptimized by the shift to a
chunked array, as there is no factory method to produce a String object
directly from an array of arrays and hence a double copy is needed.
By using ensureCapacity we hope to minimize the heap overhead of building
the intermediate StringBuffer.
(It really is a pity that Java didn't design String as a final subclass
of MutableString, rather than having StringBuffer be a separate hierarchy.
We'd avoid a lot of double-buffering.)
the contents of the FastStringBuffer as a standard Java string. |
|
|