Interface that defines an API for forward-only iteration
on text objects.
This is a minimal interface for iteration without random access
or backwards iteration. It is especially useful for wrapping
streams with converters into an object for collation or
normalization.
Characters can be accessed in two ways: as code units or as
code points.
Unicode code points are 21-bit integers and are the scalar values
of Unicode characters. ICU uses the type int for them.
Unicode code units are the storage units of a given
Unicode/UCS Transformation Format (a character encoding scheme).
With UTF-16, all code points can be represented with either one
or two code units ("surrogates").
String storage is typically based on code units, while properties
of characters are typically determined using code point values.
Some processes may be designed to work with sequences of code units,
or it may be known that all characters that are important to an
algorithm can be represented with single code units.
Other processes will need to use the code point access functions.
ForwardCharacterIterator provides next() to access
a code unit and advance an internal position into the text object,
similar to a return text[position++] .
It provides nextCodePoint() to access a code point and advance an internal
position.
nextCodePoint() assumes that the current position is that of
the beginning of a code point, i.e., of its first code unit.
After nextCodePoint(), this will be true again.
In general, access to code units and code points in the same
iteration loop should not be mixed. In UTF-16, if the current position
is on a second code unit (Low Surrogate), then only that code unit
is returned even by nextCodePoint().
Usage:
public void function1(UForwardCharacterIterator it) {
int c;
while((c=it.next())!=UForwardCharacterIterator.DONE) {
// use c
}
}
|