01: /*
02: *******************************************************************************
03: * Copyright (C) 1996-2004, International Business Machines Corporation and *
04: * others. All Rights Reserved. *
05: *******************************************************************************
06: */
07: package com.ibm.icu.text;
08:
09: /**
10: * Interface that defines an API for forward-only iteration
11: * on text objects.
12: * This is a minimal interface for iteration without random access
13: * or backwards iteration. It is especially useful for wrapping
14: * streams with converters into an object for collation or
15: * normalization.
16: *
17: * <p>Characters can be accessed in two ways: as code units or as
18: * code points.
19: * Unicode code points are 21-bit integers and are the scalar values
20: * of Unicode characters. ICU uses the type <code>int</code> for them.
21: * Unicode code units are the storage units of a given
22: * Unicode/UCS Transformation Format (a character encoding scheme).
23: * With UTF-16, all code points can be represented with either one
24: * or two code units ("surrogates").
25: * String storage is typically based on code units, while properties
26: * of characters are typically determined using code point values.
27: * Some processes may be designed to work with sequences of code units,
28: * or it may be known that all characters that are important to an
29: * algorithm can be represented with single code units.
30: * Other processes will need to use the code point access functions.</p>
31: *
32: * <p>ForwardCharacterIterator provides next() to access
33: * a code unit and advance an internal position into the text object,
34: * similar to a <code>return text[position++]</code>.<br>
35: * It provides nextCodePoint() to access a code point and advance an internal
36: * position.</p>
37: *
38: * <p>nextCodePoint() assumes that the current position is that of
39: * the beginning of a code point, i.e., of its first code unit.
40: * After nextCodePoint(), this will be true again.
41: * In general, access to code units and code points in the same
42: * iteration loop should not be mixed. In UTF-16, if the current position
43: * is on a second code unit (Low Surrogate), then only that code unit
44: * is returned even by nextCodePoint().</p>
45: *
46: * Usage:
47: * <code>
48: * public void function1(UForwardCharacterIterator it) {
49: * int c;
50: * while((c=it.next())!=UForwardCharacterIterator.DONE) {
51: * // use c
52: * }
53: * }
54: * </code>
55: * </p>
56: * @stable ICU 2.4
57: *
58: */
59:
60: public interface UForwardCharacterIterator {
61:
62: /**
63: * Indicator that we have reached the ends of the UTF16 text.
64: * @stable ICU 2.4
65: */
66: public static final int DONE = -1;
67:
68: /**
69: * Returns the UTF16 code unit at index, and increments to the next
70: * code unit (post-increment semantics). If index is out of
71: * range, DONE is returned, and the iterator is reset to the limit
72: * of the text.
73: * @return the next UTF16 code unit, or DONE if the index is at the limit
74: * of the text.
75: * @stable ICU 2.4
76: */
77: public int next();
78:
79: /**
80: * Returns the code point at index, and increments to the next code
81: * point (post-increment semantics). If index does not point to a
82: * valid surrogate pair, the behavior is the same as
83: * <code>next()<code>. Otherwise the iterator is incremented past
84: * the surrogate pair, and the code point represented by the pair
85: * is returned.
86: * @return the next codepoint in text, or DONE if the index is at
87: * the limit of the text.
88: * @stable ICU 2.4
89: */
90: public int nextCodePoint();
91:
92: }
|