| java.lang.Object com.ibm.icu.text.CollationElementIterator
CollationElementIterator | final public class CollationElementIterator (Code) | | CollationElementIterator is an iterator created by
a RuleBasedCollator to walk through a string. The return result of
each iteration is a 32-bit collation element that defines the
ordering priority of the next character or sequence of characters
in the source string.
For illustration, consider the following in Spanish:
"ca" -> the first collation element is collation_element('c') and second
collation element is collation_element('a').
Since "ch" in Spanish sorts as one entity, the below example returns one
collation element for the two characters 'c' and 'h'
"cha" -> the first collation element is collation_element('ch') and second
collation element is collation_element('a').
And in German,
Since the character 'æ' is a composed character of 'a' and 'e', the
iterator returns two collation elements for the single character 'æ'
"æb" -> the first collation element is collation_element('a'), the
second collation element is collation_element('e'), and the
third collation element is collation_element('b').
For collation ordering comparison, the collation element results
can not be compared simply by using basic arithmetric operators,
e.g. <, == or >, further processing has to be done. Details
can be found in the ICU
user guide. An example of using the CollationElementIterator
for collation ordering comparison is the class
com.ibm.icu.text.StringSearch.
To construct a CollationElementIterator object, users
call the method getCollationElementIterator() on a
RuleBasedCollator that defines the desired sorting order.
Example:
String testString = "This is a test";
RuleBasedCollator rbc = new RuleBasedCollator("&a<b");
CollationElementIterator iterator = rbc.getCollationElementIterator(testString);
int primaryOrder = iterator.IGNORABLE;
while (primaryOrder != iterator.NULLORDER) {
int order = iterator.next();
if (order != iterator.IGNORABLE &&
order != iterator.NULLORDER) {
// order is valid, not ignorable and we have not passed the end
// of the iteration, we do something
primaryOrder = CollationElementIterator.primaryOrder(order);
System.out.println("Next primary order 0x" +
Integer.toHexString(primaryOrder));
}
}
This class is not subclassable
See Also: Collator See Also: RuleBasedCollator See Also: StringSearch author: Syn Wee Quek |
Method Summary | |
public boolean | equals(Object that) Tests that argument object is equals to this CollationElementIterator. | public int | getMaxExpansion(int ce) Returns the maximum length of any expansion sequence that ends with
the specified collation element. | public int | getOffset() Returns the character offset in the source string
corresponding to the next collation element. | boolean | isInBuffer() | public int | next() Get the next collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. | public int | previous() Get the previous collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. | final public static int | primaryOrder(int ce) Return the primary order of the specified collation element,
i.e. | public void | reset() Resets the cursor to the beginning of the string. | final public static int | secondaryOrder(int ce) Return the secondary order of the specified collation element,
i.e. | void | setCollator(RuleBasedCollator collator) Sets the collator used. | void | setExactOffset(int offset) Sets the iterator to point to the collation element corresponding to
the specified character (the parameter is a CHARACTER offset in the
original string, not an offset into its corresponding sequence of
collation elements). | public void | setOffset(int offset) Sets the iterator to point to the collation element
corresponding to the character at the specified offset. | public void | setText(String source) | public void | setText(UCharacterIterator source) Set a new source string iterator for iteration, and reset the
offset to the beginning of the text. | public void | setText(CharacterIterator source) Set a new source string iterator for iteration, and reset the
offset to the beginning of the text. | void | setText(UCharacterIterator source, int offset) Sets the iterator to point to the collation element corresponding to
the specified character (the parameter is a CHARACTER offset in the
original string, not an offset into its corresponding sequence of
collation elements). | final public static int | tertiaryOrder(int ce) Return the tertiary order of the specified collation element, i.e. |
CE_CONTRACTION_TAG_ | final static int CE_CONTRACTION_TAG_(Code) | | |
CE_DIGIT_TAG_ | final static int CE_DIGIT_TAG_(Code) | | Collate Digits As Numbers (CODAN) implementation
|
CE_EXPANSION_TAG_ | final static int CE_EXPANSION_TAG_(Code) | | |
CE_NOT_FOUND_ | final static int CE_NOT_FOUND_(Code) | | |
CE_SPEC_PROC_TAG_ | final static int CE_SPEC_PROC_TAG_(Code) | | |
NULLORDER | final public static int NULLORDER(Code) | | This constant is returned by the iterator in the methods
next() and previous() when the end or the beginning of the
source string has been reached, and there are no more valid
collation elements to return.
See class documentation for an example of use.
See Also: CollationElementIterator.next See Also: CollationElementIterator.previous See Also: |
m_CEBufferOffset_ | int m_CEBufferOffset_(Code) | | This is the CE from CEs buffer that should be returned.
Initial value is 0.
Forwards iteration will end with m_CEBufferOffset_ == m_CEBufferSize_,
backwards will end with m_CEBufferOffset_ == 0.
The next/previous after we reach the end/beginning of the m_CEBuffer_
will cause this value to be reset to 0.
|
m_CEBufferSize_ | int m_CEBufferSize_(Code) | | This is the position to which we have stored processed CEs.
Initial value is 0.
The next/previous after we reach the end/beginning of the m_CEBuffer_
will cause this value to be reset to 0.
|
m_FCDStart_ | int m_FCDStart_(Code) | | Position in the original string that starts with a non-FCD sequence
|
m_isCodePointHiragana_ | boolean m_isCodePointHiragana_(Code) | | true if current codepoint was Hiragana
|
CollationElementIterator | CollationElementIterator(String source, RuleBasedCollator collator)(Code) | | CollationElementIterator constructor. This takes a source
string and a RuleBasedCollator. The iterator will walk through
the source string based on the rules defined by the
collator. If the source string is empty, NULLORDER will be
returned on the first call to next().
Parameters: source - the source string. Parameters: collator - the RuleBasedCollator |
CollationElementIterator | CollationElementIterator(CharacterIterator source, RuleBasedCollator collator)(Code) | | CollationElementIterator constructor. This takes a source
character iterator and a RuleBasedCollator. The iterator will
walk through the source string based on the rules defined by
the collator. If the source string is empty, NULLORDER will be
returned on the first call to next().
Parameters: source - the source string iterator. Parameters: collator - the RuleBasedCollator |
CollationElementIterator | CollationElementIterator(UCharacterIterator source, RuleBasedCollator collator)(Code) | | CollationElementIterator constructor. This takes a source
character iterator and a RuleBasedCollator. The iterator will
walk through the source string based on the rules defined by
the collator. If the source string is empty, NULLORDER will be
returned on the first call to next().
Parameters: source - the source string iterator. Parameters: collator - the RuleBasedCollator |
equals | public boolean equals(Object that)(Code) | | Tests that argument object is equals to this CollationElementIterator.
Iterators are equal if the objects uses the same RuleBasedCollator,
the same source text and have the same current position in iteration.
Parameters: that - object to test if it is equals to thisCollationElementIterator |
getMaxExpansion | public int getMaxExpansion(int ce)(Code) | | Returns the maximum length of any expansion sequence that ends with
the specified collation element. If there is no expansion with this
collation element as the last element, returns 1.
Parameters: ce - a collation element returned by previous() or next(). the maximum length of any expansion sequence endingwith the specified collation element. |
getOffset | public int getOffset()(Code) | | Returns the character offset in the source string
corresponding to the next collation element. I.e., getOffset()
returns the position in the source string corresponding to the
collation element that will be returned by the next call to
next(). This value could be any of:
- The index of the first character corresponding to
the next collation element. (This means that if
setOffset(offset) sets the index in the middle of
a contraction, getOffset() returns the index of
the first character in the contraction, which may not be equal
to the original offset that was set. Hence calling getOffset()
immediately after setOffset(offset) does not guarantee that the
original offset set will be returned.)
- If normalization is on, the index of the immediate
subsequent character, or composite character with the first
character, having a combining class of 0.
- The length of the source string, if iteration has reached
the end.
The character offset in the source string corresponding to thecollation element that will be returned by the next call tonext(). |
isInBuffer | boolean isInBuffer()(Code) | | Checks if iterator is in the buffer zone
true if iterator is in buffer zone, false otherwise |
next | public int next()(Code) | | Get the next collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. Because there isn't
necessarily a one-to-one mapping from characters to collation
elements, this doesn't mean the same thing as "return the
collation element [or ordering priority] of the next character
in the string".
This function returns the collation element that the
iterator is currently pointing to, and then updates the
internal pointer to point to the next element. Previous()
updates the pointer first, and then returns the element. This
means that when you change direction while iterating (i.e.,
call next() and then call previous(), or call previous() and
then call next()), you'll get back the same element twice.
the next collation element or NULLORDER if the end of theiteration has been reached. |
previous | public int previous()(Code) | | Get the previous collation element in the source string.
This iterator iterates over a sequence of collation elements
that were built from the string. Because there isn't
necessarily a one-to-one mapping from characters to collation
elements, this doesn't mean the same thing as "return the
collation element [or ordering priority] of the previous
character in the string".
This function updates the iterator's internal pointer to
point to the collation element preceding the one it's currently
pointing to and then returns that element, while next() returns
the current element and then updates the pointer. This means
that when you change direction while iterating (i.e., call
next() and then call previous(), or call previous() and then
call next()), you'll get back the same element twice.
the previous collation element, or NULLORDER when the start ofthe iteration has been reached. |
primaryOrder | final public static int primaryOrder(int ce)(Code) | | Return the primary order of the specified collation element,
i.e. the first 16 bits. This value is unsigned.
Parameters: ce - the collation element the element's 16 bits primary order. |
reset | public void reset()(Code) | | Resets the cursor to the beginning of the string. The next
call to next() or previous() will return the first and last
collation element in the string, respectively.
If the RuleBasedCollator used by this iterator has had its
attributes changed, calling reset() will reinitialize the
iterator to use the new attributes.
|
secondaryOrder | final public static int secondaryOrder(int ce)(Code) | | Return the secondary order of the specified collation element,
i.e. the 16th to 23th bits, inclusive. This value is unsigned.
Parameters: ce - the collation element the element's 8 bits secondary order |
setCollator | void setCollator(RuleBasedCollator collator)(Code) | | Sets the collator used.
Internal use, all data members will be reset to the default values
Parameters: collator - to set |
setExactOffset | void setExactOffset(int offset)(Code) | | Sets the iterator to point to the collation element corresponding to
the specified character (the parameter is a CHARACTER offset in the
original string, not an offset into its corresponding sequence of
collation elements). The value returned by the next call to next()
will be the collation element corresponding to the specified position
in the text. Unlike the public method setOffset(int), this method does
not try to readjust the offset to the start of a contracting sequence.
getOffset() is guaranteed to return the same value as was passed to a
preceding call to setOffset().
Parameters: offset - new character offset into the original text to set. |
setOffset | public void setOffset(int offset)(Code) | | Sets the iterator to point to the collation element
corresponding to the character at the specified offset. The
value returned by the next call to next() will be the collation
element corresponding to the characters at offset.
If offset is in the middle of a contracting character
sequence, the iterator is adjusted to the start of the
contracting sequence. This means that getOffset() is not
guaranteed to return the same value set by this method.
If the decomposition mode is on, and offset is in the middle
of a decomposible range of source text, the iterator may not
return a correct result for the next forwards or backwards
iteration. The user must ensure that the offset is not in the
middle of a decomposible range.
Parameters: offset - the character offset into the original source string toset. Note that this is not an offset into the correspondingsequence of collation elements. |
setText | public void setText(String source)(Code) | | Set a new source string for iteration, and reset the offset
to the beginning of the text.
Parameters: source - the new source string for iteration. |
setText | public void setText(UCharacterIterator source)(Code) | | Set a new source string iterator for iteration, and reset the
offset to the beginning of the text.
The source iterator's integrity will be preserved since a new copy
will be created for use.
Parameters: source - the new source string iterator for iteration. |
setText | public void setText(CharacterIterator source)(Code) | | Set a new source string iterator for iteration, and reset the
offset to the beginning of the text.
Parameters: source - the new source string iterator for iteration. |
setText | void setText(UCharacterIterator source, int offset)(Code) | | Sets the iterator to point to the collation element corresponding to
the specified character (the parameter is a CHARACTER offset in the
original string, not an offset into its corresponding sequence of
collation elements). The value returned by the next call to next()
will be the collation element corresponding to the specified position
in the text. Unlike the public method setOffset(int), this method does
not try to readjust the offset to the start of a contracting sequence.
getOffset() is guaranteed to return the same value as was passed to a
preceding call to setOffset().
Parameters: source - the new source string iterator for iteration. Parameters: offset - to the source |
tertiaryOrder | final public static int tertiaryOrder(int ce)(Code) | | Return the tertiary order of the specified collation element, i.e. the last
8 bits. This value is unsigned.
Parameters: ce - the collation element the element's 8 bits tertiary order |
|
|