| java.lang.Object com.ibm.icu.text.SearchIterator
All known Subclasses: com.ibm.icu.text.StringSearch,
SearchIterator | abstract public class SearchIterator (Code) | | SearchIterator is an abstract base class that defines a protocol
for text searching. Subclasses provide concrete implementations of
various search algorithms. A concrete subclass, StringSearch, is
provided that implements language-sensitive pattern matching based
on the comparison rules defined in a RuleBasedCollator
object. Instances of SearchIterator maintain a current position and
scan over the target text, returning the indices where a match is
found and the length of each match. Generally, the sequence of forward
matches will be equivalent to the sequence of backward matches.One
case where this statement may not hold is when non-overlapping mode
is set on and there are continuous repetitive patterns in the text.
Consider the case searching for pattern "aba" in the text
"ababababa", setting overlapping mode off will produce forward matches
at offsets 0, 4. However when a backwards search is done, the
results will be at offsets 6 and 2.
If matches searched for have boundary restrictions. BreakIterators
can be used to define the valid boundaries of such a match. Once a
BreakIterator is set, potential matches will be tested against the
BreakIterator to determine if the boundaries are valid and that all
characters in the potential match are equivalent to the pattern
searched for. For example, looking for the pattern "fox" in the text
"foxy fox" will produce match results at offset 0 and 5 with length 3
if no BreakIterators were set. However if a WordBreakIterator is set,
the only match that would be found will be at the offset 5. Since,
the SearchIterator guarantees that if a BreakIterator is set, all its
matches will match the given pattern exactly, a potential match that
passes the BreakIterator might still not produce a valid match. For
instance the pattern "e" will not be found in the string
"\u00e9" (latin small letter e with acute) if a
CharacterBreakIterator is used. Even though "e" is
a part of the character "\u00e9" and the potential match at
offset 0 length 1 passes the CharacterBreakIterator test, "\u00e9"
is not equivalent to "e", hence the SearchIterator rejects the potential
match. By default, the SearchIterator
does not impose any boundary restriction on the matches, it will
return all results that match the pattern. Illustrating with the
above example, "e" will
be found in the string "\u00e9" if no BreakIterator is
specified.
SearchIterator also provides a means to handle overlapping
matches via the API setOverlapping(boolean). For example, if
overlapping mode is set, searching for the pattern "abab" in the
text "ababab" will match at positions 0 and 2, whereas if
overlapping is not set, SearchIterator will only match at position
0. By default, overlapping mode is not set.
The APIs in SearchIterator are similar to that of other text
iteration classes such as BreakIterator. Using this class, it is
easy to scan through text looking for all occurances of a
match.
Example of use:
String target = "The quick brown fox jumped over the lazy fox";
String pattern = "fox";
SearchIterator iter = new StringSearch(pattern, target);
for (int pos = iter.first(); pos != SearchIterator.DONE;
pos = iter.next()) {
// println matches at offset 16 and 41 with length 3
System.out.println("Found match at " + pos + ", length is "
+ iter.getMatchLength());
}
target = "ababababa";
pattern = "aba";
iter.setTarget(new StringCharacterIterator(pattern));
iter.setOverlapping(false);
System.out.println("Overlapping mode set to false");
System.out.println("Forward matches of pattern " + pattern + " in text "
+ text + ": ");
for (int pos = iter.first(); pos != SearchIterator.DONE;
pos = iter.next()) {
// println matches at offset 0 and 4 with length 3
System.out.println("offset " + pos + ", length "
+ iter.getMatchLength());
}
System.out.println("Backward matches of pattern " + pattern + " in text "
+ text + ": ");
for (int pos = iter.last(); pos != SearchIterator.DONE;
pos = iter.previous()) {
// println matches at offset 6 and 2 with length 3
System.out.println("offset " + pos + ", length "
+ iter.getMatchLength());
}
System.out.println("Overlapping mode set to true");
System.out.println("Index set to 2");
iter.setIndex(2);
iter.setOverlapping(true);
System.out.println("Forward matches of pattern " + pattern + " in text "
+ text + ": ");
for (int pos = iter.first(); pos != SearchIterator.DONE;
pos = iter.next()) {
// println matches at offset 2, 4 and 6 with length 3
System.out.println("offset " + pos + ", length "
+ iter.getMatchLength());
}
System.out.println("Index set to 2");
iter.setIndex(2);
System.out.println("Backward matches of pattern " + pattern + " in text "
+ text + ": ");
for (int pos = iter.last(); pos != SearchIterator.DONE;
pos = iter.previous()) {
// println matches at offset 0 with length 3
System.out.println("offset " + pos + ", length "
+ iter.getMatchLength());
}
author: Laura Werner, synwee See Also: BreakIterator |
Field Summary | |
final public static int | DONE DONE is returned by previous() and next() after all valid matches have
been returned, and by first() and last() if there are no matches at all. | protected BreakIterator | breakIterator The BreakIterator to define the boundaries of a logical match. | protected int | matchLength Length of the most current match in target text. | protected CharacterIterator | targetText Target text for searching. |
Constructor Summary | |
protected | SearchIterator(CharacterIterator target, BreakIterator breaker) Protected constructor for use by subclasses.
Initializes the iterator with the argument target text for searching
and sets the BreakIterator.
See class documentation for more details on the use of the target text
and BreakIterator.
Parameters: target - The target text to be searched. Parameters: breaker - A BreakIterator that is used to determine the boundaries of a logical match. |
Method Summary | |
final public int | first() Return the index of the first forward match in the target text. | final public int | following(int position) Return the index of the first forward match in target text that
is at or after argument position. | public BreakIterator | getBreakIterator() Returns the BreakIterator that is used to restrict the indexes at which
matches are detected. | abstract public int | getIndex() Return the index in the target text at which the iterator is currently
positioned. | public int | getMatchLength()
Returns the length of the most recent match in the target text. | public int | getMatchStart()
Returns the index of the most recent match in the target text.
This call returns a valid result only after a successful call to
SearchIterator.first ,
SearchIterator.next ,
SearchIterator.previous , or
SearchIterator.last .
Just after construction, or after a searching method returns
DONE, this method will return DONE.
Use getMatchLength to get the length of the matched text.
getMatchedText will return the subtext in the searched
target text from index getMatchStart() with length getMatchLength(). | public String | getMatchedText() Returns the text that was matched by the most recent call to
SearchIterator.first ,
SearchIterator.next ,
SearchIterator.previous , or
SearchIterator.last . | public CharacterIterator | getTarget() Return the target text that is being searched. | abstract protected int | handleNext(int start)
Abstract method that subclasses override to provide the mechanism
for finding the next forwards match in the target text. | abstract protected int | handlePrevious(int startAt)
Abstract method which subclasses override to provide the mechanism
for finding the next backwards match in the target text. | public boolean | isOverlapping() Return true if the overlapping property has been set. | final public int | last() Return the index of the first backward match in target text. | public int | next() Search forwards in the target text for the next valid match,
starting the search from the current iterator position. | final public int | preceding(int position) Return the index of the first backwards match in target
text that ends at or before argument position. | public int | previous() Search backwards in the target text for the next valid match,
starting the search from the current iterator position. | public void | reset()
Resets the search iteration. | public void | setBreakIterator(BreakIterator breakiter) Set the BreakIterator that is used to restrict the points at which
matches are detected. | public void | setIndex(int position)
Sets the position in the target text at which the next search will start. | protected void | setMatchLength(int length) Sets the length of the most recent match in the target text. | public void | setOverlapping(boolean allowOverlap)
Determines whether overlapping matches are returned. | public void | setTarget(CharacterIterator text) Set the target text to be searched. |
DONE | final public static int DONE(Code) | | DONE is returned by previous() and next() after all valid matches have
been returned, and by first() and last() if there are no matches at all.
See Also: SearchIterator.previous See Also: SearchIterator.next |
SearchIterator | protected SearchIterator(CharacterIterator target, BreakIterator breaker)(Code) | | Protected constructor for use by subclasses.
Initializes the iterator with the argument target text for searching
and sets the BreakIterator.
See class documentation for more details on the use of the target text
and BreakIterator.
Parameters: target - The target text to be searched. Parameters: breaker - A BreakIterator that is used to determine the boundaries of a logical match. This argument can be null. exception: IllegalArgumentException - thrown when argument target is null,or of length 0 See Also: BreakIterator See Also: |
getBreakIterator | public BreakIterator getBreakIterator()(Code) | | Returns the BreakIterator that is used to restrict the indexes at which
matches are detected. This will be the same object that was passed to
the constructor or to setBreakIterator .
If the BreakIterator has not been set, null will be returned.
See setBreakIterator for more information.
the BreakIterator set to restrict logic matches See Also: SearchIterator.setBreakIterator See Also: BreakIterator |
handleNext | abstract protected int handleNext(int start)(Code) | |
Abstract method that subclasses override to provide the mechanism
for finding the next forwards match in the target text. This
allows different subclasses to provide different search algorithms.
If a match is found, this function must call setMatchLength(int) to
set the length of the result match.
The iterator is adjusted so that its current index, as returned by
SearchIterator.getIndex , is the starting position of the match if one was
found. If a match is not found, DONE will be returned.
Parameters: start - index in the target text at which the forwards search should begin. the starting index of the next forwards match if found, DONE otherwise See Also: SearchIterator.setMatchLength(int) See Also: SearchIterator.handlePrevious(int) See Also: SearchIterator.DONE |
handlePrevious | abstract protected int handlePrevious(int startAt)(Code) | |
Abstract method which subclasses override to provide the mechanism
for finding the next backwards match in the target text.
This allows different
subclasses to provide different search algorithms.
If a match is found, this function must call setMatchLength(int) to
set the length of the result match.
The iterator is adjusted so that its current index, as returned by
SearchIterator.getIndex , is the starting position of the match if one was
found. If a match is not found, DONE will be returned.
Parameters: startAt - index in the target text at which the backwards search should begin. the starting index of the next backwards match if found, DONE otherwise See Also: SearchIterator.setMatchLength(int) See Also: SearchIterator.handleNext(int) See Also: SearchIterator.DONE |
isOverlapping | public boolean isOverlapping()(Code) | | Return true if the overlapping property has been set.
See setOverlapping(boolean) for more information.
See Also: SearchIterator.setOverlapping true if the overlapping property has been set, false otherwise |
reset | public void reset()(Code) | |
Resets the search iteration. All properties will be reset to their
default values.
If a forward iteration is initiated, the next search will begin at the
start of the target text. Otherwise, if a backwards iteration is initiated,
the next search will begin at the end of the target text.
|
setBreakIterator | public void setBreakIterator(BreakIterator breakiter)(Code) | | Set the BreakIterator that is used to restrict the points at which
matches are detected.
Using null as the parameter is legal; it means that break
detection should not be attempted.
See class documentation for more information.
Parameters: breakiter - A BreakIterator that will be used to restrict the points at which matches are detected. See Also: SearchIterator.getBreakIterator See Also: BreakIterator |
setIndex | public void setIndex(int position)(Code) | |
Sets the position in the target text at which the next search will start.
This method clears any previous match.
Parameters: position - position from which to start the next search exception: IndexOutOfBoundsException - thrown if argument position is outof the target text range. See Also: SearchIterator.getIndex |
setMatchLength | protected void setMatchLength(int length)(Code) | | Sets the length of the most recent match in the target text.
Subclasses' handleNext() and handlePrevious() methods should call this
after they find a match in the target text.
Parameters: length - new length to set See Also: SearchIterator.handleNext See Also: SearchIterator.handlePrevious |
setOverlapping | public void setOverlapping(boolean allowOverlap)(Code) | |
Determines whether overlapping matches are returned. See the class
documentation for more information about overlapping matches.
The default setting of this property is false
Parameters: allowOverlap - flag indicator if overlapping matches are allowed See Also: SearchIterator.isOverlapping |
setTarget | public void setTarget(CharacterIterator text)(Code) | | Set the target text to be searched. Text iteration will then begin at
the start of the text string. This method is useful if you want to
reuse an iterator to search within a different body of text.
Parameters: text - new text iterator to look for match, exception: IllegalArgumentException - thrown when text is null or has0 length See Also: SearchIterator.getTarget |
|
|