Java Doc for StringSearch.java in  » Internationalization-Localization » icu4j » com » ibm » icu » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.text 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   com.ibm.icu.text.SearchIterator
      com.ibm.icu.text.StringSearch

StringSearch
final public class StringSearch extends SearchIterator (Code)

StringSearch is the concrete subclass of SearchIterator that provides language-sensitive text searching based on the comparison rules defined in a RuleBasedCollator object.

StringSearch uses a version of the fast Boyer-Moore search algorithm that has been adapted to work with the large character set of Unicode. Refer to "Efficient Text Searching in Java", published in the Java Report on February, 1999, for further information on the algorithm.

Users are also strongly encouraged to read the section on String Search and Collation in the user guide before attempting to use this class.

String searching gets alittle complicated when accents are encountered at match boundaries. If a match is found and it has preceding or trailing accents not part of the match, the result returned will include the preceding accents up to the first base character, if the pattern searched for starts an accent. Likewise, if the pattern ends with an accent, all trailing accents up to the first base character will be included in the result.

For example, if a match is found in target text "a\u0325\u0300" for the pattern "a\u0325", the result returned by StringSearch will be the index 0 and length 3 <0, 3>. If a match is found in the target "a\u0325\u0300" for the pattern "\u0300", then the result will be index 1 and length 2 <1, 2>.

In the case where the decomposition mode is on for the RuleBasedCollator, all matches that starts or ends with an accent will have its results include preceding or following accents respectively. For example, if pattern "a" is looked for in the target text "á\u0325", the result will be index 0 and length 2 <0, 2>.

The StringSearch class provides two options to handle accent matching described below:

Let S' be the sub-string of a text string S between the offsets start and end <start, end>.
A pattern string P matches a text string S at the offsets <start, length>
if

 
 option 1. P matches some canonical equivalent string of S'. Suppose the 
 RuleBasedCollator used for searching has a collation strength of 
 TERTIARY, all accents are non-ignorable. If the pattern 
 "a\u0300" is searched in the target text 
 "a\u0325\u0300", 
 a match will be found, since the target text is canonically 
 equivalent to "a\u0300\u0325"
 option 2. P matches S' and if P starts or ends with a combining mark, 
 there exists no non-ignorable combining mark before or after S' 
 in S respectively. Following the example above, the pattern 
 "a\u0300" will not find a match in "a\u0325\u0300", 
 since
 there exists a non-ignorable accent '\u0325' in the middle of 
 'a' and '\u0300'. Even with a target text of 
 "a\u0300\u0325" a match will not be found because of the 
 non-ignorable trailing accent \u0325.
 
Option 2. will be the default mode for dealing with boundary accents unless specified via the API setCanonical(boolean). One restriction is to be noted for option 1. Currently there are no composite characters that consists of a character with combining class > 0 before a character with combining class == 0. However, if such a character exists in the future, the StringSearch may not work correctly with option 1 when such characters are encountered.

SearchIterator provides APIs to specify the starting position within the text string to be searched, e.g. setIndex, preceding and following. Since the starting position will be set as it is specified, please take note that there are some dangerous positions which the search may render incorrect results:

  • The midst of a substring that requires decomposition.
  • If the following match is to be found, the position should not be the second character which requires to be swapped with the preceding character. Vice versa, if the preceding match is to be found, position to search from should not be the first character which requires to be swapped with the next character. E.g certain Thai and Lao characters require swapping.
  • If a following pattern match is to be found, any position within a contracting sequence except the first will fail. Vice versa if a preceding pattern match is to be found, a invalid starting point would be any character within a contracting sequence except the last.

Though collator attributes will be taken into consideration while performing matches, there are no APIs provided in StringSearch for setting and getting the attributes. These attributes can be set by getting the collator from getCollator and using the APIs in com.ibm.icu.text.Collator. To update StringSearch to the new collator attributes, reset() or setCollator(RuleBasedCollator) has to be called.

Consult the String Search user guide and the SearchIterator documentation for more information and examples of use.

This class is not subclassable


See Also:   SearchIterator
See Also:   RuleBasedCollator
author:
   Laura Werner, synwee



Constructor Summary
public  StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator, BreakIterator breakiter)
     Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text.
public  StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator)
     Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text.
public  StringSearch(String pattern, CharacterIterator target, Locale locale)
     Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text.
public  StringSearch(String pattern, CharacterIterator target, ULocale locale)
     Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text.
public  StringSearch(String pattern, String target)
     Initializes the iterator to use the language-specific rules and break iterator rules defined in the default locale to search for argument pattern in the argument target text.

Method Summary
public  RuleBasedCollatorgetCollator()
    

Gets the RuleBasedCollator used for the language rules.

public  intgetIndex()
     Return the index in the target text where the iterator is currently positioned at.
public  StringgetPattern()
     Returns the pattern for which StringSearch is searching for.
protected  inthandleNext(int start)
    

Concrete method to provide the mechanism for finding the next forwards match in the target text. See super class documentation for its use.


Parameters:
  start - index in the target text at which the forwards search should begin.
protected  inthandlePrevious(int start)
    

Concrete method to provide the mechanism for finding the next backwards match in the target text. See super class documentation for its use.


Parameters:
  start - index in the target text at which the backwards search should begin.
public  booleanisCanonical()
     Determines whether canonical matches (option 1, as described in the class documentation) is set.
public  voidreset()
    

Resets the search iteration.

public  voidsetCanonical(boolean allowCanonical)
    

Set the canonical match mode.

public  voidsetCollator(RuleBasedCollator collator)
    

Sets the RuleBasedCollator to be used for language-specific searching.

public  voidsetIndex(int position)
    

Sets the position in the target text which the next search will start from to the argument.

public  voidsetPattern(String pattern)
    

Set the pattern to search for.

public  voidsetTarget(CharacterIterator text)
     Set the target text to be searched.


Constructor Detail
StringSearch
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator, BreakIterator breakiter)(Code)
Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. The argument breakiter is used to define logical matches. See super class documentation for more details on the use of the target text and BreakIterator.
Parameters:
  pattern - text to look for.
Parameters:
  target - target text to search for pattern.
Parameters:
  collator - RuleBasedCollator that defines the language rules
Parameters:
  breakiter - A BreakIterator that is used to determine the boundaries of a logical match. This argument can be null.
exception:
  IllegalArgumentException - thrown when argument target is null,or of length 0
See Also:   BreakIterator
See Also:   RuleBasedCollator
See Also:   SearchIterator



StringSearch
public StringSearch(String pattern, CharacterIterator target, RuleBasedCollator collator)(Code)
Initializes the iterator to use the language-specific rules defined in the argument collator to search for argument pattern in the argument target text. No BreakIterators are set to test for logical matches.
Parameters:
  pattern - text to look for.
Parameters:
  target - target text to search for pattern.
Parameters:
  collator - RuleBasedCollator that defines the language rules
exception:
  IllegalArgumentException - thrown when argument target is null,or of length 0
See Also:   RuleBasedCollator
See Also:   SearchIterator



StringSearch
public StringSearch(String pattern, CharacterIterator target, Locale locale)(Code)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text. See super class documentation for more details on the use of the target text and BreakIterator.
Parameters:
  pattern - text to look for.
Parameters:
  target - target text to search for pattern.
Parameters:
  locale - locale to use for language and break iterator rules
exception:
  IllegalArgumentException - thrown when argument target is null,or of length 0. ClassCastException thrown if the collator for the specified locale is not a RuleBasedCollator.
See Also:   BreakIterator
See Also:   RuleBasedCollator
See Also:   SearchIterator



StringSearch
public StringSearch(String pattern, CharacterIterator target, ULocale locale)(Code)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the argument locale to search for argument pattern in the argument target text. See super class documentation for more details on the use of the target text and BreakIterator.
Parameters:
  pattern - text to look for.
Parameters:
  target - target text to search for pattern.
Parameters:
  locale - ulocale to use for language and break iterator rules
exception:
  IllegalArgumentException - thrown when argument target is null,or of length 0. ClassCastException thrown if the collator for the specified locale is not a RuleBasedCollator.
See Also:   BreakIterator
See Also:   RuleBasedCollator
See Also:   SearchIterator



StringSearch
public StringSearch(String pattern, String target)(Code)
Initializes the iterator to use the language-specific rules and break iterator rules defined in the default locale to search for argument pattern in the argument target text. See super class documentation for more details on the use of the target text and BreakIterator.
Parameters:
  pattern - text to look for.
Parameters:
  target - target text to search for pattern.
exception:
  IllegalArgumentException - thrown when argument target is null,or of length 0. ClassCastException thrown if the collator for the default locale is not a RuleBasedCollator.
See Also:   BreakIterator
See Also:   RuleBasedCollator
See Also:   SearchIterator




Method Detail
getCollator
public RuleBasedCollator getCollator()(Code)

Gets the RuleBasedCollator used for the language rules.

Since StringSearch depends on the returned RuleBasedCollator, any changes to the RuleBasedCollator result should follow with a call to either StringSearch.reset() or StringSearch.setCollator(RuleBasedCollator) to ensure the correct search behaviour.

RuleBasedCollator used by this StringSearch
See Also:   RuleBasedCollator
See Also:   StringSearch.setCollator



getIndex
public int getIndex()(Code)
Return the index in the target text where the iterator is currently positioned at. If the iteration has gone past the end of the target text or past the beginning for a backwards search, StringSearch.DONE is returned. index in the target text where the iterator is currently positioned at



getPattern
public String getPattern()(Code)
Returns the pattern for which StringSearch is searching for. the pattern searched for



handleNext
protected int handleNext(int start)(Code)

Concrete method to provide the mechanism for finding the next forwards match in the target text. See super class documentation for its use.


Parameters:
  start - index in the target text at which the forwards search should begin. the starting index of the next forwards match if found, DONE otherwise
See Also:   StringSearch.handlePrevious(int)
See Also:   StringSearch.DONE



handlePrevious
protected int handlePrevious(int start)(Code)

Concrete method to provide the mechanism for finding the next backwards match in the target text. See super class documentation for its use.


Parameters:
  start - index in the target text at which the backwards search should begin. the starting index of the next backwards match if found, DONE otherwise
See Also:   StringSearch.handleNext(int)
See Also:   StringSearch.DONE



isCanonical
public boolean isCanonical()(Code)
Determines whether canonical matches (option 1, as described in the class documentation) is set. See setCanonical(boolean) for more information.
See Also:   StringSearch.setCanonical true if canonical matches is set, false otherwise



reset
public void reset()(Code)

Resets the search iteration. All properties will be reset to the default value.

Search will begin at the start of the target text if a forward iteration is initiated before a backwards iteration. Otherwise if a backwards iteration is initiated before a forwards iteration, the search will begin at the end of the target text.

Canonical match option will be reset to false, ie an exact match.




setCanonical
public void setCanonical(boolean allowCanonical)(Code)

Set the canonical match mode. See class documentation for details. The default setting for this property is false.


Parameters:
  allowCanonical - flag indicator if canonical matches are allowed
See Also:   StringSearch.isCanonical



setCollator
public void setCollator(RuleBasedCollator collator)(Code)

Sets the RuleBasedCollator to be used for language-specific searching.

This method causes internal data such as Boyer-Moore shift tables to be recalculated, but the iterator's position is unchanged.


Parameters:
  collator - to use for this StringSearch
exception:
  IllegalArgumentException - thrown when collator is null
See Also:   StringSearch.getCollator



setIndex
public void setIndex(int position)(Code)

Sets the position in the target text which the next search will start from to the argument. This method clears all previous states.

This method takes the argument position and sets the position in the target text accordingly, without checking if position is pointing to a valid starting point to begin searching.

Search positions that may render incorrect results are highlighted in the class documentation.


Parameters:
  position - index to start next search from.
exception:
  IndexOutOfBoundsException - thrown if argument position is outof the target text range.
See Also:   StringSearch.getIndex



setPattern
public void setPattern(String pattern)(Code)

Set the pattern to search for.

This method causes internal data such as Boyer-Moore shift tables to be recalculated, but the iterator's position is unchanged.


Parameters:
  pattern - for searching
See Also:   StringSearch.getPattern
exception:
  IllegalArgumentException - thrown if pattern is null or oflength 0



setTarget
public void setTarget(CharacterIterator text)(Code)
Set the target text to be searched. Text iteration will hence begin at the start of the text string. This method is useful if you want to re-use an iterator to search within a different body of text.
Parameters:
  text - new text iterator to look for match,
exception:
  IllegalArgumentException - thrown when text is null or has0 length
See Also:   StringSearch.getTarget



Fields inherited from com.ibm.icu.text.SearchIterator
final public static int DONE(Code)(Java Doc)
protected BreakIterator breakIterator(Code)(Java Doc)
protected int matchLength(Code)(Java Doc)
protected CharacterIterator targetText(Code)(Java Doc)

Methods inherited from com.ibm.icu.text.SearchIterator
final public int first()(Code)(Java Doc)
final public int following(int position)(Code)(Java Doc)
public BreakIterator getBreakIterator()(Code)(Java Doc)
abstract public int getIndex()(Code)(Java Doc)
public int getMatchLength()(Code)(Java Doc)
public int getMatchStart()(Code)(Java Doc)
public String getMatchedText()(Code)(Java Doc)
public CharacterIterator getTarget()(Code)(Java Doc)
abstract protected int handleNext(int start)(Code)(Java Doc)
abstract protected int handlePrevious(int startAt)(Code)(Java Doc)
public boolean isOverlapping()(Code)(Java Doc)
final public int last()(Code)(Java Doc)
public int next()(Code)(Java Doc)
final public int preceding(int position)(Code)(Java Doc)
public int previous()(Code)(Java Doc)
public void reset()(Code)(Java Doc)
public void setBreakIterator(BreakIterator breakiter)(Code)(Java Doc)
public void setIndex(int position)(Code)(Java Doc)
protected void setMatchLength(int length)(Code)(Java Doc)
public void setOverlapping(boolean allowOverlap)(Code)(Java Doc)
public void setTarget(CharacterIterator text)(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.