Java Doc for Normalizer.java in  » 6.0-JDK-Modules » j2me » sun » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » 6.0 JDK Modules » j2me » sun.text 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   sun.text.Normalizer

Normalizer
final public class Normalizer implements Cloneable(Code)
Normalizer transforms Unicode text into an equivalent composed or decomposed form, allowing for easier sorting and searching of text. Normalizer supports the standard normalization forms described in Unicode Technical Report #15.

Characters with accents or other adornments can be encoded in several different ways in Unicode. For example, take the character "Â" (A-acute). In Unicode, this can be encoded as a single character (the "composed" form):

 00C1    LATIN CAPITAL LETTER A WITH ACUTE
or as two separate characters (the "decomposed" form):
 0041    LATIN CAPITAL LETTER A
 0301    COMBINING ACUTE ACCENT

To a user of your program, however, both of these sequences should be treated as the same "user-level" character "Â". When you are searching or comparing text, you must ensure that these two sequences are treated equivalently. In addition, you must handle characters with more than one accent. Sometimes the order of a character's combining accents is significant, while in other cases accent sequences in different orders are really equivalent.

Similarly, the string "ffi" can be encoded as three separate letters:

 0066    LATIN SMALL LETTER F
 0066    LATIN SMALL LETTER F
 0069    LATIN SMALL LETTER I
or as the single character
 FB03    LATIN SMALL LIGATURE FFI

The ffi ligature is not a distinct semantic character, and strictly speaking it shouldn't be in Unicode at all, but it was included for compatibility with existing character sets that already provided it. The Unicode standard identifies such characters by giving them "compatibility" decompositions into the corresponding semantic characters. When sorting and searching, you will often want to use these mappings.

Normalizer helps solve these problems by transforming text into the canonical composed and decomposed forms as shown in the first example above. In addition, you can have it perform compatibility decompositions so that you can treat compatibility characters the same as their equivalents. Finally, Normalizer rearranges accents into the proper canonical order, so that you do not have to worry about accent rearrangement on your own.

Normalizer adds one optional behavior, Normalizer.IGNORE_HANGUL , that differs from the standard Unicode Normalization Forms. This option can be passed to the Normalizer.Normalizer constructors and to the static Normalizer.compose compose and Normalizer.decompose decompose methods. This option, and any that are added in the future, will be turned off by default.

There are three common usage models for Normalizer. In the first, the static Normalizer.normalize normalize() method is used to process an entire input string at once. Second, you can create a Normalizer object and use it to iterate through the normalized form of a string by calling Normalizer.first and Normalizer.next . Finally, you can use the Normalizer.setIndex setIndex() and Normalizer.getIndex methods to perform random-access iteration, which is very useful for searching.

Note: Normalizer objects behave like iterators and have methods such as setIndex, next, previous, etc. You should note that while the setIndex and getIndex refer to indices in the underlying input text being processed, the next and previous methods it iterate through characters in the normalized output. This means that there is not necessarily a one-to-one correspondence between characters returned by next and previous and the indices passed to and returned from setIndex and getIndex. It is for this reason that Normalizer does not implement the CharacterIterator interface.

Note: Normalizer is currently based on version 3.0 of the Unicode Standard. It will be updated as later versions of Unicode are released. If you are using this class on a JDK that supports an earlier version of Unicode, it is possible that Normalizer may generate composed or dedecomposed characters for which your JDK's java.lang.Character class does not have any data.


author:
   Laura Werner, Mark Davis


Inner Class :final public static class Mode

Field Summary
final public static  ModeCOMPOSE
     Canonical decomposition followed by canonical composition.
final public static  ModeCOMPOSE_COMPAT
     Compatibility decomposition followed by canonical composition.
final public static  ModeDECOMP
     Canonical decomposition.
final public static  ModeDECOMP_COMPAT
     Compatibility decomposition.
final public static  charDONE
     Constant indicating that the end of the iteration has been reached.
final static  charHANGUL_BASE
    
final static  charHANGUL_LIMIT
    
final public static  intIGNORE_HANGUL
     Option to disable Hangul/Jamo composition and decomposition. This option applies to Korean text, which can be represented either in the Jamo alphabet or in Hangul characters, which are really just two or three Jamo combined into one visual glyph.
final public static  ModeNO_OP
     Null operation for use with the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method.
final static  intSTR_INDEX_SHIFT
    
final static  intSTR_LENGTH_MASK
    

Constructor Summary
public  Normalizer(String str, Mode mode)
     Creates a new Normalizer object for iterating over the normalized form of a given string.


Parameters:
  str - The string to be normalized.

public  Normalizer(String str, Mode mode, int opt)
     Creates a new Normalizer object for iterating over the normalized form of a given string.

The options parameter specifies which optional Normalizer features are to be enabled for this object.


Parameters:
  str - The string to be normalized.

public  Normalizer(CharacterIterator iter, Mode mode)
     Creates a new Normalizer object for iterating over the normalized form of the given text.


Parameters:
  iter - The input text to be normalized.

public  Normalizer(CharacterIterator iter, Mode mode, int opt)
     Creates a new Normalizer object for iterating over the normalized form of the given text.


Parameters:
  iter - The input text to be normalized.


Method Summary
public  Objectclone()
     Clones this Normalizer object.
public static  Stringcompose(String source, boolean compat, int options)
     Compose a String.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is Normalizer.IGNORE_HANGUL . If you want the default behavior corresponding to Unicode Normalization Form C or KC, use 0 for this argument.


Parameters:
  source - the string to be composed.
Parameters:
  compat - Perform compatibility decomposition before composition.If this argument is false, only canonicaldecomposition will be performed.
Parameters:
  options - the optional features to be enabled.

final static  intcomposeAction(int baseIndex, int comIndex)
    
final static  intcomposeLookup(char ch)
    
public  charcurrent()
     Return the current character in the normalized text.
public static  Stringdecompose(String source, boolean compat, int options)
     Static method to decompose a String.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is Normalizer.IGNORE_HANGUL . The desired options should be OR'ed together to determine the value of this argument.

public static  Stringdecompose(String source, boolean compat, int options, boolean addSingleQuotation)
    
final static  voidexplode(StringBuffer target, int index)
    
public  charfirst()
     Return the first character in the normalized text.
final public  intgetBeginIndex()
     Retrieve the index of the start of the input text.
final public static  intgetClass(char ch)
    
final public  intgetEndIndex()
     Retrieve the index of the end of the input text.
final public  intgetIndex()
     Retrieve the current iteration position in the input text that is being normalized.
public  ModegetMode()
    
public  booleangetOption(int option)
     Determine whether an option is turned on or off.
static  inthangulToJamo(char ch, StringBuffer result, int decompLimit)
     Convert a single Hangul syllable into one or more Jamo characters.
final static  intjamoAppend(char ch, int limit, StringBuffer dest)
    
public  charlast()
     Return the last character in the normalized text.
public  charnext()
     Return the current character in the normalized text and advance the iteration position by one.
public static  Stringnormalize(String str, Mode mode, int options)
     Normalizes a String using the given normalization operation.
public static  Stringnormalize(String str, Mode mode, int options, boolean addSingleQuotation)
    
final static  charpairExplode(StringBuffer target, int action)
    
public  charprevious()
     Return the previous character in the normalized text and decrement the iteration position by one.
public  voidreset()
    
public  charsetIndex(int index)
     Set the iteration position in the input text that is being normalized and return the first normalized character at that position.


Parameters:
  index - the desired index in the input text.

public  voidsetIndexOnly(int index)
    
public  voidsetMode(Mode newMode)
     Set the normalization mode for this object.

Note:If the normalization mode is changed while iterating over a string, calls to Normalizer.next and Normalizer.previous may return previously buffers characters in the old normalization mode until the iteration is able to re-sync at the next base character. It is safest to call Normalizer.setText setText() , Normalizer.first , Normalizer.last , etc.

public  voidsetOption(int option, boolean value)
     Set options that affect this Normalizer's operation. Options do not change the basic composition or decomposition operation that is being performed , but they control whether certain optional portions of the operation are done. Currently the only available option is:

public  voidsetText(String newText)
     Set the input text over which this Normalizer will iterate.
public  voidsetText(CharacterIterator newText)
     Set the input text over which this Normalizer will iterate.

Field Detail
COMPOSE
final public static Mode COMPOSE(Code)
Canonical decomposition followed by canonical composition. Used with the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method to determine the operation to be performed.

If all optional features (e.g. Normalizer.IGNORE_HANGUL ) are turned off, this operation produces output that is in Unicode Canonical Form C.


See Also:   Normalizer.setMode




COMPOSE_COMPAT
final public static Mode COMPOSE_COMPAT(Code)
Compatibility decomposition followed by canonical composition. Used with the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method to determine the operation to be performed.

If all optional features (e.g. Normalizer.IGNORE_HANGUL ) are turned off, this operation produces output that is in Unicode Canonical Form KC.


See Also:   Normalizer.setMode




DECOMP
final public static Mode DECOMP(Code)
Canonical decomposition. This value is passed to the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method to determine the operation to be performed.

If all optional features (e.g. Normalizer.IGNORE_HANGUL ) are turned off, this operation produces output that is in Unicode Canonical Form D.


See Also:   Normalizer.setMode




DECOMP_COMPAT
final public static Mode DECOMP_COMPAT(Code)
Compatibility decomposition. This value is passed to the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method to determine the operation to be performed.

If all optional features (e.g. Normalizer.IGNORE_HANGUL ) are turned off, this operation produces output that is in Unicode Canonical Form KD.


See Also:   Normalizer.setMode




DONE
final public static char DONE(Code)
Constant indicating that the end of the iteration has been reached. This is guaranteed to have the same value as CharacterIterator.DONE .



HANGUL_BASE
final static char HANGUL_BASE(Code)



HANGUL_LIMIT
final static char HANGUL_LIMIT(Code)



IGNORE_HANGUL
final public static int IGNORE_HANGUL(Code)
Option to disable Hangul/Jamo composition and decomposition. This option applies to Korean text, which can be represented either in the Jamo alphabet or in Hangul characters, which are really just two or three Jamo combined into one visual glyph. Since Jamo takes up more storage space than Hangul, applications that process only Hangul text may wish to turn this option on when decomposing text.

The Unicode standard treates Hangul to Jamo conversion as a canonical decomposition, so this option must be turned off if you wish to transform strings into one of the standard Unicode Normalization Forms.


See Also:   Normalizer.setOption




NO_OP
final public static Mode NO_OP(Code)
Null operation for use with the Normalizer.Normalizer constructors and the static Normalizer.normalize normalize method. This value tells the Normalizer to do nothing but return unprocessed characters from the underlying String or CharacterIterator. If you have code which requires raw text at some times and normalized text at others, you can use NO_OP for the cases where you want raw text, rather than having a separate code path that bypasses Normalizer altogether.


See Also:   Normalizer.setMode




STR_INDEX_SHIFT
final static int STR_INDEX_SHIFT(Code)



STR_LENGTH_MASK
final static int STR_LENGTH_MASK(Code)




Constructor Detail
Normalizer
public Normalizer(String str, Mode mode)(Code)
Creates a new Normalizer object for iterating over the normalized form of a given string.


Parameters:
  str - The string to be normalized. The normalizationwill start at the beginning of the string.
Parameters:
  mode - The normalization mode.




Normalizer
public Normalizer(String str, Mode mode, int opt)(Code)
Creates a new Normalizer object for iterating over the normalized form of a given string.

The options parameter specifies which optional Normalizer features are to be enabled for this object.


Parameters:
  str - The string to be normalized. The normalizationwill start at the beginning of the string.
Parameters:
  mode - The normalization mode.
Parameters:
  opt - Any optional features to be enabled.Currently the only available option is Normalizer.IGNORE_HANGUL.If you want the default behavior corresponding to one of thestandard Unicode Normalization Forms, use 0 for this argument.




Normalizer
public Normalizer(CharacterIterator iter, Mode mode)(Code)
Creates a new Normalizer object for iterating over the normalized form of the given text.


Parameters:
  iter - The input text to be normalized. The normalizationwill start at the beginning of the string.
Parameters:
  mode - The normalization mode.




Normalizer
public Normalizer(CharacterIterator iter, Mode mode, int opt)(Code)
Creates a new Normalizer object for iterating over the normalized form of the given text.


Parameters:
  iter - The input text to be normalized. The normalizationwill start at the beginning of the string.
Parameters:
  mode - The normalization mode.
Parameters:
  opt - Any optional features to be enabled.Currently the only available option is Normalizer.IGNORE_HANGUL.If you want the default behavior corresponding to one of thestandard Unicode Normalization Forms, use 0 for this argument.





Method Detail
clone
public Object clone()(Code)
Clones this Normalizer object. All properties of this object are duplicated in the new object, including the cloning of any CharacterIterator that was passed in to the constructor or to Normalizer.setText(CharacterIterator) setText . However, the text storage underlying the CharacterIterator is not duplicated unless the iterator's clone method does so.



compose
public static String compose(String source, boolean compat, int options)(Code)
Compose a String.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is Normalizer.IGNORE_HANGUL . If you want the default behavior corresponding to Unicode Normalization Form C or KC, use 0 for this argument.


Parameters:
  source - the string to be composed.
Parameters:
  compat - Perform compatibility decomposition before composition.If this argument is false, only canonicaldecomposition will be performed.
Parameters:
  options - the optional features to be enabled. the composed string.




composeAction
final static int composeAction(int baseIndex, int comIndex)(Code)



composeLookup
final static int composeLookup(char ch)(Code)



current
public char current()(Code)
Return the current character in the normalized text.



decompose
public static String decompose(String source, boolean compat, int options)(Code)
Static method to decompose a String.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is Normalizer.IGNORE_HANGUL . The desired options should be OR'ed together to determine the value of this argument. If you want the default behavior corresponding to Unicode Normalization Form D or KD, use 0 for this argument.


Parameters:
  str - the string to be decomposed.
Parameters:
  compat - Perform compatibility decomposition.If this argument is false, only canonicaldecomposition will be performed. the decomposed string.




decompose
public static String decompose(String source, boolean compat, int options, boolean addSingleQuotation)(Code)



explode
final static void explode(StringBuffer target, int index)(Code)



first
public char first()(Code)
Return the first character in the normalized text. This resets the Normalizer's position to the beginning of the text.



getBeginIndex
final public int getBeginIndex()(Code)
Retrieve the index of the start of the input text. This is the begin index of the CharacterIterator or the start (i.e. 0) of the String over which this Normalizer is iterating



getClass
final public static int getClass(char ch)(Code)



getEndIndex
final public int getEndIndex()(Code)
Retrieve the index of the end of the input text. This is the end index of the CharacterIterator or the length of the String over which this Normalizer is iterating



getIndex
final public int getIndex()(Code)
Retrieve the current iteration position in the input text that is being normalized. This method is useful in applications such as searching, where you need to be able to determine the position in the input text that corresponds to a given normalized output character.



getMode
public Mode getMode()(Code)
Return the basic operation performed by this Normalizer
See Also:   Normalizer.setMode



getOption
public boolean getOption(int option)(Code)
Determine whether an option is turned on or off.


See Also:   Normalizer.setOption




hangulToJamo
static int hangulToJamo(char ch, StringBuffer result, int decompLimit)(Code)
Convert a single Hangul syllable into one or more Jamo characters.
Parameters:
  conjoin - If true, decompose Jamo into conjoining Jamo.



jamoAppend
final static int jamoAppend(char ch, int limit, StringBuffer dest)(Code)



last
public char last()(Code)
Return the last character in the normalized text. This resets the Normalizer's position to be just before the the input text corresponding to that normalized character.



next
public char next()(Code)
Return the current character in the normalized text and advance the iteration position by one. If the end of the text has already been reached, Normalizer.DONE is returned.



normalize
public static String normalize(String str, Mode mode, int options)(Code)
Normalizes a String using the given normalization operation.

The options parameter specifies which optional Normalizer features are to be enabled for this operation. Currently the only available option is Normalizer.IGNORE_HANGUL . If you want the default behavior corresponding to one of the standard Unicode Normalization Forms, use 0 for this argument.


Parameters:
  str - the input string to be normalized.
Parameters:
  aMode - the normalization mode
Parameters:
  options - the optional features to be enabled.




normalize
public static String normalize(String str, Mode mode, int options, boolean addSingleQuotation)(Code)



pairExplode
final static char pairExplode(StringBuffer target, int action)(Code)



previous
public char previous()(Code)
Return the previous character in the normalized text and decrement the iteration position by one. If the beginning of the text has already been reached, Normalizer.DONE is returned.



reset
public void reset()(Code)



setIndex
public char setIndex(int index)(Code)
Set the iteration position in the input text that is being normalized and return the first normalized character at that position.


Parameters:
  index - the desired index in the input text. the first normalized character that is the result of iteratingforward starting at the given index.
throws:
  IllegalArgumentException - if the given index is less thanNormalizer.getBeginIndex or greater than Normalizer.getEndIndex.




setIndexOnly
public void setIndexOnly(int index)(Code)



setMode
public void setMode(Mode newMode)(Code)
Set the normalization mode for this object.

Note:If the normalization mode is changed while iterating over a string, calls to Normalizer.next and Normalizer.previous may return previously buffers characters in the old normalization mode until the iteration is able to re-sync at the next base character. It is safest to call Normalizer.setText setText() , Normalizer.first , Normalizer.last , etc. after calling setMode.


Parameters:
  newMode - the new mode for this Normalizer.The supported modes are:


See Also:   Normalizer.getMode



setOption
public void setOption(int option, boolean value)(Code)
Set options that affect this Normalizer's operation. Options do not change the basic composition or decomposition operation that is being performed , but they control whether certain optional portions of the operation are done. Currently the only available option is:

  • Normalizer.IGNORE_HANGUL - Do not decompose Hangul syllables into the Jamo alphabet and vice-versa. This option is off by default (i.e. Hangul processing is enabled) since the Unicode standard specifies that Hangul to Jamo is a canonical decomposition. For any of the standard Unicode Normalization Forms, you should leave this option off.


Parameters:
  option - the option whose value is to be set.
Parameters:
  value - the new setting for the option. Use true toturn the option on and false to turn it off.
See Also:   Normalizer.getOption




setText
public void setText(String newText)(Code)
Set the input text over which this Normalizer will iterate. The iteration position will be reset to the beginning.


Parameters:
  newText - The new string to be normalized.




setText
public void setText(CharacterIterator newText)(Code)
Set the input text over which this Normalizer will iterate. The iteration position will be reset to the beginning.


Parameters:
  newText - The new text to be normalized.




Methods inherited from java.lang.Object
public boolean equals(Object obj)(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.