Java Doc for UTF16.java in  » 6.0-JDK-Modules-sun » text » sun » text » normalizer » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » 6.0 JDK Modules sun » text » sun.text.normalizer 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   sun.text.normalizer.UTF16

UTF16
final public class UTF16 (Code)

Standalone utility class providing UTF16 character conversions and indexing conversions.

Code that uses strings alone rarely need modification. By design, UTF-16 does not allow overlap, so searching for strings is a safe operation. Similarly, concatenation is always safe. Substringing is safe if the start and end are both on UTF-32 boundaries. In normal code, the values for start and end are on those boundaries, since they arose from operations like searching. If not, the nearest UTF-32 boundaries can be determined using bounds().

Examples:

The following examples illustrate use of some of these methods.

 // iteration forwards: Original
 for (int i = 0; i < s.length(); ++i) {
 char ch = s.charAt(i);
 doSomethingWith(ch);
 }
 // iteration forwards: Changes for UTF-32
 int ch;
 for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) {
 ch = UTF16.charAt(s,i);
 doSomethingWith(ch);
 }
 // iteration backwards: Original
 for (int i = s.length() -1; i >= 0; --i) {
 char ch = s.charAt(i);
 doSomethingWith(ch);
 }
 // iteration backwards: Changes for UTF-32
 int ch;
 for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) {
 ch = UTF16.charAt(s,i);
 doSomethingWith(ch);
 }
 
Notes:
  • Naming: For clarity, High and Low surrogates are called Lead and Trail in the API, which gives a better sense of their ordering in a string. offset16 and offset32 are used to distinguish offsets to UTF-16 boundaries vs offsets to UTF-32 boundaries. int char32 is used to contain UTF-32 characters, as opposed to char16, which is a UTF-16 code unit.
  • Roundtripping Offsets: You can always roundtrip from a UTF-32 offset to a UTF-16 offset and back. Because of the difference in structure, you can roundtrip from a UTF-16 offset to a UTF-32 offset and back if and only if bounds(string, offset16) != TRAIL.
  • Exceptions: The error checking will throw an exception if indices are out of bounds. Other than than that, all methods will behave reasonably, even if unmatched surrogates or out-of-bounds UTF-32 values are present. UCharacter.isLegal() can be used to check for validity if desired.
  • Unmatched Surrogates: If the string contains unmatched surrogates, then these are counted as one UTF-32 value. This matches their iteration behavior, which is vital. It also matches common display practice as missing glyphs (see the Unicode Standard Section 5.4, 5.5).
  • Optimization: The method implementations may need optimization if the compiler doesn't fold static final methods. Since surrogate pairs will form an exceeding small percentage of all the text in the world, the singleton case should always be optimized for.

author:
   Mark Davis, with help from Markus Scherer


Field Summary
final public static  intCODEPOINT_MAX_VALUE
     The highest Unicode code point value (scalar value) according to the Unicode Standard.
final public static  intCODEPOINT_MIN_VALUE
     The lowest Unicode code point value.
final public static  intLEAD_SURROGATE_MAX_VALUE
    
final public static  intLEAD_SURROGATE_MIN_VALUE
    
final public static  intSUPPLEMENTARY_MIN_VALUE
    
final public static  intSURROGATE_MIN_VALUE
    
final public static  intTRAIL_SURROGATE_MAX_VALUE
    
final public static  intTRAIL_SURROGATE_MIN_VALUE
    


Method Summary
public static  StringBufferappend(StringBuffer target, int char32)
     Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  target - the buffer to append to
Parameters:
  char32 - value to append.
public static  intcharAt(String source, int offset16)
     Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access.
public static  intcharAt(char source, int start, int limit, int offset16)
     Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access.
public static  intgetCharCount(int char32)
     Determines how many chars this char32 requires. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input codepoint.
public static  chargetLeadSurrogate(int char32)
     Returns the lead surrogate. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input character.
public static  chargetTrailSurrogate(int char32)
     Returns the trail surrogate. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input character.
public static  booleanisLeadSurrogate(char char16)
     Determines whether the character is a lead surrogate.
Parameters:
  char16 - the input character.
public static  booleanisSurrogate(char char16)
     Determines whether the code value is a surrogate.
Parameters:
  char16 - the input character.
public static  booleanisTrailSurrogate(char char16)
     Determines whether the character is a trail surrogate.
Parameters:
  char16 - the input character.
public static  intmoveCodePointOffset(char source, int start, int limit, int offset16, int shift32)
     Shifts offset16 by the argument number of codepoints within a subarray.
public static  StringvalueOf(int char32)
     Convenience method corresponding to String.valueOf(char).

Field Detail
CODEPOINT_MAX_VALUE
final public static int CODEPOINT_MAX_VALUE(Code)
The highest Unicode code point value (scalar value) according to the Unicode Standard.



CODEPOINT_MIN_VALUE
final public static int CODEPOINT_MIN_VALUE(Code)
The lowest Unicode code point value.



LEAD_SURROGATE_MAX_VALUE
final public static int LEAD_SURROGATE_MAX_VALUE(Code)
Lead surrogate maximum value



LEAD_SURROGATE_MIN_VALUE
final public static int LEAD_SURROGATE_MIN_VALUE(Code)
Lead surrogate minimum value



SUPPLEMENTARY_MIN_VALUE
final public static int SUPPLEMENTARY_MIN_VALUE(Code)
The minimum value for Supplementary code points



SURROGATE_MIN_VALUE
final public static int SURROGATE_MIN_VALUE(Code)
Surrogate minimum value



TRAIL_SURROGATE_MAX_VALUE
final public static int TRAIL_SURROGATE_MAX_VALUE(Code)
Trail surrogate maximum value



TRAIL_SURROGATE_MIN_VALUE
final public static int TRAIL_SURROGATE_MIN_VALUE(Code)
Trail surrogate minimum value





Method Detail
append
public static StringBuffer append(StringBuffer target, int char32)(Code)
Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  target - the buffer to append to
Parameters:
  char32 - value to append. the updated StringBuffer
exception:
  IllegalArgumentException - thrown when char32 does not liewithin the range of the Unicode codepoints



charAt
public static int charAt(String source, int offset16)(Code)
Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned
Parameters:
  source - array of UTF-16 chars
Parameters:
  offset16 - UTF-16 offset to the start of the character. UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as inbounds32().
exception:
  IndexOutOfBoundsException - thrown if offset16 is out ofbounds.



charAt
public static int charAt(char source, int start, int limit, int offset16)(Code)
Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with UTF16.getCharCount(), as well as random access. If a validity check is required, use UCharacter.isLegal() on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned
Parameters:
  source - array of UTF-16 chars
Parameters:
  start - offset to substring in the source array for analyzing
Parameters:
  limit - offset to substring in the source array for analyzing
Parameters:
  offset16 - UTF-16 offset relative to start UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as inbounds32().
exception:
  IndexOutOfBoundsException - thrown if offset16 is not withinthe range of start and limit.



getCharCount
public static int getCharCount(int char32)(Code)
Determines how many chars this char32 requires. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input codepoint. 2 if is in supplementary space, otherwise 1.



getLeadSurrogate
public static char getLeadSurrogate(int char32)(Code)
Returns the lead surrogate. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input character. lead surrogate if the getCharCount(ch) is 2;
and 0 otherwise (note: 0 is not a valid lead surrogate).



getTrailSurrogate
public static char getTrailSurrogate(int char32)(Code)
Returns the trail surrogate. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input character. the trail surrogate if the getCharCount(ch) is 2;
otherwisethe character itself



isLeadSurrogate
public static boolean isLeadSurrogate(char char16)(Code)
Determines whether the character is a lead surrogate.
Parameters:
  char16 - the input character. true iff the input character is a lead surrogate



isSurrogate
public static boolean isSurrogate(char char16)(Code)
Determines whether the code value is a surrogate.
Parameters:
  char16 - the input character. true iff the input character is a surrogate.



isTrailSurrogate
public static boolean isTrailSurrogate(char char16)(Code)
Determines whether the character is a trail surrogate.
Parameters:
  char16 - the input character. true iff the input character is a trail surrogate.



moveCodePointOffset
public static int moveCodePointOffset(char source, int start, int limit, int offset16, int shift32)(Code)
Shifts offset16 by the argument number of codepoints within a subarray.
Parameters:
  source - char array
Parameters:
  start - position of the subarray to be performed on
Parameters:
  limit - position of the subarray to be performed on
Parameters:
  offset16 - UTF16 position to shift relative to start
Parameters:
  shift32 - number of codepoints to shift new shifted offset16 relative to start
exception:
  IndexOutOfBoundsException - if the new offset16 is out ofbounds with respect to the subarray or the subarray boundsare out of range.



valueOf
public static String valueOf(int char32)(Code)
Convenience method corresponding to String.valueOf(char). Returns a one or two char string containing the UTF-32 value in UTF16 format. If a validity check is required, use isLegal() on char32 before calling.
Parameters:
  char32 - the input character. string value of char32 in UTF16 format
exception:
  IllegalArgumentException - thrown if char32 is a invalidcodepoint.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.