Java Doc for UTF16.java in » Internationalization-Localization » icu4j » com » ibm » icu » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Internationalization Localization » icu4j » com.ibm.icu.text

Source Cross Reference Class Diagram Java Document (Java Doc)

java.lang .Object

com.ibm.icu.text .UTF16

UTF16

final public class UTF16 (Code)

Standalone utility class providing UTF16 character conversions and indexing conversions.

Code that uses strings alone rarely need modification. By design, UTF-16 does not allow overlap, so searching for strings is a safe operation. Similarly, concatenation is always safe. Substringing is safe if the start and end are both on UTF-32 boundaries. In normal code, the values for start and end are on those boundaries, since they arose from operations like searching. If not, the nearest UTF-32 boundaries can be determined using bounds().

Examples:

The following examples illustrate use of some of these methods.

 // iteration forwards: Original
 for (int i = 0; i < s.length(); ++i) {
 char ch = s.charAt(i);
 doSomethingWith(ch);
 }
 // iteration forwards: Changes for UTF-32
 int ch;
 for (int i = 0; i < s.length(); i+=UTF16.getCharCount(ch)) {
 ch = UTF16.charAt(s,i);
 doSomethingWith(ch);
 }
 // iteration backwards: Original
 for (int i = s.length() -1; i >= 0; --i) {
 char ch = s.charAt(i);
 doSomethingWith(ch);
 }
 // iteration backwards: Changes for UTF-32
 int ch;
 for (int i = s.length() -1; i > 0; i-=UTF16.getCharCount(ch)) {
 ch = UTF16.charAt(s,i);
 doSomethingWith(ch);
 }

Notes:

Naming: For clarity, High and Low surrogates are called Lead and Trail in the API, which gives a better sense of their ordering in a string. offset16 and offset32 are used to distinguish offsets to UTF-16 boundaries vs offsets to UTF-32 boundaries. int char32 is used to contain UTF-32 characters, as opposed to char16, which is a UTF-16 code unit.
Roundtripping Offsets: You can always roundtrip from a UTF-32 offset to a UTF-16 offset and back. Because of the difference in structure, you can roundtrip from a UTF-16 offset to a UTF-32 offset and back if and only if bounds(string, offset16) != TRAIL.
Exceptions: The error checking will throw an exception if indices are out of bounds. Other than than that, all methods will behave reasonably, even if unmatched surrogates or out-of-bounds UTF-32 values are present. UCharacter.isLegal() can be used to check for validity if desired.
Unmatched Surrogates: If the string contains unmatched surrogates, then these are counted as one UTF-32 value. This matches their iteration behavior, which is vital. It also matches common display practice as missing glyphs (see the Unicode Standard Section 5.4, 5.5).
Optimization: The method implementations may need optimization if the compiler doesn't fold static final methods. Since surrogate pairs will form an exceeding small percentage of all the text in the world, the singleton case should always be optimized for.

author:
Mark Davis, with help from Markus Scherer

Inner Class :final public static class StringComparator implements java.util.Comparator

Field Summary
final public static int	CODEPOINT_MAX_VALUE The highest Unicode code point value (scalar value) according to the Unicode Standard.
final public static int	CODEPOINT_MIN_VALUE The lowest Unicode code point value.
final public static int	LEAD_SURROGATE_MAX_VALUE
final public static int	LEAD_SURROGATE_MIN_VALUE
final public static int	SINGLE_CHAR_BOUNDARYLEAD_SURROGATE_BOUNDARYTRAIL_SURROGATE_BOUNDARY Value returned in `bounds()`.
final public static int	SUPPLEMENTARY_MIN_VALUE
final public static int	SURROGATE_MAX_VALUE
final public static int	SURROGATE_MIN_VALUE
final public static int	TRAIL_SURROGATE_MAX_VALUE
final public static int	TRAIL_SURROGATE_MIN_VALUE

Method Summary
public static StringBuffer	append(StringBuffer target, int char32) Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling. Parameters: target - the buffer to append to Parameters: char32 - value to append.
public static int	append(char[] target, int limit, int char32) Adds a codepoint to offset16 position of the argument char array.
public static StringBuffer	appendCodePoint(StringBuffer target, int cp) Cover JDK 1.5 APIs.
public static int	bounds(String source, int offset16) Returns the type of the boundaries around the char at offset16.
public static int	bounds(StringBuffer source, int offset16) Returns the type of the boundaries around the char at offset16.
public static int	bounds(char source, int start, int limit, int offset16) Returns the type of the boundaries around the char at offset16.
public static int	charAt(String source, int offset16) Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access.
public static int	charAt(CharSequence source, int offset16) Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access.
public static int	charAt(StringBuffer source, int offset16) Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access.
public static int	charAt(char source, int start, int limit, int offset16) Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access.
public static int	charAt(Replaceable source, int offset16) Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access.
public static int	countCodePoint(String source)
public static int	countCodePoint(StringBuffer source)
public static int	countCodePoint(char source, int start, int limit)
public static StringBuffer	delete(StringBuffer target, int offset16) Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).
public static int	delete(char target, int limit, int offset16) Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise).
public static int	findCodePointOffset(String source, int offset16) Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given UTF-16 offset.
public static int	findCodePointOffset(StringBuffer source, int offset16) Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset.
public static int	findCodePointOffset(char source, int start, int limit, int offset16) Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset.
public static int	findOffsetFromCodePoint(String source, int offset32) Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access.
public static int	findOffsetFromCodePoint(StringBuffer source, int offset32) Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access.
public static int	findOffsetFromCodePoint(char source, int start, int limit, int offset32) Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access.
public static int	getCharCount(int char32) Determines how many chars this char32 requires. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input codepoint.
public static char	getLeadSurrogate(int char32) Returns the lead surrogate. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input character.
public static char	getTrailSurrogate(int char32) Returns the trail surrogate. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input character.
public static boolean	hasMoreCodePointsThan(String source, int number) Check if the string contains more Unicode code points than a certain number.
public static boolean	hasMoreCodePointsThan(char source, int start, int limit, int number) Check if the sub-range of char array, from argument start to limit, contains more Unicode code points than a certain number.
public static boolean	hasMoreCodePointsThan(StringBuffer source, int number) Check if the string buffer contains more Unicode code points than a certain number.
public static int	indexOf(String source, int char32) Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint.
public static int	indexOf(String source, String str) Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str.
public static int	indexOf(String source, int char32, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint.
public static int	indexOf(String source, String str, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str.
public static StringBuffer	insert(StringBuffer target, int offset16, int char32) Inserts char32 codepoint into target at the argument offset16.
public static int	insert(char target, int limit, int offset16, int char32) Inserts char32 codepoint into target at the argument offset16.
public static boolean	isLeadSurrogate(char char16) Determines whether the character is a lead surrogate. Parameters: char16 - the input character.
public static boolean	isSurrogate(char char16) Determines whether the code value is a surrogate. Parameters: char16 - the input character.
public static boolean	isTrailSurrogate(char char16) Determines whether the character is a trail surrogate. Parameters: char16 - the input character.
public static int	lastIndexOf(String source, int char32) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint.
public static int	lastIndexOf(String source, String str) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str.
public static int	lastIndexOf(String source, int char32, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex. This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character. source is searched backwards starting at the last character starting at the specified index. Examples: UTF16.lastIndexOf("abc", 'c', 2) returns 2 UTF16.lastIndexOf("abc", 'c', 1) returns -1 UTF16.lastIndexOf("abc\ud800\udc00", 0x10000, 5) returns 3 UTF16.lastIndexOf("abc\ud800\udc00", 0x10000, 3) returns 3 UTF16.lastIndexOf("abc\ud800\udc00", 0xd800) returns -1 Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest. Parameters: source - UTF16 format Unicode string that will be searched Parameters: char32 - codepoint to search for Parameters: fromIndex - the index to start the search from.
public static int	lastIndexOf(String source, String str, int fromIndex) Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match.
public static int	moveCodePointOffset(String source, int offset16, int shift32)
public static int	moveCodePointOffset(StringBuffer source, int offset16, int shift32)
public static int	moveCodePointOffset(char source, int start, int limit, int offset16, int shift32) Shifts offset16 by the argument number of codepoints within a subarray.
public static String	newString(int[] codePoints, int offset, int count) Cover JDK 1.5 API.
public static String	replace(String source, int oldChar32, int newChar32) Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldChar32 in source with newChar32. If the character oldChar32 does not occur in the UTF16 format Unicode string source, then source will be returned.
public static String	replace(String source, String oldStr, String newStr) Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldStr in source with newStr. If the string oldStr does not occur in the UTF16 format Unicode string source, then source will be returned.
public static StringBuffer	reverse(StringBuffer source) Reverses a UTF16 format Unicode string and replaces source's content with it.
public static void	setCharAt(StringBuffer target, int offset16, int char32) Set a code point into a UTF16 position.
public static int	setCharAt(char target, int limit, int offset16, int char32) Set a code point into a UTF16 position in a char array. Adjusts target according if we are replacing a non-supplementary codepoint with a supplementary and vice versa. Parameters: target - char array Parameters: limit - numbers of valid chars in target, different fromtarget.length.
public static String	valueOf(int char32) Convenience method corresponding to String.valueOf(char).
public static String	valueOf(String source, int offset16) Convenience method corresponding to String.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format.
public static String	valueOf(StringBuffer source, int offset16) Convenience method corresponding to StringBuffer.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format.
public static String	valueOf(char source, int start, int limit, int offset16) Convenience method. Returns a one or two char string containing the UTF-32 value in UTF16 format.

Field Detail

CODEPOINT_MAX_VALUE
final public static int CODEPOINT_MAX_VALUE(Code)
	The highest Unicode code point value (scalar value) according to the Unicode Standard.

CODEPOINT_MIN_VALUE
final public static int CODEPOINT_MIN_VALUE(Code)
	The lowest Unicode code point value.

LEAD_SURROGATE_MAX_VALUE
final public static int LEAD_SURROGATE_MAX_VALUE(Code)
	Lead surrogate maximum value

LEAD_SURROGATE_MIN_VALUE
final public static int LEAD_SURROGATE_MIN_VALUE(Code)
	Lead surrogate minimum value

SINGLE_CHAR_BOUNDARYLEAD_SURROGATE_BOUNDARYTRAIL_SURROGATE_BOUNDARY
final public static int SINGLE_CHAR_BOUNDARYLEAD_SURROGATE_BOUNDARYTRAIL_SURROGATE_BOUNDARY(Code)
	Value returned in `bounds()`. These values are chosen specifically so that it actually represents the position of the character [offset16 - (value >> 2), offset16 + (value & 3)]

SUPPLEMENTARY_MIN_VALUE
final public static int SUPPLEMENTARY_MIN_VALUE(Code)
	The minimum value for Supplementary code points

SURROGATE_MAX_VALUE
final public static int SURROGATE_MAX_VALUE(Code)
	Maximum surrogate value

SURROGATE_MIN_VALUE
final public static int SURROGATE_MIN_VALUE(Code)
	Surrogate minimum value

TRAIL_SURROGATE_MAX_VALUE
final public static int TRAIL_SURROGATE_MAX_VALUE(Code)
	Trail surrogate maximum value

TRAIL_SURROGATE_MIN_VALUE
final public static int TRAIL_SURROGATE_MIN_VALUE(Code)
	Trail surrogate minimum value

Method Detail

append
public static StringBuffer append(StringBuffer target, int char32)(Code)
	Append a single UTF-32 value to the end of a StringBuffer. If a validity check is required, use isLegal() on char32 before calling. Parameters: target - the buffer to append to Parameters: char32 - value to append. the updated StringBuffer exception: IllegalArgumentException - thrown when char32 does not liewithin the range of the Unicode codepoints

append
public static int append(char[] target, int limit, int char32)(Code)
	Adds a codepoint to offset16 position of the argument char array. Parameters: target - char array to be append with the new code point Parameters: limit - UTF16 offset which the codepoint will be appended. Parameters: char32 - code point to be appended offset after char32 in the array. exception: IllegalArgumentException - thrown if there is not enoughspace for the append, or when char32 does not lie withinthe range of the Unicode codepoints.

appendCodePoint
public static StringBuffer appendCodePoint(StringBuffer target, int cp)(Code)
	Cover JDK 1.5 APIs. Append the code point to the buffer and return the buffer as a convenience. Parameters: target - the buffer to append to Parameters: cp - the code point to append the updated StringBuffer throws: IllegalArgumentException - if cp is not a valid code point

bounds
public static int bounds(String source, int offset16)(Code)
	Returns the type of the boundaries around the char at offset16. Used for random access. Parameters: source - text to analyse Parameters: offset16 - UTF-16 offset SINGLE_CHAR_BOUNDARY : a single char; the bounds are[offset16, offset16+1] LEAD_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16;the bounds are[offset16, offset16 + 2] TRAIL_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16 - 1; the bounds are[offset16 - 1, offset16 + 1] For bit-twiddlers, the return values for these are chosen sothat the boundaries can be gotten by:[offset16 - (value >> 2), offset16 + (value & 3)]. exception: IndexOutOfBoundsException - if offset16 is out of bounds.

bounds
public static int bounds(StringBuffer source, int offset16)(Code)
	Returns the type of the boundaries around the char at offset16. Used for random access. Parameters: source - string buffer to analyse Parameters: offset16 - UTF16 offset SINGLE_CHAR_BOUNDARY : a single char; the bounds are[offset16, offset16 + 1] LEAD_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16; the bounds are[offset16, offset16 + 2] TRAIL_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16 - 1; the bounds are[offset16 - 1, offset16 + 1] For bit-twiddlers, the return values for these are chosen so that theboundaries can be gotten by:[offset16 - (value >> 2), offset16 + (value & 3)]. exception: IndexOutOfBoundsException - if offset16 is out of bounds.

bounds
public static int bounds(char source, int start, int limit, int offset16)(Code)
	Returns the type of the boundaries around the char at offset16. Used for random access. Note that the boundaries are determined with respect to the subarray, hence the char array {0xD800, 0xDC00} has the result SINGLE_CHAR_BOUNDARY for start = offset16 = 0 and limit = 1. Parameters: source - char array to analyse Parameters: start - offset to substring in the source array for analyzing Parameters: limit - offset to substring in the source array for analyzing Parameters: offset16 - UTF16 offset relative to start SINGLE_CHAR_BOUNDARY : a single char; the bounds are LEAD_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16; the bounds are [offset16, offset16 + 2] TRAIL_SURROGATE_BOUNDARY : a surrogate pair starting atoffset16 - 1; the bounds are [offset16 - 1, offset16 + 1] For bit-twiddlers, the boundary values for these are chosen so that theboundaries can be gotten by: [offset16 - (boundvalue >> 2), offset16+ (boundvalue & 3)]. exception: IndexOutOfBoundsException - if offset16 is not within therange of start and limit.

charAt
public static int charAt(String source, int offset16)(Code)
	Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access. If a validity check is required, use `UCharacter.isLegal()` on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned Parameters: source - array of UTF-16 chars Parameters: offset16 - UTF-16 offset to the start of the character. UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as in`bounds32()`. exception: IndexOutOfBoundsException - thrown if offset16 is out ofbounds.

charAt
public static int charAt(CharSequence source, int offset16)(Code)
	Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access. If a validity check is required, use `UCharacter.isLegal()` on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned Parameters: source - array of UTF-16 chars Parameters: offset16 - UTF-16 offset to the start of the character. UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as in`bounds32()`. exception: IndexOutOfBoundsException - thrown if offset16 is out ofbounds.

charAt
public static int charAt(StringBuffer source, int offset16)(Code)
	Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access. If a validity check is required, use `UCharacter.isLegal()` on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned Parameters: source - UTF-16 chars string buffer Parameters: offset16 - UTF-16 offset to the start of the character. UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as in`bounds32()`. exception: IndexOutOfBoundsException - thrown if offset16 is out ofbounds.

charAt
public static int charAt(char source, int start, int limit, int offset16)(Code)
	Extract a single UTF-32 value from a substring. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access. If a validity check is required, use `UCharacter.isLegal()` on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned Parameters: source - array of UTF-16 chars Parameters: start - offset to substring in the source array for analyzing Parameters: limit - offset to substring in the source array for analyzing Parameters: offset16 - UTF-16 offset relative to start UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as in`bounds32()`. exception: IndexOutOfBoundsException - thrown if offset16 is not withinthe range of start and limit.

charAt
public static int charAt(Replaceable source, int offset16)(Code)
	Extract a single UTF-32 value from a string. Used when iterating forwards or backwards (with `UTF16.getCharCount()`, as well as random access. If a validity check is required, use `UCharacter.isLegal()` on the return value. If the char retrieved is part of a surrogate pair, its supplementary character will be returned. If a complete supplementary character is not found the incomplete character will be returned Parameters: source - UTF-16 chars string buffer Parameters: offset16 - UTF-16 offset to the start of the character. UTF-32 value for the UTF-32 value that contains the char atoffset16. The boundaries of that codepoint are the same as in`bounds32()`. exception: IndexOutOfBoundsException - thrown if offset16 is out ofbounds.

countCodePoint
public static int countCodePoint(String source)(Code)
	Number of codepoints in a UTF16 String Parameters: source - UTF16 string number of codepoint in string

countCodePoint
public static int countCodePoint(StringBuffer source)(Code)
	Number of codepoints in a UTF16 String buffer Parameters: source - UTF16 string buffer number of codepoint in string

countCodePoint
public static int countCodePoint(char source, int start, int limit)(Code)
	Number of codepoints in a UTF16 char array substring Parameters: source - UTF16 char array Parameters: start - offset of the substring Parameters: limit - offset of the substring number of codepoint in the substring exception: IndexOutOfBoundsException - if start and limit are not valid.

delete
public static StringBuffer delete(StringBuffer target, int offset16)(Code)
	Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise). Parameters: target - string buffer to remove codepoint from Parameters: offset16 - offset which the codepoint will be removed a reference to target exception: IndexOutOfBoundsException - thrown if offset16 is invalid.

delete
public static int delete(char target, int limit, int offset16)(Code)
	Removes the codepoint at the specified position in this target (shortening target by 1 character if the codepoint is a non-supplementary, 2 otherwise). Parameters: target - string buffer to remove codepoint from Parameters: limit - end index of the char array, limit <= target.length Parameters: offset16 - offset which the codepoint will be removed a new limit size exception: IndexOutOfBoundsException - thrown if offset16 is invalid.

findCodePointOffset

public static int findCodePointOffset(String source, int offset16)(Code)

Returns the UTF-32 offset corresponding to the first UTF-32 boundary at or after the given UTF-16 offset. Used for random access. See the class description for notes on roundtripping.
Note: If the UTF-16 offset is into the middle of a surrogate pair, then the UTF-32 offset of the lead of the pair is returned.

To find the UTF-32 length of a string, use:

 len32 = countCodePoint(source, source.length());

Parameters:
  source - text to analyse
Parameters:
  offset16 - UTF-16 offset < source text length. UTF-32 offset
exception:
  IndexOutOfBoundsException - if offset16 is out of bounds.

findCodePointOffset

public static int findCodePointOffset(StringBuffer source, int offset16)(Code)

Returns the UTF-32 offset corresponding to the first UTF-32 boundary at the given UTF-16 offset. Used for random access. See the class description for notes on roundtripping.
Note: If the UTF-16 offset is into the middle of a surrogate pair, then the UTF-32 offset of the lead of the pair is returned.

To find the UTF-32 length of a string, use:

 len32 = countCodePoint(source);

findCodePointOffset

public static int findCodePointOffset(char source, int start, int limit, int offset16)(Code)

To find the UTF-32 length of a substring, use:

 len32 = countCodePoint(source, start, limit);

Parameters:
  source - text to analyse
Parameters:
  start - offset of the substring
Parameters:
  limit - offset of the substring
Parameters:
  offset16 - UTF-16 relative to start UTF-32 offset relative to start
exception:
  IndexOutOfBoundsException - if offset16 is not within therange of start and limit.

findOffsetFromCodePoint
public static int findOffsetFromCodePoint(String source, int offset32)(Code)
	Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping. Parameters: source - the UTF-16 string Parameters: offset32 - UTF-32 offset UTF-16 offset exception: IndexOutOfBoundsException - if offset32 is out of bounds.

findOffsetFromCodePoint
public static int findOffsetFromCodePoint(StringBuffer source, int offset32)(Code)
	Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping. Parameters: source - the UTF-16 string buffer Parameters: offset32 - UTF-32 offset UTF-16 offset exception: IndexOutOfBoundsException - if offset32 is out of bounds.

findOffsetFromCodePoint
public static int findOffsetFromCodePoint(char source, int start, int limit, int offset32)(Code)
	Returns the UTF-16 offset that corresponds to a UTF-32 offset. Used for random access. See the class description for notes on roundtripping. Parameters: source - the UTF-16 char array whose substring is to be analysed Parameters: start - offset of the substring to be analysed Parameters: limit - offset of the substring to be analysed Parameters: offset32 - UTF-32 offset relative to start UTF-16 offset relative to start exception: IndexOutOfBoundsException - if offset32 is out of bounds.

getCharCount
public static int getCharCount(int char32)(Code)
	Determines how many chars this char32 requires. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input codepoint. 2 if is in supplementary space, otherwise 1.

getLeadSurrogate
public static char getLeadSurrogate(int char32)(Code)
	Returns the lead surrogate. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input character. lead surrogate if the getCharCount(ch) is 2; and 0 otherwise (note: 0 is not a valid lead surrogate).

getTrailSurrogate
public static char getTrailSurrogate(int char32)(Code)
	Returns the trail surrogate. If a validity check is required, use `isLegal()` on char32 before calling. Parameters: char32 - the input character. the trail surrogate if the getCharCount(ch) is 2; otherwisethe character itself

hasMoreCodePointsThan
public static boolean hasMoreCodePointsThan(String source, int number)(Code)
	Check if the string contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire string and comparing that number with a threshold. This function may not need to scan the string at all if the length is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(s) > number). A Unicode code point may occupy either one or two code units. Parameters: source - The input string. Parameters: number - The number of code points in the string is comparedagainst the 'number' parameter. boolean value for whether the string contains more Unicode codepoints than 'number'.

hasMoreCodePointsThan
public static boolean hasMoreCodePointsThan(char source, int start, int limit, int number)(Code)
	Check if the sub-range of char array, from argument start to limit, contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire char array range and comparing that number with a threshold. This function may not need to scan the char array at all if start and limit is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(source, start, limit) > number). A Unicode code point may occupy either one or two code units. Parameters: source - array of UTF-16 chars Parameters: start - offset to substring in the source array for analyzing Parameters: limit - offset to substring in the source array for analyzing Parameters: number - The number of code points in the string is comparedagainst the 'number' parameter. boolean value for whether the string contains more Unicode codepoints than 'number'. exception: IndexOutOfBoundsException - thrown when limit < start

hasMoreCodePointsThan
public static boolean hasMoreCodePointsThan(StringBuffer source, int number)(Code)
	Check if the string buffer contains more Unicode code points than a certain number. This is more efficient than counting all code points in the entire string buffer and comparing that number with a threshold. This function may not need to scan the string buffer at all if the length is within a certain range, and never needs to count more than 'number + 1' code points. Logically equivalent to (countCodePoint(s) > number). A Unicode code point may occupy either one or two code units. Parameters: source - The input string buffer. Parameters: number - The number of code points in the string buffer is comparedagainst the 'number' parameter. boolean value for whether the string buffer contains moreUnicode code points than 'number'.

indexOf

public static int indexOf(String source, int char32)(Code)

Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint. I.e., the smallest index i such that

UTF16.charAt(source, i) ==
 char32

is true.

If no such character occurs in this string, then -1 is returned.

Examples:
UTF16.indexOf("abc", 'a') returns 0
UTF16.indexOf("abc\ud800\udc00", 0x10000) returns 3
UTF16.indexOf("abc\ud800\udc00", 0xd800) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.
Parameters:
source - UTF16 format Unicode string that will be searched
Parameters:
char32 - codepoint to search for the index of the first occurrence of the codepoint in theargument Unicode string, or -1 if the codepoint does not occur.

indexOf

public static int indexOf(String source, String str)(Code)

Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument string str. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.e Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str. See example below.

If no such string str occurs in this source, then -1 is returned.

Examples:
UTF16.indexOf("abc", "ab") returns 0
UTF16.indexOf("abc\ud800\udc00", "\ud800\udc00") returns 3
UTF16.indexOf("abc\ud800\udc00", "\ud800") returns -1

indexOf

public static int indexOf(String source, int char32, int fromIndex)(Code)

Returns the index within the argument UTF16 format Unicode string of the first occurrence of the argument codepoint. I.e., the smallest index i such that:
(UTF16.charAt(source, i) == char32 && i >= fromIndex) is true.

If no such character occurs in this string, then -1 is returned.

Examples:
UTF16.indexOf("abc", 'a', 1) returns -1
UTF16.indexOf("abc\ud800\udc00", 0x10000, 1) returns 3
UTF16.indexOf("abc\ud800\udc00", 0xd800, 1) returns -1

indexOf

public static int indexOf(String source, String str, int fromIndex)(Code)

If no such string str occurs in this source, then -1 is returned.

Examples:
UTF16.indexOf("abc", "ab", 0) returns 0
UTF16.indexOf("abc\ud800\udc00", "\ud800\udc00", 0) returns 3
UTF16.indexOf("abc\ud800\udc00", "\ud800\udc00", 2) returns 3
UTF16.indexOf("abc\ud800\udc00", "\ud800", 0) returns -1

insert

public static StringBuffer insert(StringBuffer target, int offset16, int char32)(Code)

Inserts char32 codepoint into target at the argument offset16. If the offset16 is in the middle of a supplementary codepoint, char32 will be inserted after the supplementary codepoint. The length of target increases by one if codepoint is non-supplementary, 2 otherwise.

The overall effect is exactly as if the argument were converted to a string by the method valueOf(char) and the characters in that string were then inserted into target at the position indicated by offset16.

The offset argument must be greater than or equal to 0, and less than or equal to the length of source.
Parameters:
  target - string buffer to insert to
Parameters:
  offset16 - offset which char32 will be inserted in
Parameters:
  char32 - codepoint to be inserted a reference to target
exception:
  IndexOutOfBoundsException - thrown if offset16 is invalid.

insert

public static int insert(char target, int limit, int offset16, int char32)(Code)

Inserts char32 codepoint into target at the argument offset16. If the offset16 is in the middle of a supplementary codepoint, char32 will be inserted after the supplementary codepoint. Limit increases by one if codepoint is non-supplementary, 2 otherwise.

The offset argument must be greater than or equal to 0, and less than or equal to the limit.
Parameters:
  target - char array to insert to
Parameters:
  limit - end index of the char array, limit <= target.length
Parameters:
  offset16 - offset which char32 will be inserted in
Parameters:
  char32 - codepoint to be inserted new limit size
exception:
  IndexOutOfBoundsException - thrown if offset16 is invalid.

isLeadSurrogate
public static boolean isLeadSurrogate(char char16)(Code)
	Determines whether the character is a lead surrogate. Parameters: char16 - the input character. true iff the input character is a lead surrogate

isSurrogate
public static boolean isSurrogate(char char16)(Code)
	Determines whether the code value is a surrogate. Parameters: char16 - the input character. true iff the input character is a surrogate.

isTrailSurrogate
public static boolean isTrailSurrogate(char char16)(Code)
	Determines whether the character is a trail surrogate. Parameters: char16 - the input character. true iff the input character is a trail surrogate.

lastIndexOf

public static int lastIndexOf(String source, int char32)(Code)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint. I.e., the index returned is the largest value i such that: UTF16.charAt(source, i) == char32 is true.

Examples:
UTF16.lastIndexOf("abc", 'a') returns 0
UTF16.lastIndexOf("abc\ud800\udc00", 0x10000) returns 3
UTF16.lastIndexOf("abc\ud800\udc00", 0xd800) returns -1

source is searched backwards starting at the last character.

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.
Parameters:
source - UTF16 format Unicode string that will be searched
Parameters:
char32 - codepoint to search for the index of the last occurrence of the codepoint in source,or -1 if the codepoint does not occur.

lastIndexOf

public static int lastIndexOf(String source, String str)(Code)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str. This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity.e Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str. See example below.

Examples:
UTF16.lastIndexOf("abc", "a") returns 0
UTF16.lastIndexOf("abc\ud800\udc00", "\ud800\udc00") returns 3
UTF16.lastIndexOf("abc\ud800\udc00", "\ud800") returns -1

source is searched backwards starting at the last character.

lastIndexOf

public static int lastIndexOf(String source, int char32, int fromIndex)(Code)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument codepoint, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a single surrogate character will not match a supplementary character.

source is searched backwards starting at the last character starting at the specified index.

Examples:
UTF16.lastIndexOf("abc", 'c', 2) returns 2
UTF16.lastIndexOf("abc", 'c', 1) returns -1
UTF16.lastIndexOf("abc\ud800\udc00", 0x10000, 5) returns 3
UTF16.lastIndexOf("abc\ud800\udc00", 0x10000, 3) returns 3
UTF16.lastIndexOf("abc\ud800\udc00", 0xd800) returns -1

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.
Parameters:
  source - UTF16 format Unicode string that will be searched
Parameters:
  char32 - codepoint to search for
Parameters:
  fromIndex - the index to start the search from. There is norestriction on the value of fromIndex. If it isgreater than or equal to the length of this string,it has the same effect as if it were equal to oneless than the length of this string: this entirestring may be searched. If it is negative, it hasthe same effect as if it were -1: -1 is returned. the index of the last occurrence of the codepoint in source,or -1 if the codepoint does not occur.

lastIndexOf

public static int lastIndexOf(String source, String str, int fromIndex)(Code)

Returns the index within the argument UTF16 format Unicode string of the last occurrence of the argument string str, where the result is less than or equals to fromIndex.

This method is implemented based on codepoints, hence a "lead surrogate character + trail surrogate character" is treated as one entity. Hence if the str starts with trail surrogate character at index 0, a source with a leading a surrogate character before str found at in source will not have a valid match. Vice versa for lead surrogates that ends str.

See example below.

Examples:
UTF16.lastIndexOf("abc", "c", 2) returns 2
UTF16.lastIndexOf("abc", "c", 1) returns -1
UTF16.lastIndexOf("abc\ud800\udc00", "\ud800\udc00", 5) returns 3
UTF16.lastIndexOf("abc\ud800\udc00", "\ud800\udc00", 3) returns 3
UTF16.lastIndexOf("abc\ud800\udc00", "\ud800", 4) returns -1

source is searched backwards starting at the last character.

Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest.
Parameters:
  source - UTF16 format Unicode string that will be searched
Parameters:
  str - UTF16 format Unicode string to search for
Parameters:
  fromIndex - the index to start the search from. There is norestriction on the value of fromIndex. If it isgreater than or equal to the length of this string,it has the same effect as if it were equal to oneless than the length of this string: this entirestring may be searched. If it is negative, it hasthe same effect as if it were -1: -1 is returned. the index of the last occurrence of the codepoint in source,or -1 if the codepoint does not occur.

moveCodePointOffset
public static int moveCodePointOffset(String source, int offset16, int shift32)(Code)
	Shifts offset16 by the argument number of codepoints Parameters: source - string Parameters: offset16 - UTF16 position to shift Parameters: shift32 - number of codepoints to shift new shifted offset16 exception: IndexOutOfBoundsException - if the new offset16 is out ofbounds.

moveCodePointOffset
public static int moveCodePointOffset(StringBuffer source, int offset16, int shift32)(Code)
	Shifts offset16 by the argument number of codepoints Parameters: source - string buffer Parameters: offset16 - UTF16 position to shift Parameters: shift32 - number of codepoints to shift new shifted offset16 exception: IndexOutOfBoundsException - if the new offset16 is out ofbounds.

moveCodePointOffset
public static int moveCodePointOffset(char source, int start, int limit, int offset16, int shift32)(Code)
	Shifts offset16 by the argument number of codepoints within a subarray. Parameters: source - char array Parameters: start - position of the subarray to be performed on Parameters: limit - position of the subarray to be performed on Parameters: offset16 - UTF16 position to shift relative to start Parameters: shift32 - number of codepoints to shift new shifted offset16 relative to start exception: IndexOutOfBoundsException - if the new offset16 is out ofbounds with respect to the subarray or the subarray boundsare out of range.

newString
public static String newString(int[] codePoints, int offset, int count)(Code)
	Cover JDK 1.5 API. Create a String from an array of codePoints. Parameters: codePoints - the code array Parameters: offset - the start of the text in the code point array Parameters: count - the number of code points a String representing the code points between offset and count throws: IllegalArgumentException - if an invalid code point is encountered throws: IndexOutOfBoundsException - if the offset or count are out of bounds.

replace
public static String replace(String source, int oldChar32, int newChar32)(Code)
	Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldChar32 in source with newChar32. If the character oldChar32 does not occur in the UTF16 format Unicode string source, then source will be returned. Otherwise, a new String object is created that represents a codepoint sequence identical to the codepoint sequence represented by source, except that every occurrence of oldChar32 is replaced by an occurrence of newChar32. Examples: UTF16.replace("mesquite in your cellar", 'e', 'o'); returns "mosquito in your collar" UTF16.replace("JonL", 'q', 'x'); returns "JonL" (no change) UTF16.replace("Supplementary character \ud800\udc00", 0x10000, '!'); returns "Supplementary character !" UTF16.replace("Supplementary character \ud800\udc00", 0xd800, '!'); returns "Supplementary character \ud800\udc00" Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest. Parameters: source - UTF16 format Unicode string which the codepointreplacements will be based on. Parameters: oldChar32 - non-zero old codepoint to be replaced. Parameters: newChar32 - the new codepoint to replace oldChar32 new String derived from source by replacing every occurrenceof oldChar32 with newChar32, unless when no oldChar32 is foundin source then source will be returned.

replace
public static String replace(String source, String oldStr, String newStr)(Code)
	Returns a new UTF16 format Unicode string resulting from replacing all occurrences of oldStr in source with newStr. If the string oldStr does not occur in the UTF16 format Unicode string source, then source will be returned. Otherwise, a new String object is created that represents a codepoint sequence identical to the codepoint sequence represented by source, except that every occurrence of oldStr is replaced by an occurrence of newStr. Examples: UTF16.replace("mesquite in your cellar", "e", "o"); returns "mosquito in your collar" UTF16.replace("mesquite in your cellar", "mesquite", "cat"); returns "cat in your cellar" UTF16.replace("JonL", "q", "x"); returns "JonL" (no change) UTF16.replace("Supplementary character \ud800\udc00", "\ud800\udc00", '!'); returns "Supplementary character !" UTF16.replace("Supplementary character \ud800\udc00", "\ud800", '!'); returns "Supplementary character \ud800\udc00" Note this method is provided as support to jdk 1.3, which does not support supplementary characters to its fullest. Parameters: source - UTF16 format Unicode string which thereplacements will be based on. Parameters: oldStr - non-zero-length string to be replaced. Parameters: newStr - the new string to replace oldStr new String derived from source by replacing every occurrenceof oldStr with newStr. When no oldStr is foundin source, then source will be returned.

reverse
public static StringBuffer reverse(StringBuffer source)(Code)
	Reverses a UTF16 format Unicode string and replaces source's content with it. This method will reverse surrogate characters correctly, instead of blindly reversing every character. Examples: UTF16.reverse(new StringBuffer( "Supplementary characters \ud800\udc00\ud801\udc01")) returns "\ud801\udc01\ud800\udc00 sretcarahc yratnemelppuS". Parameters: source - the source StringBuffer that contains UTF16 formatUnicode string to be reversed a modified source with reversed UTF16 format Unicode string.

setCharAt
public static void setCharAt(StringBuffer target, int offset16, int char32)(Code)
	Set a code point into a UTF16 position. Adjusts target according if we are replacing a non-supplementary codepoint with a supplementary and vice versa. Parameters: target - stringbuffer Parameters: offset16 - UTF16 position to insert into Parameters: char32 - code point

setCharAt
public static int setCharAt(char target, int limit, int offset16, int char32)(Code)
	Set a code point into a UTF16 position in a char array. Adjusts target according if we are replacing a non-supplementary codepoint with a supplementary and vice versa. Parameters: target - char array Parameters: limit - numbers of valid chars in target, different fromtarget.length. limit counts the number of chars in targetthat represents a string, not the size of array target. Parameters: offset16 - UTF16 position to insert into Parameters: char32 - code point new number of chars in target that represents a string exception: IndexOutOfBoundsException - if offset16 is out of range

valueOf
public static String valueOf(int char32)(Code)
	Convenience method corresponding to String.valueOf(char). Returns a one or two char string containing the UTF-32 value in UTF16 format. If a validity check is required, use isLegal() on char32 before calling. Parameters: char32 - the input character. string value of char32 in UTF16 format exception: IllegalArgumentException - thrown if char32 is a invalidcodepoint.

valueOf
public static String valueOf(String source, int offset16)(Code)
	Convenience method corresponding to String.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String obtained by calling source.substring(..) with the appropriate indexes. Parameters: source - the input string. Parameters: offset16 - the UTF16 index to the codepoint in source string value of char32 in UTF16 format

valueOf
public static String valueOf(StringBuffer source, int offset16)(Code)
	Convenience method corresponding to StringBuffer.valueOf(codepoint at offset16). Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String obtained by calling source.substring(..) with the appropriate indexes. Parameters: source - the input string buffer. Parameters: offset16 - the UTF16 index to the codepoint in source string value of char32 in UTF16 format

valueOf
public static String valueOf(char source, int start, int limit, int offset16)(Code)
	Convenience method. Returns a one or two char string containing the UTF-32 value in UTF16 format. If offset16 indexes a surrogate character, the whole supplementary codepoint will be returned, except when either the leading or trailing surrogate character lies out of the specified subarray. In the latter case, only the surrogate character within bounds will be returned. If a validity check is required, use isLegal() on the codepoint at offset16 before calling. The result returned will be a newly created String containing the relevant characters. Parameters: source - the input char array. Parameters: start - start index of the subarray Parameters: limit - end index of the subarray Parameters: offset16 - the UTF16 index to the codepoint in source relative tostart string value of char32 in UTF16 format

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.