| java.lang.Object com.ibm.icu.text.UnicodeFilter com.ibm.icu.text.UnicodeSet
UnicodeSet | public class UnicodeSet extends UnicodeFilter implements Freezable(Code) | | A mutable set of Unicode characters and multicharacter strings. Objects of this class
represent character classes used in regular expressions.
A character specifies a subset of Unicode code points. Legal
code points are U+0000 to U+10FFFF, inclusive.
The UnicodeSet class is not designed to be subclassed.
UnicodeSet supports two APIs. The first is the
operand API that allows the caller to modify the value of
a UnicodeSet object. It conforms to Java 2's
java.util.Set interface, although
UnicodeSet does not actually implement that
interface. All methods of Set are supported, with the
modification that they take a character range or single character
instead of an Object , and they take a
UnicodeSet instead of a Collection . The
operand API may be thought of in terms of boolean logic: a boolean
OR is implemented by add , a boolean AND is implemented
by retain , a boolean XOR is implemented by
complement taking an argument, and a boolean NOT is
implemented by complement with no argument. In terms
of traditional set theory function names, add is a
union, retain is an intersection, remove
is an asymmetric difference, and complement with no
argument is a set complement with respect to the superset range
MIN_VALUE-MAX_VALUE
The second API is the
applyPattern() /toPattern() API from the
java.text.Format -derived classes. Unlike the
methods that add characters, add categories, and control the logic
of the set, the method applyPattern() sets all
attributes of a UnicodeSet at once, based on a
string pattern.
Pattern syntax
Patterns are accepted by the constructors and the
applyPattern() methods and returned by the
toPattern() method. These patterns follow a syntax
similar to that employed by version 8 regular expression character
classes. Here are some simple examples:
[] |
No characters |
[a] |
The character 'a' |
[ae] |
The characters 'a' and 'e' |
[a-e] |
The characters 'a' through 'e' inclusive, in Unicode code
point order |
[\\u4E01] |
The character U+4E01 |
[a{ab}{ac}] |
The character 'a' and the multicharacter strings "ab" and
"ac" |
[\p{Lu}] |
All characters in the general category Uppercase Letter |
Any character may be preceded by a backslash in order to remove any special
meaning. White space characters, as defined by UCharacterProperty.isRuleWhiteSpace(), are
ignored, unless they are escaped.
Property patterns specify a set of characters having a certain
property as defined by the Unicode standard. Both the POSIX-like
"[:Lu:]" and the Perl-like syntax "\p{Lu}" are recognized. For a
complete list of supported property patterns, see the User's Guide
for UnicodeSet at
http://icu.sourceforge.net/userguide/unicodeSet.html.
Actual determination of property data is defined by the underlying
Unicode database as implemented by UCharacter.
Patterns specify individual characters, ranges of characters, and
Unicode property sets. When elements are concatenated, they
specify their union. To complement a set, place a '^' immediately
after the opening '['. Property patterns are inverted by modifying
their delimiters; "[:^foo]" and "\P{foo}". In any other location,
'^' has no special meaning.
Ranges are indicated by placing two a '-' between two
characters, as in "a-z". This specifies the range of all
characters from the left to the right, in Unicode order. If the
left character is greater than or equal to the
right character it is a syntax error. If a '-' occurs as the first
character after the opening '[' or '[^', or if it occurs as the
last character before the closing ']', then it is taken as a
literal. Thus "[a\\-b]", "[-ab]", and "[ab-]" all indicate the same
set of three characters, 'a', 'b', and '-'.
Sets may be intersected using the '&' operator or the asymmetric
set difference may be taken using the '-' operator, for example,
"[[:L:]&[\\u0000-\\u0FFF]]" indicates the set of all Unicode letters
with values less than 4096. Operators ('&' and '|') have equal
precedence and bind left-to-right. Thus
"[[:L:]-[a-z]-[\\u0100-\\u01FF]]" is equivalent to
"[[[:L:]-[a-z]]-[\\u0100-\\u01FF]]". This only really matters for
difference; intersection is commutative.
[a] | The set containing 'a'
| [a-z] | The set containing 'a'
through 'z' and all letters in between, in Unicode order
| [^a-z] | The set containing
all characters but 'a' through 'z',
that is, U+0000 through 'a'-1 and 'z'+1 through U+10FFFF
| [[pat1][pat2]]
| The union of sets specified by pat1 and pat2
| [[pat1]&[pat2]]
| The intersection of sets specified by pat1 and pat2
| [[pat1]-[pat2]]
| The asymmetric difference of sets specified by pat1 and
pat2
| [:Lu:] or \p{Lu}
| The set of characters having the specified
Unicode property; in
this case, Unicode uppercase letters
| [:^Lu:] or \P{Lu}
| The set of characters not having the given
Unicode property
|
Warning: you cannot add an empty string ("") to a UnicodeSet.
Formal syntax
pattern := |
('[' '^'? item* ']') |
property |
item := |
char | (char '-' char) | pattern-expr
|
pattern-expr := |
pattern | pattern-expr pattern |
pattern-expr op pattern
|
op := |
'&' | '-'
|
special := |
'[' | ']' | '-'
|
char := |
any character that is not special
| ('\\' any character)
| ('\u' hex hex hex hex)
|
hex := |
any character for which
Character.digit(c, 16)
returns a non-negative result |
property := |
a Unicode property set pattern |
Legend:
a := b |
|
a may be replaced by b |
a? |
|
zero or one instance of a
|
a* |
|
one or more instances of a
|
a | b |
|
either a or b
|
'a' |
|
the literal string between the quotes |
|
To iterate over contents of UnicodeSet, use UnicodeSetIterator class.
author: Alan Liu See Also: UnicodeSetIterator |
Inner Class :abstract public static class XSymbolTable implements SymbolTable | |
Field Summary | |
final public static int | ADD_CASE_MAPPINGS Bitmask for constructor, applyPattern(), and closeOver()
indicating letter case. | final public static int | CASE Bitmask for constructor, applyPattern(), and closeOver()
indicating letter case. | final public static int | CASE_INSENSITIVE Alias for UnicodeSet.CASE, for ease of porting from C++ where ICU4C
also has both USET_CASE and USET_CASE_INSENSITIVE (see uset.h). | final public static int | IGNORE_SPACE Bitmask for constructor and applyPattern() indicating that
white space should be ignored. | final public static int | MAX_VALUE Maximum value that can be stored in a UnicodeSet. | final public static int | MIN_VALUE Minimum value that can be stored in a UnicodeSet. | final static VersionInfo | NO_VERSION | TreeSet | strings |
Method Summary | |
public StringBuffer | _generatePattern(StringBuffer result, boolean escapeUnprintable) Generate and append a string representation of this set to result. | public StringBuffer | _generatePattern(StringBuffer result, boolean escapeUnprintable, boolean includeStrings) Generate and append a string representation of this set to result. | public UnicodeSet | add(int start, int end) Adds the specified range to this set if it is not already
present. | final public UnicodeSet | add(int c) Adds the specified character to this set if it is not already
present. | final public UnicodeSet | add(String s) Adds the specified multicharacter to this set if it is not already
present. | final public UnicodeSet | addAll(String s) Adds each of the characters in this string to the set. | public UnicodeSet | addAll(UnicodeSet c) Adds all of the elements in the specified set to this set if
they're not already present. | public void | addAll(Collection source) Add the contents of the collection (as strings) into this UnicodeSet. | public void | addAllTo(Collection target) Add the contents of the UnicodeSet (as strings) into a collection. | public void | addMatchSetTo(UnicodeSet toUnionTo) Implementation of UnicodeMatcher API. | public UnicodeSet | applyIntPropertyValue(int prop, int value) Modifies this set to contain those code points which have the
given value for the given binary or enumerated property, as
returned by UCharacter.getIntPropertyValue. | final public UnicodeSet | applyPattern(String pattern) Modifies this set to represent the set specified by the given pattern. | public UnicodeSet | applyPattern(String pattern, boolean ignoreWhitespace) Modifies this set to represent the set specified by the given pattern,
optionally ignoring whitespace. | public UnicodeSet | applyPattern(String pattern, int options) Modifies this set to represent the set specified by the given pattern,
optionally ignoring whitespace. | UnicodeSet | applyPattern(String pattern, ParsePosition pos, SymbolTable symbols, int options) Parses the given pattern, starting at the given position. | void | applyPattern(RuleCharacterIterator chars, SymbolTable symbols, StringBuffer rebuiltPat, int options) Parse the pattern from the given RuleCharacterIterator. | public UnicodeSet | applyPropertyAlias(String propertyAlias, String valueAlias) Modifies this set to contain those code points which have the
given value for the given property. | public UnicodeSet | applyPropertyAlias(String propertyAlias, String valueAlias, SymbolTable symbols) Modifies this set to contain those code points which have the
given value for the given property. | public int | charAt(int index) Returns the character at the given index within this set, where
the set is ordered by ascending code point. | public UnicodeSet | clear() Removes all of the elements from this set. | public Object | clone() Return a new set that is equivalent to this one. | public Object | cloneAsThawed() Clone a thawed version of this class, according to the Freezable interface. | public UnicodeSet | closeOver(int attribute) Close this set over the given attribute. | public UnicodeSet | compact() Reallocate this objects internal structures to take up the least
possible space, without changing this object's value. | public UnicodeSet | complement(int start, int end) Complements the specified range in this set. | final public UnicodeSet | complement(int c) Complements the specified character in this set. | public UnicodeSet | complement() This is equivalent to
complement(MIN_VALUE, MAX_VALUE) . | final public UnicodeSet | complement(String s) Complement the specified string in this set. | final public UnicodeSet | complementAll(String s) Complement EACH of the characters in this string. | public UnicodeSet | complementAll(UnicodeSet c) Complements in this set all elements contained in the specified
set. | public boolean | contains(int c) Returns true if this set contains the given character. | public boolean | contains(int start, int end) Returns true if this set contains every character
of the given range. | final public boolean | contains(String s) Returns true if this set contains the given
multicharacter string. | public boolean | containsAll(UnicodeSet c) Returns true if this set contains all the characters and strings
of the given set. | public boolean | containsAll(String s) Returns true if there is a partition of the string such that this set contains each of the partitioned strings. | public boolean | containsNone(int start, int end) Returns true if this set contains none of the characters
of the given range. | public boolean | containsNone(UnicodeSet c) Returns true if none of the characters or strings in this UnicodeSet appears in the string. | public boolean | containsNone(String s) Returns true if this set contains none of the characters
of the given string. | final public boolean | containsSome(int start, int end) Returns true if this set contains one or more of the characters
in the given range. | final public boolean | containsSome(UnicodeSet s) Returns true if this set contains one or more of the characters
and strings of the given set. | final public boolean | containsSome(String s) Returns true if this set contains one or more of the characters
of the given string. | public boolean | equals(Object o) Compares the specified object with this set for equality. | public Object | freeze() Freeze this class, according to the Freezable interface. | public static UnicodeSet | from(String s) Makes a set from a multicharacter string. | public static UnicodeSet | fromAll(String s) Makes a set from each of the characters in the string. | public int | getRangeCount() Iteration method that returns the number of ranges contained in
this set. | public int | getRangeEnd(int index) Iteration method that returns the last character in the
specified range of this set. | public int | getRangeStart(int index) Iteration method that returns the first character in the
specified range of this set. | public String | getRegexEquivalent() | public int | hashCode() Returns the hash code value for this set. | public int | indexOf(int c) Returns the index of the given character within this set, where
the set is ordered by ascending code point. | public boolean | isEmpty() Returns true if this set contains no elements. | public boolean | isFrozen() | public int | matches(Replaceable text, int[] offset, int limit, boolean incremental) Implementation of UnicodeMatcher.matches(). | public int | matchesAt(CharSequence text, int offset) Tests whether the text matches at the offset. | public boolean | matchesIndexValue(int v) Implementation of UnicodeMatcher API. | public UnicodeSet | remove(int start, int end) Removes the specified range from this set if it is present.
The set will not contain the specified range once the call
returns. | final public UnicodeSet | remove(int c) Removes the specified character from this set if it is present. | final public UnicodeSet | remove(String s) Removes the specified string from this set if it is present. | final public UnicodeSet | removeAll(String s) Remove EACH of the characters in this string. | public UnicodeSet | removeAll(UnicodeSet c) Removes from this set all of its elements that are contained in the
specified set. | public static boolean | resemblesPattern(String pattern, int pos) Return true if the given position, in the given pattern, appears
to be the start of a UnicodeSet pattern. | public UnicodeSet | retain(int start, int end) Retain only the elements in this set that are contained in the
specified range. | final public UnicodeSet | retain(int c) Retain the specified character from this set if it is present. | final public UnicodeSet | retain(String s) Retain the specified string in this set if it is present. | final public UnicodeSet | retainAll(String s) Retains EACH of the characters in this string. | public UnicodeSet | retainAll(UnicodeSet c) Retains only the elements in this set that are contained in the
specified set. | public UnicodeSet | set(int start, int end) Make this object represent the range start - end . | public UnicodeSet | set(UnicodeSet other) Make this object represent the same set as other . | public int | size() Returns the number of elements in this set (its cardinality)
Note than the elements of a set may include both individual
codepoints and strings. | public String | toPattern(boolean escapeUnprintable) Returns a string representation of this set. | public String | toString() Return a programmer-readable string representation of this object. |
ADD_CASE_MAPPINGS | final public static int ADD_CASE_MAPPINGS(Code) | | Bitmask for constructor, applyPattern(), and closeOver()
indicating letter case. This may be ORed together with other
selectors.
Enable case insensitive matching. E.g., "[ab]" with this flag
will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will
match all except 'a', 'A', 'b', and 'B'. This adds the lower-,
title-, and uppercase mappings as well as the case folding
of each existing element in the set.
|
CASE | final public static int CASE(Code) | | Bitmask for constructor, applyPattern(), and closeOver()
indicating letter case. This may be ORed together with other
selectors.
Enable case insensitive matching. E.g., "[ab]" with this flag
will match 'a', 'A', 'b', and 'B'. "[^ab]" with this flag will
match all except 'a', 'A', 'b', and 'B'. This performs a full
closure over case mappings, e.g. U+017F for s.
The resulting set is a superset of the input for the code points but
not for the strings.
It performs a case mapping closure of the code points and adds
full case folding strings for the code points, and reduces strings of
the original set to their full case folding equivalents.
This is designed for case-insensitive matches, for example
in regular expressions. The full code point case closure allows checking of
an input character directly against the closure set.
Strings are matched by comparing the case-folded form from the closure
set with an incremental case folding of the string in question.
The closure set will also contain single code points if the original
set contained case-equivalent strings (like U+00DF for "ss" or "Ss" etc.).
This is not necessary (that is, redundant) for the above matching method
but results in the same closure sets regardless of whether the original
set contained the code point or a string.
|
CASE_INSENSITIVE | final public static int CASE_INSENSITIVE(Code) | | Alias for UnicodeSet.CASE, for ease of porting from C++ where ICU4C
also has both USET_CASE and USET_CASE_INSENSITIVE (see uset.h).
See Also: UnicodeSet.CASE |
IGNORE_SPACE | final public static int IGNORE_SPACE(Code) | | Bitmask for constructor and applyPattern() indicating that
white space should be ignored. If set, ignore characters for
which UCharacterProperty.isRuleWhiteSpace() returns true,
unless they are quoted or escaped. This may be ORed together
with other selectors.
|
MAX_VALUE | final public static int MAX_VALUE(Code) | | Maximum value that can be stored in a UnicodeSet.
|
MIN_VALUE | final public static int MIN_VALUE(Code) | | Minimum value that can be stored in a UnicodeSet.
|
UnicodeSet | public UnicodeSet()(Code) | | Constructs an empty set.
|
UnicodeSet | public UnicodeSet(UnicodeSet other)(Code) | | Constructs a copy of an existing set.
|
UnicodeSet | public UnicodeSet(int start, int end)(Code) | | Constructs a set containing the given range. If end >
start then an empty set is created.
Parameters: start - first character, inclusive, of range Parameters: end - last character, inclusive, of range |
UnicodeSet | public UnicodeSet(String pattern)(Code) | | Constructs a set from the given pattern. See the class description
for the syntax of the pattern language. Whitespace is ignored.
Parameters: pattern - a string specifying what characters are in the set exception: java.lang.IllegalArgumentException - if the pattern containsa syntax error. |
UnicodeSet | public UnicodeSet(String pattern, boolean ignoreWhitespace)(Code) | | Constructs a set from the given pattern. See the class description
for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: ignoreWhitespace - if true, ignore characters for whichUCharacterProperty.isRuleWhiteSpace() returns true exception: java.lang.IllegalArgumentException - if the pattern containsa syntax error. |
UnicodeSet | public UnicodeSet(String pattern, int options)(Code) | | Constructs a set from the given pattern. See the class description
for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: options - a bitmask indicating which options to apply.Valid options are IGNORE_SPACE and CASE. exception: java.lang.IllegalArgumentException - if the pattern containsa syntax error. |
UnicodeSet | public UnicodeSet(String pattern, ParsePosition pos, SymbolTable symbols)(Code) | | Constructs a set from the given pattern. See the class description
for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: pos - on input, the position in pattern at which to start parsing.On output, the position after the last character parsed. Parameters: symbols - a symbol table mapping variables to char[] arraysand chars to UnicodeSets exception: java.lang.IllegalArgumentException - if the patterncontains a syntax error. |
UnicodeSet | public UnicodeSet(String pattern, ParsePosition pos, SymbolTable symbols, int options)(Code) | | Constructs a set from the given pattern. See the class description
for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: pos - on input, the position in pattern at which to start parsing.On output, the position after the last character parsed. Parameters: symbols - a symbol table mapping variables to char[] arraysand chars to UnicodeSets Parameters: options - a bitmask indicating which options to apply.Valid options are IGNORE_SPACE and CASE. exception: java.lang.IllegalArgumentException - if the patterncontains a syntax error. |
_generatePattern | public StringBuffer _generatePattern(StringBuffer result, boolean escapeUnprintable)(Code) | | Generate and append a string representation of this set to result.
This does not use this.pat, the cleaned up copy of the string
passed to applyPattern().
Parameters: result - the buffer into which to generate the pattern Parameters: escapeUnprintable - escape unprintable characters if true |
_generatePattern | public StringBuffer _generatePattern(StringBuffer result, boolean escapeUnprintable, boolean includeStrings)(Code) | | Generate and append a string representation of this set to result.
This does not use this.pat, the cleaned up copy of the string
passed to applyPattern().
Parameters: includeStrings - if false, doesn't include the strings. |
add | public UnicodeSet add(int start, int end)(Code) | | Adds the specified range to this set if it is not already
present. If this set already contains the specified range,
the call leaves this set unchanged. If end > start
then an empty range is added, leaving the set unchanged.
Parameters: start - first character, inclusive, of range to be addedto this set. Parameters: end - last character, inclusive, of range to be addedto this set. |
add | final public UnicodeSet add(int c)(Code) | | Adds the specified character to this set if it is not already
present. If this set already contains the specified character,
the call leaves this set unchanged.
|
add | final public UnicodeSet add(String s)(Code) | | Adds the specified multicharacter to this set if it is not already
present. If this set already contains the multicharacter,
the call leaves this set unchanged.
Thus "ch" => {"ch"}
Warning: you cannot add an empty string ("") to a UnicodeSet.
Parameters: s - the source string this object, for chaining |
addAll | final public UnicodeSet addAll(String s)(Code) | | Adds each of the characters in this string to the set. Thus "ch" => {"c", "h"}
If this set already any particular character, it has no effect on that character.
Parameters: s - the source string this object, for chaining |
addAll | public UnicodeSet addAll(UnicodeSet c)(Code) | | Adds all of the elements in the specified set to this set if
they're not already present. This operation effectively
modifies this set so that its value is the union of the two
sets. The behavior of this operation is unspecified if the specified
collection is modified while the operation is in progress.
Parameters: c - set whose elements are to be added to this set. |
addAll | public void addAll(Collection source)(Code) | | Add the contents of the collection (as strings) into this UnicodeSet.
Parameters: source - the collection to add |
addAllTo | public void addAllTo(Collection target)(Code) | | Add the contents of the UnicodeSet (as strings) into a collection.
Parameters: target - collection to add into |
addMatchSetTo | public void addMatchSetTo(UnicodeSet toUnionTo)(Code) | | Implementation of UnicodeMatcher API. Union the set of all
characters that may be matched by this object into the given
set.
Parameters: toUnionTo - the set into which to union the source characters |
applyIntPropertyValue | public UnicodeSet applyIntPropertyValue(int prop, int value)(Code) | | Modifies this set to contain those code points which have the
given value for the given binary or enumerated property, as
returned by UCharacter.getIntPropertyValue. Prior contents of
this set are lost.
Parameters: prop - a property in the rangeUProperty.BIN_START..UProperty.BIN_LIMIT-1 orUProperty.INT_START..UProperty.INT_LIMIT-1 or.UProperty.MASK_START..UProperty.MASK_LIMIT-1. Parameters: value - a value in the rangeUCharacter.getIntPropertyMinValue(prop)..UCharacter.getIntPropertyMaxValue(prop), with one exception.If prop is UProperty.GENERAL_CATEGORY_MASK, then value should not bea UCharacter.getType() result, but rather a mask value producedby logically ORing (1 << UCharacter.getType()) values together.This allows grouped categories such as [:L:] to be represented. a reference to this set |
applyPattern | final public UnicodeSet applyPattern(String pattern)(Code) | | Modifies this set to represent the set specified by the given pattern.
See the class description for the syntax of the pattern language.
Whitespace is ignored.
Parameters: pattern - a string specifying what characters are in the set exception: java.lang.IllegalArgumentException - if the patterncontains a syntax error. |
applyPattern | public UnicodeSet applyPattern(String pattern, boolean ignoreWhitespace)(Code) | | Modifies this set to represent the set specified by the given pattern,
optionally ignoring whitespace.
See the class description for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: ignoreWhitespace - if true then characters for whichUCharacterProperty.isRuleWhiteSpace() returns true are ignored exception: java.lang.IllegalArgumentException - if the patterncontains a syntax error. |
applyPattern | public UnicodeSet applyPattern(String pattern, int options)(Code) | | Modifies this set to represent the set specified by the given pattern,
optionally ignoring whitespace.
See the class description for the syntax of the pattern language.
Parameters: pattern - a string specifying what characters are in the set Parameters: options - a bitmask indicating which options to apply.Valid options are IGNORE_SPACE and CASE. exception: java.lang.IllegalArgumentException - if the patterncontains a syntax error. |
applyPattern | UnicodeSet applyPattern(String pattern, ParsePosition pos, SymbolTable symbols, int options)(Code) | | Parses the given pattern, starting at the given position. The character
at pattern.charAt(pos.getIndex()) must be '[', or the parse fails.
Parsing continues until the corresponding closing ']'. If a syntax error
is encountered between the opening and closing brace, the parse fails.
Upon return from a successful parse, the ParsePosition is updated to
point to the character following the closing ']', and an inversion
list for the parsed pattern is returned. This method
calls itself recursively to parse embedded subpatterns.
Parameters: pattern - the string containing the pattern to be parsed. Theportion of the string from pos.getIndex(), which must be a '[', to thecorresponding closing ']', is parsed. Parameters: pos - upon entry, the position at which to being parsing. Thecharacter at pattern.charAt(pos.getIndex()) must be a '['. Upon returnfrom a successful parse, pos.getIndex() is either the character after theclosing ']' of the parsed pattern, or pattern.length() if the closing ']'is the last character of the pattern string. an inversion list for the parsed substringof pattern exception: java.lang.IllegalArgumentException - if the parse fails. |
applyPattern | void applyPattern(RuleCharacterIterator chars, SymbolTable symbols, StringBuffer rebuiltPat, int options)(Code) | | Parse the pattern from the given RuleCharacterIterator. The
iterator is advanced over the parsed pattern.
Parameters: chars - iterator over the pattern characters. Upon returnit will be advanced to the first character after the parsedpattern, or the end of the iteration if all characters areparsed. Parameters: symbols - symbol table to use to parse and dereferencevariables, or null if none. Parameters: rebuiltPat - the pattern that was parsed, rebuilt orcopied from the input pattern, as appropriate. Parameters: options - a bit mask of zero or more of the following:IGNORE_SPACE, CASE. |
applyPropertyAlias | public UnicodeSet applyPropertyAlias(String propertyAlias, String valueAlias)(Code) | | Modifies this set to contain those code points which have the
given value for the given property. Prior contents of this
set are lost.
Parameters: propertyAlias - a property alias, either short or long.The name is matched loosely. See PropertyAliases.txt for namesand a description of loose matching. If the value string isempty, then this string is interpreted as either aGeneral_Category value alias, a Script value alias, a binaryproperty alias, or a special ID. Special IDs are matchedloosely and correspond to the following sets:"ANY" = [\u0000-\U0010FFFF],"ASCII" = [\u0000-\u007F]. Parameters: valueAlias - a value alias, either short or long. Thename is matched loosely. See PropertyValueAliases.txt fornames and a description of loose matching. In addition toaliases listed, numeric values and canonical combining classesmay be expressed numerically, e.g., ("nv", "0.5") or ("ccc","220"). The value string may also be empty. a reference to this set |
applyPropertyAlias | public UnicodeSet applyPropertyAlias(String propertyAlias, String valueAlias, SymbolTable symbols)(Code) | | Modifies this set to contain those code points which have the
given value for the given property. Prior contents of this
set are lost.
Parameters: propertyAlias - Parameters: valueAlias - Parameters: symbols - if not null, then symbols are first called to see if a propertyis available. If true, then everything else is skipped. this set |
charAt | public int charAt(int index)(Code) | | Returns the character at the given index within this set, where
the set is ordered by ascending code point. If the index is
out of range, return -1. The inverse of this method is
indexOf() .
Parameters: index - an index from 0..size()-1 the character at the given index, or -1. |
clear | public UnicodeSet clear()(Code) | | Removes all of the elements from this set. This set will be
empty after this call returns.
|
clone | public Object clone()(Code) | | Return a new set that is equivalent to this one.
|
cloneAsThawed | public Object cloneAsThawed()(Code) | | Clone a thawed version of this class, according to the Freezable interface.
this |
closeOver | public UnicodeSet closeOver(int attribute)(Code) | | Close this set over the given attribute. For the attribute
CASE, the result is to modify this set so that:
1. For each character or string 'a' in this set, all strings
'b' such that foldCase(a) == foldCase(b) are added to this set.
(For most 'a' that are single characters, 'b' will have
b.length() == 1.)
2. For each string 'e' in the resulting set, if e !=
foldCase(e), 'e' will be removed.
Example: [aq\u00DF{Bc}{bC}{Fi}] => [aAqQ\u00DF\uFB01{ss}{bc}{fi}]
(Here foldCase(x) refers to the operation
UCharacter.foldCase(x, true), and a == b actually denotes
a.equals(b), not pointer comparison.)
Parameters: attribute - bitmask for attributes to close over.Currently only the CASE bit is supported. Any undefined bitsare ignored. a reference to this set. |
compact | public UnicodeSet compact()(Code) | | Reallocate this objects internal structures to take up the least
possible space, without changing this object's value.
|
complement | public UnicodeSet complement(int start, int end)(Code) | | Complements the specified range in this set. Any character in
the range will be removed if it is in this set, or will be
added if it is not in this set. If end > start
then an empty range is complemented, leaving the set unchanged.
Parameters: start - first character, inclusive, of range to be removedfrom this set. Parameters: end - last character, inclusive, of range to be removedfrom this set. |
complement | final public UnicodeSet complement(int c)(Code) | | Complements the specified character in this set. The character
will be removed if it is in this set, or will be added if it is
not in this set.
|
complement | public UnicodeSet complement()(Code) | | This is equivalent to
complement(MIN_VALUE, MAX_VALUE) .
|
complement | final public UnicodeSet complement(String s)(Code) | | Complement the specified string in this set.
The set will not contain the specified string once the call
returns.
Warning: you cannot add an empty string ("") to a UnicodeSet.
Parameters: s - the string to complement this object, for chaining |
complementAll | final public UnicodeSet complementAll(String s)(Code) | | Complement EACH of the characters in this string. Note: "ch" == {"c", "h"}
If this set already any particular character, it has no effect on that character.
Parameters: s - the source string this object, for chaining |
complementAll | public UnicodeSet complementAll(UnicodeSet c)(Code) | | Complements in this set all elements contained in the specified
set. Any character in the other set will be removed if it is
in this set, or will be added if it is not in this set.
Parameters: c - set that defines which elements will be complemented fromthis set. |
contains | public boolean contains(int c)(Code) | | Returns true if this set contains the given character.
Parameters: c - character to be checked for containment true if the test condition is met |
contains | public boolean contains(int start, int end)(Code) | | Returns true if this set contains every character
of the given range.
Parameters: start - first character, inclusive, of the range Parameters: end - last character, inclusive, of the range true if the test condition is met |
contains | final public boolean contains(String s)(Code) | | Returns true if this set contains the given
multicharacter string.
Parameters: s - string to be checked for containment true if this set contains the specified string |
containsAll | public boolean containsAll(UnicodeSet c)(Code) | | Returns true if this set contains all the characters and strings
of the given set.
Parameters: c - set to be checked for containment true if the test condition is met |
containsAll | public boolean containsAll(String s)(Code) | | Returns true if there is a partition of the string such that this set contains each of the partitioned strings.
For example, for the Unicode set [a{bc}{cd}]
containsAll is true for each of: "a", "bc", ""cdbca"
containsAll is false for each of: "acb", "bcda", "bcx"
Parameters: s - string containing characters to be checked for containment true if the test condition is met |
containsNone | public boolean containsNone(int start, int end)(Code) | | Returns true if this set contains none of the characters
of the given range.
Parameters: start - first character, inclusive, of the range Parameters: end - last character, inclusive, of the range true if the test condition is met |
containsNone | public boolean containsNone(UnicodeSet c)(Code) | | Returns true if none of the characters or strings in this UnicodeSet appears in the string.
For example, for the Unicode set [a{bc}{cd}]
containsNone is true for: "xy", "cb"
containsNone is false for: "a", "bc", "bcd"
Parameters: c - set to be checked for containment true if the test condition is met |
containsNone | public boolean containsNone(String s)(Code) | | Returns true if this set contains none of the characters
of the given string.
Parameters: s - string containing characters to be checked for containment true if the test condition is met |
containsSome | final public boolean containsSome(int start, int end)(Code) | | Returns true if this set contains one or more of the characters
in the given range.
Parameters: start - first character, inclusive, of the range Parameters: end - last character, inclusive, of the range true if the condition is met |
containsSome | final public boolean containsSome(UnicodeSet s)(Code) | | Returns true if this set contains one or more of the characters
and strings of the given set.
Parameters: s - set to be checked for containment true if the condition is met |
containsSome | final public boolean containsSome(String s)(Code) | | Returns true if this set contains one or more of the characters
of the given string.
Parameters: s - string containing characters to be checked for containment true if the condition is met |
equals | public boolean equals(Object o)(Code) | | Compares the specified object with this set for equality. Returns
true if the specified object is also a set, the two sets
have the same size, and every member of the specified set is
contained in this set (or equivalently, every member of this set is
contained in the specified set).
Parameters: o - Object to be compared for equality with this set. true if the specified Object is equal to this set. |
freeze | public Object freeze()(Code) | | Freeze this class, according to the Freezable interface.
this |
from | public static UnicodeSet from(String s)(Code) | | Makes a set from a multicharacter string. Thus "ch" => {"ch"}
Warning: you cannot add an empty string ("") to a UnicodeSet.
Parameters: s - the source string a newly created set containing the given string |
fromAll | public static UnicodeSet fromAll(String s)(Code) | | Makes a set from each of the characters in the string. Thus "ch" => {"c", "h"}
Parameters: s - the source string a newly created set containing the given characters |
getRegexEquivalent | public String getRegexEquivalent()(Code) | | regex pattern equivalent to this UnicodeSet |
indexOf | public int indexOf(int c)(Code) | | Returns the index of the given character within this set, where
the set is ordered by ascending code point. If the character
is not in this set, return -1. The inverse of this method is
charAt() .
an index from 0..size()-1, or -1 |
isEmpty | public boolean isEmpty()(Code) | | Returns true if this set contains no elements.
true if this set contains no elements. |
isFrozen | public boolean isFrozen()(Code) | | Is this frozen, according to the Freezable interface?
value |
matches | public int matches(Replaceable text, int[] offset, int limit, boolean incremental)(Code) | | Implementation of UnicodeMatcher.matches(). Always matches the
longest possible multichar string.
|
matchesAt | public int matchesAt(CharSequence text, int offset)(Code) | | Tests whether the text matches at the offset. If so, returns the end of the longest substring that it matches. If not, returns -1. For now, an internal routine.
|
matchesIndexValue | public boolean matchesIndexValue(int v)(Code) | | Implementation of UnicodeMatcher API. Returns true if
this set contains any character whose low byte is the given
value. This is used by RuleBasedTransliterator for
indexing.
|
remove | public UnicodeSet remove(int start, int end)(Code) | | Removes the specified range from this set if it is present.
The set will not contain the specified range once the call
returns. If end > start then an empty range is
removed, leaving the set unchanged.
Parameters: start - first character, inclusive, of range to be removedfrom this set. Parameters: end - last character, inclusive, of range to be removedfrom this set. |
remove | final public UnicodeSet remove(int c)(Code) | | Removes the specified character from this set if it is present.
The set will not contain the specified character once the call
returns.
Parameters: c - the character to be removed this object, for chaining |
remove | final public UnicodeSet remove(String s)(Code) | | Removes the specified string from this set if it is present.
The set will not contain the specified string once the call
returns.
Parameters: s - the string to be removed this object, for chaining |
removeAll | final public UnicodeSet removeAll(String s)(Code) | | Remove EACH of the characters in this string. Note: "ch" == {"c", "h"}
If this set already any particular character, it has no effect on that character.
Parameters: s - the source string this object, for chaining |
removeAll | public UnicodeSet removeAll(UnicodeSet c)(Code) | | Removes from this set all of its elements that are contained in the
specified set. This operation effectively modifies this
set so that its value is the asymmetric set difference of
the two sets.
Parameters: c - set that defines which elements will be removed fromthis set. |
resemblesPattern | public static boolean resemblesPattern(String pattern, int pos)(Code) | | Return true if the given position, in the given pattern, appears
to be the start of a UnicodeSet pattern.
|
retain | public UnicodeSet retain(int start, int end)(Code) | | Retain only the elements in this set that are contained in the
specified range. If end > start then an empty range is
retained, leaving the set empty.
Parameters: start - first character, inclusive, of range to be retainedto this set. Parameters: end - last character, inclusive, of range to be retainedto this set. |
retain | final public UnicodeSet retain(int c)(Code) | | Retain the specified character from this set if it is present.
Upon return this set will be empty if it did not contain c, or
will only contain c if it did contain c.
Parameters: c - the character to be retained this object, for chaining |
retain | final public UnicodeSet retain(String s)(Code) | | Retain the specified string in this set if it is present.
Upon return this set will be empty if it did not contain s, or
will only contain s if it did contain s.
Parameters: s - the string to be retained this object, for chaining |
retainAll | final public UnicodeSet retainAll(String s)(Code) | | Retains EACH of the characters in this string. Note: "ch" == {"c", "h"}
If this set already any particular character, it has no effect on that character.
Parameters: s - the source string this object, for chaining |
retainAll | public UnicodeSet retainAll(UnicodeSet c)(Code) | | Retains only the elements in this set that are contained in the
specified set. In other words, removes from this set all of
its elements that are not contained in the specified set. This
operation effectively modifies this set so that its value is
the intersection of the two sets.
Parameters: c - set that defines which elements this set will retain. |
set | public UnicodeSet set(int start, int end)(Code) | | Make this object represent the range start - end .
If end > start then this object is set to an
an empty range.
Parameters: start - first character in the set, inclusive Parameters: end - last character in the set, inclusive |
set | public UnicodeSet set(UnicodeSet other)(Code) | | Make this object represent the same set as other .
Parameters: other - a UnicodeSet whose value will becopied to this object |
size | public int size()(Code) | | Returns the number of elements in this set (its cardinality)
Note than the elements of a set may include both individual
codepoints and strings.
the number of elements in this set (its cardinality). |
toPattern | public String toPattern(boolean escapeUnprintable)(Code) | | Returns a string representation of this set. If the result of
calling this function is passed to a UnicodeSet constructor, it
will produce another set that is equal to this one.
|
toString | public String toString()(Code) | | Return a programmer-readable string representation of this object.
|
|
|