com.ibm.icu.text |
Extensions and enhancements to java.text to support unicode transforms, UnicodeSet, surrogate char utilities, UCA collation, normalization, break iteration (rule and dictionary based), enhanced number format, international string searching, and arabic shaping.
- Unicode Transforms (Transliteration) convert between different representations of unicode text.
- UnicodeSet provides set operations on unicode characters and strings, and are representable as compact expressions.
- Surrogate character utilities (UTF16) provides indexing and substring operations on text containing surrogates.
- UCA collation implements the current Unicode Collation Algorithm. Data for many locale-specific collations is provided.
- Normalization supports standard Unicode normalization forms as well as FCD.
- Break iteration supports character, word, line, sentences, and title case breaks.
- Enhanced number format to 'spell out' numbers, padding and rounding control.
- Internationally sensitive string searching, uses collation.
- Arabic shaping converts between shaped and unshaped arabic characters and digits.
|
Java Source File Name | Type | Comment |
AnyTransliterator.java | Class | A transliterator that translates multiple input scripts to a single
output script. |
ArabicShaping.java | Class | Shape Arabic text on a character basis.
ArabicShaping performs basic operations for "shaping" Arabic text. |
ArabicShapingException.java | Class | Thrown by ArabicShaping when there is a shaping error. |
BreakDictionary.java | Class | This is the class that represents the list of known words used by
DictionaryBasedBreakIterator. |
BreakIterator.java | Class | A class that locates boundaries in text. |
BreakIteratorFactory.java | Class | |
BreakTransliterator.java | Class | Inserts the specified characters at word breaks. |
CanonicalIterator.java | Class | This class allows one to iterate through all the strings that are canonically equivalent to a given
string. |
CharsetDetector.java | Class | CharsetDetector provides a facility for detecting the
charset or encoding of character data in an unknown format.
The input data can either be from an input stream or an array of bytes.
The result of the detection operation is a list of possibly matching
charsets, or, for simple use, you can just ask for a Java Reader that
will will work over the input data.
Character set detection is at best an imprecise operation. |
CharsetMatch.java | Class | This class represents a charset that has been identified by a CharsetDetector
as a possible encoding for a set of input data. |
CharsetRecognizer.java | Class | Abstract class for recognizing a single charset.
Part of the implementation of ICU's CharsetDetector.
Each specific charset that can be recognized will have an instance
of some subclass of this class. |
CharsetRecog_2022.java | Class | class CharsetRecog_2022 part of the ICU charset detection imlementation. |
CharsetRecog_mbcs.java | Class | CharsetRecognizer implemenation for Asian - double or multi-byte - charsets.
Match is determined mostly by the input data adhering to the
encoding scheme for the charset, and, optionally,
frequency-of-occurence of characters.
Instances of this class are singletons, one per encoding
being recognized. |
CharsetRecog_sbcs.java | Class | This class recognizes single-byte encodings. |
CharsetRecog_Unicode.java | Class | This class matches UTF-16 and UTF-32, both big- and little-endian. |
CharsetRecog_UTF8.java | Class | |
ChineseDateFormat.java | Class | A concrete
DateFormat for
com.ibm.icu.util.ChineseCalendar .
This class handles a ChineseCalendar -specific field,
ChineseCalendar.IS_LEAP_MONTH . |
ChineseDateFormatSymbols.java | Class | A subclass of
DateFormatSymbols for
ChineseDateFormat . |
CollationElementIterator.java | Class | CollationElementIterator is an iterator created by
a RuleBasedCollator to walk through a string.
|
CollationKey.java | Class | A CollationKey represents a String
under the rules of a specific Collator
object. |
CollationParsedRuleBuilder.java | Class | Class for building a collator from a list of collation rules. |
CollationRuleParser.java | Class | |
Collator.java | Class | Collator performs locale-sensitive string comparison. |
CollatorReader.java | Class | Internal reader class for ICU data file uca.icu containing
Unicode Collation Algorithm data.
This class simply reads uca.icu, authenticates that it is a valid
ICU data file and split its contents up into blocks of data for use in
com.ibm.icu.text.Collator. |
CollatorServiceShim.java | Class | |
ComposedCharIter.java | Class | ComposedCharIter is an iterator class that returns all
of the precomposed characters defined in the Unicode standard, along
with their decomposed forms. |
CompoundTransliterator.java | Class | A transliterator that is composed of two or more other
transliterator objects linked together. |
CurrencyFormat.java | Class | Temporary internal concrete subclass of MeasureFormat implementing
parsing and formatting of CurrencyAmount objects. |
DateFormat.java | Class | DateFormat is an abstract class for date/time formatting subclasses which
formats and parses dates or time in a language-independent manner.
The date/time formatting subclass, such as SimpleDateFormat, allows for
formatting (i.e., date -> text), parsing (text -> date), and
normalization. |
DateFormatSymbols.java | Class | DateFormatSymbols is a public class for encapsulating
localizable date-time formatting data, such as the names of the
months, the names of the days of the week, and the time zone data.
DateFormat and SimpleDateFormat both use
DateFormatSymbols to encapsulate this information.
Typically you shouldn't use DateFormatSymbols directly.
Rather, you are encouraged to create a date-time formatter with the
DateFormat class's factory methods: getTimeInstance ,
getDateInstance , or getDateTimeInstance .
These methods automatically create a DateFormatSymbols for
the formatter so that you don't have to. |
DateTimePatternGenerator.java | Class | This class provides flexible generation of date format patterns, like "yy-MM-dd". |
DecimalFormat.java | Class | DecimalFormat is a concrete subclass of
NumberFormat that formats decimal numbers. |
DecimalFormatSymbols.java | Class | This class represents the set of symbols (such as the decimal separator, the
grouping separator, and so on) needed by DecimalFormat to format
numbers. |
DecompData.java | Class | |
DictionaryBasedBreakIterator.java | Class | A subclass of RuleBasedBreakIterator that adds the ability to use a dictionary
to further subdivide ranges of text beyond what is possible using just the
state-table-based algorithm. |
DigitList.java | Class | DigitList handles the transcoding between numeric values and
strings of characters. |
EscapeTransliterator.java | Class | A transliterator that converts Unicode characters to an escape
form. |
FunctionReplacer.java | Class | A replacer that calls a transliterator to generate its output text.
The input text to the transliterator is the output of another
UnicodeReplacer object. |
IDNA.java | Class | IDNA API implements the IDNA protocol as defined in the IDNA RFC.
The draft defines 2 operations: ToASCII and ToUnicode. |
LowercaseTransliterator.java | Class | A transliterator that performs locale-sensitive toLower()
case mapping. |
MeasureFormat.java | Class | A formatter for Measure objects. |
MessageFormat.java | Class | MessageFormat provides a means to produce concatenated
messages in language-neutral way. |
NameUnicodeTransliterator.java | Class | A transliterator that performs name to character mapping. |
NFRule.java | Class | A class represnting a single rule in a RuleBasedNumberFormat. |
NFRuleSet.java | Class | A collection of rules used by a RuleBasedNumberFormat to format and
parse numbers. |
NFSubstitution.java | Class | An abstract class defining protocol for substitutions. |
NormalizationTransliterator.java | Class | |
Normalizer.java | Class | Unicode Normalization
Unicode normalization API
normalize transforms Unicode text into an equivalent composed or
decomposed form, allowing for easier sorting and searching of text.
normalize supports the standard normalization forms described in
Unicode Standard Annex #15 — Unicode Normalization Forms.
Characters with accents or other adornments can be encoded in
several different ways in Unicode. |
NullTransliterator.java | Class | A transliterator that leaves text unchanged. |
NumberFormat.java | Class | NumberFormat is the abstract base class for all number
formats. |
NumberFormatServiceShim.java | Class | |
Punycode.java | Class | |
Quantifier.java | Class | |
RawCollationKey.java | Class |
Simple class wrapper to store the internal byte representation of a
CollationKey. |
RBBIDataWrapper.java | Class | Internal class used for Rule Based Break Iterators
This class provides access to the compiled break rule data, as
it is stored in a .brk file. |
RBBINode.java | Class | This class represents a node in the parse tree created by the RBBI Rule compiler. |
RBBIRuleBuilder.java | Class | |
RBBIRuleParseTable.java | Class | Generated Java File. |
RBBIRuleScanner.java | Class | This class is part of the Rule Based Break Iterator rule compiler. |
RBBISetBuilder.java | Class | |
RBBISymbolTable.java | Class | |
RBBITableBuilder.java | Class | |
RBNFChinesePostProcessor.java | Class | A post-processor for Chinese text. |
RBNFPostProcessor.java | Class | Post processor for RBNF output. |
RemoveTransliterator.java | Class | A transliterator that removes characters. |
Replaceable.java | Interface | Replaceable is an interface representing a
string of characters that supports the replacement of a range of
itself with a new string of characters. |
ReplaceableContextIterator.java | Class | Implementation of UCaseProps.ContextIterator, iterates over a Replaceable. |
ReplaceableString.java | Class | ReplaceableString is an adapter class that implements the
Replaceable API around an ordinary StringBuffer .
Note: This class does not support attributes and is not
intended for general use. |
RuleBasedBreakIterator.java | Class | Rule Based Break Iterator
This is a port of the C++ class RuleBasedBreakIterator from ICU4C. |
RuleBasedCollator.java | Class | RuleBasedCollator is a concrete subclass of Collator. |
RuleBasedNumberFormat.java | Class | A class that formats numbers according to a set of rules. |
RuleBasedTransliterator.java | Class | RuleBasedTransliterator is a transliterator
that reads a set of rules in order to determine how to perform
translations. |
SCSU.java | Interface | An interface defining constants for the Standard Compression Scheme
for Unicode (SCSU) as outlined in Unicode Technical
Report #6.
author: Stephen F. |
SearchIterator.java | Class | SearchIterator is an abstract base class that defines a protocol
for text searching. |
SimpleDateFormat.java | Class | SimpleDateFormat is a concrete class for formatting and
parsing dates in a locale-sensitive manner. |
StringCharacterIterator.java | Class | StringCharacterIterator implements the
CharacterIterater protocol for a String . |
StringMatcher.java | Class | An object that matches a fixed input string, implementing the
UnicodeMatcher API. |
StringPrep.java | Class | StringPrep API implements the StingPrep framework as described by
RFC 3454.
StringPrep prepares Unicode strings for use in network protocols.
Profiles of StingPrep are set of rules and data according to which the
Unicode Strings are prepared. |
StringPrepParseException.java | Class | Exception that signals an error has occurred while parsing the
input to StringPrep or IDNA. |
StringReplacer.java | Class | A replacer that produces static text as its output. |
StringSearch.java | Class |
StringSearch is the concrete subclass of
SearchIterator that provides language-sensitive text searching
based on the comparison rules defined in a
RuleBasedCollator object.
StringSearch uses a version of the fast Boyer-Moore search
algorithm that has been adapted to work with the large character set of
Unicode. |
SymbolTable.java | Interface | An interface that defines both lookup protocol and parsing of
symbolic names.
This interface is used by UnicodeSet to resolve $Variable style
references that appear in set patterns. |
TitlecaseTransliterator.java | Class | A transliterator that converts all letters (as defined by
UCharacter.isLetter() ) to lower case, except for those
letters preceded by non-letters. |
TransformTransliterator.java | Class | |
TransliterationRule.java | Class | A transliteration rule used by
RuleBasedTransliterator .
TransliterationRule is an immutable object.
A rule consists of an input pattern and an output string. |
TransliterationRuleSet.java | Class | A set of rules for a RuleBasedTransliterator . |
Transliterator.java | Class | Transliterator is an abstract class that
transliterates text from one format to another. |
TransliteratorIDParser.java | Class | Parsing component for transliterator IDs. |
TransliteratorParser.java | Class | |
TransliteratorRegistry.java | Class | |
UCharacterIterator.java | Class | Abstract class that defines an API for iteration on text objects.This is an
interface for forward and backward iteration and random access into a text
object. |
UFormat.java | Class | An abstract class that extends
java.text.Format to provide
additional ICU protocol, specifically, the getLocale()
API. |
UForwardCharacterIterator.java | Interface | Interface that defines an API for forward-only iteration
on text objects.
This is a minimal interface for iteration without random access
or backwards iteration. |
UnescapeTransliterator.java | Class | A transliterator that converts Unicode escape forms to the
characters they represent. |
UnicodeCompressor.java | Class | A compression engine implementing the Standard Compression Scheme
for Unicode (SCSU) as outlined in Unicode Technical
Report #6.
The SCSU works by using dynamically positioned windows
consisting of 128 consecutive characters in Unicode. |
UnicodeDecompressor.java | Class | A decompression engine implementing the Standard Compression Scheme
for Unicode (SCSU) as outlined in Unicode Technical
Report #6.
USAGE
The static methods on UnicodeDecompressor may be used in a
straightforward manner to decompress simple strings:
byte [] compressed = ... |
UnicodeFilter.java | Class | UnicodeFilter defines a protocol for selecting a
subset of the full range (U+0000 to U+FFFF) of Unicode characters. |
UnicodeMatcher.java | Interface | UnicodeMatcher defines a protocol for objects that can
match a range of characters in a Replaceable string. |
UnicodeNameTransliterator.java | Class | A transliterator that performs character to name mapping. |
UnicodeReplacer.java | Interface | UnicodeReplacer defines a protocol for objects that
replace a range of characters in a Replaceable string with output
text. |
UnicodeSet.java | Class | A mutable set of Unicode characters and multicharacter strings. |
UnicodeSetIterator.java | Class | UnicodeSetIterator iterates over the contents of a UnicodeSet. |
UppercaseTransliterator.java | Class | A transliterator that performs locale-sensitive toUpper()
case mapping. |
UTF16.java | Class | Standalone utility class providing UTF16 character conversions and
indexing conversions.
Code that uses strings alone rarely need modification.
By design, UTF-16 does not allow overlap, so searching for strings is a safe
operation. |