Java Doc for RuleBasedBreakIterator.java in » 6.0-JDK-Modules » j2me » java » text » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » 6.0 JDK Modules » j2me » java.text

Source Cross Reference

Class Diagram

Java Document (Java Doc)

java.lang .Object

java.text .BreakIterator

java.text .RuleBasedBreakIterator

All known Subclasses:   java.text .DictionaryBasedBreakIterator,
RuleBasedBreakIterator
class RuleBasedBreakIterator extends BreakIterator (Code)

A subclass of BreakIterator whose behavior is specified using a list of rules.

There are two kinds of rules, which are separated by semicolons: substitutions and regular expressions.

A substitution rule defines a name that can be used in place of an expression. It consists of a name, which is a string of characters contained in angle brackets, an equals sign, and an expression. (There can be no whitespace on either side of the equals sign.) To keep its syntactic meaning intact, the expression must be enclosed in parentheses or square brackets. A substitution is visible after its definition, and is filled in using simple textual substitution. Substitution definitions can contain other substitutions, as long as those substitutions have been defined first. Substitutions are generally used to make the regular expressions (which can get quite complex) shorted and easier to read. They typically define either character categories or commonly-used subexpressions.

There is one special substitution. If the description defines a substitution called "<ignore>", the expression must be a [] expression, and the expression defines a set of characters (the "ignore characters") that will be transparent to the BreakIterator. A sequence of characters will break the same way it would if any ignore characters it contains are taken out. Break positions never occur befoer ignore characters.

A regular expression uses a subset of the normal Unix regular-expression syntax, and defines a sequence of characters to be kept together. With one significant exception, the iterator uses a longest-possible-match algorithm when matching text to regular expressions. The iterator also treats descriptions containing multiple regular expressions as if they were ORed together (i.e., as if they were separated by |).

The special characters recognized by the regular-expression parser are as follows:

* Specifies that the expression preceding the asterisk may occur any number of times (including not at all).

{} Encloses a sequence of characters that is optional.

() Encloses a sequence of characters. If followed by *, the sequence repeats. Otherwise, the parentheses are just a grouping device and a way to delimit the ends of expressions containing |.

| Separates two alternative sequences of characters. Either one sequence or the other, but not both, matches this expression. The | character can only occur inside ().

. Matches any character.

*? Specifies a non-greedy asterisk. *? works the same way as *, except when there is overlap between the last group of characters in the expression preceding the and the first group of characters following the *. When there is this kind of overlap, * will match the longest sequence of characters that match the expression before the *, and *? will match the shortest sequence of characters matching the expression before the *?. For example, if you have "xxyxyyyxyxyxxyxyxyy" in the text, "x[xy]*x" will match through to the last x (i.e., "xxyxyyyxyxyxxyxyxyy", but "x[xy]*?x" will only match the first two xes ("xxyxyyyxyxyxxyxyxyy").

[] Specifies a group of alternative characters. A [] expression will match any single character that is specified in the [] expression. For more on the syntax of [] expressions, see below.

/ Specifies where the break position should go if text matches this expression. (e.g., "[a-z]*/[:Zs:]*[1-0]" will match if the iterator sees a run of letters, followed by a run of whitespace, followed by a digit, but the break position will actually go before the whitespace). Expressions that don't contain / put the break position at the end of the matching text.

\ Escape character. The \ itself is ignored, but causes the next character to be treated as literal character. This has no effect for many characters, but for the characters listed above, this deprives them of their special meaning. (There are no special escape sequences for Unicode characters, or tabs and newlines; these are all handled by a higher-level protocol. In a Java string, "\n" will be converted to a literal newline character by the time the regular-expression parser sees it. Of course, this means that \ sequences that are visible to the regexp parser must be written as \\ when inside a Java string.) All characters in the ASCII range except for letters, digits, and control characters are reserved characters to the parser and must be preceded by \ even if they currently don't mean anything.

! If ! appears at the beginning of a regular expression, it tells the regexp parser that this expression specifies the backwards-iteration behavior of the iterator, and not its normal iteration behavior. This is generally only used in situations where the automatically-generated backwards-iteration brhavior doesn't produce satisfactory results and must be supplemented with extra client-specified rules.

(all others) All other characters are treated as literal characters, which must match the corresponding character(s) in the text exactly.

Within a [] expression, a number of other special characters can be used to specify groups of characters:

- Specifies a range of matching characters. For example "[a-p]" matches all lowercase Latin letters from a to p (inclusive). The - sign specifies ranges of continuous Unicode numeric values, not ranges of characters in a language's alphabetical order: "[a-z]" doesn't include capital letters, nor does it include accented letters such as a-umlaut.

:: A pair of colons containing a one- or two-letter code matches all characters in the corresponding Unicode category. The two-letter codes are the same as the two-letter codes in the Unicode database (for example, "[:Sc::Sm:]" matches all currency symbols and all math symbols). Specifying a one-letter code is the same as specifying all two-letter codes that begin with that letter (for example, "[:L:]" matches all letters, and is equivalent to "[:Lu::Ll::Lo::Lm::Lt:]"). Anything other than a valid two-letter Unicode category code or a single letter that begins a Unicode category code is illegal within colons.

[] [] expressions can nest. This has no effect, except when used in conjunction with the ^ token.

^ Excludes the character (or the characters in the [] expression) following it from the group of characters. For example, "[a-z^p]" matches all Latin lowercase letters except p. "[:L:^[\u4e00-\u9fff]]" matches all letters except the Han ideographs.

(all others) All other characters are treated as literal characters. (For example, "[aeiou]" specifies just the letters a, e, i, o, and u.)

For a more complete explanation, see http://www.ibm.com/java/education/boundaries/boundaries.html. For examples, see the resource data (which is annotated).

author:
   Richard Gillam
version:
   $RCSFile$ $Revision: 1.1 $ $Date: 1998/11/05 19:32:04 $

Inner Class :protected class Builder

Field Summary
final protected static byte IGNORE


Constructor Summary
public RuleBasedBreakIterator(String description)
     Constructs a RuleBasedBreakIterator according to the description provided.

Method Summary
final protected static void checkOffset(int offset, CharacterIterator text)
     Throw IllegalArgumentException unless begin <= offset < end.
public Object clone()
     Clones this iterator.
public int current()
     Returns the current iteration position.
public boolean equals(Object that)
     Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.
public int first()
     Sets the current iteration position to the beginning of the text.
public int following(int offset)
     Sets the iterator to refer to the first boundary position following the specified position.
public CharacterIterator getText()
     Return a CharacterIterator over the text being analyzed.
protected int handleNext()
     This method is the actual implementation of the next() method.
protected int handlePrevious()
     This method backs the iterator back up to a "safe position" in the text. This is a position that we know, without any context, must be a break position. The various calling methods then iterate forward from this safe position to the appropriate position to return.
public int hashCode()

public boolean isBoundary(int offset)
     Returns true if the specfied position is a boundary position.
public int last()
     Sets the current iteration position to the end of the text.
protected int lookupBackwardState(int state, int category)
     Given a current state and a character category, looks up the next state to transition to in the backwards state table.
protected int lookupCategory(char c)

protected int lookupState(int state, int category)
     Given a current state and a character category, looks up the next state to transition to in the state table.
protected Builder makeBuilder()
     Creates a Builder.
public int next(int n)
     Advances the iterator either forward or backward the specified number of steps. Negative values move backward, and positive values move forward.
public int next()
     Advances the iterator to the next boundary position.
public int preceding(int offset)
     Sets the iterator to refer to the last boundary position before the specified position.
public int previous()
     Advances the iterator backwards, to the last boundary preceding this one.
public void setText(CharacterIterator newText)
     Set the iterator to analyze a new piece of text.
public String toString()


Field Detail
IGNORE
final protected static byte IGNORE(Code)
A token used as a character-category value to identify ignore characters

Constructor Detail
RuleBasedBreakIterator
public RuleBasedBreakIterator(String description)(Code)
Constructs a RuleBasedBreakIterator according to the description provided. If the description is malformed, throws an IllegalArgumentException. Normally, instead of constructing a RuleBasedBreakIterator directory, you'll use the factory methods on BreakIterator to create one indirectly from a description in the framework's resource files. You'd use this when you want special behavior not provided by the built-in iterators.

Method Detail
checkOffset
final protected static void checkOffset(int offset, CharacterIterator text)(Code)
Throw IllegalArgumentException unless begin <= offset < end.

clone
public Object clone()(Code)
Clones this iterator. A newly-constructed RuleBasedBreakIterator with the samebehavior as this one.

current
public int current()(Code)
Returns the current iteration position. The current iteration position.

equals
public boolean equals(Object that)(Code)
Returns true if both BreakIterators are of the same class, have the same rules, and iterate over the same text.

first
public int first()(Code)
Sets the current iteration position to the beginning of the text. (i.e., the CharacterIterator's starting offset). The offset of the beginning of the text.

following
public int following(int offset)(Code)
Sets the iterator to refer to the first boundary position following the specified position. The position of the first break after the current position.

getText
public CharacterIterator getText()(Code)
Return a CharacterIterator over the text being analyzed. This version of this method returns the actual CharacterIterator we're using internally. Changing the state of this iterator can have undefined consequences. If you need to change it, clone it first. An iterator over the text being analyzed.

handleNext
protected int handleNext()(Code)
This method is the actual implementation of the next() method. All iteration vectors through here. This method initializes the state machine to state 1 and advances through the text character by character until we reach the end of the text or the state machine transitions to state 0. We update our return value every time the state machine passes through a possible end state.

handlePrevious
protected int handlePrevious()(Code)
This method backs the iterator back up to a "safe position" in the text. This is a position that we know, without any context, must be a break position. The various calling methods then iterate forward from this safe position to the appropriate position to return. (For more information, see the description of buildBackwardsStateTable() in RuleBasedBreakIterator.Builder.)

hashCode
public int hashCode()(Code)
Compute a hashcode for this BreakIterator A hash code

isBoundary
public boolean isBoundary(int offset)(Code)
Returns true if the specfied position is a boundary position. As a side effect, leaves the iterator pointing to the first boundary position at or after "offset".
Parameters:
  offset - the offset to check. True if "offset" is a boundary position.

last
public int last()(Code)
Sets the current iteration position to the end of the text. (i.e., the CharacterIterator's ending offset). The text's past-the-end offset.

lookupBackwardState
protected int lookupBackwardState(int state, int category)(Code)
Given a current state and a character category, looks up the next state to transition to in the backwards state table.

lookupCategory
protected int lookupCategory(char c)(Code)
Looks up a character's category (i.e., its category for breaking purposes, not its Unicode category)

lookupState
protected int lookupState(int state, int category)(Code)
Given a current state and a character category, looks up the next state to transition to in the state table.

makeBuilder
protected Builder makeBuilder()(Code)
Creates a Builder.

next
public int next(int n)(Code)
Advances the iterator either forward or backward the specified number of steps. Negative values move backward, and positive values move forward. This is equivalent to repeatedly calling next() or previous().
Parameters:
  n - The number of steps to move. The sign indicates the direction(negative is backwards, and positive is forwards). The character offset of the boundary position n boundaries away fromthe current one.

next
public int next()(Code)
Advances the iterator to the next boundary position. The position of the first boundary after this one.

preceding
public int preceding(int offset)(Code)
Sets the iterator to refer to the last boundary position before the specified position. The position of the last boundary before the starting position.

previous
public int previous()(Code)
Advances the iterator backwards, to the last boundary preceding this one. The position of the last boundary position preceding this one.

setText
public void setText(CharacterIterator newText)(Code)
Set the iterator to analyze a new piece of text. This function resets the current iteration position to the beginning of the text.
Parameters:
  newText - An iterator over the text to analyze.

toString
public String toString()(Code)
Returns the description used to create this iterator

Fields inherited from java.text.BreakIterator
final public static int DONE(Code)(Java Doc)

Methods inherited from java.text.BreakIterator
public Object clone()(Code)(Java Doc)
abstract public int current()(Code)(Java Doc)
abstract public int first()(Code)(Java Doc)
abstract public int following(int offset)(Code)(Java Doc)
public static synchronized Locale[] getAvailableLocales()(Code)(Java Doc)
public static BreakIterator getCharacterInstance()(Code)(Java Doc)
public static BreakIterator getCharacterInstance(Locale where)(Code)(Java Doc)
public static BreakIterator getLineInstance()(Code)(Java Doc)
public static BreakIterator getLineInstance(Locale where)(Code)(Java Doc)
public static BreakIterator getSentenceInstance()(Code)(Java Doc)
public static BreakIterator getSentenceInstance(Locale where)(Code)(Java Doc)
abstract public CharacterIterator getText()(Code)(Java Doc)
public static BreakIterator getWordInstance()(Code)(Java Doc)
public static BreakIterator getWordInstance(Locale where)(Code)(Java Doc)
public boolean isBoundary(int offset)(Code)(Java Doc)
abstract public int last()(Code)(Java Doc)
abstract public int next(int n)(Code)(Java Doc)
abstract public int next()(Code)(Java Doc)
public int preceding(int offset)(Code)(Java Doc)
abstract public int previous()(Code)(Java Doc)
public void setText(String newText)(Code)(Java Doc)
abstract public void setText(CharacterIterator newText)(Code)(Java Doc)

Methods inherited from java.lang.Object
public boolean equals(Object obj)(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.