Source Code Cross Referenced for RuleCharacterIterator.java in » 6.0-JDK-Modules-sun » text » sun » text » normalizer » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » 6.0 JDK Modules sun » text » sun.text.normalizer

Source Cross Referenced Class Diagram Java Document (Java Doc)

001:        /*
002:         * Portions Copyright 2005 Sun Microsystems, Inc.  All Rights Reserved.
003:         * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
004:         *
005:         * This code is free software; you can redistribute it and/or modify it
006:         * under the terms of the GNU General Public License version 2 only, as
007:         * published by the Free Software Foundation.  Sun designates this
008:         * particular file as subject to the "Classpath" exception as provided
009:         * by Sun in the LICENSE file that accompanied this code.
010:         *
011:         * This code is distributed in the hope that it will be useful, but WITHOUT
012:         * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
013:         * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
014:         * version 2 for more details (a copy is included in the LICENSE file that
015:         * accompanied this code).
016:         *
017:         * You should have received a copy of the GNU General Public License version
018:         * 2 along with this work; if not, write to the Free Software Foundation,
019:         * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
020:         *
021:         * Please contact Sun Microsystems, Inc., 4150 Network Circle, Santa Clara,
022:         * CA 95054 USA or visit www.sun.com if you need additional information or
023:         * have any questions.
024:         */
025:
026:        /*
027:         *******************************************************************************
028:         * (C) Copyright IBM Corp. 1996-2005 - All Rights Reserved                     *
029:         *                                                                             *
030:         * The original version of this source code and documentation is copyrighted   *
031:         * and owned by IBM, These materials are provided under terms of a License     *
032:         * Agreement between IBM and Sun. This technology is protected by multiple     *
033:         * US and International patents. This notice and attribution to IBM may not    *
034:         * to removed.                                                                 *
035:         *******************************************************************************
036:         */
037:
038:        /*
039:         **********************************************************************
040:         * Author: Alan Liu
041:         * Created: September 23 2003
042:         * Since: ICU 2.8
043:         **********************************************************************
044:         */
045:
046:        package sun.text.normalizer;
047:
048:        import java.text.ParsePosition;
049:
050:        /**
051:         * An iterator that returns 32-bit code points.  This class is deliberately
052:         * <em>not</em> related to any of the JDK or ICU4J character iterator classes
053:         * in order to minimize complexity.
054:         * @author Alan Liu
055:         * @since ICU 2.8
056:         */
057:        public class RuleCharacterIterator {
058:
059:            // TODO: Ideas for later.  (Do not implement if not needed, lest the
060:            // code coverage numbers go down due to unused methods.)
061:            // 1. Add a copy constructor, equals() method, clone() method.
062:            // 2. Rather than return DONE, throw an exception if the end
063:            // is reached -- this is an alternate usage model, probably not useful.
064:            // 3. Return isEscaped from next().  If this happens,
065:            // don't keep an isEscaped member variable.
066:
067:            /**
068:             * Text being iterated.
069:             */
070:            private String text;
071:
072:            /**
073:             * Position of iterator.
074:             */
075:            private ParsePosition pos;
076:
077:            /**
078:             * Symbol table used to parse and dereference variables.  May be null.
079:             */
080:            private SymbolTable sym;
081:
082:            /**
083:             * Current variable expansion, or null if none.
084:             */
085:            private char[] buf;
086:
087:            /**
088:             * Position within buf[].  Meaningless if buf == null.
089:             */
090:            private int bufPos;
091:
092:            /**
093:             * Flag indicating whether the last character was parsed from an escape.
094:             */
095:            private boolean isEscaped;
096:
097:            /**
098:             * Value returned when there are no more characters to iterate.
099:             */
100:            public static final int DONE = -1;
101:
102:            /**
103:             * Bitmask option to enable parsing of variable names.  If (options &
104:             * PARSE_VARIABLES) != 0, then an embedded variable will be expanded to
105:             * its value.  Variables are parsed using the SymbolTable API.
106:             */
107:            public static final int PARSE_VARIABLES = 1;
108:
109:            /**
110:             * Bitmask option to enable parsing of escape sequences.  If (options &
111:             * PARSE_ESCAPES) != 0, then an embedded escape sequence will be expanded
112:             * to its value.  Escapes are parsed using Utility.unescapeAt().
113:             */
114:            public static final int PARSE_ESCAPES = 2;
115:
116:            /**
117:             * Bitmask option to enable skipping of whitespace.  If (options &
118:             * SKIP_WHITESPACE) != 0, then whitespace characters will be silently
119:             * skipped, as if they were not present in the input.  Whitespace
120:             * characters are defined by UCharacterProperty.isRuleWhiteSpace().
121:             */
122:            public static final int SKIP_WHITESPACE = 4;
123:
124:            /**
125:             * Constructs an iterator over the given text, starting at the given
126:             * position.
127:             * @param text the text to be iterated
128:             * @param sym the symbol table, or null if there is none.  If sym is null,
129:             * then variables will not be deferenced, even if the PARSE_VARIABLES
130:             * option is set.
131:             * @param pos upon input, the index of the next character to return.  If a
132:             * variable has been dereferenced, then pos will <em>not</em> increment as
133:             * characters of the variable value are iterated.
134:             */
135:            public RuleCharacterIterator(String text, SymbolTable sym,
136:                    ParsePosition pos) {
137:                if (text == null || pos.getIndex() > text.length()) {
138:                    throw new IllegalArgumentException();
139:                }
140:                this .text = text;
141:                this .sym = sym;
142:                this .pos = pos;
143:                buf = null;
144:            }
145:
146:            /**
147:             * Returns true if this iterator has no more characters to return.
148:             */
149:            public boolean atEnd() {
150:                return buf == null && pos.getIndex() == text.length();
151:            }
152:
153:            /**
154:             * Returns the next character using the given options, or DONE if there
155:             * are no more characters, and advance the position to the next
156:             * character.
157:             * @param options one or more of the following options, bitwise-OR-ed
158:             * together: PARSE_VARIABLES, PARSE_ESCAPES, SKIP_WHITESPACE.
159:             * @return the current 32-bit code point, or DONE
160:             */
161:            public int next(int options) {
162:                int c = DONE;
163:                isEscaped = false;
164:
165:                for (;;) {
166:                    c = _current();
167:                    _advance(UTF16.getCharCount(c));
168:
169:                    if (c == SymbolTable.SYMBOL_REF && buf == null
170:                            && (options & PARSE_VARIABLES) != 0 && sym != null) {
171:                        String name = sym.parseReference(text, pos, text
172:                                .length());
173:                        // If name == null there was an isolated SYMBOL_REF;
174:                        // return it.  Caller must be prepared for this.
175:                        if (name == null) {
176:                            break;
177:                        }
178:                        bufPos = 0;
179:                        buf = sym.lookup(name);
180:                        if (buf == null) {
181:                            throw new IllegalArgumentException(
182:                                    "Undefined variable: " + name);
183:                        }
184:                        // Handle empty variable value
185:                        if (buf.length == 0) {
186:                            buf = null;
187:                        }
188:                        continue;
189:                    }
190:
191:                    if ((options & SKIP_WHITESPACE) != 0
192:                            && UCharacterProperty.isRuleWhiteSpace(c)) {
193:                        continue;
194:                    }
195:
196:                    if (c == '\\' && (options & PARSE_ESCAPES) != 0) {
197:                        int offset[] = new int[] { 0 };
198:                        c = Utility.unescapeAt(lookahead(), offset);
199:                        jumpahead(offset[0]);
200:                        isEscaped = true;
201:                        if (c < 0) {
202:                            throw new IllegalArgumentException("Invalid escape");
203:                        }
204:                    }
205:
206:                    break;
207:                }
208:
209:                return c;
210:            }
211:
212:            /**
213:             * Returns true if the last character returned by next() was
214:             * escaped.  This will only be the case if the option passed in to
215:             * next() included PARSE_ESCAPED and the next character was an
216:             * escape sequence.
217:             */
218:            public boolean isEscaped() {
219:                return isEscaped;
220:            }
221:
222:            /**
223:             * Returns true if this iterator is currently within a variable expansion.
224:             */
225:            public boolean inVariable() {
226:                return buf != null;
227:            }
228:
229:            /**
230:             * Returns an object which, when later passed to setPos(), will
231:             * restore this iterator's position.  Usage idiom:
232:             *
233:             * RuleCharacterIterator iterator = ...;
234:             * Object pos = iterator.getPos(null); // allocate position object
235:             * for (;;) {
236:             *   pos = iterator.getPos(pos); // reuse position object
237:             *   int c = iterator.next(...);
238:             *   ...
239:             * }
240:             * iterator.setPos(pos);
241:             *
242:             * @param p a position object previously returned by getPos(),
243:             * or null.  If not null, it will be updated and returned.  If
244:             * null, a new position object will be allocated and returned.
245:             * @return a position object which may be passed to setPos(),
246:             * either `p,' or if `p' == null, a newly-allocated object
247:             */
248:            public Object getPos(Object p) {
249:                if (p == null) {
250:                    return new Object[] { buf,
251:                            new int[] { pos.getIndex(), bufPos } };
252:                }
253:                Object[] a = (Object[]) p;
254:                a[0] = buf;
255:                int[] v = (int[]) a[1];
256:                v[0] = pos.getIndex();
257:                v[1] = bufPos;
258:                return p;
259:            }
260:
261:            /**
262:             * Restores this iterator to the position it had when getPos()
263:             * returned the given object.
264:             * @param p a position object previously returned by getPos()
265:             */
266:            public void setPos(Object p) {
267:                Object[] a = (Object[]) p;
268:                buf = (char[]) a[0];
269:                int[] v = (int[]) a[1];
270:                pos.setIndex(v[0]);
271:                bufPos = v[1];
272:            }
273:
274:            /**
275:             * Skips ahead past any ignored characters, as indicated by the given
276:             * options.  This is useful in conjunction with the lookahead() method.
277:             *
278:             * Currently, this only has an effect for SKIP_WHITESPACE.
279:             * @param options one or more of the following options, bitwise-OR-ed
280:             * together: PARSE_VARIABLES, PARSE_ESCAPES, SKIP_WHITESPACE.
281:             */
282:            public void skipIgnored(int options) {
283:                if ((options & SKIP_WHITESPACE) != 0) {
284:                    for (;;) {
285:                        int a = _current();
286:                        if (!UCharacterProperty.isRuleWhiteSpace(a))
287:                            break;
288:                        _advance(UTF16.getCharCount(a));
289:                    }
290:                }
291:            }
292:
293:            /**
294:             * Returns a string containing the remainder of the characters to be
295:             * returned by this iterator, without any option processing.  If the
296:             * iterator is currently within a variable expansion, this will only
297:             * extend to the end of the variable expansion.  This method is provided
298:             * so that iterators may interoperate with string-based APIs.  The typical
299:             * sequence of calls is to call skipIgnored(), then call lookahead(), then
300:             * parse the string returned by lookahead(), then call jumpahead() to
301:             * resynchronize the iterator.
302:             * @return a string containing the characters to be returned by future
303:             * calls to next()
304:             */
305:            public String lookahead() {
306:                if (buf != null) {
307:                    return new String(buf, bufPos, buf.length - bufPos);
308:                } else {
309:                    return text.substring(pos.getIndex());
310:                }
311:            }
312:
313:            /**
314:             * Advances the position by the given number of 16-bit code units.
315:             * This is useful in conjunction with the lookahead() method.
316:             * @param count the number of 16-bit code units to jump over
317:             */
318:            public void jumpahead(int count) {
319:                if (count < 0) {
320:                    throw new IllegalArgumentException();
321:                }
322:                if (buf != null) {
323:                    bufPos += count;
324:                    if (bufPos > buf.length) {
325:                        throw new IllegalArgumentException();
326:                    }
327:                    if (bufPos == buf.length) {
328:                        buf = null;
329:                    }
330:                } else {
331:                    int i = pos.getIndex() + count;
332:                    pos.setIndex(i);
333:                    if (i > text.length()) {
334:                        throw new IllegalArgumentException();
335:                    }
336:                }
337:            }
338:
339:            /**
340:             * Returns the current 32-bit code point without parsing escapes, parsing
341:             * variables, or skipping whitespace.
342:             * @return the current 32-bit code point
343:             */
344:            private int _current() {
345:                if (buf != null) {
346:                    return UTF16.charAt(buf, 0, buf.length, bufPos);
347:                } else {
348:                    int i = pos.getIndex();
349:                    return (i < text.length()) ? UTF16.charAt(text, i) : DONE;
350:                }
351:            }
352:
353:            /**
354:             * Advances the position by the given amount.
355:             * @param count the number of 16-bit code units to advance past
356:             */
357:            private void _advance(int count) {
358:                if (buf != null) {
359:                    bufPos += count;
360:                    if (bufPos == buf.length) {
361:                        buf = null;
362:                    }
363:                } else {
364:                    pos.setIndex(pos.getIndex() + count);
365:                    if (pos.getIndex() > text.length()) {
366:                        pos.setIndex(text.length());
367:                    }
368:                }
369:            }
370:        }

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.