Source Code Cross Referenced for Perl5Util.java in  » Development » Jakarta-ORO » org » apache » oro » text » perl » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Development » Jakarta ORO » org.apache.oro.text.perl 
Source Cross Referenced  Class Diagram Java Document (Java Doc) 


0001:        /*
0002:         * $Id: Perl5Util.java,v 1.19 2003/11/07 20:16:25 dfs Exp $
0003:         *
0004:         * ====================================================================
0005:         * The Apache Software License, Version 1.1
0006:         *
0007:         * Copyright (c) 2000 The Apache Software Foundation.  All rights
0008:         * reserved.
0009:         *
0010:         * Redistribution and use in source and binary forms, with or without
0011:         * modification, are permitted provided that the following conditions
0012:         * are met:
0013:         *
0014:         * 1. Redistributions of source code must retain the above copyright
0015:         *    notice, this list of conditions and the following disclaimer.
0016:         *
0017:         * 2. Redistributions in binary form must reproduce the above copyright
0018:         *    notice, this list of conditions and the following disclaimer in
0019:         *    the documentation and/or other materials provided with the
0020:         *    distribution.
0021:         *
0022:         * 3. The end-user documentation included with the redistribution,
0023:         *    if any, must include the following acknowledgment:
0024:         *       "This product includes software developed by the
0025:         *        Apache Software Foundation (http://www.apache.org/)."
0026:         *    Alternately, this acknowledgment may appear in the software itself,
0027:         *    if and wherever such third-party acknowledgments normally appear.
0028:         *
0029:         * 4. The names "Apache" and "Apache Software Foundation", "Jakarta-Oro" 
0030:         *    must not be used to endorse or promote products derived from this
0031:         *    software without prior written permission. For written
0032:         *    permission, please contact apache@apache.org.
0033:         *
0034:         * 5. Products derived from this software may not be called "Apache" 
0035:         *    or "Jakarta-Oro", nor may "Apache" or "Jakarta-Oro" appear in their 
0036:         *    name, without prior written permission of the Apache Software Foundation.
0037:         *
0038:         * THIS SOFTWARE IS PROVIDED ``AS IS'' AND ANY EXPRESSED OR IMPLIED
0039:         * WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
0040:         * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
0041:         * DISCLAIMED.  IN NO EVENT SHALL THE APACHE SOFTWARE FOUNDATION OR
0042:         * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
0043:         * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
0044:         * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF
0045:         * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
0046:         * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
0047:         * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
0048:         * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
0049:         * SUCH DAMAGE.
0050:         * ====================================================================
0051:         *
0052:         * This software consists of voluntary contributions made by many
0053:         * individuals on behalf of the Apache Software Foundation.  For more
0054:         * information on the Apache Software Foundation, please see
0055:         * <http://www.apache.org/>.
0056:         */
0057:
0058:        package org.apache.oro.text.perl;
0059:
0060:        import java.util.*;
0061:
0062:        import org.apache.oro.text.*;
0063:        import org.apache.oro.text.regex.*;
0064:        import org.apache.oro.util.*;
0065:
0066:        /**
0067:         * This is a utility class implementing the 3 most common Perl5 operations
0068:         * involving regular expressions:
0069:         * <ul>
0070:         * <li> [m]/pattern/[i][m][s][x],
0071:         * <li> s/pattern/replacement/[g][i][m][o][s][x],
0072:         * <li> and split().
0073:         * </ul>
0074:         * As with Perl, any non-alphanumeric character can be used in lieu of
0075:         * the slashes.
0076:         *  <p>
0077:         * The objective of the class is to minimize the amount of code a Java
0078:         * programmer using Jakarta-ORO
0079:         * has to write to achieve the same results as Perl by 
0080:         * transparently handling regular expression compilation, caching, and
0081:         * matching.  A second objective is to use the same Perl pattern matching
0082:         * syntax to ease the task of Perl programmers transitioning to Java
0083:         * (this also reduces the number of parameters to a method).
0084:         * All the state affecting methods are synchronized to avoid
0085:         * the maintenance of explicit locks in multithreaded programs.  This
0086:         * philosophy differs from the
0087:         * {@link org.apache.oro.text.regex} package, where
0088:         * you are expected to either maintain explicit locks, or more preferably
0089:         * create separate compiler and matcher instances for each thread.
0090:         * <p>
0091:         * To use this class, first create an instance using the default constructor
0092:         * or initialize the instance with a PatternCache of your choosing using
0093:         * the alternate constructor.  The default cache used by Perl5Util is a
0094:         * PatternCacheLRU of capacity GenericPatternCache.DEFAULT_CAPACITY.  You may
0095:         * want to create a cache with a different capacity, a different
0096:         * cache replacement policy, or even devise your own PatternCache
0097:         * implementation.  The PatternCacheLRU is probably the best general purpose
0098:         * pattern cache, but your specific application may be better served by
0099:         * a different cache replacement policy.  You should remember that you can
0100:         * front-load a cache with all the patterns you will be using before
0101:         * initializing a Perl5Util instance, or you can just let Perl5Util
0102:         * fill the cache as you use it.
0103:         * <p>
0104:         * You might use the class as follows:
0105:         * <pre>
0106:         * Perl5Util util = new Perl5Util();
0107:         * String line;
0108:         * DataInputStream input;
0109:         * PrintStream output;
0110:         * 
0111:         * // Initialization of input and output omitted
0112:         * while((line = input.readLine()) != null) {
0113:         *     // First find the line with the string we want to substitute because
0114:         *     // it is cheaper than blindly substituting each line.
0115:         *     if(util.match("/HREF=\"description1.html\"/")) {
0116:         *        line = util.substitute("s/description1\\.html/about1.html/", line);
0117:         *     }
0118:         *    output.println(line);
0119:         * }
0120:         * </pre>
0121:         * <p>
0122:         * A couple of things to remember when using this class are that the
0123:         * {@link #match match()} methods have the same meaning as
0124:         * {@link org.apache.oro.text.regex.Perl5Matcher#contains
0125:         *  Perl5Matcher.contains()}
0126:         * and <code>=~ m/pattern/</code> in Perl.  The methods are named match
0127:         * to more closely associate them with Perl and to differentiate them
0128:         * from {@link org.apache.oro.text.regex.Perl5Matcher#matches
0129:         * Perl5Matcher.matches()}.
0130:         * A further thing to keep in mind is that the
0131:         * {@link MalformedPerl5PatternException} class is derived from
0132:         * RuntimeException which means you DON'T have to catch it.  The reasoning
0133:         * behind this is that you will detect your regular expression mistakes
0134:         * as you write and debug your program when a MalformedPerl5PatternException
0135:         * is thrown during a test run.  However, we STRONGLY recommend that you
0136:         * ALWAYS catch MalformedPerl5PatternException whenever you deal with a
0137:         * DYNAMICALLY created pattern.  Relying on a fatal
0138:         * MalformedPerl5PatternException being thrown to detect errors while
0139:         * debugging is only useful for dealing with static patterns, that is, actual
0140:         * pregenerated strings present in your program.  Patterns created from user
0141:         * input or some other dynamic method CANNOT be relied upon to be correct
0142:         * and MUST be handled by catching MalformedPerl5PatternException for your
0143:         * programs to be robust.
0144:         * <p>
0145:         * Finally, as a convenience Perl5Util implements 
0146:         * the {@link org.apache.oro.text.regex.MatchResult MatchResult} interface.
0147:         * The methods are merely wrappers which call the corresponding method of
0148:         * the last {@link org.apache.oro.text.regex.MatchResult MatchResult}
0149:         * found (which can be accessed with {@link #getMatch()}) by a match or
0150:         * substitution (or even a split, but this isn't particularly useful).
0151:         * At the moment, the
0152:         * {@link org.apache.oro.text.regex.MatchResult MatchResult} returned
0153:         * by {@link #getMatch()} is not stored in a thread-local variable.  Therefore
0154:         * concurrent calls to {@link #getMatch()} will produce unpredictable
0155:         * results.  So if your concurrent program requires the match results,
0156:         * you must protect the matching and the result retrieval in a critical
0157:         * section.  If you do not need match results, you don't need to do anything
0158:         * special.  If you feel the J2SE implementation of {@link #getMatch()}
0159:         * should use a thread-local variable and obviate the need for a critical
0160:         * section, please express your views on the oro-dev mailing list.
0161:         *
0162:         * @version @version@
0163:         * @since 1.0
0164:         * @see MalformedPerl5PatternException
0165:         * @see org.apache.oro.text.PatternCache
0166:         * @see org.apache.oro.text.PatternCacheLRU
0167:         * @see org.apache.oro.text.regex.MatchResult
0168:         */
0169:        public final class Perl5Util implements  MatchResult {
0170:            /** The regular expression to use to parse match expression. */
0171:            private static final String __matchExpression = "m?(\\W)(.*)\\1([imsx]*)";
0172:
0173:            /** The pattern cache to compile and store patterns */
0174:            private PatternCache __patternCache;
0175:            /** The hashtable to cache higher-level expressions */
0176:            private Cache __expressionCache;
0177:            /** The pattern matcher to perform matching operations. */
0178:            private Perl5Matcher __matcher;
0179:            /** The compiled match expression parsing regular expression. */
0180:            private Pattern __matchPattern;
0181:            /** The last match from a successful call to a matching method. */
0182:            private MatchResult __lastMatch;
0183:            /**
0184:             * A container for temporarily holding the results of a split before
0185:             * deleting trailing empty fields.
0186:             */
0187:            private ArrayList __splitList;
0188:
0189:            /**
0190:             * Keeps track of the original input (for postMatch() and preMatch())
0191:             * methods.  This will be discarded if the preMatch() and postMatch()
0192:             * methods are moved into the MatchResult interface.
0193:             */
0194:            private Object __originalInput;
0195:
0196:            /**
0197:             * Keeps track of the begin and end offsets of the original input for
0198:             * the postMatch() and preMatch() methods.
0199:             */
0200:            private int __inputBeginOffset, __inputEndOffset;
0201:
0202:            /** Used for default return value of post and pre Match() */
0203:            private static final String __nullString = "";
0204:
0205:            /**
0206:             * A constant passed to the {@link #split split()} methods indicating
0207:             * that all occurrences of a pattern should be used to split a string. 
0208:             */
0209:            public static final int SPLIT_ALL = Util.SPLIT_ALL;
0210:
0211:            /**
0212:             * A secondary constructor for Perl5Util.  It initializes the Perl5Matcher
0213:             * used by the class to perform matching operations, but requires the
0214:             * programmer to provide a PatternCache instance for the class
0215:             * to use to compile and store regular expressions.  You would want to
0216:             * use this constructor if you want to change the capacity or policy 
0217:             * of the cache used.  Example uses might be:
0218:             * <pre>
0219:             * // We know we're going to use close to 50 expressions a whole lot, so
0220:             * // we create a cache of the proper size.
0221:             * util = new Perl5Util(new PatternCacheLRU(50));
0222:             * </pre>
0223:             * or
0224:             * <pre>
0225:             * // We're only going to use a few expressions and know that second-chance
0226:             * // fifo is best suited to the order in which we are using the patterns.
0227:             * util = new Perl5Util(new PatternCacheFIFO2(10));
0228:             * </pre>
0229:             */
0230:            public Perl5Util(PatternCache cache) {
0231:                __splitList = new ArrayList();
0232:                __matcher = new Perl5Matcher();
0233:                __patternCache = cache;
0234:                __expressionCache = new CacheLRU(cache.capacity());
0235:                __compilePatterns();
0236:            }
0237:
0238:            /**
0239:             * Default constructor for Perl5Util.  This initializes the Perl5Matcher
0240:             * used by the class to perform matching operations and creates a
0241:             * default PatternCacheLRU instance to use to compile and cache regular
0242:             * expressions.  The size of this cache is 
0243:             * GenericPatternCache.DEFAULT_CAPACITY.
0244:             */
0245:            public Perl5Util() {
0246:                this (new PatternCacheLRU());
0247:            }
0248:
0249:            /**
0250:             * Compiles the patterns (currently only the match expression) used to
0251:             * parse Perl5 expressions.  Right now it initializes __matchPattern.
0252:             */
0253:            private void __compilePatterns() {
0254:                Perl5Compiler compiler = new Perl5Compiler();
0255:
0256:                try {
0257:                    __matchPattern = compiler.compile(__matchExpression,
0258:                            Perl5Compiler.SINGLELINE_MASK);
0259:                } catch (MalformedPatternException e) {
0260:                    // This should only happen during debugging.
0261:                    //e.printStackTrace();
0262:                    throw new RuntimeException(e.getMessage());
0263:                }
0264:            }
0265:
0266:            /**
0267:             * Parses a match expression and returns a compiled pattern.
0268:             * First checks the expression cache and if the pattern is not found,
0269:             * then parses the expression and fetches a compiled pattern from the
0270:             * pattern cache.  Otherwise, just uses the pattern found in the
0271:             * expression cache.  __matchPattern is used to parse the expression.
0272:             * <p>
0273:             * @param pattern  The Perl5 match expression to parse.
0274:             * @exception MalformedPerl5PatternException If there is an error parsing
0275:             *            the expression.
0276:             */
0277:            private Pattern __parseMatchExpression(String pattern)
0278:                    throws MalformedPerl5PatternException {
0279:                int index, compileOptions;
0280:                String options, regex;
0281:                MatchResult result;
0282:                Object obj;
0283:                Pattern ret;
0284:
0285:                obj = __expressionCache.getElement(pattern);
0286:
0287:                // Must catch ClassCastException because someone might incorrectly 
0288:                // pass an s/// expression.  try block is cheaper than checking
0289:                // instanceof
0290:                try {
0291:                    if (obj != null)
0292:                        return (Pattern) obj;
0293:                } catch (ClassCastException e) {
0294:                    // Fall through and parse expression
0295:                }
0296:
0297:                if (!__matcher.matches(pattern, __matchPattern))
0298:                    throw new MalformedPerl5PatternException(
0299:                            "Invalid expression: " + pattern);
0300:
0301:                result = __matcher.getMatch();
0302:
0303:                regex = result.group(2);
0304:                compileOptions = Perl5Compiler.DEFAULT_MASK;
0305:
0306:                options = result.group(3);
0307:
0308:                if (options != null) {
0309:                    index = options.length();
0310:
0311:                    while (index-- > 0) {
0312:                        switch (options.charAt(index)) {
0313:                        case 'i':
0314:                            compileOptions |= Perl5Compiler.CASE_INSENSITIVE_MASK;
0315:                            break;
0316:                        case 'm':
0317:                            compileOptions |= Perl5Compiler.MULTILINE_MASK;
0318:                            break;
0319:                        case 's':
0320:                            compileOptions |= Perl5Compiler.SINGLELINE_MASK;
0321:                            break;
0322:                        case 'x':
0323:                            compileOptions |= Perl5Compiler.EXTENDED_MASK;
0324:                            break;
0325:                        default:
0326:                            throw new MalformedPerl5PatternException(
0327:                                    "Invalid options: " + options);
0328:                        }
0329:                    }
0330:                }
0331:
0332:                ret = __patternCache.getPattern(regex, compileOptions);
0333:                __expressionCache.addElement(pattern, ret);
0334:
0335:                return ret;
0336:            }
0337:
0338:            /**
0339:             * Searches for the first pattern match somewhere in a character array
0340:             * taking a pattern specified in Perl5 native format:
0341:             * <blockquote><pre>
0342:             * [m]/pattern/[i][m][s][x]
0343:             * </pre></blockquote>
0344:             * The <code>m</code> prefix is optional and the meaning of the optional
0345:             * trailing options are:
0346:             * <dl compact> 
0347:             * <dt> i <dd> case insensitive match
0348:             * <dt> m <dd> treat the input as consisting of multiple lines
0349:             * <dt> s <dd> treat the input as consisting of a single line
0350:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0351:             *             and comments
0352:             * </dl>
0353:             * As with Perl, any non-alphanumeric character can be used in lieu of
0354:             * the slashes.
0355:             * <p>
0356:             * If the input contains the pattern, the org.apache.oro.text.regex.MatchResult
0357:             * can be obtained by calling {@link #getMatch()}.
0358:             * However, Perl5Util implements the MatchResult interface as a wrapper
0359:             * around the last MatchResult found, so you can call its methods to
0360:             * access match information.
0361:             * <p>
0362:             * @param pattern  The pattern to search for.
0363:             * @param input    The char[] input to search.
0364:             * @return True if the input contains the pattern, false otherwise.
0365:             * @exception MalformedPerl5PatternException  If there is an error in
0366:             *            the pattern.  You are not forced to catch this exception
0367:             *            because it is derived from RuntimeException.
0368:             */
0369:            public synchronized boolean match(String pattern, char[] input)
0370:                    throws MalformedPerl5PatternException {
0371:                boolean result;
0372:                __parseMatchExpression(pattern);
0373:
0374:                result = __matcher.contains(input,
0375:                        __parseMatchExpression(pattern));
0376:
0377:                if (result) {
0378:                    __lastMatch = __matcher.getMatch();
0379:                    __originalInput = input;
0380:                    __inputBeginOffset = 0;
0381:                    __inputEndOffset = input.length;
0382:                }
0383:
0384:                return result;
0385:            }
0386:
0387:            /**
0388:             * Searches for the first pattern match in a String taking
0389:             * a pattern specified in Perl5 native format:
0390:             * <blockquote><pre>
0391:             * [m]/pattern/[i][m][s][x]
0392:             * </pre></blockquote>
0393:             * The <code>m</code> prefix is optional and the meaning of the optional
0394:             * trailing options are:
0395:             * <dl compact> 
0396:             * <dt> i <dd> case insensitive match
0397:             * <dt> m <dd> treat the input as consisting of multiple lines
0398:             * <dt> s <dd> treat the input as consisting of a single line
0399:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0400:             *             and comments
0401:             * </dl>
0402:             * As with Perl, any non-alphanumeric character can be used in lieu of
0403:             * the slashes.
0404:             * <p>
0405:             * If the input contains the pattern, the
0406:             * {@link org.apache.oro.text.regex.MatchResult MatchResult}
0407:             * can be obtained by calling {@link #getMatch()}.
0408:             * However, Perl5Util implements the MatchResult interface as a wrapper
0409:             * around the last MatchResult found, so you can call its methods to
0410:             * access match information.
0411:             * <p>
0412:             * @param pattern  The pattern to search for.
0413:             * @param input    The String input to search.
0414:             * @return True if the input contains the pattern, false otherwise.
0415:             * @exception MalformedPerl5PatternException  If there is an error in
0416:             *            the pattern.  You are not forced to catch this exception
0417:             *            because it is derived from RuntimeException.
0418:             */
0419:            public synchronized boolean match(String pattern, String input)
0420:                    throws MalformedPerl5PatternException {
0421:                return match(pattern, input.toCharArray());
0422:            }
0423:
0424:            /**
0425:             * Searches for the next pattern match somewhere in a
0426:             * org.apache.oro.text.regex.PatternMatcherInput instance, taking
0427:             * a pattern specified in Perl5 native format:
0428:             * <blockquote><pre>
0429:             * [m]/pattern/[i][m][s][x]
0430:             * </pre></blockquote>
0431:             * The <code>m</code> prefix is optional and the meaning of the optional
0432:             * trailing options are:
0433:             * <dl compact> 
0434:             * <dt> i <dd> case insensitive match
0435:             * <dt> m <dd> treat the input as consisting of multiple lines
0436:             * <dt> s <dd> treat the input as consisting of a single line
0437:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0438:             *             and comments
0439:             * </dl>
0440:             * As with Perl, any non-alphanumeric character can be used in lieu of
0441:             * the slashes.
0442:             * <p>
0443:             * If the input contains the pattern, the
0444:             * {@link org.apache.oro.text.regex.MatchResult MatchResult}
0445:             * can be obtained by calling {@link #getMatch()}.
0446:             * However, Perl5Util implements the MatchResult interface as a wrapper
0447:             * around the last MatchResult found, so you can call its methods to
0448:             * access match information.
0449:             * After the call to this method, the PatternMatcherInput current offset
0450:             * is advanced to the end of the match, so you can use it to repeatedly
0451:             * search for expressions in the entire input using a while loop as
0452:             * explained in the {@link org.apache.oro.text.regex.PatternMatcherInput
0453:             * PatternMatcherInput} documentation.
0454:             * <p>
0455:             * @param pattern  The pattern to search for.
0456:             * @param input    The PatternMatcherInput to search.
0457:             * @return True if the input contains the pattern, false otherwise.
0458:             * @exception MalformedPerl5PatternException  If there is an error in
0459:             *            the pattern.  You are not forced to catch this exception
0460:             *            because it is derived from RuntimeException.
0461:             */
0462:            public synchronized boolean match(String pattern,
0463:                    PatternMatcherInput input)
0464:                    throws MalformedPerl5PatternException {
0465:                boolean result;
0466:
0467:                result = __matcher.contains(input,
0468:                        __parseMatchExpression(pattern));
0469:
0470:                if (result) {
0471:                    __lastMatch = __matcher.getMatch();
0472:                    __originalInput = input.getInput();
0473:                    __inputBeginOffset = input.getBeginOffset();
0474:                    __inputEndOffset = input.getEndOffset();
0475:                }
0476:
0477:                return result;
0478:            }
0479:
0480:            /**
0481:             * Returns the last match found by a call to a match(), substitute(), or
0482:             * split() method.  This method is only intended for use to retrieve a match
0483:             * found by the last match found by a match() method.  This method should
0484:             * be used when you want to save MatchResult instances.  Otherwise, for
0485:             * simply accessing match information, it is more convenient to use the
0486:             * Perl5Util methods implementing the MatchResult interface.
0487:             * <p>
0488:             * @return The org.apache.oro.text.regex.MatchResult instance containing the
0489:             *         last match found.
0490:             */
0491:            public synchronized MatchResult getMatch() {
0492:                return __lastMatch;
0493:            }
0494:
0495:            /**
0496:             * Substitutes a pattern in a given input with a replacement string.
0497:             * The substitution expression is specified in Perl5 native format:
0498:             * <blockquote><pre>
0499:             * s/pattern/replacement/[g][i][m][o][s][x]
0500:             * </pre></blockquote>
0501:             * The <code>s</code> prefix is mandatory and the meaning of the optional
0502:             * trailing options are:
0503:             * <dl compact> 
0504:             * <dt> g <dd> Substitute all occurrences of pattern with replacement.
0505:             *             The default is to replace only the first occurrence.
0506:             * <dt> i <dd> perform a case insensitive match
0507:             * <dt> m <dd> treat the input as consisting of multiple lines
0508:             * <dt> o <dd> If variable interopolation is used, only evaluate the
0509:             *             interpolation once (the first time).  This is equivalent
0510:             *             to using a numInterpolations argument of 1 in
0511:             * {@link org.apache.oro.text.regex.Util#substitute Util.substitute()}.
0512:             *             The default is to compute each interpolation independently.
0513:             *             See
0514:             * {@link org.apache.oro.text.regex.Util#substitute Util.substitute()}
0515:             * and {@link org.apache.oro.text.regex.Perl5Substitution Perl5Substitution}
0516:             *             for more details on variable interpolation in
0517:             *             substitutions.
0518:             * <dt> s <dd> treat the input as consisting of a single line
0519:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0520:             *             and comments
0521:             * </dl>
0522:             * As with Perl, any non-alphanumeric character can be used in lieu of
0523:             * the slashes.  This is helpful to avoid backslashing.  For example,
0524:             * using slashes you would have to do:
0525:             * <blockquote><pre>
0526:             * numSubs = util.substitute(result, "s/foo\\/bar/goo\\/\\/baz/", input);
0527:             * </pre></blockquote>
0528:             * when you could more easily write:
0529:             * <blockquote><pre>
0530:             * numSubs = util.substitute(result, "s#foo/bar#goo//baz#", input);
0531:             * </pre></blockquote>
0532:             * where the hashmarks are used instead of slashes.
0533:             * <p>
0534:             * There is a special case of backslashing that you need to pay attention
0535:             * to.  As demonstrated above, to denote a delimiter in the substituted
0536:             * string it must be backslashed.  However, this can be a problem
0537:             * when you want to denote a backslash at the end of the substituted
0538:             * string.  As of PerlTools 1.3, a new means of handling this
0539:             * situation has been implemented.
0540:             * In previous versions, the behavior was that
0541:             * <blockquote>
0542:             * "... a double backslash (quadrupled in the Java String) always
0543:             * represents two backslashes unless the second backslash is followed
0544:             * by the delimiter, in which case it represents a single backslash."
0545:             * </blockquote>
0546:             * <p>
0547:             * The new behavior is that a backslash is always a backslash
0548:             * in the substitution portion of the expression unless it is used to
0549:             * escape a delimiter.  A backslash is considered to escape a delimiter
0550:             * if an even number of contiguous backslashes preceed the backslash
0551:             * and the delimiter following the backslash is not the FINAL delimiter
0552:             * in the expression.  Therefore, backslashes preceding final delimiters
0553:             * are never considered to escape the delimiter.  The following, which
0554:             * used to be an invalid expression and require a special-case extra
0555:             * backslash, will now replace all instances of / with \:
0556:             * <blockquote><pre>
0557:             * numSubs = util.substitute(result, "s#/#\\#g", input);
0558:             * </pre></blockquote>
0559:             * <p>
0560:             * @param result     The StringBuffer in which to store the result of the
0561:             *                   substitutions. The buffer is only appended to.
0562:             * @param expression The Perl5 substitution regular expression.
0563:             * @param input      The input on which to perform substitutions.
0564:             * @return The number of substitutions made.
0565:             * @exception MalformedPerl5PatternException  If there is an error in
0566:             *            the expression.  You are not forced to catch this exception
0567:             *            because it is derived from RuntimeException.
0568:             * @since 2.0.6
0569:             */
0570:            // Expression parsing will have to be moved into a separate method if
0571:            // there are going to be variations of this method.
0572:            public synchronized int substitute(StringBuffer result,
0573:                    String expression, String input)
0574:                    throws MalformedPerl5PatternException {
0575:                boolean backslash, finalDelimiter;
0576:                int index, compileOptions, numSubstitutions, numInterpolations;
0577:                int firstOffset, secondOffset, thirdOffset, subCount;
0578:                StringBuffer replacement;
0579:                Pattern compiledPattern;
0580:                char exp[], delimiter;
0581:                ParsedSubstitutionEntry entry;
0582:                Perl5Substitution substitution;
0583:                Object obj;
0584:
0585:                obj = __expressionCache.getElement(expression);
0586:
0587:                __nullTest: if (obj != null) {
0588:                    // Must catch ClassCastException because someone might incorrectly 
0589:                    // pass an m// expression.  try block is cheaper than checking
0590:                    // instanceof.  We want to go ahead with parsing just in case so
0591:                    // we break.
0592:                    try {
0593:                        entry = (ParsedSubstitutionEntry) obj;
0594:                    } catch (ClassCastException e) {
0595:                        break __nullTest;
0596:                    }
0597:
0598:                    subCount = Util.substitute(result, __matcher,
0599:                            entry._pattern, entry._substitution, input,
0600:                            entry._numSubstitutions);
0601:
0602:                    __lastMatch = __matcher.getMatch();
0603:
0604:                    return subCount;
0605:                }
0606:
0607:                exp = expression.toCharArray();
0608:
0609:                // Make sure basic conditions for a valid substitution expression hold.
0610:                if (exp.length < 4 || exp[0] != 's'
0611:                        || Character.isLetterOrDigit(exp[1]) || exp[1] == '-')
0612:                    throw new MalformedPerl5PatternException(
0613:                            "Invalid expression: " + expression);
0614:                delimiter = exp[1];
0615:                firstOffset = 2;
0616:                secondOffset = thirdOffset = -1;
0617:                backslash = false;
0618:
0619:                // Parse pattern
0620:                for (index = firstOffset; index < exp.length; index++) {
0621:                    if (exp[index] == '\\')
0622:                        backslash = !backslash;
0623:                    else if (exp[index] == delimiter && !backslash) {
0624:                        secondOffset = index;
0625:                        break;
0626:                    } else if (backslash)
0627:                        backslash = !backslash;
0628:                }
0629:
0630:                if (secondOffset == -1 || secondOffset == exp.length - 1)
0631:                    throw new MalformedPerl5PatternException(
0632:                            "Invalid expression: " + expression);
0633:
0634:                // Parse replacement string
0635:
0636:                backslash = false;
0637:                finalDelimiter = true;
0638:                replacement = new StringBuffer(exp.length - secondOffset);
0639:                for (index = secondOffset + 1; index < exp.length; index++) {
0640:                    if (exp[index] == '\\') {
0641:                        backslash = !backslash;
0642:
0643:                        // 05/05/99 dfs
0644:                        // We unbackslash backslashed delimiters in the replacement string
0645:                        // only if we're on an odd backslash and there is another occurrence
0646:                        // of a delimiter later in the string.
0647:                        if (backslash
0648:                                && index + 1 < exp.length
0649:                                && exp[index + 1] == delimiter
0650:                                && expression.lastIndexOf(delimiter,
0651:                                        exp.length - 1) != (index + 1)) {
0652:                            finalDelimiter = false;
0653:                            continue;
0654:                        }
0655:                    } else if (exp[index] == delimiter && finalDelimiter) {
0656:                        thirdOffset = index;
0657:                        break;
0658:                    } else {
0659:                        backslash = false;
0660:                        finalDelimiter = true;
0661:                    }
0662:
0663:                    replacement.append(exp[index]);
0664:                }
0665:
0666:                if (thirdOffset == -1)
0667:                    throw new MalformedPerl5PatternException(
0668:                            "Invalid expression: " + expression);
0669:
0670:                compileOptions = Perl5Compiler.DEFAULT_MASK;
0671:                numSubstitutions = 1;
0672:
0673:                // Single quotes cause no interpolations to be performed in replacement
0674:                if (delimiter != '\'')
0675:                    numInterpolations = Perl5Substitution.INTERPOLATE_ALL;
0676:                else
0677:                    numInterpolations = Perl5Substitution.INTERPOLATE_NONE;
0678:
0679:                // Parse options
0680:                for (index = thirdOffset + 1; index < exp.length; index++) {
0681:                    switch (exp[index]) {
0682:                    case 'i':
0683:                        compileOptions |= Perl5Compiler.CASE_INSENSITIVE_MASK;
0684:                        break;
0685:                    case 'm':
0686:                        compileOptions |= Perl5Compiler.MULTILINE_MASK;
0687:                        break;
0688:                    case 's':
0689:                        compileOptions |= Perl5Compiler.SINGLELINE_MASK;
0690:                        break;
0691:                    case 'x':
0692:                        compileOptions |= Perl5Compiler.EXTENDED_MASK;
0693:                        break;
0694:                    case 'g':
0695:                        numSubstitutions = Util.SUBSTITUTE_ALL;
0696:                        break;
0697:                    case 'o':
0698:                        numInterpolations = 1;
0699:                        break;
0700:                    default:
0701:                        throw new MalformedPerl5PatternException(
0702:                                "Invalid option: " + exp[index]);
0703:                    }
0704:                }
0705:
0706:                compiledPattern = __patternCache.getPattern(new String(exp,
0707:                        firstOffset, secondOffset - firstOffset),
0708:                        compileOptions);
0709:                substitution = new Perl5Substitution(replacement.toString(),
0710:                        numInterpolations);
0711:                entry = new ParsedSubstitutionEntry(compiledPattern,
0712:                        substitution, numSubstitutions);
0713:                __expressionCache.addElement(expression, entry);
0714:
0715:                subCount = Util.substitute(result, __matcher, compiledPattern,
0716:                        substitution, input, numSubstitutions);
0717:
0718:                __lastMatch = __matcher.getMatch();
0719:
0720:                return subCount;
0721:            }
0722:
0723:            /**
0724:             * Substitutes a pattern in a given input with a replacement string.
0725:             * The substitution expression is specified in Perl5 native format.
0726:             * <dl compact>
0727:             *   <dt>Calling this method is the same as:</dt>
0728:             *   <dd>
0729:             *     <blockquote><pre>
0730:             *      String result;
0731:             *      StringBuffer buffer = new StringBuffer();
0732:             *      perl.substitute(buffer, expression, input);
0733:             *      result = buffer.toString();
0734:             *     </pre></blockquote>
0735:             *   </dd>
0736:             * </dl>
0737:             * @param expression The Perl5 substitution regular expression.
0738:             * @param input      The input on which to perform substitutions.
0739:             * @return  The input as a String after substitutions have been performed.
0740:             * @exception MalformedPerl5PatternException  If there is an error in
0741:             *            the expression.  You are not forced to catch this exception
0742:             *            because it is derived from RuntimeException.
0743:             * @since 1.0
0744:             * @see #substitute
0745:             */
0746:            public synchronized String substitute(String expression,
0747:                    String input) throws MalformedPerl5PatternException {
0748:                StringBuffer result = new StringBuffer();
0749:                substitute(result, expression, input);
0750:                return result.toString();
0751:            }
0752:
0753:            /**
0754:             * Splits a String into strings that are appended to a List, but no more
0755:             * than a specified limit.  The String is split using a regular expression
0756:             * as the delimiter.  The regular expression is a pattern specified
0757:             * in Perl5 native format:
0758:             * <blockquote><pre>
0759:             * [m]/pattern/[i][m][s][x]
0760:             * </pre></blockquote>
0761:             * The <code>m</code> prefix is optional and the meaning of the optional
0762:             * trailing options are:
0763:             * <dl compact> 
0764:             * <dt> i <dd> case insensitive match
0765:             * <dt> m <dd> treat the input as consisting of multiple lines
0766:             * <dt> s <dd> treat the input as consisting of a single line
0767:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0768:             *             and comments
0769:             * </dl>
0770:             * As with Perl, any non-alphanumeric character can be used in lieu of
0771:             * the slashes.
0772:             * <p>
0773:             * The limit parameter causes the string to be split on at most the first
0774:             * <b>limit - 1</b> number of pattern occurences.
0775:             * <p>
0776:             * Of special note is that this split method performs EXACTLY the same
0777:             * as the Perl split() function.  In other words, if the split pattern
0778:             * contains parentheses, additional Vector elements are created from
0779:             * each of the matching subgroups in the pattern.  Using an example
0780:             * similar to the one from the Camel book:
0781:             * <blockquote><pre>
0782:             * split(list, "/([,-])/", "8-12,15,18")
0783:             * </pre></blockquote>
0784:             * produces the Vector containing:
0785:             * <blockquote><pre>
0786:             * { "8", "-", "12", ",", "15", ",", "18" }
0787:             * </pre></blockquote>
0788:             * Furthermore, the following Perl behavior is observed: "leading empty
0789:             * fields are preserved, and empty trailing one are deleted."  This
0790:             * has the effect that a split on a zero length string returns an empty
0791:             * list.
0792:             * The {@link org.apache.oro.text.regex.Util#split Util.split()} method
0793:             * does NOT implement these behaviors because it is intended to
0794:             * be a general self-consistent and predictable split function usable
0795:             * with Pattern instances other than Perl5Pattern.
0796:             * <p>
0797:             * @param results 
0798:             *    A <code> Collection </code> to which the substrings of the input
0799:             *    that occur between the regular expression delimiter occurences
0800:             *    are appended. The input will not be split into any more substrings
0801:             *    than the specified 
0802:             *    limit. A way of thinking of this is that only the first
0803:             *    <b>limit - 1</b>
0804:             *    matches of the delimiting regular expression will be used to split the
0805:             *    input.  The Collection must support the
0806:             *    <code>addAll(Collection)</code> operation.
0807:             * @param pattern The regular expression to use as a split delimiter.
0808:             * @param input The String to split.
0809:             * @param limit The limit on the size of the returned <code>Vector</code>.
0810:             *   Values <= 0 produce the same behavior as the SPLIT_ALL constant which
0811:             *   causes the limit to be ignored and splits to be performed on all
0812:             *   occurrences of the pattern.  You should use the SPLIT_ALL constant
0813:             *   to achieve this behavior instead of relying on the default behavior
0814:             *   associated with non-positive limit values.
0815:             * @exception MalformedPerl5PatternException  If there is an error in
0816:             *            the expression.  You are not forced to catch this exception
0817:             *            because it is derived from RuntimeException.
0818:             */
0819:            public synchronized void split(Collection results, String pattern,
0820:                    String input, int limit)
0821:                    throws MalformedPerl5PatternException {
0822:                int beginOffset, groups, index;
0823:                String group;
0824:                MatchResult currentResult = null;
0825:                PatternMatcherInput pinput;
0826:                Pattern compiledPattern;
0827:
0828:                compiledPattern = __parseMatchExpression(pattern);
0829:
0830:                pinput = new PatternMatcherInput(input);
0831:                beginOffset = 0;
0832:
0833:                while (--limit != 0
0834:                        && __matcher.contains(pinput, compiledPattern)) {
0835:                    currentResult = __matcher.getMatch();
0836:
0837:                    __splitList.add(input.substring(beginOffset, currentResult
0838:                            .beginOffset(0)));
0839:
0840:                    if ((groups = currentResult.groups()) > 1) {
0841:                        for (index = 1; index < groups; ++index) {
0842:                            group = currentResult.group(index);
0843:                            if (group != null && group.length() > 0)
0844:                                __splitList.add(group);
0845:                        }
0846:                    }
0847:
0848:                    beginOffset = currentResult.endOffset(0);
0849:                }
0850:
0851:                __splitList.add(input.substring(beginOffset, input.length()));
0852:
0853:                // Remove all trailing empty fields.
0854:                for (int i = __splitList.size() - 1; i >= 0; --i) {
0855:                    String str;
0856:
0857:                    str = (String) __splitList.get(i);
0858:                    if (str.length() == 0)
0859:                        __splitList.remove(i);
0860:                    else
0861:                        break;
0862:                }
0863:
0864:                results.addAll(__splitList);
0865:                __splitList.clear();
0866:
0867:                // Just for the sake of completeness
0868:                __lastMatch = currentResult;
0869:            }
0870:
0871:            /**
0872:             * This method is identical to calling:
0873:             * <blockquote><pre>
0874:             * split(results, pattern, input, SPLIT_ALL);
0875:             * </pre></blockquote>
0876:             */
0877:            public synchronized void split(Collection results, String pattern,
0878:                    String input) throws MalformedPerl5PatternException {
0879:                split(results, pattern, input, SPLIT_ALL);
0880:            }
0881:
0882:            /**
0883:             * Splits input in the default Perl manner, splitting on all whitespace.
0884:             * This method is identical to calling:
0885:             * <blockquote><pre>
0886:             * split(results, "/\\s+/", input);
0887:             * </pre></blockquote>
0888:             */
0889:            public synchronized void split(Collection results, String input)
0890:                    throws MalformedPerl5PatternException {
0891:                split(results, "/\\s+/", input);
0892:            }
0893:
0894:            /**
0895:             * Splits a String into strings contained in a Vector of size no greater
0896:             * than a specified limit.  The String is split using a regular expression
0897:             * as the delimiter.  The regular expression is a pattern specified
0898:             * in Perl5 native format:
0899:             * <blockquote><pre>
0900:             * [m]/pattern/[i][m][s][x]
0901:             * </pre></blockquote>
0902:             * The <code>m</code> prefix is optional and the meaning of the optional
0903:             * trailing options are:
0904:             * <dl compact> 
0905:             * <dt> i <dd> case insensitive match
0906:             * <dt> m <dd> treat the input as consisting of multiple lines
0907:             * <dt> s <dd> treat the input as consisting of a single line
0908:             * <dt> x <dd> enable extended expression syntax incorporating whitespace
0909:             *             and comments
0910:             * </dl>
0911:             * As with Perl, any non-alphanumeric character can be used in lieu of
0912:             * the slashes.
0913:             * <p>
0914:             * The limit parameter causes the string to be split on at most the first
0915:             * <b>limit - 1</b> number of pattern occurences.
0916:             * <p>
0917:             * Of special note is that this split method performs EXACTLY the same
0918:             * as the Perl split() function.  In other words, if the split pattern
0919:             * contains parentheses, additional Vector elements are created from
0920:             * each of the matching subgroups in the pattern.  Using an example
0921:             * similar to the one from the Camel book:
0922:             * <blockquote><pre>
0923:             * split("/([,-])/", "8-12,15,18")
0924:             * </pre></blockquote>
0925:             * produces the Vector containing:
0926:             * <blockquote><pre>
0927:             * { "8", "-", "12", ",", "15", ",", "18" }
0928:             * </pre></blockquote>
0929:             * The {@link org.apache.oro.text.regex.Util#split Util.split()} method
0930:             * does NOT implement this particular behavior because it is intended to
0931:             * be usable with Pattern instances other than Perl5Pattern.
0932:             * <p>
0933:             * @deprecated Use
0934:             * {@link #split(Collection results, String pattern, String input, int limit)}
0935:             *  instead.
0936:             * @param pattern The regular expression to use as a split delimiter.
0937:             * @param input The String to split.
0938:             * @param limit The limit on the size of the returned <code>Vector</code>.
0939:             *   Values <= 0 produce the same behavior as the SPLIT_ALL constant which
0940:             *   causes the limit to be ignored and splits to be performed on all
0941:             *   occurrences of the pattern.  You should use the SPLIT_ALL constant
0942:             *   to achieve this behavior instead of relying on the default behavior
0943:             *   associated with non-positive limit values.
0944:             * @return A <code> Vector </code> containing the substrings of the input
0945:             *    that occur between the regular expression delimiter occurences. The
0946:             *    input will not be split into any more substrings than the specified 
0947:             *    limit. A way of thinking of this is that only the first
0948:             *    <b>limit - 1</b>
0949:             *    matches of the delimiting regular expression will be used to split the
0950:             *    input. 
0951:             * @exception MalformedPerl5PatternException  If there is an error in
0952:             *            the expression.  You are not forced to catch this exception
0953:             *            because it is derived from RuntimeException.
0954:             */
0955:            public synchronized Vector split(String pattern, String input,
0956:                    int limit) throws MalformedPerl5PatternException {
0957:                Vector results = new Vector(20);
0958:                split(results, pattern, input, limit);
0959:                return results;
0960:            }
0961:
0962:            /**
0963:             * This method is identical to calling:
0964:             * <blockquote><pre>
0965:             * split(pattern, input, SPLIT_ALL);
0966:             * </pre></blockquote>
0967:             * @deprecated Use
0968:             * {@link #split(Collection results, String pattern, String input)} instead.
0969:             */
0970:            public synchronized Vector split(String pattern, String input)
0971:                    throws MalformedPerl5PatternException {
0972:                return split(pattern, input, SPLIT_ALL);
0973:            }
0974:
0975:            /**
0976:             * Splits input in the default Perl manner, splitting on all whitespace.
0977:             * This method is identical to calling:
0978:             * <blockquote><pre>
0979:             * split("/\\s+/", input);
0980:             * </pre></blockquote>
0981:             * @deprecated Use
0982:             * {@link #split(Collection results, String input)} instead.
0983:             */
0984:            public synchronized Vector split(String input)
0985:                    throws MalformedPerl5PatternException {
0986:                return split("/\\s+/", input);
0987:            }
0988:
0989:            //
0990:            // MatchResult interface methods.
0991:            //
0992:
0993:            /**
0994:             * Returns the length of the last match found.
0995:             * <p>
0996:             * @return The length of the last match found.
0997:             */
0998:            public synchronized int length() {
0999:                return __lastMatch.length();
1000:            }
1001:
1002:            /**
1003:             * @return The number of groups contained in the last match found.
1004:             *         This number includes the 0th group.  In other words, the
1005:             *         result refers to the number of parenthesized subgroups plus
1006:             *         the entire match itself.          
1007:             */
1008:            public synchronized int groups() {
1009:                return __lastMatch.groups();
1010:            }
1011:
1012:            /**
1013:             * Returns the contents of the parenthesized subgroups of the last match
1014:             * found according to the behavior dictated by the MatchResult interface.
1015:             * <p>
1016:             * @param group The pattern subgroup to return.
1017:             * @return A string containing the indicated pattern subgroup.  Group
1018:             *         0 always refers to the entire match.  If a group was never
1019:             *         matched, it returns null.  This is not to be confused with
1020:             *         a group matching the null string, which will return a String
1021:             *         of length 0.
1022:             */
1023:            public synchronized String group(int group) {
1024:                return __lastMatch.group(group);
1025:            }
1026:
1027:            /**
1028:             * Returns the begin offset of the subgroup of the last match found 
1029:             * relative the beginning of the match.
1030:             * <p>
1031:             * @param group The pattern subgroup.
1032:             * @return The offset into group 0 of the first token in the indicated
1033:             *         pattern subgroup.  If a group was never matched or does
1034:             *         not exist, returns -1.  Be aware that a group that matches
1035:             *         the null string at the end of a match will have an offset
1036:             *         equal to the length of the string, so you shouldn't blindly
1037:             *         use the offset to index an array or String.
1038:             */
1039:            public synchronized int begin(int group) {
1040:                return __lastMatch.begin(group);
1041:            }
1042:
1043:            /**
1044:             * Returns the end offset of the subgroup of the last match found 
1045:             * relative the beginning of the match.
1046:             * <p>
1047:             * @param group The pattern subgroup.
1048:             * @return Returns one plus the offset into group 0 of the last token in
1049:             *         the indicated pattern subgroup.  If a group was never matched
1050:             *         or does not exist, returns -1.  A group matching the null
1051:             *         string will return its start offset.
1052:             */
1053:            public synchronized int end(int group) {
1054:                return __lastMatch.end(group);
1055:            }
1056:
1057:            /**
1058:             * Returns an offset marking the beginning of the last pattern match
1059:             * found relative to the beginning of the input from which the match
1060:             * was extracted.
1061:             * <p>
1062:             * @param group The pattern subgroup.
1063:             * @return The offset of the first token in the indicated
1064:             *         pattern subgroup.  If a group was never matched or does
1065:             *         not exist, returns -1.          
1066:             */
1067:            public synchronized int beginOffset(int group) {
1068:                return __lastMatch.beginOffset(group);
1069:            }
1070:
1071:            /**
1072:             * Returns an offset marking the end of the last pattern match found
1073:             * relative to the beginning of the input from which the match was
1074:             * extracted.
1075:             * <p>
1076:             * @param group The pattern subgroup.
1077:             * @return Returns one plus the offset of the last token in
1078:             *         the indicated pattern subgroup.  If a group was never matched
1079:             *         or does not exist, returns -1.  A group matching the null
1080:             *         string will return its start offset.
1081:             */
1082:            public synchronized int endOffset(int group) {
1083:                return __lastMatch.endOffset(group);
1084:            }
1085:
1086:            /**
1087:             * Returns the same as group(0).
1088:             * <p>
1089:             * @return A string containing the entire match.
1090:             */
1091:            public synchronized String toString() {
1092:                if (__lastMatch == null)
1093:                    return null;
1094:                return __lastMatch.toString();
1095:            }
1096:
1097:            /**
1098:             * Returns the part of the input preceding the last match found.
1099:             * <p>
1100:             * @return The part of the input following the last match found.
1101:             */
1102:            public synchronized String preMatch() {
1103:                int begin;
1104:
1105:                if (__originalInput == null)
1106:                    return __nullString;
1107:
1108:                begin = __lastMatch.beginOffset(0);
1109:
1110:                if (begin <= 0)
1111:                    return __nullString;
1112:
1113:                if (__originalInput instanceof  char[]) {
1114:                    char[] input;
1115:
1116:                    input = (char[]) __originalInput;
1117:
1118:                    // Just in case we make sure begin offset is in bounds.  It should
1119:                    // be but we're paranoid.
1120:                    if (begin > input.length)
1121:                        begin = input.length;
1122:
1123:                    return new String(input, __inputBeginOffset, begin);
1124:                } else if (__originalInput instanceof  String) {
1125:                    String input;
1126:
1127:                    input = (String) __originalInput;
1128:
1129:                    // Just in case we make sure begin offset is in bounds.  It should
1130:                    // be but we're paranoid.
1131:                    if (begin > input.length())
1132:                        begin = input.length();
1133:
1134:                    return input.substring(__inputBeginOffset, begin);
1135:                }
1136:
1137:                return __nullString;
1138:            }
1139:
1140:            /**
1141:             * Returns the part of the input following the last match found.
1142:             * <p>
1143:             * @return The part of the input following the last match found.
1144:             */
1145:            public synchronized String postMatch() {
1146:                int end;
1147:
1148:                if (__originalInput == null)
1149:                    return __nullString;
1150:
1151:                end = __lastMatch.endOffset(0);
1152:
1153:                if (end < 0)
1154:                    return __nullString;
1155:
1156:                if (__originalInput instanceof  char[]) {
1157:                    char[] input;
1158:
1159:                    input = (char[]) __originalInput;
1160:                    // Just in case we make sure begin offset is in bounds.  It should
1161:                    // be but we're paranoid.
1162:                    if (end >= input.length)
1163:                        return __nullString;
1164:
1165:                    return new String(input, end, __inputEndOffset - end);
1166:                } else if (__originalInput instanceof  String) {
1167:                    String input;
1168:
1169:                    input = (String) __originalInput;
1170:
1171:                    // Just in case we make sure begin offset is in bounds.  It should
1172:                    // be but we're paranoid.
1173:                    if (end >= input.length())
1174:                        return __nullString;
1175:
1176:                    return input.substring(end, __inputEndOffset);
1177:                }
1178:
1179:                return __nullString;
1180:            }
1181:
1182:            /**
1183:             * Returns the part of the input preceding the last match found as a
1184:             * char array.  This method eliminates the extra
1185:             * buffer copying caused by preMatch().toCharArray().
1186:             * <p>
1187:             * @return The part of the input preceding the last match found as a char[].
1188:             *         If the result is of zero length, returns null instead of a zero
1189:             *         length array.
1190:             */
1191:            public synchronized char[] preMatchCharArray() {
1192:                int begin;
1193:                char[] result = null;
1194:
1195:                if (__originalInput == null)
1196:                    return null;
1197:
1198:                begin = __lastMatch.beginOffset(0);
1199:
1200:                if (begin <= 0)
1201:                    return null;
1202:
1203:                if (__originalInput instanceof  char[]) {
1204:                    char[] input;
1205:
1206:                    input = (char[]) __originalInput;
1207:
1208:                    // Just in case we make sure begin offset is in bounds.  It should
1209:                    // be but we're paranoid.
1210:                    if (begin >= input.length)
1211:                        begin = input.length;
1212:
1213:                    result = new char[begin - __inputBeginOffset];
1214:                    System.arraycopy(input, __inputBeginOffset, result, 0,
1215:                            result.length);
1216:                } else if (__originalInput instanceof  String) {
1217:                    String input;
1218:
1219:                    input = (String) __originalInput;
1220:
1221:                    // Just in case we make sure begin offset is in bounds.  It should
1222:                    // be but we're paranoid.
1223:                    if (begin >= input.length())
1224:                        begin = input.length();
1225:
1226:                    result = new char[begin - __inputBeginOffset];
1227:                    input.getChars(__inputBeginOffset, begin, result, 0);
1228:                }
1229:
1230:                return result;
1231:            }
1232:
1233:            /**
1234:             * Returns the part of the input following the last match found as a char
1235:             * array.  This method eliminates the extra buffer copying caused by
1236:             * preMatch().toCharArray().
1237:             * <p>
1238:             * @return The part of the input following the last match found as a char[].
1239:             *         If the result is of zero length, returns null instead of a zero
1240:             *         length array.
1241:             */
1242:            public synchronized char[] postMatchCharArray() {
1243:                int end;
1244:                char[] result = null;
1245:
1246:                if (__originalInput == null)
1247:                    return null;
1248:
1249:                end = __lastMatch.endOffset(0);
1250:
1251:                if (end < 0)
1252:                    return null;
1253:
1254:                if (__originalInput instanceof  char[]) {
1255:                    int length;
1256:                    char[] input;
1257:
1258:                    input = (char[]) __originalInput;
1259:                    // Just in case we make sure begin offset is in bounds.  It should
1260:                    // be but we're paranoid.
1261:                    if (end >= input.length)
1262:                        return null;
1263:
1264:                    length = __inputEndOffset - end;
1265:                    result = new char[length];
1266:                    System.arraycopy(input, end, result, 0, length);
1267:                } else if (__originalInput instanceof  String) {
1268:                    String input;
1269:
1270:                    input = (String) __originalInput;
1271:
1272:                    // Just in case we make sure begin offset is in bounds.  It should
1273:                    // be but we're paranoid.
1274:                    if (end >= __inputEndOffset)
1275:                        return null;
1276:
1277:                    result = new char[__inputEndOffset - end];
1278:                    input.getChars(end, __inputEndOffset, result, 0);
1279:                }
1280:
1281:                return result;
1282:            }
1283:
1284:        }
www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.