Java Doc for RE.java in  » Library » jakarta-regexp-1.5 » org » apache » regexp » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Library » jakarta regexp 1.5 » org.apache.regexp 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.apache.regexp.RE

RE
public class RE implements Serializable(Code)
RE is an efficient, lightweight regular expression evaluator/matcher class. Regular expressions are pattern descriptions which enable sophisticated matching of strings. In addition to being able to match a string against a pattern, you can also extract parts of the match. This is especially useful in text parsing! Details on the syntax of regular expression patterns are given below.

To compile a regular expression (RE), you can simply construct an RE matcher object from the string specification of the pattern, like this:

 RE r = new RE("a*b");
 

Once you have done this, you can call either of the RE.match methods to perform matching on a String. For example:

 boolean matched = r.match("aaaab");
 
will cause the boolean matched to be set to true because the pattern "a*b" matches the string "aaaab".

If you were interested in the number of a's which matched the first part of our example expression, you could change the expression to "(a*)b". Then when you compiled the expression and matched it against something like "xaaaab", you would get results like this:

 RE r = new RE("(a*)b");                  // Compile expression
 boolean matched = r.match("xaaaab");     // Match against "xaaaab"
 String wholeExpr = r.getParen(0);        // wholeExpr will be 'aaaab'
 String insideParens = r.getParen(1);     // insideParens will be 'aaaa'
 int startWholeExpr = r.getParenStart(0); // startWholeExpr will be index 1
 int endWholeExpr = r.getParenEnd(0);     // endWholeExpr will be index 6
 int lenWholeExpr = r.getParenLength(0);  // lenWholeExpr will be 5
 int startInside = r.getParenStart(1);    // startInside will be index 1
 int endInside = r.getParenEnd(1);        // endInside will be index 5
 int lenInside = r.getParenLength(1);     // lenInside will be 4
 
You can also refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:
 ([0-9]+)=\1
 
will match any string of the form n=n (like 0=0 or 2=2).

The full regular expression syntax accepted by RE is described here:

 Characters
 unicodeChar   Matches any identical unicode character
 \                    Used to quote a meta-character (like '*')
 \\                   Matches a single '\' character
 \0nnn                Matches a given octal character
 \xhh                 Matches a given 8-bit hexadecimal character
 \\uhhhh              Matches a given 16-bit hexadecimal character
 \t                   Matches an ASCII tab character
 \n                   Matches an ASCII newline character
 \r                   Matches an ASCII return character
 \f                   Matches an ASCII form feed character
 Character Classes
 [abc]                Simple character class
 [a-zA-Z]             Character class with ranges
 [^abc]               Negated character class
 
NOTE: Incomplete ranges will be interpreted as "starts from zero" or "ends with last character".
I.e. [-a] is the same as [\\u0000-a], and [a-] is the same as [a-\\uFFFF], [-] means "all characters".
 Standard POSIX Character Classes
 [:alnum:]            Alphanumeric characters.
 [:alpha:]            Alphabetic characters.
 [:blank:]            Space and tab characters.
 [:cntrl:]            Control characters.
 [:digit:]            Numeric characters.
 [:graph:]            Characters that are printable and are also visible.
 (A space is printable, but not visible, while an
 `a' is both.)
 [:lower:]            Lower-case alphabetic characters.
 [:print:]            Printable characters (characters that are not
 control characters.)
 [:punct:]            Punctuation characters (characters that are not letter,
 digits, control characters, or space characters).
 [:space:]            Space characters (such as space, tab, and formfeed,
 to name a few).
 [:upper:]            Upper-case alphabetic characters.
 [:xdigit:]           Characters that are hexadecimal digits.
 Non-standard POSIX-style Character Classes
 [:javastart:]        Start of a Java identifier
 [:javapart:]         Part of a Java identifier
 Predefined Classes
 .         Matches any character other than newline
 \w        Matches a "word" character (alphanumeric plus "_")
 \W        Matches a non-word character
 \s        Matches a whitespace character
 \S        Matches a non-whitespace character
 \d        Matches a digit character
 \D        Matches a non-digit character
 Boundary Matchers
 ^         Matches only at the beginning of a line
 $         Matches only at the end of a line
 \b        Matches only at a word boundary
 \B        Matches only at a non-word boundary
 Greedy Closures
 A*        Matches A 0 or more times (greedy)
 A+        Matches A 1 or more times (greedy)
 A?        Matches A 1 or 0 times (greedy)
 A{n}      Matches A exactly n times (greedy)
 A{n,}     Matches A at least n times (greedy)
 A{n,m}    Matches A at least n but not more than m times (greedy)
 Reluctant Closures
 A*?       Matches A 0 or more times (reluctant)
 A+?       Matches A 1 or more times (reluctant)
 A??       Matches A 0 or 1 times (reluctant)
 Logical Operators
 AB        Matches A followed by B
 A|B       Matches either A or B
 (A)       Used for subexpression grouping
 (?:A)      Used for subexpression clustering (just like grouping but
 no backrefs)
 Backreferences
 \1    Backreference to 1st parenthesized subexpression
 \2    Backreference to 2nd parenthesized subexpression
 \3    Backreference to 3rd parenthesized subexpression
 \4    Backreference to 4th parenthesized subexpression
 \5    Backreference to 5th parenthesized subexpression
 \6    Backreference to 6th parenthesized subexpression
 \7    Backreference to 7th parenthesized subexpression
 \8    Backreference to 8th parenthesized subexpression
 \9    Backreference to 9th parenthesized subexpression
 

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.

Line terminators
A line terminator is a one- or two-character sequence that marks the end of a line of the input character sequence. The following are recognized as line terminators:

  • A newline (line feed) character ('\n'),
  • A carriage-return character followed immediately by a newline character ("\r\n"),
  • A standalone carriage-return character ('\r'),
  • A next-line character ('\u0085'),
  • A line-separator character ('\u2028'), or
  • A paragraph-separator character ('\u2029).

RE runs programs compiled by the RECompiler class. But the RE matcher class does not include the actual regular expression compiler for reasons of efficiency. In fact, if you want to pre-compile one or more regular expressions, the 'recompile' class can be invoked from the command line to produce compiled output like this:

 // Pre-compiled regular expression "a*b"
 char[] re1Instructions =
 {
 0x007c, 0x0000, 0x001a, 0x007c, 0x0000, 0x000d, 0x0041,
 0x0001, 0x0004, 0x0061, 0x007c, 0x0000, 0x0003, 0x0047,
 0x0000, 0xfff6, 0x007c, 0x0000, 0x0003, 0x004e, 0x0000,
 0x0003, 0x0041, 0x0001, 0x0004, 0x0062, 0x0045, 0x0000,
 0x0000,
 };
 REProgram re1 = new REProgram(re1Instructions);
 
You can then construct a regular expression matcher (RE) object from the pre-compiled expression re1 and thus avoid the overhead of compiling the expression at runtime. If you require more dynamic regular expressions, you can construct a single RECompiler object and re-use it to compile each expression. Similarly, you can change the program run by a given matcher object at any time. However, RE and RECompiler are not threadsafe (for efficiency reasons, and because requiring thread safety in this class is deemed to be a rare requirement), so you will need to construct a separate compiler or matcher object for each thread (unless you do thread synchronization yourself). Once expression compiled into the REProgram object, REProgram can be safely shared across multiple threads and RE objects.


ISSUES:

  • com.weusours.util.re is not currently compatible with all standard POSIX regcomp flags
  • com.weusours.util.re does not support POSIX equivalence classes ([=foo=] syntax) (I18N/locale issue)
  • com.weusours.util.re does not support nested POSIX character classes (definitely should, but not completely trivial)
  • com.weusours.util.re Does not support POSIX character collation concepts ([.foo.] syntax) (I18N/locale issue)
  • Should there be different matching styles (simple, POSIX, Perl etc?)
  • Should RE support character iterators (for backwards RE matching!)?
  • Should RE support reluctant {m,n} closures (does anyone care)?
  • Not *all* possibilities are considered for greediness when backreferences are involved (as POSIX suggests should be the case). The POSIX RE "(ac*)c*d[ac]*\1", when matched against "acdacaa" should yield a match of acdacaa where \1 is "a". This is not the case in this RE package, and actually Perl doesn't go to this extent either! Until someone actually complains about this, I'm not sure it's worth "fixing". If it ever is fixed, test #137 in RETest.txt should be updated.

See Also:   recompile
See Also:   RECompiler
author:
   Jonathan Locke
author:
   Tobias Schäfer
version:
   $Id: RE.java 518156 2007-03-14 14:31:26Z vgritsenko $


Field Summary
final static  charE_ALNUM
    
final static  charE_BOUND
    
final static  charE_DIGIT
    
final static  charE_NALNUM
    
final static  charE_NBOUND
    
final static  charE_NDIGIT
    
final static  charE_NSPACE
    
final static  charE_SPACE
    
final public static  intMATCH_CASEINDEPENDENT
    
final public static  intMATCH_MULTILINE
    
final public static  intMATCH_NORMAL
     Specifies normal, case-sensitive matching behaviour.
final public static  intMATCH_SINGLELINE
     Consider all input a single body of text - newlines are matched by .
final static  intMAX_PAREN
    
final static  charOP_ANY
    
final static  charOP_ANYOF
    
final static  charOP_ATOM
    
final static  charOP_BACKREF
    
final static  charOP_BOL
    
final static  charOP_BRANCH
    
final static  charOP_CLOSE
    
final static  charOP_CLOSE_CLUSTER
    
final static  charOP_CONTINUE
    
final static  charOP_END
    
final static  charOP_EOL
    
final static  charOP_ESCAPE
    
final static  charOP_GOTO
    
final static  charOP_MAYBE
    
final static  charOP_NOTHING
    
final static  charOP_OPEN
    
final static  charOP_OPEN_CLUSTER
    
final static  charOP_PLUS
    
final static  charOP_POSIXCLASS
    
final static  charOP_RELUCTANTMAYBE
    
final static  charOP_RELUCTANTPLUS
    
final static  charOP_RELUCTANTSTAR
    
final static  charOP_STAR
    
final static  charPOSIX_CLASS_ALNUM
    
final static  charPOSIX_CLASS_ALPHA
    
final static  charPOSIX_CLASS_BLANK
    
final static  charPOSIX_CLASS_CNTRL
    
final static  charPOSIX_CLASS_DIGIT
    
final static  charPOSIX_CLASS_GRAPH
    
final static  charPOSIX_CLASS_JPART
    
final static  charPOSIX_CLASS_JSTART
    
final static  charPOSIX_CLASS_LOWER
    
final static  charPOSIX_CLASS_PRINT
    
final static  charPOSIX_CLASS_PUNCT
    
final static  charPOSIX_CLASS_SPACE
    
final static  charPOSIX_CLASS_UPPER
    
final static  charPOSIX_CLASS_XDIGIT
    
final public static  intREPLACE_ALL
     Flag bit that indicates that subst should replace all occurrences of this regular expression.
final public static  intREPLACE_BACKREFERENCES
    
final public static  intREPLACE_FIRSTONLY
     Flag bit that indicates that subst should only replace the first occurrence of this regular expression.
transient  intend0
    
transient  intend1
    
transient  intend2
    
transient  int[]endBackref
    
transient  int[]endn
    
 intmatchFlags
    
final static  intmaxNode
    
 intmaxParen
    
final static  intnodeSize
    
final static  intoffsetNext
    
final static  intoffsetOpcode
    
final static  intoffsetOpdata
    
transient  intparenCount
    
 REProgramprogram
    
transient  CharacterIteratorsearch
    
transient  intstart0
    
transient  intstart1
    
transient  intstart2
    
transient  int[]startBackref
    
transient  int[]startn
    

Constructor Summary
public  RE(String pattern)
     Constructs a regular expression matcher from a String by compiling it using a new instance of RECompiler.
public  RE(String pattern, int matchFlags)
     Constructs a regular expression matcher from a String by compiling it using a new instance of RECompiler.
public  RE(REProgram program, int matchFlags)
     Construct a matcher for a pre-compiled regular expression from program (bytecode) data.
public  RE(REProgram program)
     Construct a matcher for a pre-compiled regular expression from program (bytecode) data.
public  RE()
     Constructs a regular expression matcher with no initial program.

Method Summary
public  intgetMatchFlags()
     Returns the current match behaviour flags.
public  StringgetParen(int which)
     Gets the contents of a parenthesized subexpression after a successful match.
public  intgetParenCount()
     Returns the number of parenthesized subexpressions available after a successful match.
final public  intgetParenEnd(int which)
     Returns the end index of a given paren level.
final public  intgetParenLength(int which)
     Returns the length of a given paren level.
final public  intgetParenStart(int which)
     Returns the start index of a given paren level.
public  REProgramgetProgram()
     Returns the current regular expression program in use by this matcher object.
public  String[]grep(Object[] search)
     Returns an array of Strings, whose toString representation matches a regular expression.
protected  voidinternalError(String s)
     Throws an Error representing an internal error condition probably resulting from a bug in the regular expression compiler (or possibly data corruption).
public  booleanmatch(String search, int i)
     Matches the current regular expression program against a character array, starting at a given index.
public  booleanmatch(CharacterIterator search, int i)
     Matches the current regular expression program against a character array, starting at a given index.
public  booleanmatch(String search)
     Matches the current regular expression program against a String.
protected  booleanmatchAt(int i)
     Match the current regular expression program against the current input string, starting at index i of the input string.
protected  intmatchNodes(int firstNode, int lastNode, int idxStart)
     Try to match a string against a subset of nodes in the program
Parameters:
  firstNode - Node to start at in program
Parameters:
  lastNode - Last valid node (used for matching a subexpression withoutmatching the rest of the program as well).
Parameters:
  idxStart - Starting position in character array Final input array index if match succeeded.
public  voidsetMatchFlags(int matchFlags)
     Sets match behaviour flags which alter the way RE does matching.
final protected  voidsetParenEnd(int which, int i)
    
final protected  voidsetParenStart(int which, int i)
    
public  voidsetProgram(REProgram program)
     Sets the current regular expression program used by this matcher object.
public static  StringsimplePatternToFullRegularExpression(String pattern)
    
public  String[]split(String s)
     Splits a string into an array of strings on regular expression boundaries. This function works the same way as the Perl function of the same name. Given a regular expression of "[ab]+" and a string to split of "xyzzyababbayyzabbbab123", the result would be the array of Strings "[xyzzy, yyz, 123]".

Please note that the first string in the resulting array may be an empty string.

public  Stringsubst(String substituteIn, String substitution)
     Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".
Parameters:
  substituteIn - String to substitute within
Parameters:
  substitution - String to substitute for all matches of this regular expression.
public  Stringsubst(String substituteIn, String substitution, int flags)
     Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".

It is also possible to reference the contents of a parenthesized expression with $0, $1, ...


Field Detail
E_ALNUM
final static char E_ALNUM(Code)



E_BOUND
final static char E_BOUND(Code)



E_DIGIT
final static char E_DIGIT(Code)



E_NALNUM
final static char E_NALNUM(Code)



E_NBOUND
final static char E_NBOUND(Code)



E_NDIGIT
final static char E_NDIGIT(Code)



E_NSPACE
final static char E_NSPACE(Code)



E_SPACE
final static char E_SPACE(Code)



MATCH_CASEINDEPENDENT
final public static int MATCH_CASEINDEPENDENT(Code)
Flag to indicate that matching should be case-independent (folded)



MATCH_MULTILINE
final public static int MATCH_MULTILINE(Code)
Newlines should match as BOL/EOL (^ and $)



MATCH_NORMAL
final public static int MATCH_NORMAL(Code)
Specifies normal, case-sensitive matching behaviour.



MATCH_SINGLELINE
final public static int MATCH_SINGLELINE(Code)
Consider all input a single body of text - newlines are matched by .



MAX_PAREN
final static int MAX_PAREN(Code)



OP_ANY
final static char OP_ANY(Code)



OP_ANYOF
final static char OP_ANYOF(Code)



OP_ATOM
final static char OP_ATOM(Code)



OP_BACKREF
final static char OP_BACKREF(Code)



OP_BOL
final static char OP_BOL(Code)



OP_BRANCH
final static char OP_BRANCH(Code)



OP_CLOSE
final static char OP_CLOSE(Code)



OP_CLOSE_CLUSTER
final static char OP_CLOSE_CLUSTER(Code)



OP_CONTINUE
final static char OP_CONTINUE(Code)



OP_END
final static char OP_END(Code)
The format of a node in a program is: * [ OPCODE ] [ OPDATA ] [ OPNEXT ] [ OPERAND ] * char OPCODE - instruction * char OPDATA - modifying data * char OPNEXT - next node (relative offset) *



OP_EOL
final static char OP_EOL(Code)



OP_ESCAPE
final static char OP_ESCAPE(Code)



OP_GOTO
final static char OP_GOTO(Code)



OP_MAYBE
final static char OP_MAYBE(Code)



OP_NOTHING
final static char OP_NOTHING(Code)



OP_OPEN
final static char OP_OPEN(Code)



OP_OPEN_CLUSTER
final static char OP_OPEN_CLUSTER(Code)



OP_PLUS
final static char OP_PLUS(Code)



OP_POSIXCLASS
final static char OP_POSIXCLASS(Code)



OP_RELUCTANTMAYBE
final static char OP_RELUCTANTMAYBE(Code)



OP_RELUCTANTPLUS
final static char OP_RELUCTANTPLUS(Code)



OP_RELUCTANTSTAR
final static char OP_RELUCTANTSTAR(Code)



OP_STAR
final static char OP_STAR(Code)



POSIX_CLASS_ALNUM
final static char POSIX_CLASS_ALNUM(Code)



POSIX_CLASS_ALPHA
final static char POSIX_CLASS_ALPHA(Code)



POSIX_CLASS_BLANK
final static char POSIX_CLASS_BLANK(Code)



POSIX_CLASS_CNTRL
final static char POSIX_CLASS_CNTRL(Code)



POSIX_CLASS_DIGIT
final static char POSIX_CLASS_DIGIT(Code)



POSIX_CLASS_GRAPH
final static char POSIX_CLASS_GRAPH(Code)



POSIX_CLASS_JPART
final static char POSIX_CLASS_JPART(Code)



POSIX_CLASS_JSTART
final static char POSIX_CLASS_JSTART(Code)



POSIX_CLASS_LOWER
final static char POSIX_CLASS_LOWER(Code)



POSIX_CLASS_PRINT
final static char POSIX_CLASS_PRINT(Code)



POSIX_CLASS_PUNCT
final static char POSIX_CLASS_PUNCT(Code)



POSIX_CLASS_SPACE
final static char POSIX_CLASS_SPACE(Code)



POSIX_CLASS_UPPER
final static char POSIX_CLASS_UPPER(Code)



POSIX_CLASS_XDIGIT
final static char POSIX_CLASS_XDIGIT(Code)



REPLACE_ALL
final public static int REPLACE_ALL(Code)
Flag bit that indicates that subst should replace all occurrences of this regular expression.



REPLACE_BACKREFERENCES
final public static int REPLACE_BACKREFERENCES(Code)
Flag bit that indicates that subst should replace backreferences



REPLACE_FIRSTONLY
final public static int REPLACE_FIRSTONLY(Code)
Flag bit that indicates that subst should only replace the first occurrence of this regular expression.



end0
transient int end0(Code)



end1
transient int end1(Code)



end2
transient int end2(Code)



endBackref
transient int[] endBackref(Code)



endn
transient int[] endn(Code)



matchFlags
int matchFlags(Code)



maxNode
final static int maxNode(Code)



maxParen
int maxParen(Code)



nodeSize
final static int nodeSize(Code)



offsetNext
final static int offsetNext(Code)



offsetOpcode
final static int offsetOpcode(Code)



offsetOpdata
final static int offsetOpdata(Code)



parenCount
transient int parenCount(Code)



program
REProgram program(Code)



search
transient CharacterIterator search(Code)



start0
transient int start0(Code)



start1
transient int start1(Code)



start2
transient int start2(Code)



startBackref
transient int[] startBackref(Code)



startn
transient int[] startn(Code)




Constructor Detail
RE
public RE(String pattern) throws RESyntaxException(Code)
Constructs a regular expression matcher from a String by compiling it using a new instance of RECompiler. If you will be compiling many expressions, you may prefer to use a single RECompiler object instead.
Parameters:
  pattern - The regular expression pattern to compile.
exception:
  RESyntaxException - Thrown if the regular expression has invalid syntax.
See Also:   RECompiler
See Also:   recompile



RE
public RE(String pattern, int matchFlags) throws RESyntaxException(Code)
Constructs a regular expression matcher from a String by compiling it using a new instance of RECompiler. If you will be compiling many expressions, you may prefer to use a single RECompiler object instead.
Parameters:
  pattern - The regular expression pattern to compile.
Parameters:
  matchFlags - The matching style
exception:
  RESyntaxException - Thrown if the regular expression has invalid syntax.
See Also:   RECompiler
See Also:   recompile



RE
public RE(REProgram program, int matchFlags)(Code)
Construct a matcher for a pre-compiled regular expression from program (bytecode) data. Permits special flags to be passed in to modify matching behaviour.
Parameters:
  program - Compiled regular expression program (see RECompiler and/or recompile)
Parameters:
  matchFlags - One or more of the RE match behaviour flags (RE.MATCH_*):
MATCH_NORMAL              // Normal (case-sensitive) matchingMATCH_CASEINDEPENDENT     // Case folded comparisonsMATCH_MULTILINE           // Newline matches as BOL/EOL

See Also:   RECompiler
See Also:   REProgram
See Also:   recompile



RE
public RE(REProgram program)(Code)
Construct a matcher for a pre-compiled regular expression from program (bytecode) data.
Parameters:
  program - Compiled regular expression program
See Also:   RECompiler
See Also:   recompile



RE
public RE()(Code)
Constructs a regular expression matcher with no initial program. This is likely to be an uncommon practice, but is still supported.




Method Detail
getMatchFlags
public int getMatchFlags()(Code)
Returns the current match behaviour flags. Current match behaviour flags (RE.MATCH_*).
MATCH_NORMAL              // Normal (case-sensitive) matchingMATCH_CASEINDEPENDENT     // Case folded comparisonsMATCH_MULTILINE           // Newline matches as BOL/EOL

See Also:   RE.setMatchFlags



getParen
public String getParen(int which)(Code)
Gets the contents of a parenthesized subexpression after a successful match.
Parameters:
  which - Nesting level of subexpression String



getParenCount
public int getParenCount()(Code)
Returns the number of parenthesized subexpressions available after a successful match. Number of available parenthesized subexpressions



getParenEnd
final public int getParenEnd(int which)(Code)
Returns the end index of a given paren level.
Parameters:
  which - Nesting level of subexpression String index



getParenLength
final public int getParenLength(int which)(Code)
Returns the length of a given paren level.
Parameters:
  which - Nesting level of subexpression Number of characters in the parenthesized subexpression



getParenStart
final public int getParenStart(int which)(Code)
Returns the start index of a given paren level.
Parameters:
  which - Nesting level of subexpression String index



getProgram
public REProgram getProgram()(Code)
Returns the current regular expression program in use by this matcher object. Regular expression program
See Also:   RE.setProgram



grep
public String[] grep(Object[] search)(Code)
Returns an array of Strings, whose toString representation matches a regular expression. This method works like the Perl function of the same name. Given a regular expression of "a*b" and an array of String objects of [foo, aab, zzz, aaaab], the array of Strings returned by grep would be [aab, aaaab].
Parameters:
  search - Array of Objects to search Array of Strings whose toString() value matches this regular expression.



internalError
protected void internalError(String s) throws Error(Code)
Throws an Error representing an internal error condition probably resulting from a bug in the regular expression compiler (or possibly data corruption). In practice, this should be very rare.
Parameters:
  s - Error description



match
public boolean match(String search, int i)(Code)
Matches the current regular expression program against a character array, starting at a given index.
Parameters:
  search - String to match against
Parameters:
  i - Index to start searching at True if string matched



match
public boolean match(CharacterIterator search, int i)(Code)
Matches the current regular expression program against a character array, starting at a given index.
Parameters:
  search - String to match against
Parameters:
  i - Index to start searching at True if string matched



match
public boolean match(String search)(Code)
Matches the current regular expression program against a String.
Parameters:
  search - String to match against True if string matched



matchAt
protected boolean matchAt(int i)(Code)
Match the current regular expression program against the current input string, starting at index i of the input string. This method is only meant for internal use.
Parameters:
  i - The input string index to start matching at True if the input matched the expression



matchNodes
protected int matchNodes(int firstNode, int lastNode, int idxStart)(Code)
Try to match a string against a subset of nodes in the program
Parameters:
  firstNode - Node to start at in program
Parameters:
  lastNode - Last valid node (used for matching a subexpression withoutmatching the rest of the program as well).
Parameters:
  idxStart - Starting position in character array Final input array index if match succeeded. -1 if not.



setMatchFlags
public void setMatchFlags(int matchFlags)(Code)
Sets match behaviour flags which alter the way RE does matching.
Parameters:
  matchFlags - One or more of the RE match behaviour flags (RE.MATCH_*):
MATCH_NORMAL              // Normal (case-sensitive) matchingMATCH_CASEINDEPENDENT     // Case folded comparisonsMATCH_MULTILINE           // Newline matches as BOL/EOL



setParenEnd
final protected void setParenEnd(int which, int i)(Code)
Sets the end of a paren level
Parameters:
  which - Which paren level
Parameters:
  i - Index in input array



setParenStart
final protected void setParenStart(int which, int i)(Code)
Sets the start of a paren level
Parameters:
  which - Which paren level
Parameters:
  i - Index in input array



setProgram
public void setProgram(REProgram program)(Code)
Sets the current regular expression program used by this matcher object.
Parameters:
  program - Regular expression program compiled by RECompiler.
See Also:   RECompiler
See Also:   REProgram
See Also:   recompile



simplePatternToFullRegularExpression
public static String simplePatternToFullRegularExpression(String pattern)(Code)
Converts a 'simplified' regular expression to a full regular expression
Parameters:
  pattern - The pattern to convert The full regular expression



split
public String[] split(String s)(Code)
Splits a string into an array of strings on regular expression boundaries. This function works the same way as the Perl function of the same name. Given a regular expression of "[ab]+" and a string to split of "xyzzyababbayyzabbbab123", the result would be the array of Strings "[xyzzy, yyz, 123]".

Please note that the first string in the resulting array may be an empty string. This happens when the very first character of input string is matched by the pattern.
Parameters:
  s - String to split on this regular exression Array of strings




subst
public String subst(String substituteIn, String substitution)(Code)
Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".
Parameters:
  substituteIn - String to substitute within
Parameters:
  substitution - String to substitute for all matches of this regular expression. The string substituteIn with zero or more occurrences of the currentregular expression replaced with the substitution String (if this regularexpression object doesn't match at any position, the original String is returnedunchanged).



subst
public String subst(String substituteIn, String substitution, int flags)(Code)
Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".

It is also possible to reference the contents of a parenthesized expression with $0, $1, ... $9. A regular expression of "http://[\\.\\w\\-\\?/~_@&=%]+", a String to substituteIn of "visit us: http://www.apache.org!" and the substitution String "<a href=\"$0\">$0</a>", the resulting String returned by subst would be "visit us: <a href=\"http://www.apache.org\">http://www.apache.org</a>!".

Note: $0 represents the whole match.
Parameters:
  substituteIn - String to substitute within
Parameters:
  substitution - String to substitute for matches of this regular expression
Parameters:
  flags - One or more bitwise flags from REPLACE_*. If the REPLACE_FIRSTONLYflag bit is set, only the first occurrence of this regular expression is replaced.If the bit is not set (REPLACE_ALL), all occurrences of this pattern will bereplaced. If the flag REPLACE_BACKREFERENCES is set, all backreferences willbe processed. The string substituteIn with zero or more occurrences of the currentregular expression replaced with the substitution String (if this regularexpression object doesn't match at any position, the original String is returnedunchanged).




Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.