Java Doc for Regexp.java in  » Scripting » jacl » sunlabs » brazil » util » regexp » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Scripting » jacl » sunlabs.brazil.util.regexp 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   sunlabs.brazil.util.regexp.Regexp

Regexp
public class Regexp (Code)
The Regexp class can be used to match a pattern against a string and optionally replace the matched parts with new strings.

Regular expressions were implemented by translating Henry Spencer's regular expression package for tcl8.0. Much of the description below is copied verbatim from the tcl8.0 regsub manual entry.


REGULAR EXPRESSIONS

A regular expression is zero or more branches, separated by "|". It matches anything that matches one of the branches.

A branch is zero or more pieces, concatenated. It matches a match for the first piece, followed by a match for the second piece, etc.

A piece is an atom, possibly followed by "*", "+", or "?".

  • An atom followed by "*" matches a sequence of 0 or more matches of the atom.
  • An atom followed by "+" matches a sequence of 1 or more matches of the atom.
  • An atom followed by "?" matches either 0 or 1 matches of the atom.

An atom is

  • a regular expression in parentheses (matching a match for the regular expression)
  • a range (see below)
  • "." (matching any single character)
  • "^" (matching the null string at the beginning of the input string)
  • "$" (matching the null string at the end of the input string)
  • a "\" followed by a single character (matching that character)
  • a single character with no other significance (matching that character).

A range is a sequence of characters enclosed in "[]". The range normally matches any single character from the sequence. If the sequence begins with "^", the range matches any single character not from the rest of the sequence. If two characters in the sequence are separated by "-", this is shorthand for the full list of characters between them (e.g. "[0-9]" matches any decimal digit). To include a literal "]" in the sequence, make it the first character (following a possible "^"). To include a literal "-", make it the first or last character.

In general there may be more than one way to match a regular expression to an input string. For example, consider the command

 String[] match = new String[2];
 Regexp.match("(a*)b*", "aabaaabb", match);
 
Considering only the rules given so far, match[0] and match[1] could end up with the values
  • "aabb" and "aa"
  • "aaab" and "aaa"
  • "ab" and "a"
or any of several other combinations. To resolve this potential ambiguity, Regexp chooses among alternatives using the rule "first then longest". In other words, it considers the possible matches in order working from left to right across the input string and the pattern, and it attempts to match longer pieces of the input string before shorter ones. More specifically, the following rules apply in decreasing order of priority:
  1. If a regular expression could match two different parts of an input string then it will match the one that begins earliest.
  2. If a regular expression contains "|" operators then the leftmost matching sub-expression is chosen.
  3. In "*", "+", and "?" constructs, longer matches are chosen in preference to shorter ones.
  4. In sequences of expression components the components are considered from left to right.

In the example from above, "(a*)b*" therefore matches exactly "aab"; the "(a*)" portion of the pattern is matched first and it consumes the leading "aa", then the "b*" portion of the pattern consumes the next "b". Or, consider the following example:

 String match = new String[3];
 Regexp.match("(ab|a)(b*)c", "abc", match);
 
After this command, match[0] will be "abc", match[1] will be "ab", and match[2] will be an empty string. Rule 4 specifies that the "(ab|a)" component gets first shot at the input string and Rule 2 specifies that the "ab" sub-expression is checked before the "a" sub-expression. Thus the "b" has already been claimed before the "(b*)" component is checked and therefore "(b*)" must match an empty string.
REGULAR EXPRESSION SUBSTITUTION

Regular expression substitution matches a string against a regular expression, transforming the string by replacing the matched region(s) with new substring(s).

What gets substituted into the result is controlled by a subspec. The subspec is a formatting string that specifies what portions of the matched region should be substituted into the result.

  • "&" or "\0" is replaced with a copy of the entire matched region.
  • "\n", where n is a digit from 1 to 9, is replaced with a copy of the nth subexpression.
  • "\&" or "\\" are replaced with just "&" or "\" to escape their special meaning.
  • any other character is passed through.
In the above, strings like "\2" represents the two characters backslash and "2", not the Unicode character 0002.
Here is an example of how to use Regexp
 public static void
 main(String[] args)
 throws Exception
 {
 Regexp re;
 String[] matches;
 String s;
 /*
 A regular expression to match the first line of a HTTP request.
 1. ^               - starting at the beginning of the line
 2. ([A-Z]+)        - match and remember some upper case characters
 3. [ \t]+          - skip blank space
 4. ([^ \t]*)       - match and remember up to the next blank space
 5. [ \t]+          - skip more blank space
 6. (HTTP/1\\.[01]) - match and remember HTTP/1.0 or HTTP/1.1
 7. $		      - end of string - no chars left.
 /
 s = "GET http://a.b.com:1234/index.html HTTP/1.1";
 re = new Regexp("^([A-Z]+)[ \t]+([^ \t]+)[ \t]+(HTTP/1\\.[01])$");
 matches = new String[4];
 if (re.match(s, matches)) {
 System.out.println("METHOD  " + matches[1]);
 System.out.println("URL     " + matches[2]);
 System.out.println("VERSION " + matches[3]);
 }
 /*
 A regular expression to extract some simple comma-separated data,
 reorder some of the columns, and discard column 2.
 /
 s = "abc,def,ghi,klm,nop,pqr";
 re = new Regexp("^([^,]+),([^,]+),([^,]+),(.*)");
 System.out.println(re.sub(s, "\\3,\\1,\\4"));
 }
 

author:
   Colin Stevens (colin.stevens@sun.com)
version:
   1.7, 99/10/14
See Also:   Regsub

Inner Class :public interface Filter
Inner Class :static class Compiler
Inner Class :static class Match

Field Summary
final static  charANY
    
final static  charANYBUT
    
final static  charANYOF
    
final static  charBACK
    
final static  charBOL
    
final static  charBRANCH
    
final static  charCLOSE
    
final static  charEND
    
final static  charEOL
    
final static  charEXACTLY
    
final static  charNOTHING
    
final static  intNSUBEXP
    
final static  charOPEN
    
final static  charPLUS
    
final static  charSTAR
    
 booleananchored
     true if the pattern must match the beginning of the string, so we don't have to waste time matching against all possible starting locations in the string.
 booleanignoreCase
     Whether the regexp matching should be case insensitive.
 Stringmust
    
 intnpar
     The number of parenthesized subexpressions in the regexp pattern, plus 1 for the match of the whole pattern itself.
final static  String[]opnames
    
 char[]program
     The bytecodes making up the regexp program.
 intstartChar
    

Constructor Summary
public  Regexp(String pat)
     Compiles a new Regexp object from the given regular expression pattern.

It takes a certain amount of time to parse and validate a regular expression pattern before it can be used to perform matches or substitutions.

public  Regexp(String pat, boolean ignoreCase)
     Compiles a new Regexp object from the given regular expression pattern.
Parameters:
  pat - The string holding the regular expression pattern.
Parameters:
  ignoreCase - If true then this regular expression willdo case-insensitive matching.

Method Summary
public static  voidapplySubspec(Regsub rs, String subspec, StringBuffer sb)
     Utility method to give access to the standard substitution algorithm used by sub and subAll.
 Matchexec(String str, int start, int off)
    
public static  voidmain(String[] args)
    
public  Stringmatch(String str)
     Matches the given string against this regular expression.
public  booleanmatch(String str, String[] substrs)
     Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions.

substrs[0] is set to the range of str that matched the entire regular expression.

substrs[1] is set to the range of str that matched the first (leftmost) parenthesized subexpression. substrs[n] is set to the range that matched the nth subexpression, and so on.

If subexpression n did not match, then substrs[n] is set to null.

public  booleanmatch(String str, int[] indices)
     Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions.

For the indices specified below, the range extends from the character at the starting index up to, but not including, the character at the ending index.

indices[0] and indices[1] are set to starting and ending indices of the range of str that matched the entire regular expression.

indices[2] and indices[3] are set to the starting and ending indices of the range of str that matched the first (leftmost) parenthesized subexpression. indices[n * 2] and indices[n * 2 + 1] are set to the range that matched the nth subexpression, and so on.

If subexpression n did not match, then indices[n * 2] and indices[n * 2 + 1] are both set to -1.

The length that the caller should use when allocating the indices array is twice the return value of Regexp.subspecs.

public  Stringsub(String str, String subspec)
     Matches a string against a regular expression and replaces the first match with the string generated from the substitution parameter.
Parameters:
  str - The string to match against this regular expression.
Parameters:
  subspec - The substitution parameter, described in REGULAR EXPRESSION SUBSTITUTION. The string formed by replacing the first match instr with the string generated fromsubspec.
public  Stringsub(String str, Filter rf)
    
public  StringsubAll(String str, String subspec)
     Matches a string against a regular expression and replaces all matches with the string generated from the substitution parameter. After each substutition is done, the portions of the string already examined, including the newly substituted region, are not checked again for new matches -- only the rest of the string is examined.
Parameters:
  str - The string to match against this regular expression.
Parameters:
  subspec - The substitution parameter, described in REGULAR EXPRESSION SUBSTITUTION. The string formed by replacing all the matches instr with the strings generated fromsubspec.
public  intsubspecs()
     Returns the number of parenthesized subexpressions in this regular expression, plus one more for this expression itself.
public  StringtoString()
     Returns a string representation of this compiled regular expression.

Field Detail
ANY
final static char ANY(Code)



ANYBUT
final static char ANYBUT(Code)



ANYOF
final static char ANYOF(Code)



BACK
final static char BACK(Code)



BOL
final static char BOL(Code)



BRANCH
final static char BRANCH(Code)



CLOSE
final static char CLOSE(Code)



END
final static char END(Code)



EOL
final static char EOL(Code)



EXACTLY
final static char EXACTLY(Code)



NOTHING
final static char NOTHING(Code)



NSUBEXP
final static int NSUBEXP(Code)



OPEN
final static char OPEN(Code)



PLUS
final static char PLUS(Code)



STAR
final static char STAR(Code)



anchored
boolean anchored(Code)
true if the pattern must match the beginning of the string, so we don't have to waste time matching against all possible starting locations in the string.



ignoreCase
boolean ignoreCase(Code)
Whether the regexp matching should be case insensitive.



must
String must(Code)



npar
int npar(Code)
The number of parenthesized subexpressions in the regexp pattern, plus 1 for the match of the whole pattern itself.



opnames
final static String[] opnames(Code)



program
char[] program(Code)
The bytecodes making up the regexp program.



startChar
int startChar(Code)




Constructor Detail
Regexp
public Regexp(String pat) throws IllegalArgumentException(Code)
Compiles a new Regexp object from the given regular expression pattern.

It takes a certain amount of time to parse and validate a regular expression pattern before it can be used to perform matches or substitutions. If the caller caches the new Regexp object, that parsing time will be saved because the same Regexp can be used with respect to many different strings.
Parameters:
  pat - The string holding the regular expression pattern.
throws:
  IllegalArgumentException - if the pattern is malformed.The detail message for the exception will be set to astring indicating how the pattern was malformed.




Regexp
public Regexp(String pat, boolean ignoreCase) throws IllegalArgumentException(Code)
Compiles a new Regexp object from the given regular expression pattern.
Parameters:
  pat - The string holding the regular expression pattern.
Parameters:
  ignoreCase - If true then this regular expression willdo case-insensitive matching. If false, thenthe matches are case-sensitive. Regular expressionsgenerated by Regexp(String) are case-sensitive.
throws:
  IllegalArgumentException - if the pattern is malformed.The detail message for the exception will be set to astring indicating how the pattern was malformed.




Method Detail
applySubspec
public static void applySubspec(Regsub rs, String subspec, StringBuffer sb)(Code)
Utility method to give access to the standard substitution algorithm used by sub and subAll. Appends to the string buffer the string generated by applying the substitution parameter to the matched region.
Parameters:
  rs - Information about the matched region.
Parameters:
  subspec - The substitution parameter.
Parameters:
  sb - StringBuffer to which the generated string is appended.



exec
Match exec(String str, int start, int off)(Code)



main
public static void main(String[] args) throws Exception(Code)



match
public String match(String str)(Code)
Matches the given string against this regular expression.
Parameters:
  str - The string to match. The substring of str that matched the entireregular expression, or null if the string did notmatch this regular expression.



match
public boolean match(String str, String[] substrs)(Code)
Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions.

substrs[0] is set to the range of str that matched the entire regular expression.

substrs[1] is set to the range of str that matched the first (leftmost) parenthesized subexpression. substrs[n] is set to the range that matched the nth subexpression, and so on.

If subexpression n did not match, then substrs[n] is set to null. Not to be confused with "", which is a valid value for a subexpression that matched 0 characters.

The length that the caller should use when allocating the substr array is the return value of Regexp.subspecs. The array can be shorter (in which case not all the information will be returned), or longer (in which case the remainder of the elements are initialized to null), or null (to ignore the subexpressions).
Parameters:
  str - The string to match.
Parameters:
  substrs - An array of strings allocated by the caller, and filled inwith information about the portions of str thatmatched the regular expression. May be null. true if str that matched thisregular expression, false otherwise.If false is returned, then the contents ofsubstrs are unchanged.
See Also:   Regexp.subspecs




match
public boolean match(String str, int[] indices)(Code)
Matches the given string against this regular expression, and computes the set of substrings that matched the parenthesized subexpressions.

For the indices specified below, the range extends from the character at the starting index up to, but not including, the character at the ending index.

indices[0] and indices[1] are set to starting and ending indices of the range of str that matched the entire regular expression.

indices[2] and indices[3] are set to the starting and ending indices of the range of str that matched the first (leftmost) parenthesized subexpression. indices[n * 2] and indices[n * 2 + 1] are set to the range that matched the nth subexpression, and so on.

If subexpression n did not match, then indices[n * 2] and indices[n * 2 + 1] are both set to -1.

The length that the caller should use when allocating the indices array is twice the return value of Regexp.subspecs. The array can be shorter (in which case not all the information will be returned), or longer (in which case the remainder of the elements are initialized to -1), or null (to ignore the subexpressions).
Parameters:
  str - The string to match.
Parameters:
  indices - An array of integers allocated by the caller, and filled inwith information about the portions of str thatmatched all the parts of the regular expression.May be null. true if the string matched the regular expression,false otherwise. If false isreturned, then the contents of indices areunchanged.
See Also:   Regexp.subspecs




sub
public String sub(String str, String subspec)(Code)
Matches a string against a regular expression and replaces the first match with the string generated from the substitution parameter.
Parameters:
  str - The string to match against this regular expression.
Parameters:
  subspec - The substitution parameter, described in REGULAR EXPRESSION SUBSTITUTION. The string formed by replacing the first match instr with the string generated fromsubspec. If no matches were found, thenthe return value is null.



sub
public String sub(String str, Filter rf)(Code)



subAll
public String subAll(String str, String subspec)(Code)
Matches a string against a regular expression and replaces all matches with the string generated from the substitution parameter. After each substutition is done, the portions of the string already examined, including the newly substituted region, are not checked again for new matches -- only the rest of the string is examined.
Parameters:
  str - The string to match against this regular expression.
Parameters:
  subspec - The substitution parameter, described in REGULAR EXPRESSION SUBSTITUTION. The string formed by replacing all the matches instr with the strings generated fromsubspec. If no matches were found, thenthe return value is a copy of str.



subspecs
public int subspecs()(Code)
Returns the number of parenthesized subexpressions in this regular expression, plus one more for this expression itself. The number.



toString
public String toString()(Code)
Returns a string representation of this compiled regular expression. The format of the string representation is a symbolic dump of the bytecodes. A string representation of this regular expression.



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.