org.apache.oro.text.regex |
This package used to be the OROMatcher library and provides both
generic regular expression interfaces and Perl5 regular expression
compatible implementation classes.
Note: The following information will be moved into the user's guide.
Perl5 regular expressions
Here we summarize the syntax of Perl5.003 regular expressions, all of
which is supported by the Perl5 classes in this package. However, for
a definitive reference, you should consult the
perlre man page
that accompanies the Perl5 distribution and also the book
Programming Perl, 2nd Edition from O'Reilly & Associates.
We are working toward implementing the features added after Perl5.003
up to and including Perl 5.6. Please remember, we only guarantee
support for Perl5.003 expressions in version 2.0.
- Alternatives separated by |
- Quantified atoms
- {n,m}
- Match at least n but not more than m times.
- {n,}
- Match at least n times.
- {n}
- Match exactly n times.
- *
- Match 0 or more times.
- +
- Match 1 or more times.
- ?
- Match 0 or 1 times.
- Atoms
- regular expression within parentheses
- a . matches everything except \n
- a ^ is a null token matching the beginning of a string or line
(i.e., the position right after a newline or right before
the beginning of a string)
- a $ is a null token matching the end of a string or line
(i.e., the position right before a newline or right after
the end of a string)
- Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
- Special backslashed characters work within a character
class (except for backreferences and boundaries).
- \b is backspace inside a character class
- Special backslashed characters
- \b
- null token matching a word boundary (\w on one side
and \W on the other)
- \B
- null token matching a boundary that isn't a
word boundary
- \A
- Match only at beginning of string
- \Z
- Match only at end of string (or before newline
at the end)
- \n
- newline
- \r
- carriage return
- \t
- tab
- \f
- formfeed
- \d
- digit [0-9]
- \D
- non-digit [^0-9]
- \w
- word character [0-9a-z_A-Z]
- \W
- a non-word character [^0-9a-z_A-Z]
- \s
- a whitespace character [ \t\n\r\f]
- \S
- a non-whitespace character [^ \t\n\r\f]
- \xnn
- hexadecimal representation of character
- \cD
- matches the corresponding control character
- \nn or \nnn
- octal representation of character
unless a backreference. a
- \1, \2, \3, etc.
- match whatever the first, second,
third, etc. parenthesized group matched. This is called a
backreference. If there is no corresponding group, the
number is interpreted as an octal representation of a character.
- \0
- matches null character
- Any other backslashed character matches itself
- Expressions within parentheses are matched as subpattern groups
and saved for use by certain methods.
By default, a quantified subpattern is greedy .
In other words it matches as many times as possible without causing
the rest of the pattern not to match. To change the quantifiers
to match the minimum number of times possible, without
causing the rest of the pattern not to match, you may use
a "?" right after the quantifier.
- *?
- Match 0 or more times
- +?
- Match 1 or more times
- ??
- Match 0 or 1 time
- {n}?
- Match exactly n times
- {n,}?
- Match at least n times
- {n,m}?
- Match at least n but not more than m times
Perl5 extended regular expressions are fully supported.
- (?#text)
- An embedded comment causing text to be ignored.
- (?:regexp)
- Groups things like "()" but doesn't cause the
group match to be saved.
- (?=regexp)
-
A zero-width positive lookahead assertion. For
example, \w+(?=\s) matches a word followed by
whitespace, without including whitespace in the
MatchResult.
- (?!regexp)
-
A zero-width negative lookahead assertion. For
example foo(?!bar) matches any occurrence of
"foo" that isn't followed by "bar". Remember
that this is a zero-width assertion, which means
that a(?!b)d will match ad because a is followed
by a character that is not b (the d) and a d
follows the zero-width assertion.
- (?imsx)
- One or more embedded pattern-match modifiers.
i enables case insensitivity, m enables multiline
treatment of the input, s enables single line treatment
of the input, and x enables extended whitespace comments.
|
Java Source File Name | Type | Comment |
CharStringPointer.java | Class | The CharStringPointer class is used to facilitate traversal of a char[]
in the manner pointer traversals of strings are performed in C/C++. |
MalformedPatternException.java | Class | A class used to signify the occurrence of a syntax error in a
regular expression that is being compiled. |
MatchResult.java | Interface | The MatchResult interface allows PatternMatcher implementors to return
results storing match information in whatever format they like, while
presenting a consistent way of accessing that information. |
OpCode.java | Class | The OpCode class should not be instantiated. |
Pattern.java | Interface | The Pattern interface allows multiple representations of a regular
expression to be defined. |
PatternCompiler.java | Interface | The PatternCompiler interface defines the operations a regular
expression compiler must implement. |
PatternMatcher.java | Interface | The PatternMatcher interface defines the operations a regular
expression matcher must implement. |
PatternMatcherInput.java | Class | The PatternMatcherInput class is used to preserve state across
calls to the contains() methods of PatternMatcher instances.
It is also used to specify that only a subregion of a string
should be used as input when looking for a pattern match. |
Perl5Compiler.java | Class | The Perl5Compiler class is used to create compiled regular expressions
conforming to the Perl5 regular expression syntax. |
Perl5Debug.java | Class | The Perl5Debug class is not intended for general use and should not
be instantiated, but is provided because some users may find the output
of its single method to be useful.
The Perl5Compiler class generates a representation of a
regular expression identical to that of Perl5 in the abstract, but
not in terms of actual data structures. |
Perl5Matcher.java | Class | The Perl5Matcher class is used to match regular expressions
(conforming to the Perl5 regular expression syntax) generated by
Perl5Compiler.
Perl5Compiler and Perl5Matcher are designed with the intent that
you use a separate instance of each per thread to avoid the overhead
of both synchronization and concurrent access (e.g., a match that takes
a long time in one thread will block the progress of another thread with
a shorter match). |
Perl5MatchResult.java | Class | A class used to store and access the results of a Perl5Pattern match. |
Perl5Pattern.java | Class | An implementation of the Pattern interface for Perl5 regular expressions.
This class is compatible with the Perl5Compiler and Perl5Matcher
classes. |
Perl5Repetition.java | Class | Perl5Repetition is a support class for Perl5Matcher. |
Perl5Substitution.java | Class | Perl5Substitution implements a Substitution consisting of a
literal string, but allowing Perl5 variable interpolation referencing
saved groups in a match. |
StringSubstitution.java | Class | StringSubstitution implements a Substitution consisting of a simple
literal string. |
Substitution.java | Interface | The Substitution interface provides a means for you to control how
a substitution is performed when using the
Util.substitute Util.substitute method. |
Util.java | Class | The Util class is a holder for useful static utility methods that can
be generically applied to Pattern and PatternMatcher instances.
This class cannot and is not meant to be instantiated.
The Util class currently contains versions of the split() and substitute()
methods inspired by Perl's split function and s operation
respectively, although they are implemented in such a way as not to
rely on the Perl5 implementations of the OROMatcher packages regular
expression interfaces. |