| |
|
| java.lang.Object org.apache.oro.text.awk.AwkCompiler
AwkCompiler | final public class AwkCompiler implements PatternCompiler(Code) | | The AwkCompiler class is used to create compiled regular expressions
conforming to the Awk regular expression syntax. It generates
AwkPattern instances upon compilation to be used in conjunction
with an AwkMatcher instance. AwkMatcher finds true leftmost-longest
matches, so you must take care with how you formulate your regular
expression to avoid matching more than you really want.
The supported regular expression syntax is a superset of traditional AWK,
but NOT to be confused with GNU AWK or other AWK variants. Additionally,
this AWK implementation is DFA-based and only supports 8-bit ASCII.
Consequently, these classes can perform very fast pattern matches in
most cases.
This is the traditional Awk syntax that is supported:
- Alternatives separated by |
- Quantified atoms
- *
- Match 0 or more times.
- +
- Match 1 or more times.
- ?
- Match 0 or 1 times.
- Atoms
- regular expression within parentheses
- a . matches everything including newline
- a ^ is a null token matching the beginning of a string
but has no relation to newlines (and is only valid at the
beginning of a regex; this differs from traditional awk
for the sake of efficiency in Java).
- a $ is a null token matching the end of a string but has
no relation to newlines (and is only valid at the
end of a regex; this differs from traditional awk for the
sake of efficiency in Java).
- Character classes (e.g., [abcd]) and ranges (e.g. [a-z])
- Special backslashed characters work within a character class
- Special backslashed characters
- \b
- backspace
- \n
- newline
- \r
- carriage return
- \t
- tab
- \f
- formfeed
- \xnn
- hexadecimal representation of character
- \nn or \nnn
- octal representation of character
- Any other backslashed character matches itself
This is the extended syntax that is supported:
- Quantified atoms
- {n,m}
- Match at least n but not more than m times.
- {n,}
- Match at least n times.
- {n}
- Match exactly n times.
- Atoms
- Special backslashed characters
- \d
- digit [0-9]
- \D
- non-digit [^0-9]
- \w
- word character [0-9a-z_A-Z]
- \W
- a non-word character [^0-9a-z_A-Z]
- \s
- a whitespace character [ \t\n\r\f]
- \S
- a non-whitespace character [^ \t\n\r\f]
- \cD
- matches the corresponding control character
- \0
- matches null character
version: @version@ since: 1.0 See Also: org.apache.oro.text.regex.PatternCompiler See Also: org.apache.oro.text.regex.MalformedPatternException See Also: AwkPattern See Also: AwkMatcher |
Method Summary | |
static boolean | _isLowerCase(char token) | static boolean | _isUpperCase(char token) | static boolean | _isWordCharacter(char token) | SyntaxNode | _newTokenNode(char token, int position) | SyntaxTree | _parse(char[] expression) | static char | _toggleCase(char token) | public Pattern | compile(char[] pattern, int options) Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
Parameters: pattern - An Awk regular expression to compile. Parameters: options - A set of flags giving the compiler instructions onhow to treat the regular expression. | public Pattern | compile(String pattern, int options) Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
Parameters: pattern - An Awk regular expression to compile. Parameters: options - A set of flags giving the compiler instructions onhow to treat the regular expression. | public Pattern | compile(char[] pattern) Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
Parameters: pattern - A regular expression to compile. | public Pattern | compile(String pattern) Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
Parameters: pattern - A regular expression to compile. |
CASE_INSENSITIVE_MASK | final public static int CASE_INSENSITIVE_MASK(Code) | | A mask passed as an option to the
AwkCompiler.compile compile methods
to indicate a compiled regular expression should be case insensitive.
|
DEFAULT_MASK | final public static int DEFAULT_MASK(Code) | | The default mask for the
AwkCompiler.compile compile methods.
It is equal to 0 and indicates no special options are active.
|
MULTILINE_MASK | final public static int MULTILINE_MASK(Code) | | A mask passed as an option to the
AwkCompiler.compile compile methods
to indicate a compiled regular expression should treat input as having
multiple lines. This option affects the interpretation of
the . metacharacters. When this mask is used,
the . metacharacter will not match newlines. The default
behavior is for . to match newlines.
|
_END_OF_INPUT | final static char _END_OF_INPUT(Code) | | |
_isLowerCase | static boolean _isLowerCase(char token)(Code) | | |
_isUpperCase | static boolean _isUpperCase(char token)(Code) | | |
_isWordCharacter | static boolean _isWordCharacter(char token)(Code) | | |
_toggleCase | static char _toggleCase(char token)(Code) | | |
compile | public Pattern compile(char[] pattern, int options) throws MalformedPatternException(Code) | | Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
Parameters: pattern - An Awk regular expression to compile. Parameters: options - A set of flags giving the compiler instructions onhow to treat the regular expression. Currently theonly meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK. A Pattern instance constituting the compiled regular expression.This instance will always be an AwkPattern and can be reliablybe casted to an AwkPattern. exception: MalformedPatternException - If the compiled expressionis not a valid Awk regular expression. |
compile | public Pattern compile(String pattern, int options) throws MalformedPatternException(Code) | | Compiles an Awk regular expression into an AwkPattern instance that
can be used by an AwkMatcher object to perform pattern matching.
Parameters: pattern - An Awk regular expression to compile. Parameters: options - A set of flags giving the compiler instructions onhow to treat the regular expression. Currently theonly meaningful flag is AwkCompiler.CASE_INSENSITIVE_MASK. A Pattern instance constituting the compiled regular expression.This instance will always be an AwkPattern and can be reliablybe casted to an AwkPattern. exception: MalformedPatternException - If the compiled expressionis not a valid Awk regular expression. |
compile | public Pattern compile(char[] pattern) throws MalformedPatternException(Code) | | Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
Parameters: pattern - A regular expression to compile. A Pattern instance constituting the compiled regular expression.This instance will always be an AwkPattern and can be reliablybe casted to an AwkPattern. exception: MalformedPatternException - If the compiled expressionis not a valid Awk regular expression. |
compile | public Pattern compile(String pattern) throws MalformedPatternException(Code) | | Same as calling compile(pattern, AwkCompiler.DEFAULT_MASK);
Parameters: pattern - A regular expression to compile. A Pattern instance constituting the compiled regular expression.This instance will always be an AwkPattern and can be reliablybe casted to an AwkPattern. exception: MalformedPatternException - If the compiled expressionis not a valid Awk regular expression. |
|
|
|