xtc.parser

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Parser » Rats Parser Generators » xtc.parser

xtc.parser
Rats!, a parser generator supporting extensible syntax. Grammars for Rats! build on the Parsing Expression Grammar (PEG) formalism described in Brian Ford's Parsing Expression Grammars paper. However, since Rats! produces working parsers, the syntax of Rats! grammars is somewhat different from and more expressive than the PEG syntax described in the paper. Additionally, to make grammars more easily reusable and extensible, Rats organizes grammar fragments into modules. A good starting point for learning how to use Rats!, in addition to this introduction, are Rats!' own grammar in package `xtc.parser` and the C and Java grammars in package `xtc.lang`. The rest of this document covers the following topics: Syntax Overview of Expressions and Operators Grammar Modules Productions and Semantic Values Passing the Semantic Value Through Options, Repetitions, and Nested Choices Void and Text-Only Productions Generic Productions List-Valued Productions Left-Recursive Productions Node Markers in Generic Productions Modules, Name Spaces, and Parameters Module Modifications Memoization and Transient Productions Grammar and Production Attributes Parser Actions Syntax Here is the syntax of Rats!' grammar modules, expressed in PEG syntax: Module <- Spacing Intro Production* EOF Intro <- ModuleDecl Dependency* Header? Body? Footer? Option? ModuleDecl <- "module" FSpacing ModuleRef SEMICOLON Dependency <- Modification / Instantiation / Import Modification <- "modify" FSpacing ModuleRef ModuleTarget? SEMICOLON Instantiation <- "instantiate" FSpacing ModuleRef ModuleTarget? SEMICOLON Import <- "import" FSpacing ModuleRef ModuleTarget? SEMICOLON ModuleRef <- QName ModuleParams? ModuleParams <- OPEN ( QName (COMMA QName)* )? CLOSE ModuleTarget <- "as" FSpacing QName Header <- "header" Spacing Action Body <- "body" Spacing Action Footer <- "footer" Spacing Action Option <- "option" FSpacing Attribute (COMMA Attribute)* SEMICOLON Production <- Full / Addition / Removal / Override Full <- PAttributes QName Identifier EQUAL Choice SEMICOLON Addition <- QName Identifier PLUSEQUAL ( SName ELLIPSIS SLASH Choice SEMICOLON / Choice SLASH SName ELLIPSIS SEMICOLON ) Removal <- QName Identifier MINUSEQUAL SName ( COMMA SName )* SEMICOLON Override <- QName Identifier COLONEQUAL Choice SEMICOLON / QName Identifier COLONEQUAL ELLIPSIS SLASH Choice SEMICOLON / QName Identifier COLONEQUAL Choice SLASH ELLIPSIS SEMICOLON / PAttributes QName Identifier COLONEQUAL ELLIPSIS SEMICOLON PAttributes <- &(QName Identifier EQUAL) / Attribute PAttributes Choice <- Sequence (SLASH Sequence)* Sequence <- !(SName ELLIPSIS / ELLIPSIS) SName? Voided* Voided <- ("void" Spacing COLON)? Prefix Prefix <- (AND / NOT / CARET / Identifier COLON / StringLit Spacing COLON)? Suffix Suffix <- Primary (QUESTION / STAR / PLUS)? Primary <- NullLit / QName / Literal / NodeMarker / Action / OPEN Choice CLOSE NullLit <- "null" Spacing NodeMarker <- '@' Id Spacing Action <- '{' ActionBody* '}' Spacing ActionBody <- Action / CharLit / StringLit / MLComment / SLComment / !'}' . Attribute <- Identifier (OPEN AttValue CLOSE)? AttValue <- Integer / QName / StringLit Spacing QName <- Id ('.' Id)* Spacing SName <- LESS Id GREATER Identifier <- Id Spacing Id <- [a-zA-Z] [a-zA-Z0-9]* Literal <- ('_' / CharLit / StringLit / Class) Spacing CharLit <- ['] (Escape / !['\\] .) ['] StringLit <- ["] (Escape / !["\\] .)* ["] Class <- '[' (Char '-' Char / Char)* ']' Char <- Escape / ![-\]\\] . Escape <- '\\' [btnfr"'\[\\\]-] / '\\' 'u' HexQuad / OctalEscape OctalEscape <- '\\' ([0-3] OctDigit OctDigit / OctDigit OctDigit / OctDigit) Integer <- (HexNumber / OctNumber / Number) Spacing HexNumber <- '0' [xX] HexDigit+ HexQuad <- HexDigit HexDigit HexDigit HexDigit HexDigit <- [0-9a-fA-F] Number <- '0' / NZDigit Digit* NZDigit <- [1-9] Digit <- [0-9] OctNumber <- '0' OctDigit+ OctDigit <- [0-7] ELLIPSIS <- "..." Spacing PLUSEQUAL <- "+=" Spacing MINUSEQUAL <- "-=" Spacing COLONEQUAL <- ":=" Spacing COMMA <- ',' Spacing EQUAL <- '=' Spacing SLASH <- '/' ![/] Spacing AND <- '&' Spacing NOT <- '!' Spacing CARET <- '^' Spacing COLON <- ':' Spacing QUESTION <- '?' Spacing STAR <- '' Spacing PLUS <- '+' Spacing OPEN <- '(' Spacing CLOSE <- ')' Spacing SEMICOLON <- ';' Spacing LESS <- '<' GREATER >- '>' Spacing Spacing <- (Space / SLComment / MLComment)* FSpacing <- (Space / SLComment / MLComment)+ Space <- ' ' / '\t' / '\f' / EOL SLComment <- "//" (![\n\r] .)* EOL MLComment <- "/" ('' !'/' / !'' .) "/" EOL <- '\r' '\n' / '\r' / '\n' EOF <- !. Note that `QName` stands for "qualified name," `SName` stands for "sequence name," `PAttributes` for "production attributes," `AttValue` for "attribute value," `NZDigit` for "non-zero digit," `FSpacing` for "forced spacing," `SLComment` for "single-line comment," and `MLComment` for "multi-line comment." Overview of Expressions and Operators The biggest difference between parsing expression grammars and the more familiar context-free grammars (CFGs) is the use of ordered* choices instead of symmetric choices. Hence, the different options in a PEG choice (and also a Rats! choice) are separated by a slash `/` instead of a vertical bar `\|`. Furthermore, to emphasize that PEGs define how to parse a language instead of how to generate a language, PEGs use a left arrow `<-` instead of the right arrow `->` used in CFGs. Note that Rats! uses an equal sign instead. Otherwise, PEGs have many of the same expressions and operators as other syntax and grammar formalisms. For example, the any character constant `.` (`_` for Rats!) matches any character in the input, and a character class, as defined by the `[]` operator, matches any of the characters in the class. The option operator `?` makes an expression optional, and the star `` and plus `+` operators indicate zero-or-more and one-or-more repetitions, respectively. Somewhat less common are the and `&` and not `!` operators, which denote syntactic predicates. Expressions in a syntactic predicate must match (for the and `&` operator) or not match (for the not `!` operator) the input, though the corresponding text in the input is not* consumed. Rats! grammars differ from PEGs in that they have additional expressions and operators necessary for generating actual parsers. Most importantly, Rats! grammars include actions that generate semantic values. Actions are surrounded by curly brackets `{}`, just like blocks of code in C, C++, or Java. To access such semantic values, Rats! grammars can also include bindings. Bindings first specify the variable name, followed by a colon `:`, and then the expression to be bound. Additionally, Rats! grammars support semantic predicates, which are denoted by the and `&` operator directly followed by an action (which must be an expression evaluating to a boolean value). Furthermore, Rats! grammars support text matching expressions, which first specify the text to be matched as a string literal, followed by a colon `:`, and then the expression to be matched. A text matching expression `"`text`":`expr is equivalent to the expression fresh-variable`:`expr `&{ "`text`".equals(`fresh-variable`) }` but implemented more efficiently. Rats! grammars also support a voiding operator, which is specified as "`void:`" followed by an expression, node markers, which are specified as an at sign '`@`' immediately followed by an identifier, and parser actions, which are actions prefixed by a caret '`^`'. Voided expressions and node markers help with the automatic generation of abstract syntax trees and are explained here. Parser actions provide a low-level interface for extending parsers generated by Rats! and are explained here. Other differences between PEGs and Rats! grammars include that, as already mentioned Rats! uses an underscore '`_`' as the any character constant, while PEGs use a dot '`.`'. Furthermore, Rats!, just like C/C++/Java, requires that single-quoted literals specify exactly one character, while double-quoted literals may specify an arbitrary number of characters. Escape sequences include basic C/C++/Java escapes such as '`\n`' and '`\\`', Java Unicode escapes such as '`\u00ff`', and '`\[`', '`\-`', and '`\]`' for character class specifications. Rats! grammars also use standard C/C++/Java comments. Note that, while Rats! supports Unicode, it only supports 16 bit characters in the basic multilingual plane (i.e., with code points between U+0000 and U+FFFF). Expressions recognizing supplementary characters (i.e., with code points between U+10000 and U+10FFFF) need to use Java's encoding as a pair of `char` values in the surrogates range. Grammar Modules A Rats! grammar consists of one top-level module, which is the module specified on the command line when invoking Rats!, and zero or more dependent modules. Each module starts with several declarations, whose syntax follows from `Intro` in the Rats! syntax specification above: The `module` declaration specifies the fully qualified name of a grammar module. Optionally, the module may have one or more parameters, which are treated as module names and are replaced with the actual arguments throughout the module, notably in module dependency declarations and in qualified nonterminals (a qualified nonterminal consists of the module name, followed by a dot '`.`', followed by the unqualified nonterminal). Module parameters are discussed in more detail here. Zero or more module dependency declarations specify how the current module depends on other modules. Rats! supports three types of such dependency declarations. First, import dependencies make another module's nonterminals referenceable from within the current module. Second, modification dependencies make another module's productions available for modification in the current module. Each module can have at most one modification dependency. Finally, instantiation dependencies instantiate other modules (typically, modules that require arguments) and make their names available for use in other dependencies. The effects of import and instantiation declarations are discussed in more detail here; module modifications are discussed in detail here. The action code following an optional `header` declaration is copied into the parser just before the parser class declaration. If a grammar spans several modules, all unique header actions are copied into the parser class. The action code following an optional `body` declaration is copied into the body of the parser class. Again, if a grammar spans several modules, all unique body actions are copied into the parser class. The action code following an optional `footer` declaration is copied just after the parser class declaration. As for header and body declarations, if a grammar spans several modules, all unique footer actions are copied into the parser class. An optional `option` declaration specifies parser options. Each option is expressed as an attribute, which is an identifier specifying the attribute's name followed by an optional integer literal, qualified name, or string within parentheses `()` specifying the attribute's value. The currently recognized options are discussed here. A grammar's top-level module (i.e., the module specified on the command line when invoking Rats!) is treated somewhat differently from its dependent modules: By default, the Java package and class names of the generated parser are automatically deduced from the fully qualified name of the top-level module. For example, the default package and class names for module `xtc.parser.PGrammar` would be `xtc.parser` and `PGrammar`, respectively. However, this default can be overriden through the `parser` option (and is, in fact, overriden by that module). Option declarations are outside the module system, and only the top-level module's options are observed by the parser generator. The exceptions are the following options. First, the `stateful` option specifies a global state object and must have the same value across all of a grammar's modules (if it appears at all). Second, the `setOfString` option specifies a global set of strings and has the corresponding field's name as its value; each unique value results in its own global field. Finally the `flag` option specifies a global boolean field and has the corresponding field's name as its value; each unique value results in its own global field. The top-level module must have at least one production that is marked as `public`, which indicates that that production is a top-level production and thus visible outside the parser class. Other modules' public productions are not treated as top-level production for the grammar. Finally, because the top-level module is directly instantiated, it cannot have any parameters. Only dependent modules may have parameters; the corresponding arguments are specified when importing, instantiating, or modifying the module. Similar to the Java compiler automatically loading dependent Java classes, Rats! automatically loads dependent modules from the file system. For example, when looking for the dependent module `xtc.util.Spacing`, Rats! searches for the file `xtc/util/Spacing.rats` on Unix-like systems. One or more root directories for this search can be specified through the `-in` command line option. A grammar's modules can be visualized by using Rats!' `-html` command line option combined with either the `-instantiated`, `-applied`, or `-processed` command line option to specify the stage of Rats!' processing. In all three cases, Rats! creates a hyperlinked version of the grammar at that processing stage. Make sure that the `grammar.css` stylesheet is in the same directory as the generated HTML file(s). The output directory can be controlled with the `-out` command line option. Productions and Semantic Values After the module declarations follow the productions, with each production defining how to parse a nonterminal. Regular modules may only contain full productions, whose syntax follows from `Full` in the Rats! syntax specification above. Module modifications may also contain alternative additions, alternative removals, and production overrides, which are discussed below. Each full production first lists zero or more attributes, then declares the type of the semantic value it returns, and then specifies the name of the nonterminal it defines (which must be unqualified). The definition follows after the equals sign `=` and is terminated by a semicolon `;`. Each full production can either be public, protected, or private, as indicated by the corresponding attribute of the same name. A public production is a top-level production for the grammar if it appears in the top-level module. Otherwise, it is treated as a protected production, which is the default, can be omitted, and makes the production referenceable from other modules. Finally, a private production can only be referenced from within the same module. Note that the use of the `transient` attribute is explained here. All per-production attributes supported by Rats! are discussed here. Further note that types cannot be primitive types, such as `int`; though, as discussed here, `void` is allowed. A production's semantic value is usually defined in an action by setting the `yyValue` variable (so named in deference to the Yacc parser generator) to the desired value. The action code can use characters or strings matched by a literal as well as semantic values returned by nested nonterminals through bindings. As specified above, each binding first declares the variable, followed by a colon, and then the bound expression. For example, this is the production for the `FullProduction` nonterminal from Rats!' own grammar: FullProduction FullProduction = atts:ProductionAttributes type:Name nt:UnqualifiedNonTerminal "=":Symbol choice:Choice ";":Symbol { yyValue = new FullProduction(atts.list(), type, nt, choice); } ; The production first declares the type of the semantic value it returns to be `FullProduction` (which only coincidentally has the same name as the production). In the definition, the semantic value of recognizing the production's attributes is bound to the variable `atts`, the semantic value returned by the `Name` production is bound to the variable `type`, the semantic value returned by the `UnqualifiedNonTerminal` production is bound to the variable `nt`, and the semantic value returned by the `Choice` production is bound to the variable `choice`. The action code then uses these variables to create a new `FullProduction` object and assigns that object to `yyValue`, thus making the newly created object the production's semantic value. In the generated parser, Rats! declares the types of variables as following: Variables bound to the any character constant `_`, to a character literal, or a character class are declared as `char`. Variables bound to a string literal are declared as `String`. Variables bound to a nonterminal are declared to be of the type of the corresponding production. For example, the declared type of the `Name` production from the above example is `String` and, consequently, `type` is declared to be a `String` as well. Similarly, `yyValue` is declared to be of the type of the corresponding production. Passing the Semantic Value Through Sometimes, the semantic value of a production is the same as the semantic value of one of the expressions appearing in that production. To avoid first binding that expression to some variable and then assigning that variable to `yyValue` , it is possible to directly bind `yyValue`. For example, the production for the `Header` nonterminal from Rats!' own grammar makes use of this feature: Action Header = "header":Word yyValue:Action ; An equivalent, but more verbose definition might look like this: Action Header = "header":Word a:Action { yyValue = a; } ; Options, Repetitions, and Nested Choices In general, Rats! lifts options, repetitions, and nested choices into their own, newly generated productions. It also desugars options and repetitions into equivalent productions without the option or repetition operators. Note that nested choices are not lifted if they are the last element in a sequence. Further note that repetitions are not desugared if they appear in a transient production. Finally, options are not desugared if they are not bound or if Rats can automatically deduce the value of the bound optional expression. This lifting and desugaring raises several questions: What is the semantic value of an expression before applying the option or repetition operator? Similarly, what is the semantic value of a nested choice? For options and repetitions, what is the semantic value after applying the option or repetition operator? Finally, for each of these expressions, what is the type of the newly generated production? We now answer these questions in turn. To determine the semantic value of an expression before applying the option or repetition operator and that of a nested choice, Rats! generally requires that each expression has its own, embedded action that defines the corresponding value. As a result, the definition of a production may contain several actions that all assign values to `yyValue`, even in nested expressions. However, for some expressions, Rats! is able to automatically deduce their semantic values. Notably, if a nonterminal is optional or repeated, the semantic value of the expression before applying the option or repetition operator simply is the semantic value of the nonterminal's production. Similarly, if a nonterminal is the only expression appearing in an option, the semantic value for that option simply is the semantic value of the nonterminal's production. In these cases, no embedded actions are necessary. For options, the semantic value after applying the option operator is the semantic value of the expression, if that expression can be matched in the input. Otherwise, the overall semantic value is simply `null`. For repetitions, the semantic value after applying the repetition operator is a {@link xtc.util.Pair}. Pairs are used to implement singly-linked lists and contain the semantic values of the repeated expressions. In the case of zero matches, the pair is the special {@link xtc.util.Pair#EMPTY empty pair}, which is also accessible through the type-safe {@link xtc.util.Pair#empty()} method. Rats! uses pairs as they allow for efficient addition of an element to the front of a sequence of pairs. Note that pairs can easily be converted into a Java Collections Framework list by calling {@link xtc.util.Pair#list()} (as illustrated in the production for Rats! productions shown above). The type of a newly generated production representing a desugared option generally is `Object`. However, if Rats! can automatically determine the semantic value of an optional expression, it uses the more specific type (which, in the case of desugared optional nonterminals, is the type of the corresponding production). The type of a newly generated production representing a desugared repetition always is `xtc.util.Pair`. Finally, the type of a newly generated production representing a lifted choice always is `Object`. To illustrate the general case of repetitions and nested choices, consider the following snippet from Rats!' own grammar, which is taken from the `Terminal` production and parses character class specifications: / '[' l:( c1:ClassChar '-' c2:ClassChar { yyValue = new CharRange(Utilities.unescape(c1).charAt(0), Utilities.unescape(c2).charAt(0)); } / c1:ClassChar { yyValue = new CharRange(Utilities.unescape(c1).charAt(0)); } )* ']' Spacing { yyValue = new CharClass(l.list()); } The nested choice has its own actions, which create character ranges as their semantic values. The parser collects these character ranges into a sequence of pairs, which is then bound to the `l` variable. The outer action uses this sequence of pairs to create a new character class as the overall semantic value (converting the sequence of pairs into a Java Collections Framework list along the way). To illustrate the automatic deduction of a semantic value, consider the production for sequences from Rats!' own grammar: Sequence Sequence = !Ellipsis n:SequenceName? l:Voided* { yyValue = new Sequence(n, l.list()); } ; A sequence essentially consists of zero or more repetitions of the `Voided` nonterminal; no embedded action is necessary for the repetition. Rats! automatically collects the individual semantic values into a sequence of pairs. The optional sequence name preceding the repeated `Voided` nonterminal simply is an identifier between less-than `<` and greater-than `>` signs and is used when modifying productions. It can also be used for documentation, as Rats! tries to preserve it throughout its internal processing, so that it will be included in the generated parser (within a comment). The corresponding optional expression does not need an embedded action either. The syntactic predicate `!Ellipsis` excludes sequence names followed by an ellipsis `...` or an ellipsis by itself from sequences, as they are used to modify productions. Void and Text-Only Productions Adding actions to create a production's semantic value can seem tedious and may make grammars unnecessarily verbose. To reduce the need for explicit actions, Rats! supports special types of productions, which need no semantic actions at all. We now discuss void and text-only productions, which are especially useful for recognizing lexical syntax (such as identifiers, punctuation, numbers, and so on). A void production is a production that declares the type of its semantic value to be `void`. Such a production does not require any actions, but it may not be bound to an identifier either. A void production is typically used for ignored spacing or punctuation elements. Void productions also improve the accuracy of Rats!' deduction of a compound expression's semantic value. If the compound expression only contains a single non-void nonterminal, that nonterminal's semantic value is used as the semantic value of the entire expression. Here is an example void production from Rats!' own grammar: transient void Spacing = (Space / LineTerminator / TraditionalComment/ EndOfLineComment)* ; This production matches space characters, line terminators, and comments. Note that the `transient` keyword is explained here. The fact the `Spacing` is declared as `void` is then used in the production for `Symbol`: String Symbol = SymbolCharacters Spacing ; No action is necessary because only the `SymbolCharacters` nonterminal produces a semantic value. Rats! automatically detects this case and passes the value of `SymbolCharacters` through. In general, Rats! tries to deduce a compound expression's value, even if it is nested within another expression, ignoring nonterminals referencing a void production or explicitly voided expressions. Furthermore, for repetitions, options, and nested choices that do not appear as the last expression in a sequence, it treats the entire repetition, option, or nested choice as void, if all component expressions are nonterminals referencing void productions, explicitly voided expressions, or predicates. For example, if `nt` references a void production, then the expression "`nt`" is equivalent to "`void:(nt)`" (note that the parentheses are not strictly necessary). A text-only production is a production that declares the type of its semantic value to be a `String`, does not contain any bindings to `yyValue` nor actions that reference `yyValue` and references only other text-only productions (if any). The semantic value of a text-only production is the text matched in the input, represented as a `String`. Note that the above `Symbol` production is not text-only because it references a void production. Also note that bindings (besides to `yyValue`) and semantic predicates may appear in text-only productions. To illustrate text-only productions, consider the following productions from Rats!' own grammar: String StringLiteral = ["] ( EscapeSequence / !["\\] _ )* ["] ; transient String EscapeSequence = '\\' ( [btnfr\"\'\-\[\\\]] / 'u' HexNumber ) ; transient String HexNumber = HexDigit HexDigit HexDigit HexDigit ; transient String HexDigit = [0-9a-fA-F] ; The `StringLiteral` production only references terminals and other productions that are also text-only. As a result, its semantic value is the text matched in the input, exactly what we expect from a production that matches string literals. Note that the `transient` keyword is explained here. A note on bindings within text-only productions: When binding to a character terminal (the any character constant, a character class, or a character literal), the bound variable is a `char` (just as for all other productions). However, for all other expressions (including options, repetitions, and nested choices), the value of the bound variable always is a string representing the text matched by that expression in the input. For options that cannot be matched in the input, the value is the empty string (and not `null`). Generic Productions Void and text-only productions typically help with recognizing lexical syntax; though, they are of limited use for productions that recognize hierarchical syntax, such as the productions for `FullProduction`, `Header`, or `Sequence` shown above. Generic productions help simplify productions for hierarchical syntax. Just as for void and text-only productions, no explicit semantic actions are required. A generic production is a production that declares `generic` as its type and returns a semantic value that is a {@link xtc.tree.GNode generic node}. Rats! automatically creates the production's semantic value: its name is the name of the production, and its children are the semantic values of the component expressions in the recognized sequence, with the exception of any voided expressions (using the `void:` operator), void nonterminals, and character terminals, which are ignored. For example, we could rewrite the production for `FullProduction` as a generic production: generic FullProduction = ProductionAttributes Name UnqualifiedNonTerminal void:"=":Symbol Choice void:";":Symbol ; ; The rewritten production has a generic node as its semantic value; the grammar author thus does not need to define a `FullProduction` class to represent the semantic value. Furthermore, the rewritten production is equivalent to the following production with an explicit semantic action: Node FullProduction = v1:ProductionAttributes v2:Name v3:UnqualifiedNonTerminal "=":Symbol v4:Choice ";":Symbol { yyValue = GNode.create("Production", v1, v2, v3, v4); } ; The two symbol references are not bound because they are explicitly voided. Options, repetitions, and nested choices in generic productions are treated just like the corresponding expressions in regular productions. In other words, they may be desugared and lifted into their own productions. As a result, they require explicit actions to create their semantic values, unless, of course, Rats! can automatically determine the corresponding values. List-Valued Productions If a grammar has the `flatten` attribute, pairs resulting from a repeated expression are treated somewhat differently in generic productions: the values on the list are added to the production's generic node as individual children. For example, consider this production for recognizing parameter lists in C: generic ParameterList = ParameterDeclaration (void:",":Symbol ParameterDeclaration)* ; Because the semantic values of the embedded repetition are added as invidividual children to the production's generic node, that node's children are all parameter declarations, which is exactly the desired behavior. Alternatively, Rats! can automatically deduce the semantic value of productions with a `Pair` type. The production's value is a list of all of a sequence's component expressions. For example, consider this rewritten production for recognizing parameter lists: Pair<Node> ParameterList = ParameterDeclaration (void:",":Symbol ParameterDeclaration)* ; Beacuse the declared type is `Pair<Node>`, Rats! automatically deduces the semantic value of the production. It is a list containing the value of the first parameter declaration followed by the value of the repetition. The production is equivalent to the following production using an explicit semantic action: Pair<Node> ParameterList = head:ParamterDeclaration tail:(void:",":Symbol ParameterDeclaration)* { yyValue = new Pair<Node>(head, tail); } ; As illustrated by this version, Rats! recognizes when the last component expression in a list-valued production already has a list value and turns that value into the overall list's tail. If the last component expression does not have a list value, the overall list's tail is the empty list. Whether to (1) use the `flatten` attribute so that list elements are added as individual children to generic nodes or (2) use list-valued productions to create lists and preserve them as generic nodes' children is a trade-off. With the `flatten` attribute, ASTs are more uniform, typically consisting of only generic nodes and strings. In contrast, without the `flatten` attribute, generic nodes may also contain lists of, say, nodes or strings. However, flattening lists may result in the loss of information. For example, consider this production for Java compilation units: public generic CompilationUnit = Spacing PackageDeclaration? ImportDeclaration* Declaration* EndOfFile ; When lists are flattened, there is no way to distinguish import declarations from regular declarations without looking at the actual children. In contrast, when lists are not flattened, the generic node for compilation units has exactly three children, corresponding to package, import, and regular declarations respectively. Additionally, the preservation of lists in generic nodes enables more exact typing of ASTs through Rats!' `-ast` option. Consequently, the recommended practice is not to use the `flatten` attribute. Left-Recursive Productions To simplify the specification of productions that recognize expressions at different precedence levels, void, text-only, and generic productions may contain direct left-recursions; though arbitrary left-recursions are illegal in Rats! grammars just as for PEGs. Directly left-recursive productions are automatically transformed into equivalent right-iterative productions, while preserving the left-recursive structure of the original production's semantic value. As an example, consider the following generic production for recognizing logical and expressions in C: generic LogicalAndExpression = LogicalAndExpression void:"&&":Symbol BitwiseOrExpression / BitwiseOrExpression ; This directly left-recursive production is equivalent to the following two productions, which leverage so-called actions (i.e., promises) to create the semantic values: Node LogicalAndExpression = seed:BitwiseOrExpression actions:LogicalAndExpressionTail* { yyValue = apply(actions, seed); } ; Action<Node> LogicalAndExpressionTail = "&&":Symbol right:BitwiseOrExpression { yyValue = new Action<Node>() { public Node run(Node left) { Node result = GNode.create("LogicalAndExpression", left, right); result.setLocation(location(yyStart)); return result; }}; } ; The `LogicalAndExpressionTail` expression creates a list of {@link xtc.util.Action actions}, with each action on the list creating a generic node that is annotated with the source location corresponding to the start of the production. The `LogicalAndExpression` production uses {@link xtc.parser.ParserBase#apply(Pair,Object)} to apply each action in the list onto the result of the previous action, starting with the `seed` value for the first action. If the list is empty, the seed value is simply passed through, which is the desired behavior. By using actions and thus delaying the creation of generic nodes, the rewritten productions ensure that the resulting abstract syntax tree preserves left-associativity. Note that actions can also be used outside of generic productions to create left-associative abstract syntax trees. In the previous example, the use of a list of actions results in the correct treatment of the base case: no new generic node is created, rather the semantic value of the bitwise or expression is simply passed through. When specifying productions that do not use actions, grammar writers can achieve similar results by explicitly binding `yyValue`. For alternatives in a generic production that assign `yyValue` either through a binding or a semantic action, Rats!* does not create a new generic node but rather uses the explicitly specified semantic value. Node Markers in Generic Productions Also in the previous example, there is only one recursive alternative; consequently, the production's name, i.e., "`LogicalAndExpression`", provides a meaningful name for the newly created generic node. However, languages such as C contain more than one left-recursive operator at the same precedence level. Using the features described so far, grammar developers can either write one directly left-recursive generic production for all operators, thus using the same name for all operators' generic nodes, or they can write several right-iterative productions that create the generic nodes through explicit semantic actions. Neither option is particularly attractive and node markers provide a more elegant solution. They are written as an at sign '`@`' immediately followed by an identifier and specify a generic node's name for nodes generated by the sequence they appear in. As an example, consider the following generic production for recognizing postfix expressions in C: generic PostfixExpression = <Subscript> PostfixExpression void:"[":Symbol Expression void:"]":Symbol @SubscriptExpression / <DirectSelection> PostfixExpression void:".":Symbol Identifier @DirectComponentSelection / <IndirectSelection> PostfixExpression void:"->":Symbol Identifier @IndirectComponentSelection / <FunctionCall> PostfixExpression void:"(":Symbol ExpressionList? void:")":Symbol @FunctionCall / <Increment> PostfixExpression void:"++":Symbol @PostincrementExpression / <Decrement> PostfixExpression void:"--":Symbol @PostdecrementExpression / <Compound> CompoundLiteral / <Primary> PrimaryExpression ; In a single directly left-recursive production, the example specifies all of C's postfix expressions. Yet, each recursive alternative contains a distinct node marker, thus resulting in a differently named generic node. Internally (and as described above), Rats! converts the directly left-recursive production into the corresponding right-iterative version, which uses actions to create left-recursive AST nodes. Note that the last two alternatives do not require node markers (nor bindings to `yyValue`) because they provide the base cases for the recursion. Modules, Name Spaces, and Parameters The intent behind Rats!' module system is to facilitate the re-use and extensibility of grammar specifications. Even without parameters and modifications, basic modules help, as they allow a grammar writer to break up a grammar into several units. All modules exist in the same global name space. Without parameters and modifications, this name space consists simply of the module names as specified by the corresponding `module` declarations. With parameters and modifications, this name space consists of the module names of all instantiated modules (with instantiation being the process of providing arguments to parameterized modules). In contrast to module names, nonterminals are only meaningful in relation to a specific module. Without any `import` or `modify` declarations, a module can only reference the nonterminals defined in that module. In the presence of `import` or `modify` declarations, a module can also reference the public and protected nonterminals defined by imported modules and all nonterminals defined by modified modules. Nonterminals may be ambiguous, e.g., the same nonterminal may be defined within a module and an imported module or in several imported modules. Rats! gives precedence to nonterminals defined in the same module as the nonterminal reference. For example, if module `Foo` defines nonterminal `Name` and imports module `Bar`, which also defines nonterminal `Name`, then a reference to `Name` appearing in one of `Foo`'s productions is interpreted to mean `Foo`'s `Name`. `Bar`'s `Name` can still be referenced by using the qualified notation `Bar.Name`. Furthermore, if module `Foo` does not define `Name` but imports `Bar` and `Baz`, which both define `Name`, any reference to `Name` in `Foo` must be qualified, writing `Bar.Name` and `Baz.Name` respectively. Note that, while qualified nonterminals are globally unique and thus always unambiguous, a nonterminal, whether qualified or not, can only be used if the corresponding production's module is the same module, imported by the referencing module, or modified by the referencing module. Parameterized modules improve re-use when compared to basic modules, as the same parameterized module can be used with different actual dependencies. For example, module `xtc.util.Symbol` takes one parameter for a module defining spacing: module xtc.util.Symbol(Spacing); import Spacing; String Symbol = SymbolCharacters Spacing ; transient String SymbolCharacters = ..elided.. ; As a result, this module can be instantiated with different forms of spacing, without changing `xtc.util.Symbol`. At the same time, an instantiation must provide a module that specifies a void nonterminal `Spacing` for the resulting module to be meaningful. A parameterized module is instantiated by specifying the corresponding arguments in the corresponding `import` declaration: import xtc.util.Symbol(xtc.util.Spacing); This `import` declaration without a target name is equivalent to the following declaration with a target name: import xtc.util.Symbol(xtc.util.Spacing) as xtc.util.Symbol; In either case, all occurrences of the parameter module name `Spacing` are replaced by the specified argument module name `xtc.util.Spacing`. Furthermore, all occurrences of the module name itself are replaced by the specified target name (which, in this example, is the same as the module name). Finally, the instantiated module becomes the sole module of that name in the global module name space. No other instantiation with the same target name is possible, unless it instantiates exactly the same module with the same arguments. Note that the nonterminal `Spacing` in the production for `Symbol` is not renamed during instantiation, as it denotes a nonterminal and not a module. However, if the nonterminal was written as `Spacing.Spacing`, it would be renamed to `xtc.util.Spacing.Spacing`. Further note that `xtc.util.Symbol` could be instantiated with a different argument within the same grammar, as long as that instantiation's target name is different. Finally, note that once instantiated, other modules can reference the instantiated module through its target name. For example, after module `xtc.util.Symbol` has been instantiated through either of the above `import` declarations, it can also be imported as following: import xtc.util.Symbol; This import declaration, when appearing in a dependent module, references the same instantiated module. While module parameters and arguments can only be module names and never more complex expressions, sometimes more complex arguments are desirable. For example, a grammar writer may want to instantiate `xtc.util.Symbol` with a more complex module `my.Spacing`, which requires its own argument `my.Argument`. In this case, the grammar writer can use an `instantiate` declaration, which instantiates a module but does not make its nonterminals accessible from within the instantiating module. In the example, the grammar writer would use the following declarations: instantiate my.Spacing(my.Argument) as Spacing; import xtc.util.Symbol(Spacing); Module dependencies are resolved through a breadth-first search starting with the top-level module. In other words, a dependent module's dependencies are only processed after all the dependencies of the depending module have been resolved. As a result, the order of `import`, `instantiate`, and `modify` declarations in a module does not matter and mutually dependent modules can be instantiated within the same module. For example, `xtc.lang.CSpacing` has a parameter for a constant module and `xtc.lang.CConstant` has a parameter for a spacing module, with each module importing the parameterized module. To instantiate these two mutually dependent modules, module `xtc.lang.C` simply declares (with some intermediate declarations omitted and ignoring the additional `xtc.lang.CState` argument): instantiate xtc.lang.CConstant(xtc.lang.CSpacing); instantiate xtc.lang.CSpacing(xtc.lang.CState, xtc.lang.CConstant); Because it uses breadth-first search, Rats! processes these two `instantiate` declarations before processing the corresponding `import` declarations in `xtc.lang.CConstant` and `xtc.lang.CSpacing`. By the time Rats reaches the `import` declarations in the dependent modules, the modules have been instantiated and no arguments are necessary. Module Modifications To further facilitate re-use and extensibility, Rats! supports module modifications, which concisely express extensions to a grammar and, in addition to full productions, can also contain alternative additions, which add new alternatives to a production's top-level choice, alternative removals, which remove alternatives from a production's top-level choice, and production overrides, which can override an entire production, specific alternatives, or a production's attributes. All three types of partial production, whose syntax is specified above and illustrated below, depend on the different alternatives in a production's top-level choice having sequence names. Consequently, it is good practice to always name your sequences! A module modification must contain a single `modify` declaration, which specifies the module to be modified. For example, the following declaration specifies that module `xtc.lang.JavaSymbol` modifies module `Symbol`: module xtc.lang.JavaSymbol(Symbol); modify Symbol; The example uses a module parameter to ensure that module `xtc.lang.JavaSymbol` can be applied to different versions of `Symbol`. The resulting module contains all full productions appearing in both the modifying and modified modules (in the example, `xtc.lang.JavaSymbol` and `Symbol`, respectively). Furthermore, all productions in the resulting module are modified as specified by the alternative additions, alternative removals, and production overrides in the modifying module. Modifications are applied in the same order as they appear in the modifying module (in the example, `xtc.lang.JavaSymbol`). The resulting productions are the only versions; i.e., all nonterminals, whether they originally appear in the modifying module or the modified module reference the modified productions. The resulting module's options are the options of the modifying module, with exception of any `stateful`, `setOfString`, and `flag` options, which are preserved if they appear in the modified module's options. Furthermore, the modifying module's header, body, and footer actions are combined with the modified module's actions (if they exist). An alternative addition adds an expression before or after an existing sequence. For example, the following addition adds sequence `s` after sequence `<Bar>` in production `Foo`: Type Foo += <Bar> ... / <NewSequence> s ; Similarly, the following addition adds the new sequence before sequence `<Bar>`: Type Foo += <NewSequence> s / <Bar> ... ; Note that the new expression may actually consist of several sequences and not only one as suggested by the examples. Further note that the type and nonterminal must both be specified and match an existing full production. An alternative removal removes one or more sequences from a production. For example, the following removal eliminates sequences `<Bar>` and `<Baz>` from production `Foo`: Type Foo -= <Bar>, <Baz> ; As before, the type and nonterminal must both be specified and match an existing full production. A production override can replace all alternatives, only specific alternatives, or a production's attributes. For example, the following override replaces all alternatives of production `Foo` with expression `e`: Type Foo := e ; In contrast, this override only replaces alternatives `<Bar>` and `<Baz<`: Type Foo := ... / <Bar> s1 / <Baz> s2 ; Finally, this override replaces `Foo`'s attributes with the `public` attribute: public Type Foo := ... ; Note that the list of attributes may be empty, thus removing all attributes from the full production. Memoization and Transient Productions Packrat parsers process their input in time linear to the size of the input, even if they need to backtrack. The reason they can do this is that they store, or memoize, the result of trying each nonterminal at every input position – hence the name "packrat" parser. The benefits of memoization are best illustrated with a production from Rats!' own grammar: Element Suffix = ( p:Primary "?":Symbol { yyValue = new Option(p); } / p:Primary "":Symbol { yyValue = new Repetition(false, p); } / p:Primary "+":Symbol { yyValue = new Repetition(true, p); } / Primary ) ; The `Suffix` production matches possible suffix expressions in the input, with each of the four alternatives in the choice first trying to parse a primary expression. Assume that the current input contains a primary expression without* a suffix (i.e., the fourth alternative is the alternative that will succeed). If the parser did not memoize its results, the primary expression would be parsed four times, once for each alternative, resulting in suboptimal performance. However, because parsers generated by Rats! memoize their results, the primary expression is parsed exactly once. Each subsequent alternative simply looks up the result of the first parse. Note that it is possible to rewrite the production for `Suffix` to not require backtracking. An alternative expression, without semantic actions, might look like this: Primary ( Question / Star / Plus / /* Empty / ) Further note that other top-down parsers, such as JavaCC and ANTLR, may not require backtracking, even with a comparable production to the one used in Rats!' own grammar as they support (limited) lookahead. However, the original form clearly and concisely expresses the intent behind this production and, because parsers generated by Rats!* memoize their results, does not require any fiddling with lookahead (as would be necessary for other top-down parsers). In theory, packrat parsers create a two-dimensional array, with characters in the input stream along the x-axis and nonterminals along the y-axis. In practice, not all values in this array are calculated (many of them represent mismatched nonterminals anyway) and the array is sparse. To avoid allocating memory for the entire array, parsers generated by Rats! break each column into several chunks. Memory for the different chunks is allocated independently and on demand (i.e., if a parsing result needs to be stored). Furthermore, for productions that are only referenced once in the entire grammar the corresponding row is never allocated. Grammar writers have further control over memory allocation by declaring productions to be `transient`: if a production is transient, the corresponding row is not allocated either. The transient keyword can thus reduce a parser's memory footprint and also improve its performance. However, it is also dangerous because, if overused, the resulting parser may need to reparse the same input for the same nonterminal several times and thus perform in time superlinear to the size of the input. So, which productions should be declared transient? In other parsers, the parser processes tokens generated by a separate lexer instead of accessing the character stream directly. Packrat parser grammars thus need to include productions that build up tokens. Examples include the `Escape`, `HexNumber`, and `HexDigit` productions in Rats!' own grammar (shown above). Such productions can be declared transient, as the parser typically does not backtrack at the level of individual characters. Furthermore, productions that cover ignored input characters, notably all productions that cover white space and comments, can be declared transient as well. However, when in doubt, it is best to not declare a production as transient, or to perform performance studies on several inputs to see how such declarations impact the parser's memory footprint and performance. Grammar and Production Attributes Rats! supports a number of options, which are either specified as a grammar-wide option in the top-level module's introduction or as per-production attributes: `public`, `protected`, and `private` instruct Rats! to make a production a top-level production, referenceable from other modules, and referenceable only from the same module, respectively. These attributes do not have values. Furthermore, these attributes can only be specified as per-production attributes. The default is `protected` and can be omitted from productions. `transient` instructs Rats! not to memoize the corresponding production. The attribute does not have a value and can only be specified as a per-production attribute. `memoized` instructs Rats! to always memoize the corresponding production. The attribute does not have a value and can only be specified as a per-production attribute. `inline` instructs Rats! not to memoize the corresponding production. For productions that are neither void nor text-only, it also instructs Rats! to inline the production if the corresponding nonterminal appears as the only element in an ordered choice's alternative. The attribute does not have a value and can only be specified as a per-production attribute. `noinline` instructs Rats! not to inline the production, even if it is (recognized as) transient. The attribute does not have a value and can only be specified as a per-production attribute. `constant` instructs Rats! to make all bindings constant and thus unmodifiable. The attribute does not have a value. This attribute can be specified either as a grammar-wide attribute or as a per-production attribute. `withLocation` instructs Rats! to annotate all semantic values that are instances of {@link xtc.tree.Node} with their locations in the source file. The attribute does not have a value. This attribute can be specified either as a grammar-wide attribute or as a per-production attribute. `stateful` instructs Rats! to include code for managing global parsing state through a {@link xtc.util.State state object}. This attribute is used both as a grammar-wide attribute and a per-production attribute. The grammar-wide attribute value specifies the class managing the global state (which must have a no-argument constructor). Rats! includes a static final field `yyState`, which references an instance of that class, in the generated parser. For a production with this attribute, the attribute does not have a value and Rats! includes the appropriate calls to {@link xtc.util.State#start()}, {@link xtc.util.State#commit()}, and {@link xtc.util.State#abort()}. Note that the state object must be reset explicitly with the per-production `resetting` attribute. Further note that, if a grammar spans several modules, all the module's stateful options must specify the same class name. `resetting` instructs Rats! to include code to {@link xtc.util.State#reset(String) reset} the global state object. The attribut does not have a value. This attribute can only be specified as a per-production attribute and requires a grammar-wide `stateful` attribute. `ignoringCase` instructs Rats! to perform comparisons for string matches in a case-insensitive manner. The attribute does not have a value. This attribute can be specified either as a grammar-wide attribute or as a per-production attribute. `flatten` instructs Rats! to add the elements of list values (i.e., instances of {@link xtc.util.Pair}) to generic nodes instead of adding the list values themselves. This attribute does not have a value and can only be specified as a grammar-wide attribute. `variant` instructs Rats! that all generic nodes returned by the production as a semantic value are variants of the same type. This attribute does not have a value and can only be specified as a per-production attribute. `withParseTree` instructs Rats! to generate a parse tree instead of an abstract syntax tree. This attribute is specified as a grammar-wide attribute and does not have a value. For a grammar with this attribute, Rats! rewrites all generic, text-only, and void productions as well as productions that pass the semantic value through to preserve the input in the form of {@link xtc.tree.Formatting} annotations. The embedded AST generally has the same structure as for the grammr without the attribute. The exception are strings, which are represented as instances of {@link xtc.tree.Token}. Furthermore, generic nodes include additional children consisting of `Formatting` annotating `null` values if voided expressions or void nonterminals appear between two non-void repetitions. `verbose` instructs Rats! to print debugging information to the console for each parser method invocation. The attribute does not have a value. This attribute can be specified either as a grammar-wide attribute or as a per-production attribute. `nowarn` instructs Rats! to suppress warnings during parser generation. The attribute does not have a value; it can be specified either as a grammar-wide attribute or as a per-production attribute. `parser` instructs Rats! to use the fully qualified class name specified by the attribute's value as the Java class name for the generated parser, instead of deducing that name from the module's name. This attribute can only be specified as a grammar-wide attribute. `factory` instructs Rats! to use the (optionally qualified) class name specified by the attribute's value as the factory for creating generic nodes instead of using `xtc.tree.GNode`. This attribute can only be specified as a grammar-wide attribute. `visibility` instructs Rats! to make the generated parser class either public or package private, depending on the attribute's value (`public` or `packagePrivate`). This attribute can only be specified as a grammar-wide attribute. The default for modules without this attribute is public. `rawTypes` instructs Rats! to use raw types instead of generic types. Besides a "`@SuppressWarnings("unchecked")`" annotation for the parser class, the generated code is compatible with Java 1.4. The attribute does not have a value and can only be specified as a grammar-wide attribute. `main` instructs Rats! to include a static main method that parses one or more input files and prints the corresponding results. The value of this attribute must be the name of the top-level nonterminal to parse. This attribute can only be specified as a grammar-wide attribute. `printer` instructs Rats! to use the visitor specified by the attribute's value for printing the semantic value returned by a successful parse. This attribute can only be specified as a grammar-wide attribute and requires that the grammar also has the `main` attribute. `setOfString` instructs Rats! to include a static final set of strings with the attribute's value as its name. Rats! also includes a convience method for filling sets `add(Set,T[])`. This attribute can only be specified as a grammar-wide attribute. `flag` instructs Rats! to include a public static final boolean field with the attribute's value as its name. The field's value is `true`. This attribute can only be specified as a grammar-wide attribute. `genericAsVoid` instructs Rats! to treat generic productions as void productions. The attribute does not have a value and can only be specified as a grammar-wide attribute. `dump` instructs Rats! to include a method for dumping a plain-text representation of the parser's memoization table to a printer: `dump({@link xtc.tree.Printer})`. The attribute does not have a value and can only be specified as a grammar-wide attribute. A note on licensing: Parsers generated with the `main`, `printer`, and `dump` attributes link with code licensed under the GNU GPL version 2. As a result, parsers generated with these attributes may not be used in software that is not compatible with the GPL version 2. To generate parsers that are not restricted by the GPL, use the `-lgpl` option when running Rats!. Parser Actions Some languages cannot be expressed by PEGs or Rats! grammars. For example, the on-the-wire format for SPKI/SDSI represents byte strings as one or more digits, followed by a colon, followed by the number of characters specified by the sequence of digits (when interpreted as a decimal integer); but parsing a specific number of characters cannot be expressed by PEGs. Parser actions allow grammar writers to still use Rats! for such languages by providing a low-level extensibility mechanism. Parser actions work as follows. Before executing the code in a parser action, the variable `yyBase` is assigned the index of the current parser position. The parser action can then use this index together with the parser's methods to further consume the input, parsing the expression not expressible by PEGs. When finished, it assigns the corresponding result to `yyResult`, which must either be a {@link xtc.parser.SemanticValue} object indicating a successful parse or a {@link xtc.parser.ParseError} object indicating a parse error. In the case of a semantic value object, that object's semantic value becomes the semantic value of the production, unless another action after the parser action changes it. The following grammar illustrates the use of a parser action for parsing byte strings as used by SPKI/SDSI. Note that the `yyStart` reference in the parser action is the parser position at the beginning of the production and is used for indicating the position of an error. module ByteString; body { Result parseChars(String number, int start, int base) throws IOException { int n; try { n = Integer.parseInt(number); } catch (NumberFormatException x) { return new ParseError("Malformed length", start); } StringBuilder buf = new StringBuilder(n); for (int i=0; i<n; i++) { int c = character(base + i); if (c != -1) { buf.append((char)c); } else { return new ParseError("Unexpected end of bytestring", base + i); } } return new SemanticValue(buf.toString(), base + n); } } option main(ByteString); public String ByteString = n:Integer ':' ^{ yyResult = parseChars(n, yyStart, yyBase); } ; String Integer = [0-9]+ ;
Java Source File Name	Type	Comment
Action.java	Class	A semantic action.
ActionBaseValue.java	Class	Element to set the semantic value to the result of applying a list of xtc.util.Action actions .
AlternativeAddition.java	Class	An alternative addition.
AlternativeRemoval.java	Class	An alternative removal.
Analyzer.java	Class	Utility for analyzing and modifying grammars.
Annotator.java	Class	Visitor to inject source code annotations into a grammar. This visitor distinguishes between bindable elements, i.e., elements that would contribute to the value of a generic production, and valuable element, i.e., elements that can be rewritten to have a semantic value.
AnyChar.java	Class	The any character element.
Binding.java	Class	A binding of a grammar element to a variable.
BindingValue.java	Class	Element to set the semantic value to a binding.
CharCase.java	Class	A case within a character switch.
CharClass.java	Class	A character class terminal.
CharLiteral.java	Class	A character literal.
CharRange.java	Class	A character range for a character classs.
CharSwitch.java	Class	A character switch terminal.
CharTerminal.java	Class	The superclass of all character terminals.
Checker.java	Class	Visitor to check a grammar for consistency.
ChoiceExpander.java	Class	Visitor to expand choices by inlining productions.
CodeGenerator.java	Class	The code generator.
Column.java	Class	The superclass of memoization table columns.
Copier.java	Class	Visitor to copy grammar nodes.
CostEstimator.java	Class	Visitor to provide a conservative estimate for the cost of parsing a production.
DeadProductionEliminator.java	Class	Visitor to eliminate dead productions.
DirectLeftRecurser.java	Class	Visitor to transform direct left-recursions into equivalent right-recursions.
DuplicateProductionFolder.java	Class	Visitor to fold duplicate productions into a single production.
Element.java	Class	The superclass of all grammar elements.
ElementVoider.java	Class	Visitor to void ordered choices, repetitions, and options.
EmptyListValue.java	Class	Element to set the semantic value to a new empty list.
EquivalenceTester.java	Class	Visitor to test productions and elements for equivalence.
FollowedBy.java	Class	A followed-by predicate.
FullProduction.java	Class	A complete production.
GenericActionValue.java	Class	Element to set the semantic value to an xtc.util.Actionaction .
GenericNodeValue.java	Class	Element to set the semantic value to a xtc.tree.GNodegeneric node .
GenericRecursionValue.java	Class	Element to set the semantic value to a list of xtc.util.Action actions .
GenericValue.java	Class	The superclass of all value elements specifying a generic node as the semantic value.
GenericVoider.java	Class	Visitor to turn generic productions into void productions.
Generifier.java	Class	Visitor to add generic nodes as semantic values. For any production with pseudotype "`generic`" that does not contain any direct left-recursions (which is also called a generic node production), this visitor adds the appropriate GenericNodeValue generic node value elements, which create a xtc.tree.GNode generic node as the productions' semantic value.
Grammar.java	Class	A grammar represented as a collection of modules.
GrammarVisitor.java	Class	The parent class of all grammar module visitors.
HtmlPrinter.java	Class	A grammar pretty printer producing HTML.
Inliner.java	Class	Visitor to inline productions.
InternalElement.java	Interface	The marker interface for internal grammar elements.
LeftRecurser.java	Class	Visitor to detect left-recursion in a grammar.
ListMaker.java	Class	Visitor to add lists as semantic values.
ListValue.java	Class	The superclass of all value elements specifying a list as the semantic value.
MetaData.java	Class	The meta-data for a production necessary for code generation.
MetaDataCreator.java	Class	Visitor to create the production meta-data.
MetaDataSetter.java	Class	Visitor to fill in the production meta-data.
Module.java	Class	A grammar module.
ModuleDependency.java	Class	A grammar module dependency.
ModuleImport.java	Class	A grammar module import.
ModuleInstantiation.java	Class	A grammar module instantiation.
ModuleList.java	Class	A list of module names as a node.
ModuleMap.java	Class	A mapping between module names.
ModuleModification.java	Class	A grammar module modification.
ModuleName.java	Class	A module name as a node.
Name.java	Class	The superclass of all names.
NodeMarker.java	Class	Element to set a generic node's name.
NonTerminal.java	Class	A nonterminal.
NotFollowedBy.java	Class	A not-followed-by predicate.
NullLiteral.java	Class	A null literal representing a bindable null value.
NullValue.java	Class	Element to set the semantic value to `null`.
Option.java	Class	An optional grammar element.
OrderedChoice.java	Class	An ordered choice of grammar elements.
PackratParser.java	Class	The superclass of all packrat parsers.
ParseError.java	Class	An erroneous parse.
ParseException.java	Class	An exception representing a ParseError parse error .
ParserAction.java	Class	A parser action.
ParserBase.java	Class	The base class for packrat parsers.
ParseTreeNode.java	Class	Element to create a parse tree node capturing formatting.
PartialProduction.java	Class	A partial production.
PParser.java	Class	Packrat parser for grammar `xtc.parser.PGrammar`.
PParserState.java	Class	Parser state for parsing Rats! grammars.
Predicate.java	Class	The superclass of all predicates.
PrefixFolder.java	Class	Visitor to combine common prefixes into a single sequence with an embedded choice.
PrettyPrinter.java	Class	The grammar pretty printer.
Production.java	Class	The superclass of full and partial productions.
ProductionOverride.java	Class	A production override.
ProductionVoider.java	Class	Visitor to turn the semantic value of a production to void.
ProperListValue.java	Class	Element to set the semantic value to a list.
Properties.java	Class	Definition of node property names.
Quantification.java	Class	The superclass of repetitions and options.
Rats.java	Class	The command line interface to Rats!, the packrat parser generator for Java.
ReachabilityChecker.java	Class	Visitor to ensure that every alternative is reachable.
ReferenceCounter.java	Class	Visitor to fill in meta-data reference counts.
Renamer.java	Class	Visitor to rename nonterminals.
Repetition.java	Class	A repeated grammar element.
Resolver.java	Class	Visitor to resolve grammar module dependencies. Note that this visitor TextTester marks text-only productions as such.
Result.java	Class	The superclass of all parser results.
RootFinder.java	Class	Visitor to identify a grammar's real root.
SemanticPredicate.java	Class	A semantic predicate.
SemanticValue.java	Class	A semantic value.
Sequence.java	Class	A sequence of grammar elements. A sequence may have an optional name.
SequenceName.java	Class	A sequence name.
Simplifier.java	Class	Visitor to simplify a grammar.
StringLiteral.java	Class	A literal string terminal.
StringMatch.java	Class	A match of a string to a grammar element.
StringValue.java	Class	Element to set the semantic value to a string.
Terminal.java	Class	The superclass of all terminals.
TerminalOptimizer.java	Class	Visitor to optimize the recognition of terminals by introducing character switches.
TextTester.java	Class	Visitor to identify text-only productions.
Tokenizer.java	Class	Visitor to recognize token-level productions. This visitor recognizes the boundary between hierarchical and lexical syntax: This visitor recognizes all lexical productions.
TokenValue.java	Class	Element to collect a production's text as a token.
Transformer.java	Class	Visitor to transform productions.
TransientMarker.java	Class	Visitor to detect productions that can be treated as transient. Note that this visitor requires that a grammar's productions have been ReferenceCounter reference counted .
TreeExtractor.java	Class	Visitor to pear down a grammar to the structure of its abstract syntax tree.
TreeTyper.java	Class	Visitor to determine a grammar's tree structure.
UnaryOperator.java	Class	The superclass of all unary operators.
ValueChecker.java	Class	Visitor to ensure that every alternative has a semantic value.
ValueElement.java	Class	The superclass of all value elements.
VariantSorter.java	Class	Visitor to infer a grammar's variants. This visitor assumes that the entire grammar is contained in a single module.
VoidedElement.java	Class	A voided element.

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.

xtc.parser

Syntax

Overview of Expressions and Operators

Grammar Modules

Productions and Semantic Values

Passing the Semantic Value Through

Options, Repetitions, and Nested Choices

Void and Text-Only Productions

Generic Productions

List-Valued Productions

Left-Recursive Productions

Node Markers in Generic Productions

Modules, Name Spaces, and Parameters

Module Modifications

Memoization and Transient Productions

Grammar and Production Attributes

Parser Actions