The XForm query and transformation language.
This package provides a querying and transformation facility for abstract syntax
trees (ASTs). The language, XForm, loosely follows the syntax and semantics
of the XPath 2.0 XML query language, but with added support for AST transformations.
However, there are some key differences between the XForm language and XPath,
so a careful reading of this document is essential.
Also note that example XForm queries can be found here.
Additional examples can be found in the xform/samples subdirectory.
The Language
Program Structure
XForm is a declarative language that can be used to query or transform an AST.
An XForm script, or query, is composed of a series of one or more comma-separated
expressions, each returning a sequence of tree items. If the query is composed
of a single expression, its result is the value of the expression. However,
if a query is composed of more than one expression, its result will be a list
of the values of each expression (a list of lists). Note that internally, the
query engine implicitly flattens all lists of lists.
It is also important to note that the result of each expression in a list
of expressions provides the focus (discussed below) for the expression that
follows it
Path Expressions
All XForm expressions generate ordered sets of tree items, called sequences
A tree item may be a node within the AST, a string, or null . Nested
sequences are not permitted.
The most basic expression in XForm is the path expression. A path expression
produces a sequence of all items in an AST that conform to a specified template.
The format of a path expression template is:
Path expression<- "/" RelativePathExpression?
/ "inside_out"? "//" RelativePathExpression
/ RelativePathExpression
RelativePathExpression <- RelativePathExpression ("/" / "//") StepExpression
/ StepExpression
StepExpression <- ItemTest PredicateList / ".."
ItemTest <- PrimaryExpression / Identifier / "*"
PredicateList <- Predicate*
Predicate <- "[" (IntegerLiteral / Expression / FunctionCall ) "]"
PrimaryExpression <- "." / StringLiteral / VariableReference / FunctionCall
/ ParenthesizedExpression
The template begins with an optional path specifier - the initial focus of
an expression. The focus of an expression can be thought of as the environment
that an expression is evaluated in - it represents the context for each step
of an expression. So in E1/E2 or E1 [E2] , the value
of E1 is computed, which then acts as the focus for computing the
value of E2 .
If the specifier is / , then the search will take place relative
to the AST root. For example, the template/FOO/BAR will return
all BAR nodes whose parent is a FOO node and whose
grandparent is the AST root node. If the specifier is // , then
all items that satisfy the template will be considered (so //FOO
will collect all FOO nodes in the AST). If the keyword inside_out
is prepended to a path expression using // , items will be chosen
based on a bottom-up, breadth-first traversal of the AST. If the path specifier
is omitted, then the nodes collected will have to be relative to the first step
expression in the template (so $x/FOO will collect all FOO
children of the nodes in the sequence $x ). If a step expressions
are separated by// , the latter expression will be matched against
all descendents of the former, rather than just its children. (So, //FOO//BAR
will collect all FOO descendents of all BAR nodes).
The individual steps of a path expression are known as step expressions. There
are two different kinds of step expressions -forward and reverse. A forward
step expression filters the children or the descendents of the current focus,
whereas the reverse step expression, .. , moves back up the tree,
to the level of the focus' parents. (E.g.. //FOO/.. would collect
the parents of every FOO node in the AST).
Of the forward step expressions, tests include matching against the names of
an item (as in /FOO/BAR or/FOO/"j" ), matching all
nodes (/FOO/* ), or matching against a primary expression.
A primary expression represents a value. There are five kinds.
- The context item,
. , which represents the current tree item
being processed (so //FOO/. would resolve to all FOO
nodes in the tree).
- A string literal, such as
"baz" .
- A variable reference, such as
$stat . Variable references begin
with a dollar-sign, and represent a previously bound sequence of items.
- A function call, such as
test($a,$b) , which can be used to
call an external function, and whose value and parameters are all sequences.
Instructions on how to add external functions programmed in Java are described
later.
- A parenthesized expression, such as
(//CompoundStatement) whose
value is that of the expressions it contains, concatenated together.
A step expression may contain one or more predicates. A predicate takes the
form of [ Expression ] , and represents a sequence. Each predicate
will intersect its sequence with the current focus, producing a new focus for
the evaluation of any subsequent predicates. The value of the final intersection
is the value of the step expression.
A predicate containing an integer literal (beginning at one) represents an
index into the current focus. So //FOO [1] would return the first
FOO node in the AST.
When using predicates, it is important to note that within a predicate, the
focus becomes that of the current step expression. So, for example, the
value of //FOO/BAR [BAZ] , is a sequence containing all BAR
nodes with FOO parents and BAZ children.
Also note that tree traversals are done in a depth-first, in-order manner.
If you would like to do an "inner" traversal, that is, traverse the tree in
a bottom-up manner, you may use the inner keyword. For example,
replace inner //ForStatement with FOO<>
will replace all ForStatement nodes in the AST with FOO
nodes, but the order of replacement will be inside-out.
In addition to searching an AST, XForm provides facilities for manipulating
sequences. These facilities include binding variables, looping through sequences,
conditionals, creating new items and evaluating logical operations over sequences.
Binding Variables
To bind a sequence to a variable for later use, one can use a let
expression. The syntax of a let expression is as follows:
LetExpression <- "let" VariableBindingList "return" SingleExpression
VariableBindingList <- VariableBinding ("," VariableBinding)*
VariableBinding <- VariableReference "be" SingleExpression
A let expression binds one or more sequences to variables for the duration
of its single expression. For example,
let $f be //ForStatement, $cs be //CompoundStatement return ($f, $cs)
would return a sequence composed of all the for statements and all the compound
statements in an AST (a parenthesized expression concatenates the results of the
expressions within).
Looping
Two options exist to iterate through sequences - for and cfor .
Their formats are:
ForExpression <- "for" IterativeBindingList "return" SingleExpression
CForExpression <- "cfor" IterativeBindingList "return" SingleExpression
IterativeBindingList <- IterativeBinding ("," IterativeBinding)*
IterativeBinding <- VariableReference "in" SingleExpression
A for statement iterates through one or more sequences, binding the resultant
singletons to the specified variables (the bindings hold for the duration of
the expression's single expression) and evaluating its single expression. If
more than one sequence is specified to iterate through, the iterations will
be implicitly nested. So,
for $x in //FOO, $y in //BAR return $x union $y
is equivalent to
for $x in //FOO return for $y in //BAR return $x union $y
A cfor statement acts like a for statement, except that it iterates each of
its bound variables concurrently, halting when one of them reaches its end.
In either case, the resulting sequence of a looping expression is the concatenation
of each sequence that it returns.
Conditionals
Conditional expressions in XForm take the following form:
IfExpression <- "if" SingleExpression "then" SingleExpression "else" SingleExpression
Conditional expressions evaluate the first single expression. If that expression
is a non-empty sequence, the second single expression is evaluated and its value
is returned. Otherwise, the third single expression is evaluated and its value
is returned.
Set Operations
Set operations XForm take the following form:
UnionExpression <- UnionExpression "union" IntersectionExpression
IntersectionExpression <- IntersectionExpression "intersect" PathExpression
DifferExpression <- DifferExpression "differ" LogicalExpression
A union expression returns the union of two sequences, whereas an intersection
expression returns their intersection and the differ expression returns their
difference. Note that these operations are based on identity (so union is not
a concatenation). If you wish to concatenate two single expressions, wrap them
in a parenthesized expression
Logical Operations
There are two supported logical operation expressions, or and .
An or expression allows you to group together a series of pathexpressions
and select the value of the first path expression that returns a non-null sequence.
For example, the query:
//Foo or //Bar
will select the sequence returned by //Foo if that sequence is non-null.
Should the sequence be null, the results of //Bar will be used.
An and expression also lets you group together a series of pathexpressions.
If the value of each path expression in the series is non-null, all of the values
are concatenated together into a single sequence and then returned. Otherwise,
a null value is returned. For example, the query:
//Foo and //Bar
will return each Foo and Bar node in the tree, should the tree contain both Foo and Bar nodes. Otherwise,
it will return a null value.
Note that these logical operations can be embedded inside of a pathexpression
by way of parenthesized expressions. For example,
//Foo/(Bar or Zoo)
will return all Bar children of Foo nodes, should any
exist. Otherwise, it will attempt to return any Zoo children of Foo
nodes. Whereas,
//Foo/(Bar and Zoo)
will return any Bar nodes with Foo parents and Zoo
siblings, and any Zoo nodes with Foo parents and Bar
siblings, should both sequences be non-null.
Generating New Items
New tree items may be generated with a new item expression. A newitem expression
creates a singleton sequence containing a new tree item. It takes the following
form:
NewItemExpression <- "null" / StringLiteral / NewNodeExpression
NewNodeExpression <- Identifier "<" Children ">"
Children <- Child ("," Child)*
Child <- "null" / NewNodeExpression / SingleExpression / /* empty */
Note that the last type of child is blank, which means that the newly created
node will have no children (which is not the same as having a null child).
So FOO<> would create a FOO node with no children,
whereas FOO<null> would create a FOO node with
a single null child.
Transforming ASTs
Modifications to abstract syntax trees are done using replacement expressions,
removal expressions, addition expressions, or insertion expressions.
Users transform ASTs in one of
two ways. The first is to query the AST for items to replace and then replacing
them with new or existing items. The second way is to query the AST for insertion
position markers and then inserting new or existing items.
The format of a replacement expression is:
ReplacementExpression <- "replace" SingleExpression "with" SingleExpression
The value of a replacement expression is a sequence containing the items that
have been inserted (or moved) within the tree, otherwise, if no replacements
have been made, it's an empty sequence. For example,
for $f in //ForStatement replace $f with CompoundStatement<>
would replace all of the for statements in an AST with new compoundstatements,
whereas
for $f in //ForStatement replace $f with //CompoundStatement [1]
would replace each for statement in the tree with one the tree's first compound
statement. Note that a sequence need not be bound to a variable for its items
to be replaced, so
replace //BAR with FOO<>
would replace any BAR nodes in the AST with FOO nodes.
Note that in the case of a replacement expression, in the context of a for
expression, you can omit the "return".
The format of a remove Expression is:
"remove" SingleExpression
For example, remove //IfStatement removes all IfStatements from
the AST.
The format of an add Expression is:
"add" SingleExpression "to" SingleExpression
For example, add Foo<> to //Bar adds a Foo node (as the last child)
to all Bar nodes.
The format of an insert expression is:
insert SingleExpression before SingleExpression
insert SingleExpression after SingleExpression
For example, insert Foo<> before //Bar<> would create
and insert new Foo<> items before every Bar node in the AST.
The value of an insertion operation is a sequence containing the items inserted,
or an empty sequence if no insertions are made. insert expression operate only on
none empty sequences. To illustrate, insert Foo<> before //Block/*[1]
adds a Foo<> node before the first child in a Block node. It will not, however, add
a Foo<> node to a an empty Block. This behaviour, if desired, can be implemented with
addtional commands; for example, replace empty(//Block<>) with Block< Foo<> >
Extending XForm
A user may add functionality to XForm by way of external functions. To add
an external function to XForm, one must do the following:
Example Queries
Example0: Using Xform queries to find empty IfStatements
1. //example0.java
2. class example1{
3. public void bar(){
4. Boolean foo = true;
5. if( foo ){
6. //empty!
7. }
8. int i = 0;
9. if( foo ) //no braces!
10. i++; //not empty
11. if( !foo ); //empty!!
12. }
13. }
In example0.java above, the IfStatements on lines 5 and
6 are empty. To find them using xform, we must first identify the AST structure
of empty IfStatement s. Using the xform driver's -preAST
function, we deduce that empty IfStatements come in two flavours:
either an IfStatement item with an empty Block child or an
IfStatement item with an EmptyStatement child. The query
1. #find all the emptyIfStatements
2. empty(//IfStatement/Block)/.. union //IfStatement[ EmptyStatement ]
will return all emptyIfStatement . The XForm library function empty
returns sequence items that have no children so empty(//IfStatement/Block)
will return all empty blocks belong to IfStatement . Adding /..
will return the parents of those blocks. //IfStatement[ EmptyStatement
] returns all IfStatement items that have an EmptyStatement
child. The union of the two expressions gives all Statements in the program.
For additional feedback, the library function lines() can be used to report
the line and column information of the items identified. Executing the with
query
1. #example0.xform
2. #find all the emptyIfStatements
3. lines( empty(//IfStatement/Block)/.. union //IfStatement[ EmptyStatement ] )
with java xtc.xform.Driver -java example0.xform example0.java would display:
example0.java: 5:7
example0.java: 11:7
Example1: Using XForm transformations to ensure that all IfStatements in a Java program have braces.
1. //example1.java
2. class example1{
3. public void bar(){
4. Boolean foo = true;
5 if( foo ){
6. //this one is good
7. }
8. int i = 0;
9. if( foo ) //no braces!
10. i++;
11. }
12. }
In example1.java above the IfStatement on line 8 violates the coding
convention that states all if statements should have curly braces. To correct
this using xform transformations, we must first query the program's AST to find
all braceless IfStatement . Using the xform Driver's preAST command
line option (or alternately figuring it out from the grammar), we realize that
the AST representation of a braceless if Statement does not have a Block enclosing
it's body.
The IfStatement on line 5 is represented as
IfStatement<
PrimaryIdentifier<"foo">,
Block<>
>
while the one on line 8 is represented as
IfStatement<
PrimaryIdentifier<"foo">,
ExpressionStatement<
PostfixExpression<
PrimaryIdentifier<"i">,
PostincrementTail<>
>
>
>
We can find braceless IfStatements with the XForm query
//IfStatement differ IfStatement[ Block ]
The complete query to add missing braces is
1. #example1.xform
2 #add missing braces toIfStatement
3. for $i in //IfStatement differ //IfStatement[ Block ] return
4. replace $i with IfStatement< $i/*[1], Block< $i/*[2] > >
Line 3 in the addifbrace.xform iterates over each Blockless if statement
in the AST Line 4 creates a new IfStatement item with two children and replaces
the original. The first child in the new item is the same as the first child
of the original IfStatement - meaning the if( foo ) part of the
statement is rewritten/preserved as if( foo ) . The second child
is a new Block item containing the second child of the original IfStatement.
The command java xtc.xform.Driver -java -printJava example1.xform
example1.java produces the output
1. class example1{
2. public void bar(){
3. Boolean foo = true;
4 if ( foo ){
5. }
6. int i = 0;
7. if ( foo ){
8. i++;
9. }
10. }
11. }
Example2: Transforming for loops to enforce 'Must Have Braces' rule.
The procedure for transforming ForStatement s similar to the IfStatement
transformation in Example1. The difference, however, is that the first 3 children
of each Blockless ForStatement must be preserved in the rewrite.
This can be done with the query
1. #add missing forStatement braces
2. for $f in //ForStatement differ //ForStatement[ Block ] return
3. replace $f with ForStatement< $f/*[1], $f/*[2], $f/*[3], Block< $f/*[4]> >
or by using XForms subsequence() library function in the query
1. #example2.xform
2. for $f in //ForStatement differ //ForStatement[ Block ] return
3. replace $f with ForStatement< subsequence( $f/*, 1, 3 ), Block< $f/*[4] > >
1. //example2.java
2. class example2{3. public void bar(){
4. int j = 0;
5. for( int i = 0; i < 10; i++ )
6. j++;
7.
8. for(;;)
9. j--;
10. }
11.}
On the file example2.java (shown above), the query example2.xform
produces the following transformation.
1. class example2{
2. public void bar(){
3. int j = 0;
4. for( int i = 0; i < 10; i++ ){
5. j++;
6. }
7. for (;;){
8. j--;
9. }
10. }
11.}
Example3: Using XTC and Xform to create an simple extension to Java
(JavaProperty), that adds C# like properties. Note that this example does not
have the full set of C# property features. Note also that all code in this example
can be found in xform/samples/javaproperty
In our extension, we wish to write property declarations in the form "property
int foo". A desugaring transformation convert the JavaProperty code to
Java code that contains generate accessors methods for this variable in this
declaration. For example the JavaProperty code:
1. //sample.jprop2. class sample{3. property int foo;4. }
should desugared to the following Java code
1. //sample.java2. class sample{
3. private int foo;
4. public void setfoo( int val ){
5. foo = val;
6. }
7. public int getfoo(){
8. return foo;
9. }
10. }
First we use xtc to define a grammar for the extension. This involves adding
"property" as a keyword and as a Modifer.
The JavaPropertyKW.Rats file below adds 'property' as a keyword to the existing
list of Java keywords.
1. //JavaPropertyKW.rats
2. module xtc.xform.samples.javaproperty.JavaPropertyKW;
3. import xtc.lang.JavaIdentifier(xtc.lang.JavaSymbol, xtc.util.Spacing);
4.
5. body {
6. static{
7. add(JAVA_KEYWORDS, new Object[] { "property" });
8. }
9. }
The JavaProperty.Rats file below modifies the existing Java core grammar (JavaCore)
to add 'property' as a Modifier. For more details on grammar modification see:
Rats!
1. //JavaProperty.Rats
2. module xtc.xform.samples.javaproperty.JavaProperty;
3. instantiate xtc.lang.JavaType(xtc.lang.JavaIdentifier, xtc.lang.JavaSymbol);
4. instantiate xtc.lang.JavaConstant(xtc.lang.JavaIdentifier, xtc.util.Spacing);
5. instantiate xtc.lang.JavaIdentifier(xtc.lang.JavaSymbol, xtc.util.Spacing);
6. instantiate xtc.util.Symbol(xtc.util.Spacing);
7. instantiate xtc.lang.JavaSymbol(xtc.util.Symbol);
8. instantiate xtc.util.Spacing;
9. import xtc.xform.samples.javaproperty.JavaPropertyKW;
10.
11. modify xtc.lang.JavaCore(xtc.lang.JavaType, xtc.lang.JavaConstant, xtc.lang.JavaIdentifier, xtc.lang.JavaSymbol,xtc.util.Spacing, xtc.util.Null );
12. option withLocation, constant, parser(xtc.xform.JavaPropertyParser);
13.
14. String Modifier += <Strictfp> ... / "property":Word;
We can now generate a parser for our JavaProperty grammar by typing 'make'
in xform/samples/javaproperty to execute the following commands:
java xtc.parser.Rats -in ~/java/src JavaProperty.rats
javac -source 1.4 -d ~/java/classes -sourcepath ~java/src JavaPropertyParser.java
Finally, using the Xform query JPTrans.xform shown below, we can transform
JavaProperty code to Java code with the command:
java xtc.xform.Driver -printJava -parser xtc.xform.samples.javaproperty.JavaPropertyParser
JPTrans.xform sample.jprop
This command line instructs the XForm engine to 1. parse sample.jprop
using the JavaPropertyParser 2. Execute the query JPTrans.xform
3. Pretty print the resulting code using xtc's Java PrettyPrinter
JPTrans.xform is explained as follows: Line 5 finds all property declarations.
Lines 6-18 inserts getter methods after each property declaration. Line 7 makes
the method public. Line 8 sets the methods return type to be the same as the
field type. Line 9 sets the method name as the concatenation of "get"
and the field's name. Lines 14-16 add a return statement with the field name
as an identifier. Lines 21-45 add a setter method after each property declaration.
This is similar to the xform code for adding getter methods, the main differences
are the return types and an assignment statement instead of a Return statement.
Last, lines 49-55 rewrite all property declarations to private field declarations
with the same name and type.
1. #JPTrans.xform
2. #XFrom desugaring from JavaProperty to pure Java.
3.
4. #add a getter method for the field
5. for $i in //FieldDeclaration/Modifiers/"property"/../.. return
6. insert MethodDeclaration<7. Modifiers<"public">,
8. $i/*[2],
9. concat( "get", $i/Declarators/Declarator/*[1] ),
10. FormalParameters<>,
11. null,
12. null,
13. Block<
14. ReturnStatement<
15. PrimaryIdentifier<$i/Declarators/Declarator/*[1]>
16. >
17. >
18. > after $i,
19.
20. #add a setter method for the field
21. for $i in //FieldDeclaration/Modifiers/"property"/../.. return
22. insert MethodDeclaration<
23. Modifiers<"public">,
24. VoidTypeSpecifier<>,
25. concat( "set", $i/Declarators/Declarator/*[1] ) ,
26. FormalParameters<
27. FormalParameter<
28. null,
29. $i/*[2],
30. "val",
31. null
32. >
33. >,
34. null,
35. null,
36. Block<
37. ExpressionStatement<
38. Expression<
39. PrimaryIdentifier<$i/Declarators/Declarator/*[1]>,
40. "=",
41. PrimaryIdentifier<"val">
42. >
43. >
44. >
45. > after $i,
46.
47. #replace the property declaration with a private field declaration
48. for $i in //FieldDeclaration/Modifiers/"property"/../.. return
49. replace $i with FieldDeclaration<
50. Modifiers<"private">,
51. $i/*[2],
52. Declarators<
53. Declarator< $i/Declarators/Declarator/*[1], null, null >
54. >
55. >
56.
|