Text Driver for Scriptella.
It allows querying a text file based on regular expressions, the text driver
can also be used as a lightweight replacement for Velocity to produce
simple output with properties substitution.
Text driver does depends on additional libraries and is generally faster than CSV or Velocity driver.
Note: The driver doesn't use SQL syntax
General information
Driver class: | scriptella.driver.text.Driver |
URL: | Text file URL. URIs are resolved relative to a script file directory.
If url has no value the output is read from/printed to the console (System.out). |
Runtime dependencies: | None |
Driver Specific Properties
Name |
Description |
Required |
encoding |
Specifies charset encoding of Text files. |
No, the system default encoding is used. |
eol |
End-Of-Line suffix. Only valid for <script> elements. |
No, the default value is \n . |
trim |
Value of true specifies that the leading and trailing
whitespaces in text file lines should be omitted.
| No, the default value is true . |
flush |
Value of true specifies that the outputted content should flushed immediately when
the <script> element completes.
| No, the default value is false . |
skip_lines |
The number of lines to skip before start reading.
| No, the default value is 0 (no lines are skipped). |
Query Syntax
Text driver supports Regular expressions syntax to query text files.
The file is read line-by-line from the location specified by the URL connection property and each line is matched
against the regex pattern.
If a line or a part of it matches the pattern this match produces a virtual row in a result set.
The column names in a virtual result set correspond to matched regex group names.
For example query foo(.*) matches foobar line and the produced
result set row contains two columns(groups): 0-foobar, 1-bar. These columns
can be referenced in child script or query elements by a numeric name or by a string name columnN .
It also possible to specify more than one regular expressions to match file content.
Specify each regular expression on a separate line to match them using OR condition.
The Text driver uses java.util.regex implementation for pattern matching. See java.util.Pattern
for supported syntax Javadoc.
Additional notes:
- Regular expressions matching is case-insensitive
- Empty query selects all lines from the input file.
- The
0 (zero) column name in the produced result set contains the matched line.
- Leading and trailing whitespaces in query element and input file lines are trimmed by default.
- Use ^ and $ boundary matchers to match the whole line.
Example:
<query>
^ERROR: (.*)
WARNING: (.*Failed.*)
([\d]+) errors?
</query>
This query consists of 3 regular expressions:
- selects lines starting with
ERROR: prefix
- selects
WARNING lines having Failed substring
- selects lines containg a number of errors, e.g. "Found 5 errors".
The query selects any line satisfying one of these 3 regular expressions.
Suppose input file has the following content:
Log file started...
INFO: INIT
WARNING: CPU is slow
WARNING: Failed to increase heap size
ERROR: Process interrupted
Operation completed with 1 error.
As the result of query execution the following set of rows is produced:
0 |
1 |
WARNING: Failed to increase heap size |
Failed to increase heap size |
ERROR: Process interrupted |
Process interrupted |
1 error |
1 |
Script Syntax
The <script> element content is read line-by-line, for each line
properties are expanded and the output is sent to the file specifed by a url connection attribute.
Additional notes:
- Lines in the outputted file are separated by a EOL string specified by
eol connection property.
- Leading and trailing whitespaces in the output file lines are trimmed by default.
- No escaping is performed when properties are expanded. Use String.replace or other escaping techniques to
achieve output similar to CSV etc.
- If a script is executed multiple times (e.g. inside a parent query) the output is appended to the file content.
Example:
<script>
Inserted a record with ID=$id. Table=${table}
</script>
For id=1 and table=system this script produces the following output:
Inserted a record with ID=1. Table=system
Properties substitution
In text script and query elements ${property} or $property syntax is used for properties/variables substition.
Examples
<connection id="in" driver="text" url="data.csv">
</connection>
<connection id="out" driver="text" url="report.csv">
</connection>
<script connection-id="out">
ID;Priority;Summary;Status
</script>
<query connection-id="in">
<script connection-id="out">
$rownum;$column0;$column1;$column2
</script>
</query>
Copies rows from data.csv file to report.csv, additionally the ID column is added.
The result file is semicolon separated.
|