org.apache.derby.iapi.types |
Derby Type System
Derby Type System
The Derby type system is mainly contained in the org.apache.derby.iapi.types
package. The main two classes are DataValueDescriptor
and
DataTypeDescriptor.
DataValueDescriptor
Values in Derby are always represented by instances of org.apache.derby.iapi.types.DataValueDescriptor,
which might have
been better named DataValue.
DataValueDescriptor, or
DVD for short, is
mainly used to represent SQL data values, though it is used for other
internal types. DataValueDescriptor
is a Java
interface and in general all values are manipulated through interfaces
and not the Java class implementations such as SQLInteger.
DVDs are mutable
(their value can change) and can represent NULL or a valid value. Note
that SQL NULL is
represented by a DataValueDescriptor
with a state of NULL,
not a null Java
reference to a DataValueDescriptor.
Generally the Derby
engine works upon an array of DVD's that represent a row, which can
correspond to a row in a table, a row in a ResultSet to be returned to
the application or an intermediate row in a query. The DVD's within
this array are re-used for each row processed, this is why they are
mutable. For example in reading rows from the store a single DVD
is used to read a column's value for all the rows processed. This is to
benefit performance, thus in a table scan of one million rows Derby
does not create one million objects, which would be the case if the
type system was immutable, like the Java object wrappers
java.lang.Integer
etc.
The methods in DataValueDescriptor
can be broken into
these groups
- getXXX methods
to help implement java.sql.ResultSet.getXXX
methods. Thus a ResultSet.getInt()
corresponds to the DataValueDescriptor.getInt()
method.
- setValue methods to help implement java.sql.PreparedStatement.setXXX
methods. Thus a PreparedStatement.setInt()
corresponds to the DataValueDescriptor.setValue(int)
method. These methods perform overflow and other range checks, e.g.
setting a long into a SQLInteger checks
to see that the value is within the range of an int, if not
an exception is thrown.
These methods are also used to implement casts and other data type
conversion, thus ensuring consistent type conversions for Derby within
SQL and JDBC.
- Methods to support SQL operators, e.g. isNull.
- Methods to read and write to disk, or strictly to convert the
value into a byte representation, e.g.writeExternal.
Type Specific
Interfaces
To support operators specific to a type, or set of types, Java
interfaces that extend DataValueDescriptor exist:
- NumberDataValue -
Methods for operators on numeric types, such as INTEGER, DECIMAL, REAL, e.g. plus for the SQL +
operator.
- StringDataValue -
Methods for operators on character types, such as CHAR, VARCHAR.
- BitDataValue -
Methods for operators on binary types, such as CHAR FOR BIT DATA, BLOB.
- DateTimeDataValue -
Methods for operators on character types, such as CHAR, VARCHAR.
- BooleanDataValue -
Methods for operators on BOOLEAN
type.
Language Compilation
Much of the generate code for language involves the type system. E.g.
SQL operators are converted to method calls on interfaces within the
type system, such as DataValueDesciptor
or NumberDataValue. Thus all
this generated code makes method calls through interface method calls.
The language has a policy/style of generating fields with holder
objects for the result of any operation. This holder
DataValueDescriptor is
then re-used for all the operations within that
query execution, thus saving object creation when the operation is
called on multiple rows. The generated code does not create the initial
value for the field, instead the operator method or DataValueFactory
methods create instance the first time that the result is passed
in as
null. The approximate Java
code for this would be (note the generator
generates to byte code directly).
// instance field to
hold the result of the minus
private
NumberDataValue f7;
...
// code within
a generated method
f7 = value.minus(f7);
Interaction with
Store
The store knows little about how values represent themselves in bytes,
all that knowledge is contained within the DVD implementation.
The exception is SQL NULL handling,
the store handles NULL values
consistently, as a null bit in the status byte for a field. Thus
readExternal and writeExternal are never called
for a
DataValueDescriptor that
is NULL.
Delayed Object Creation
When a value reads itself from its byte representation it is required
that the least amount of work is performed to obtain a useful
representation of a value. This is because the value being read from
disk may never be returned to the application, or returned but never
used by the application. The first case can occur when a qualification
in the SQL statement is executed at the language layer and not pushed
down to the store, thus the row is fetched from the store but filtered
out at the language layer. Taking SQLDecimal
as an example, the byte
format is a representation of a java.math.BigInteger instance
along with a scale. Taking the simple approach that SQLDecimal would
always use a java.math.BigDecimal, then this is the steps that would
occur when reading a DECIMAL column:
- Read BigInteger format
into byte array, read scale
- New BigInteger instance
from byte array - 2 object creations
and byte array copy
- New BigDecimal instance
from BigInteger and scale - 1 object
creation
Now think about a million row table scan with a DECIMAL column that
returns 1% of the rows to the application, filtering at the language
layer.
This simple SQLDecimal implementation
will create 3 million objects and
do 1 million byte array copies.
The smart (and current) implementation of SQLDecimal will delay steps 2
and 3 until there is an actual need for a BigDecimal object, e.g when
the application calls ResultSet.getBigDecimal.
So assuming the
application calls getBigDecimal for
every row it receives, then, since
only 1% of the rows are returned, 30,000 objects are created and 10,000
byte copies are made, thus saving 2,970,000 object creations and
990,000 byte array copies and the garbage collection overhead of those
short lived objects.
This delayed object creation increases the complexity of the
DataValueDescriptor implementation,
but the performance benefit is well
worth it. The complexity comes from the implementation maintaining dual
state, in SQLDecimal case
the value is represented by either the raw
value, or by a BigDecimal object.
Care is taken in the implementation
to always access the value through methods, and not the fields
directly. String based values such as SQLChar also perform this
delayed
object creation to String, as creating a String object requires two
object creations and a char array copy. In the case of SQLChar though,
the raw value is maintained as a char array and not a byte array, this
is because the char[] can be used as-is as the value, e.g. in string
comparisons.
DataValueFactory
Specific instances of DataValueDescriptor
are mostly created through
the DataValueFactory interface.
This hides the implementation of types
from the JDBC, language and store layers. This interface includes
methods to:
- generate new NULL values
for specific SQL types.
- generate specific types from Java primitves or Java objects (such
as String). The returned type corresponds to the JDBC mapping for the
Java type, e.g. SQLInteger for
int. Where the Java
type can map to
multiple SQL types there are specific methods such as getChar,
getVarchar.
DataTypeDescriptor
The SQL type of a column, value or expression is represented by an
instance of org.apache.derby.iapi.types.DataTypeDescriptor. DataTypeDescriptor contains three key
pieces of information:
- The fundamental SQL type, e.g. INTEGER, DECIMAL, represented by a
org.apache.derby.iapi.types.TypeId.
- Any length, precision or scale attributes, e.g. length for CHAR, precision & scale for
DECIMAL.
- Is the type nullable
Note that a DataValueDescriptor is
not tied to any DataTypeDescriptor,
thus setting a value into a DataValueDescriptor
that does not conform to the intended DataTypeDescriptor is allowed.
The value is checked in an explict normalization phase. As an example,
an application can use setBigDecimal()
to set 199.0 to a
parameter that is marked as being DECIMAL(4,2).
Only on the execute phase will the out of range exception be raised.
Issues
Interfaces or Classes
Matching the interface type hierachy is a implementation (class)
hierachy complete with abstract types, for example DataType (again
badly named) is the abstract root for all implementations of
DataValueDescriptor, and NumberDataType for NumberDataValue. Code would
be smaller and faster if the interfaces were removed and the official
api became the public methods of these abstract classes. The work
involved here is fixing the code generation involving types, regular
java code would be compiled correctly with any change, but the
generated code needs to be change by hand, to change interface calls to
method calls. Any change like this should probably rename the abstract
classes to short descriptive names, liker DataValue and NumberValue.
DataValueFactory
There is demonstrated need to hide the implementation of DECIMAL as
J2ME, J2SE and J2SE5 require different versions, thus a type
implementation factory is required. However it seems to be too generic
to have the ability to support different implementations of INTEGER,
BIGINT and some other fundemental types. Thus maybe the code could be
simplified to allow use of SQLInteger, SQLLong and others directly. At
least the SQL types that are implemented using Java primitives.
Result Holder Generation
The dynamic creation of result holders (see language section) means
that all operators have to check for the result reference being passed
in being null, and if so create a new instance of the desired type.
This check seems inefficient as it will be performed once per
operation, again, imagine the million row query. In addition the field
that holds the result holder in the generated code is assigned each
time to the same value, inefficient. It seems that the code using the
type system, generated or coded, can set up the result holder at
initialization time, thus removing the need for the check and field
assignment, leading to faster smaller code.
NULL and operators
The operators typically have to check for incoming NULL values and
assign the result to be NULL if any of the inputs are NULL. This
combined with the result holder generation issue leads to a lot of
duplicate code checking to see if the inputs are NULL. It's hard to
currently do this in a single method as the code needs to determine if
the inputs are NULL, generate a result holder and return two values (is
the result NULL and what is the result holder). Splitting the operator
methods into two would help as at least the NULL checks could be in the
super-class for all the types, rather than in each implementation. In
addition this would lead to the ability to generate to a more efficient
operator if the inputs are not nullable. E.g for the + operator there
could be plus() and plusNotNull() methods, the plus() being implemented in the
NumberDataType class, handling NULL inputs and calling plusNotNull(),
with the plusNotNull() implemented in the specific type.
Operators and self
It seems the operator methods should almost always be acting on thier
own value, e.g. the plus() method should only take one input and the
result is the value of the receiver (self) added to the input.
Currently the plus takes two inputs and probably in most if not all
cases the left input is the receiver. The result would be smaller code
and possible faster, as the method calls on self would not be through
an interface.
|
Java Source File Name | Type | Comment |
BigIntegerDecimal.java | Class | DECIMAL support using the immutable java.math.BigInteger to perform arithmetic
and conversions. |
BinaryDecimal.java | Class | SQL DECIMAL using raw data. |
BitDataValue.java | Interface | |
BooleanDataValue.java | Interface | |
CDCDataValueFactory.java | Class | DataValueFactory implementation for J2ME/CDC/Foundation. |
CloneableObject.java | Interface | This is a simple interface that is used by the
sorter for cloning input rows. |
ConcatableDataValue.java | Interface | The ConcatableDataValue interface corresponds to the
SQL 92 string value data type. |
DataType.java | Class | DataType is the superclass for all data types. |
DataTypeDescriptor.java | Class | This is an implementation of DataTypeDescriptor from the generic language
datatype module interface. |
DataTypeUtilities.java | Class | A set of static utility methods for data types. |
DataValueDescriptor.java | Interface | The DataValueDescriptor interface provides methods to get the data from
a column returned by a statement. |
DataValueFactory.java | Interface | This interface is how we get constant data values of different types. |
DataValueFactoryImpl.java | Class | Core implementation of DataValueFactory. |
DateTimeDataValue.java | Interface | |
DateTimeParser.java | Class | |
DTSClassInfo.java | Class | |
J2SEDataValueFactory.java | Class | DataValueFactory implementation for J2SE. |
JSQLType.java | Class | Type descriptor which wraps all 3 kinds of types supported in Cloudscape's
JSQL language: SQL types, Java primitives, Java classes. |
Like.java | Class | Like matching algorithm. |
NumberDataType.java | Class | NumberDataType is the superclass for all exact and approximate
numeric data types. |
NumberDataValue.java | Interface | |
Orderable.java | Interface | The Orderable interface represents a value that can
be linearly ordered.
Currently only supports linear (<, =, <=) operations.
Eventually we may want to do other types of orderings,
in which case there would probably be a number of interfaces
for each "class" of ordering.
The implementation must handle the comparison of null
values. |
RawToBinaryFormatStream.java | Class | Stream that takes a raw input stream and converts it
to the on-disk format of the binary types by prepending the
length of the value.
If the length of the stream is known then it is encoded
as the first bytes in the stream in the defined format.
If the length is unknown then the first four bytes will
be zero, indicating unknown length.
Note: This stream cannot be re-used. |
ReaderToUTF8Stream.java | Class | Converts a java.io.Reader to the on-disk UTF8 format used by Derby
for character types. |
RefDataValue.java | Interface | |
Resetable.java | Interface | This is a simple interface that is used by
streams that can initialize and reset themselves. |
RowLocation.java | Interface | Holds the location of a row within a given conglomerate.
A row location is not valid except in the conglomerate
from which it was obtained. |
SQLBinary.java | Class | SQLBinary is the abstract class for the binary datatypes.
- CHAR FOR BIT DATA
- VARCHAR FOR BIT DATA
- LONG VARCHAR
- BLOB
Format :
Length is encoded to support Cloudscape 5.x databases where the length was stored as the number of bits.
The first bit of the first byte indicates if the format is an old (Cloudscape 5.x) style or a new Derby style.
Derby then uses the next two bits to indicate how the length is encoded.
is one of N styles.
- (5.x format zero) 4 byte Java format integer value 0 - either is 0 bytes/bits or an unknown number of bytes.
- (5.x format bits) 4 byte Java format integer value >0 (positive) - number of bits in , number of bytes in
is the minimum number of bytes required to store the number of bits.
- (Derby format) 1 byte encoded length (0 <= L <= 31) - number of bytes of - encoded = 0x80 & L
- (Derby format) 3 byte encoded length (32 <= L < 64k) - number of bytes of - encoded = 0xA0
- (Derby format) 5 byte encoded length (64k <= L < 2G) - number of bytes of - encoded = 0xC0
- (future) to be determined L >= 2G - encoded 0xE0
(0xE0 is an esacape to allow any number of arbitary encodings in the future).
When the value was written from a byte array the Derby encoded byte
length format was always used from Derby 10.0 onwards (ie. |
SQLBit.java | Class | |
SQLBlob.java | Class | SQLBlob satisfies the DataValueDescriptor,
interfaces (i.e., OrderableDataType). |
SQLBoolean.java | Class | SQLBoolean satisfies the DataValueDescriptor
interfaces (i.e., DataType). |
SQLChar.java | Class | SQLChar satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLClob.java | Class | SQLClob uses SQLVarchar by inheritance.
It satisfies the DataValueDescriptor interfaces (i.e., OrderableDataType). |
SQLDate.java | Class | This contains an instance of a SQL Date.
The date is stored as int (year << 16 + month << 8 + day)
Null is represented by an encodedDate value of 0.
Some of the static methods in this class are also used by SQLTime and SQLTimestamp
so check those classes if you change the date encoding
PERFORMANCE OPTIMIZATION:
The java.sql.Date object is only instantiated when needed
do to the overhead of Date.valueOf(), etc. |
SQLDecimal.java | Class | SQLDecimal satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLDouble.java | Class | SQLDouble satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLInteger.java | Class | SQLInteger satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLLongint.java | Class | SQLLongint satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLLongVarbit.java | Class | SQLLongVarbit represents the SQL type LONG VARCHAR FOR BIT DATA
It is an extension of SQLVarbit and is virtually indistinguishable
other than normalization. |
SQLLongvarchar.java | Class | SQLLongvarchar satisfies the DataValueDescriptor interfaces (i.e., OrderableDataType). |
SQLNationalChar.java | Class | SQLNationalChar satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLNationalLongvarchar.java | Class | SQLNationalLongvarchar satisfies the DataValueDescriptor interfaces (i.e., OrderableDataType). |
SQLNationalVarchar.java | Class | SQLNationalVarchar satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLNClob.java | Class | SQLNClob satisfies the DataValueDescriptor interfaces (i.e., OrderableDataType). |
SQLReal.java | Class | SQLReal satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLRef.java | Class | |
SQLSmallint.java | Class | SQLSmallint satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLTime.java | Class | This contains an instance of a SQL Time
Our current implementation doesn't implement time precision so the fractional
seconds portion of the time is always 0. |
SQLTimestamp.java | Class | This contains an instance of a SQL Timestamp object. |
SQLTinyint.java | Class | SQLTinyint satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SQLVarbit.java | Class | SQLVarbit represents the SQL type VARCHAR FOR BIT DATA
It is an extension of SQLBit and is virtually indistinguishable
other than normalization. |
SQLVarchar.java | Class | SQLVarchar satisfies the DataValueDescriptor
interfaces (i.e., OrderableDataType). |
SqlXmlUtil.java | Class | This class contains "utility" methods that work with XML-specific
objects that are only available if JAXP and/or Xalan are in
the classpath.
NOTE: This class is only compiled with JDK 1.4 and higher since
the XML-related classes that it uses (JAXP and Xalan) are not
part of earlier JDKs.
Having a separate class for this functionality is beneficial
for two reasons:
1. |
StringDataValue.java | Interface | |
TypeId.java | Class | The TypeId interface provides methods to get information about datatype ids. |
UserDataValue.java | Interface | |
UserType.java | Class | This contains an instance of a user-defined type, that is, a java object. |
VariableSizeDataValue.java | Interface | The VariableSizeDataValue interface corresponds to
Datatypes that have adjustable width. |
XML.java | Class | This type implements the XMLDataValue interface and thus is
the type on which all XML related operations are executed.
The first and simplest XML store implementation is a UTF-8
based one--all XML data is stored on disk as a UTF-8 string,
just like the other Derby string types. |
XMLDataValue.java | Interface | |