Query Planner used in select and sub-select Query. I use Rule-based analysis and
decision-making – rules expressing proven, efficient statement execution methods
determine how operations and their attributes are built and combined.
One of the important components is the Query Optimizer. Its goal, for every SQL
statement, is to answer this question: What is the quickest way to execute the
statement, to produce exactly the data requested by the statement in the shortest
possible time?
The query processor makes extensive use of a relational algebra tree representation to
model and manipulate SQL queries. These trees can be thought of as a series of pipes,
valves, and other components through which data flows, entering through the bottom from
one or more tables, and leaving through the top as a result set. At various points
within the tree, the data are operated on as needed to produce the desired result. Each
operation is represented as a node in the tree. Nodes may have one or more expressions
associated with them to specify columns, conditions, and calculations associated with
the operation. In Axion in most cases these trees are represented as RowIterators.
Some of the operators that may be present in the tree are:
Restrict – reduces the number of output rows by eliminating those that fail to
satisfy some condition applied to the input. Restrict operators appear in the tree from
WHERE clauses and JOINs. example: FilteringRowIterator
Join – combines two input tables into a single output table that contains some
combination of rows from the inputs. Joins appear in the tree from the use of FROM
clauses and from JOIN clauses. example: NestedLoopJoinRowIterator.
Sort – changes the ordering of rows in an input table to produce an output table
in the desired order. example: SortedRowIterator
Table – represents a Table scan or an Index scan, reading data from a given table
by either its default index (table scan) or a specific index (index scan). Leaf nodes
of the tree are always references to database tables.
Current Axion SQL parser does not build a formal query tree, instead it builds a
AxionQueryContext object, which hold the query tree, that includes selectables, where
node, from node, oder by and group by node. FromNode is nested and represet a from
tree, the subqueries are nested in the context object. Then I apply rule bases analysis
to optimize the query.
The Axion query optimizer divides the optimization process into several phases. Some of
the phases deal with very significant rule based cost factors and are straightforward
to understand. Each phase, listed below, addresses a specific type of optimization:
Pushing restrictions as far down the tree as possible
Derive restrictions
Using indexes for restrictions
Join optimization
Sort optimization
Using index for NULL Check
Count Rows Optimization
Pushing Restrict Operations Close to the Data Origin: This optimization stage
consists of moving restrict operators as far down the query tree as possible. This
reduces the number of tuples moving up the tree for further processing and minimizes
the amount of data handled. When restrict operations on a join node cannot be moved
below the join node, they are set as join conditions. When multiple predicates are
moved down the tree to the same relative position, they are reassembled into a single
Restrict operation. Consider the SQL clause:
SELECT EMP.NAME FROM EMP, DEPT
WHERE EMP.SAL > 4000
AND EMP.SAL <= 6000
AND EMP.DEPTNO = DEPT.DEPTNO
The optimizer apply restrictions EMP.SAL > 4000 and
EMP.SAL <= 6000 are moved down the tree, below the join node, since they
apply to a single table. The restriction EMP.DEPTNO = DEPT.DEPTNO, applying to two
tables, stays in the join node.
Derive restrictions: This optimization stage consists of deriving restrict
operators based on the current current restriction and join condition. Consider the
following SQL clause:
SELECT EMP.NAME FROM EMP, DEPT
WHERE EMP.DEPTNO = DEPT.DEPTNO AND EMP.DEPTNO = 10
In this stage, the Optimizer can derive DEPT.DEPTNO = 10 and push that
down the tree, below the join node, since they apply to a single table. The restriction
EMP.DEPTNO = DEPT.DEPTNO, applying to two tables, stays in the join node.
Using Indexes for Restrictions: This optimization phase consists of recognizing
those cases where an existing index can be used to evaluate a restriction and
converting a table scan into an index bracket or set of contiguous index entries. An
index bracket is extremely effective in limiting the number of rows that must be
processed. Consider the following SQL clause:
SELECT EMP.NAME FROM EMP WHERE EMP.SAL > 6000 AND EMP.DEPTNO = 10
Instead of a separate restrict operation, the output tree uses the index EmpIdx on the
DEPTNO column table EMP to evaluate the restriction DEPTNO = 10.
Choosing the Join Algorithm: Currently Axion support Augmented nested loop
(ANL) or Index Nested Loop join and Nested loop join.
The Index Nested Loop Join or Augmented Nested Loop Join (ANL) is by far the
most common join method and is the classic Axion join method. An augmented nested loop
join is performed by doing a scan over the left subtree and for each row in it,
performing an index bracket scan on a portion of the right subtree. The right subtree
is read as many times as there are rows in the left subtree. To be a candidate for an
ANL join, the subtree pair for a join node must meet the following criteria:
There must be an index(es) defined on the join column(s) for the table in the
right subtree.
No other scan on that index has already been set.
When there is an index defined on the left subtree’s table instead of on the right, the
optimizer swaps the subtrees to make an ANL join possible. When neither subtree’s table
has an index defined on the join column, the optimizer creats a dynamic index on one of
the subtree.
A Nested Loop Join is performed by doing a scan over the left subtree and for
each row in it performing a full scan of the right subtree. This is the default join
algorithm, which can be used for any join. However, it is usually less efficient than
the other methods. Usually, either an existing index, or a dynamic index, used in an
ANL join, will cost much less. Occasionally, when subtree cardinalities are very low
(possibly because of index bracketing), nested loop will be the method with the least
cost.
Count Rows Optimization: If we are coutinig the leaf nodes, axion will use
table row count instead of table scan. If index was used to scan the table, count will
use the index count. e.g select count(*) from emp will get the row count
from table emp, instead of making a full table scan, but
select count(*) from emp where emp.id > 100 require a full table scan
unless optimizer can take advantage of index, if index is scanned then, index count
will be used.
Sort Optimization: The optimizer/planner performs two separate optimizations
designed to avoid sort operations whenever possible: eliminating redundant sorts and
converting table scans followed by sorts into index bracket scans.
Eliminating Redundant Sorts: The optimizer/planner checks whether the
query tree contains unnecessary sort nodes. For example, when an SQL statement contains
both a GROUP BY and an ORDER BY clause that refers to the same column, at most one sort
is needed. A sort node is also redundant when the immediate descendant node of the sort
node is an index bracket scan on the sort column. That is, the sort is redundant when
the data input to the sort was read using an index with the needed sort order.
Converting Table Scans to Index Bracket Scans: When a leaf node of a
subtree is a table scan, the optimizer/planner checks whether any indexes that exist on
the table match the sort column(s). If so, the optimizer converts the table scan to the
index bracket scan and removes the sort node.
version: $Revision: 1.61 $ $Date: 2006/01/10 21:02:37 $ author: Morgan Delagrange author: Rodney Waldhoff author: Chuck Burdick author: Amrish Lal author: Dave Pekarek Krohn author: Rahul Dwivedi author: Ahimanikya Satapathy author: Ritesh Adval |