Query Processing
Query Processing
Query Processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query, and extraction of data from the database.
It is the step by step process of breaking the high-level language into low level
language which machine can understand and perform the requested action for user.
Query processor in the DBMS performs this task.
Let us consider the following two relations as the example tables for our
discussion;
Employee (Eno, Ename, Phone)
Proj_Assigned (Eno, Proj_No, Role, DOP)
where,
Eno is Employee number,
Ename is Employee name,
Proj_No is Project Number in which an employee is assigned,
Role is the role of an employee in a project,
DOP is duration of the project in months.
Here we write a query to find the list of all employees who are working in a project
which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee. Eno = Proj_Assigned. Eno AND DOP > 10;
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the
query, the user’s privileges to execute the query, the table names and attribute
names, etc. The correct table names, attribute names and the privilege of the users
can be taken from the system catalog (data dictionary).
Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL
to low level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent
as follows;
πEname(σDOP>10 Λ Employee. Eno=Proj_Assigned. Eno (Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical
data are information about the size of the table, the length of records, the indexes
created on the table, etc. Optimizer also checks for the conditions and conditional
attributes which are parts of the query.
Step 4: Execution Plan
The query processor module, at this stage, using the information collected in step 3
to find different relational algebra expressions that are equivalent and return the
result of the one which we have written already.
For our example, the query written in Relational algebra can also be written
as the one given below;
πEname (Employee ⋈Eno (σDOP>10 (Prof_Assigned)))
Step 5: Evaluation
There are many execution plans constructed through statistical data, though they
return same result, they differ in terms of Time consumption, or the Space required
executing the query. Hence, it is mandatory choose one plan which obviously
consumes less cost.
At this stage, we choose one execution plan. This Execution plan accesses data
from the database to give the final result.
In our example, the second plan may be good. In the first plan, we join two
relations (costly operation) then apply the condition (conditions are
considered as filters) on the joined relation. This consumes more time as
well as space.
In the second plan, we filter one of the tables (Proj_Assigned) and the result
is joined with the Employee table. This join may need to compare a smaller
number of records. Hence, the second plan is the best (with the information
known, not always).
Query Optimization:
A single query can be executed through different algorithms or re-written in
different forms and structures. Hence, the question of query optimization comes
into the picture – Which of these forms or pathways is the most optimal? The
query optimizer attempts to determine the most efficient way to execute a given
query by considering the possible query plans.
Importance: The goal of query optimization is to reduce the system resources
required to fulfill a query, and ultimately provide the user with the correct result set
faster.
First, it provides the user with faster results, which makes the application
seem faster to the user.
Secondly, it allows the system to service more queries in the same amount of
time, because each request takes less time than unoptimized queries.
Thirdly, query optimization ultimately reduces the amount of wear on the
hardware (e.g. disk drives), and allows the server to run more efficiently
(e.g. lower power consumption, less memory usage).
The query optimizer uses these two techniques to determine which process or
expression to consider for evaluating the query.
Dynamic programming
Left Deep Trees
Inserting sort orders
Heuristic Optimization (Logical)
This method is also known as rule-based optimization. This is based on the
equivalence rule on relational expressions; hence the number of combination of
queries get reduces here. Hence the cost of the query too reduces.
Some of the common heuristic rules are
Perform select and project operations before join operations. This is done by
moving the select and project operations down the query tree. This reduces
the number of tuples available for join
Perform the most restrictive select/project operations at first before the other
operations.
Avoid cross-product operation since they result in very large-sized
intermediate tables.
EMPLOYEE
DEPARTMENT