0% found this document useful (0 votes)
118 views

Query Processing

The document discusses the process of query processing in a database management system. It involves 5 main steps: 1) parsing the query syntax and checking privileges, 2) translating the query from SQL to relational algebra, 3) optimizing the query using statistical data, 4) generating an execution plan, and 5) evaluating and executing the plan to retrieve the query results from the database in an efficient manner. The goal is to choose the most optimal execution plan that minimizes the time and resources needed to run the query.

Uploaded by

anon_189503955
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views

Query Processing

The document discusses the process of query processing in a database management system. It involves 5 main steps: 1) parsing the query syntax and checking privileges, 2) translating the query from SQL to relational algebra, 3) optimizing the query using statistical data, 4) generating an execution plan, and 5) evaluating and executing the plan to retrieve the query results from the database in an efficient manner. The goal is to choose the most optimal execution plan that minimizes the time and resources needed to run the query.

Uploaded by

anon_189503955
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Query Processing

Query Processing would mean the entire process or activity which involves query
translation into low level instructions, query optimization to save resources, cost
estimation or evaluation of query, and extraction of data from the database.
It is the step by step process of breaking the high-level language into low level
language which machine can understand and perform the requested action for user.
Query processor in the DBMS performs this task.

Let us consider the following two relations as the example tables for our
discussion;
Employee (Eno, Ename, Phone)
Proj_Assigned (Eno, Proj_No, Role, DOP)
where,
 Eno is Employee number,
 Ename is Employee name,
 Proj_No is Project Number in which an employee is assigned,
 Role is the role of an employee in a project,
 DOP is duration of the project in months.
Here we write a query to find the list of all employees who are working in a project
which is more than 10 months old.
SELECT Ename
FROM Employee, Proj_Assigned
WHERE Employee. Eno = Proj_Assigned. Eno AND DOP > 10;
Step 1: Parsing
In this step, the parser of the query processor module checks the syntax of the
query, the user’s privileges to execute the query, the table names and attribute
names, etc. The correct table names, attribute names and the privilege of the users
can be taken from the system catalog (data dictionary).

Step 2: Translation
If we have written a valid query, then it is converted from high level language SQL
to low level instruction in Relational Algebra.
For example, our SQL query can be converted into a Relational Algebra equivalent
as follows;
πEname(σDOP>10 Λ Employee. Eno=Proj_Assigned. Eno (Employee X Prof_Assigned))
Step 3: Optimizer
Optimizer uses the statistical data stored as part of data dictionary. The statistical
data are information about the size of the table, the length of records, the indexes
created on the table, etc. Optimizer also checks for the conditions and conditional
attributes which are parts of the query.
Step 4: Execution Plan
The query processor module, at this stage, using the information collected in step 3
to find different relational algebra expressions that are equivalent and return the
result of the one which we have written already.
For our example, the query written in Relational algebra can also be written
as the one given below;
πEname (Employee ⋈Eno (σDOP>10 (Prof_Assigned)))

Step 5: Evaluation
There are many execution plans constructed through statistical data, though they
return same result, they differ in terms of Time consumption, or the Space required
executing the query. Hence, it is mandatory choose one plan which obviously
consumes less cost.
At this stage, we choose one execution plan. This Execution plan accesses data
from the database to give the final result.
 In our example, the second plan may be good. In the first plan, we join two
relations (costly operation) then apply the condition (conditions are
considered as filters) on the joined relation. This consumes more time as
well as space.
 In the second plan, we filter one of the tables (Proj_Assigned) and the result
is joined with the Employee table. This join may need to compare a smaller
number of records. Hence, the second plan is the best (with the information
known, not always).

Query Optimization:
A single query can be executed through different algorithms or re-written in
different forms and structures. Hence, the question of query optimization comes
into the picture – Which of these forms or pathways is the most optimal? The
query optimizer attempts to determine the most efficient way to execute a given
query by considering the possible query plans.
Importance: The goal of query optimization is to reduce the system resources
required to fulfill a query, and ultimately provide the user with the correct result set
faster.
 First, it provides the user with faster results, which makes the application
seem faster to the user.
 Secondly, it allows the system to service more queries in the same amount of
time, because each request takes less time than unoptimized queries.
 Thirdly, query optimization ultimately reduces the amount of wear on the
hardware (e.g. disk drives), and allows the server to run more efficiently
(e.g. lower power consumption, less memory usage).
The query optimizer uses these two techniques to determine which process or
expression to consider for evaluating the query.

Cost based Optimization


This is based on the cost of the query. The query can use different paths based on
indexes, constraints, sorting methods etc. This method mainly uses the statistics
like record size, number of records, number of records per block, number of
blocks, table size, whether whole table fits in a block, organization of tables,
uniqueness of column values, size of columns etc.

 Dynamic programming
 Left Deep Trees
 Inserting sort orders
Heuristic Optimization (Logical)
This method is also known as rule-based optimization. This is based on the
equivalence rule on relational expressions; hence the number of combination of
queries get reduces here. Hence the cost of the query too reduces.
Some of the common heuristic rules are
 Perform select and project operations before join operations. This is done by
moving the select and project operations down the query tree. This reduces
the number of tuples available for join
 Perform the most restrictive select/project operations at first before the other
operations.
 Avoid cross-product operation since they result in very large-sized
intermediate tables.

Steps for Query Optimization


Query optimization involves three steps, namely query tree generation, plan
generation, and query plan code generation.

Step 1 − Query Tree Generation


A query tree is a tree data structure representing a relational algebra expression. The
tables of the query are represented as leaf nodes. The relational algebra operations are
represented as the internal nodes. The root represents the query as a whole.
During execution, an internal node is executed whenever its operand tables are
available. The node is then replaced by the result table. This process continues for all
internal nodes until the root node is executed and replaced by the result table.

For example, let us consider the following schemas −

EMPLOYEE

EmpID EName Salary DeptNo DateOfJoining

DEPARTMENT

DNo DName Location

Step 2 − Query Plan Generation


After the query tree is generated, a query plan is made. A query plan is an extended
query tree that includes access paths for all operations in the query tree. Access paths
specify how the relational operations in the tree should be performed. For example, a
selection operation can have an access path that gives details about the use of B+ tree
index for selection.
Besides, a query plan also states how the intermediate tables should be passed from
one operator to the next, how temporary tables should be used and how operations
should be pipelined/combined.

Step 3− Code Generation


Code generation is the final step in query optimization. It is the executable form of the
query, whose form depends upon the type of the underlying operating system. Once the
query code is generated, the Execution Manager runs it and produces the results.

You might also like