0% found this document useful (0 votes)
35 views

Flash Cards

Database Management System (DBMS) is software that facilitates creation and maintenance of databases. It allows users to interact with databases through queries, which access data, and transactions, which may read and update values. The DBMS catalog stores metadata about the database structure. Program-data independence separates data access programs from physical storage details. Data abstraction presents users a conceptual view of data rather than storage details. The relational model represents data as mathematical relations with rows and columns. Database design aims to eliminate anomalies like insertion, deletion, and update anomalies. Normalization divides relations to eliminate functional dependencies that are not related to primary keys. Structured Query Language (SQL) is used to define, manipulate and control access to data in a

Uploaded by

Patek lyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Flash Cards

Database Management System (DBMS) is software that facilitates creation and maintenance of databases. It allows users to interact with databases through queries, which access data, and transactions, which may read and update values. The DBMS catalog stores metadata about the database structure. Program-data independence separates data access programs from physical storage details. Data abstraction presents users a conceptual view of data rather than storage details. The relational model represents data as mathematical relations with rows and columns. Database design aims to eliminate anomalies like insertion, deletion, and update anomalies. Normalization divides relations to eliminate functional dependencies that are not related to primary keys. Structured Query Language (SQL) is used to define, manipulate and control access to data in a

Uploaded by

Patek lyu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Mini-world

some part of the real world about which data is stored in a database

Database Management System (DBMS)


a software package/system to facilitate the creation and maintenance of a computerized
database

Queries vs. Transactions


Applications interact with a database by generating:

Queries: access different parts of data and formulate the result of the request

Transactions: may read some data and 'update' certain values or generate new data and store
that in the database

DBMS catalog
stores the description of a particular database (e.g. data structures, types and constraints)

Description is called meta-data

Program-Data independence
Structure of data files is stored in DBMS catalog separately from access programs

Allows changing data structures and storage organization without having to change the DBMS
access programs

Data abstraction
a data model is used to hide storage details and present the users with a conceptual view of the
database

Programs refer to the data model constructs rather than data storage details

Actors on the scene


database users who actually use and control the database content, and those who design,
develop and maintain database aplications

Workers behind the scene


Those who design and develop the DBMS software and related tools, and the computer systems
operators

Big Data
High-volume, high-velocity, and/or high-variety information assets that require innovative forms
of information processing for enhanced insight and decision making.

Data Model
A set of concepts to describe the structure of a database, the operations for manipulating these
structures, and certain constraints that the database should obey.

Categories of Data Models


Conceptual (high-level, semantic)

Logical (implementation, representatinal)

Physical (low-level, internal)

Self-Describing

Three-Schema Architecture
External Schema (end users)
Conceptual Schema (conceptual and logical data models)
Internal Schema (physical data model)

Supports characteristics of program-data independence and supporting multiple views of the


data

Logical Data Independence


The capacity to change the conceptual schema without having to change the external schemas
and their associated application programs.

Physical Data Independence


The capacity to change the internal schema without having to change the conceptual schema.

Types of database constraints


Inherent or Implicit
Schema-based or Explicit

Application Based or Semantic

Inherent/Implicit constraints
based on the data model itself (e.g. relational data model does not allow a list as a value for any
attribute)

Schema-based/Explicit constraints
expressed in the schema by using the facilities provided by the model (ex. max cardinality ratio)

Application based/semantic constraints


beyond the expressive power of the model and must be specified and enforced by application
programs

Design guidelines for relational databases


1. informally, each tuple in a relation should represent one entity or relationship instance

2. design a schema that does not suffer from insertion, deletion and update anomalies

3. relations sould be designed such that their tuples will have as few NULL values as possible

4. the relation should be designed to satisfy the lossless join condition

Lossless Join Condition


No spurious tuples generated by doing a natural join of any relations

Functional Dependencies
A FD holds if whenever 2 tuples have the same value for X they MUST have the same value for Y
i.e. If t1[X]=t2[X] then t1[Y]=t2[Y]

Given an instance of a relation can only conclude that an FD may exist or does not exist, can't
know for sure that condition holds in all cases

First Normal Form (1NF)


Disallows: composite attributes, multivalued attributes and nested relations (i.e. attribute values
cannot be lists)

considered to be part of the definition of a relaton

1NF normalization
Move the attributes violating 1NF to a new relation and associate the relations via keys (FK, PK)

Second Normal Form (2NF)


Every non-prime attribute A in R is fully functionally dependent on every key of R

i.e. a FD Y -> Z where removal of any attribute from Y means the FD does not hold anymore

Prime attribute
An attribute that is a member of a (candidate) key K

2NF Normalization
Move the attribute involved in a 2NF violation to a new relation, maintain in the original relation
the LHS attributes and associate them to the new relation via keys

Third Normal Form (3NF)


2NF + no non-prime attribute A in R is transitively dependent on the primary key

normalization process same as 2NF

Transitive functional dependency


a FD X -> Z that can be derived from 2 FDs X -> Y and Y -> Z

Creating a Schema (SQL)


CREATE SCHEMA <schema_name> AUTHORIZATION <value>;

Creating a data type (SQL)


CREATE DOMAIN <type_name> AS <data type>;

ex. CREATE DOMAIN SSN_TYPE AS CHAR(9);


Creating a relation (SQL)
CREATE TABLE <[schema_name.]table_name>
(<attribute_name><data_type><attribute_constraints>, ..., <table_constraints>, ...);

Primary key constraint (SQL)


on attribute: <attribute_name><data_type> PRIMARY KEY

on table: CONSTRAINT <constraint_name> PRIMARY KEY (<attribute_name>)

Secondary key constraint (SQL)


on attribute: <attribute_name><data_type> UNIQUE

on table: CONSTRAINT <constraint_name> UNIQUE(<attribute_name>)

Foreign key constraint (SQL)


on attribute: <attribute_name> REFERENCES <table_name>(<referenced_attribute_name>)

on table: CONSTRAINT <constraint_name> FOREIGN KEY(<attribute_name>) REFERENCES


<table_name>(<referenced_attribute_name>) ON DELETE <delete_instructions>

Default operations (ON DELETE/ ON UPDATE)


SET NULL

CASCADE (suitable for relationship relations)

SET DEFAULT

Additional constraints using check (SQL)


on attribute: <attribute_name><data_type> CHECK (<constraint>)

on table: CONSTRAINT <constraint_name> CHECK (<constraint>)

ex. CHECK (Salary >= 0 AND Salary <= 100000)

The DROP command


used to drop named schema elements such as tables, domains or constraints
ex. DROP TABLE company.employee CASCADE;

DROP SCHEMA company CASCADE;

The ALTER TABLE command


actions include: assing/dropping a column (attribute), changing a column definition,
adding/dropping constraints

ex. ALTER TABLE company.employee ADD COLUMN job VARCHAR (12);

ALTER TABLE company.employee DROP CONSTRAINT fk_employee_supervisor;

ALTER TABLE department ALTER COLUMN mgr_ssn SET DEFAULT '123456789'; (can also drop
default or set not null)

Commands for modifying the database


INSERT (for inserting tuples into a relation)

UPDATE (for updating tuples that satisfy the condition)

DELETE (removes tuples that satisfy a condition)

The INSERT command


INSERT INTO <table_name> (<attribute1_name>, ..., <attributen_name>) VALUES
(<attribute1_value>, ..., <attributen_value>)

if attribute is not listed, will be set to default

Can nest SELECT function as value

Specifying Sequences
CREATE SEQUENCE <sequence_name> START <start_value> INCREMENT <increment_value>;

to set value as next number in sequence: nextval(<sequence_name>)

SELECT statement
SELECT <attribute and function list> FROM <table> [WHERE <condition>] [GROUP BY <grouping
attributes>] [HAVING <group condition>] [ORDER BY <attribute list>];
Logical operators
=, >, >=, <, <=, <>

BETWEEN operator
BETWEEN <value1> AND <value2>

ex. WHERE salary BETWEEN 30000 AND 40000

LIKE comparison operator


% replaces an arbitrary number of characters, _ replaces a single character

ex. WHERE Address LIKE '%Houston%'

WHERE SSN LIKE '___1__8901'

Aliasing
Used to shorten query when referring to same table more than once

Ex. SELECT e.name FROM Employee AS e ...;

Can also rename attributes Ex. SELECT ... FROM Employee AS e(fn, mi, ln)...;

Join condition
<table1> JOIN <table2> ON <condition>

can nest join conditions

ex. SELECT e.* FROM employee AS e JOIN department AS d ON e.dno = d.dnumber

Aggregate functions
Can be used in SELECT or HAVING clause

COUNT, SUM, MAX, MIN, AVG, STDDEV_POP

nulls are discarded


ORDER BY clause
Keywords: DESC, ASC

Ex. SELECT ... ORDER BY d.dname DESC, e.lname ASC;

Eliminating duplicate tuples in query results


Use the keyword DISTINCT in the SELECT clause i.e. SELECT DISTINCT e.ssn ...;

Comparison operatiors for nested queries


IN (R)
value theta ANY (R)
value theta ALL (R)
EXISTS
NOT EXISTS

Comparisons involving NULL


operations involving null return NULL (ex. NULL + 1 = NULL)

comparisons involving NULL return UNKNOWN

the CASE statement


allows conditional instructions
CASE
WHEN 'cond1' THEN 'result1'
...
[ELSE 'resultN']
END ...;

ex. SELECT fname, CASE (sex) WHEN 'M' THEN 'Male', ELSE 'Female' END FROM employee;

UPDATE employee SET salary = (CASE WHEN dno=5 THEN salary1.15 ELSE salary 1.3 END);

The DELETE clause


DELETE FROM <table> WHERE <condition>;

to delete all: DELETE FROM <table>;


the UPDATE clause
UPDATE <table> SET <attribute> = <value or function> WHERE <condition>

Views in SQL
single table derived from other tables

CREATE VIEW <view_name>(<attribute_list>) AS SELECT ...;

DROP VIEW disposes of a view

Once defined, can be referenced as a table in queries

Computed Views
the DBMS stores the view definition and executes the view query every time the view is used
(always up to date; updated automatically when parent tables are updated)

zero maintenance but does not increase performance

Materialized view
CREATE MATERIALIZED VIEW...

the DBMS stores the view definition, executes the query and stores the result as system
controlled table

increases performance but requires system to update view to reflect updates in base tables

Maintenance of materialized views


Immediate update: updates view as soon as base tables are changed

Lazy Update: updates view when needed by a view query

Periodic update

Derived Tables
Used for bulk-loading of several tuples into a table that satisfy a condition

CREATE TABLE <table_name> LIKE <base_table_name> (SELECT ...) WITH DATA;


does not maintain association with base tables

Assertions
CREATE ASSERTION <assertion name> CHECK (<condition, can include SELECT statement>);

Triggers
CREATE TRIGGER <name> {BEFORE | AFTER} [event [OR ...]] ON <table_name> [FOR [EACH] {ROW
| STATEMENT}] [WHEN (condition)] EXECUTE (function);

Main programming approaches


Programming into the DBMS Server

Using embedded SQL

Using a library of database functions

Using a persistence framework

Impedance Mismatch
Differences between database model and programming language model

Server-side programming
Allows implementing complex operations using a programming language supported by the DBMS,
operations executed in the DBMS server, can store procedures

Libraries of database functions


use libraries or APIs provided for the host language to access database

JDBC (Java Database Connectivity) - Driver


Allows a JAVA program to connect to several different databases

JDBC (Java Database Connectivity) - connection object


encapsulates a database connection

JDBC (Java Database Connectivity) - statement object


used to interact with the database through an opened connection

JDBC (Java Database Connectivity) - ResultSet object


holds results of query

Database programming Approach (Pros/Cons)


P: does nto suffer from impedance mismatch problem

C: programmers must learn a new language

Embedded SQL approach (P/C)


P: query text checked for syntax and validated against database schema at compile time

C: for complex applications where queries have to be generated at runtime, function call
approach more suitable

Library of function calls approach (P/C)


P: more flexibility, more complex programming

C: no checking of syntax done at compile time

Persistence frameworks approach (P/C)


P: encapsulates database operations and implement ORM functionalities

C: can be limited or deliver poor performance for data-intensive or complex operations

Double Buffering
Used to read continuous stream of blocks.

Use of 2 buffers, A and B, for reading from disk. While one buffer is beinf filled the other is
consumed

Buffer management information


Pin count (to pin a frame in the buffer pool)

Dirty bit (to indicate a modified block)


Buffer replacement strategies
To free frames in the buffer pool. Strategies:
Least recently used (LRU)
First-in-first-out (FIFO)

Record (placing records in files)


collection of related data values or items. Values correspond to record field (ex. row in a table)

Allocated to disk blocks

Binary large objects (BLOBs) (placing records in files)


Unstructured objects ex. image file

Allocated to disk blocks

Unspanned records
records that fit in a block; not allowed to cross block boundaries

Spanned records
records larger than a single block; pointer at end of first block points to block containing
remainder of record

Retrieval operations on files


no change to file data; blocks are loaded in the buffer pool

ex. Open, scan, find, read, FindNext, Close

Update operations on files


file change by insertion, deletion or modification.

Updates are made in the buffer pool, turning the dirty bit trie, and stored in disk after the
transaction commits
ex. Delete, Insert

Heap (or pile) file


(file organization)
records placed in file order of insertion

inserting a new record is very efficient, searching for a record requires linear search

Ordered (sequential) file


(file organization)
records sorted by ordering field

reading records in order of ordering key value is extremely efficient, binary search technique,
updates require reorganizing file

Indexing
Used to speed up record retrieval. Index structures provide secondary access paths.

Multiple indexes can be constructed, may be unique or non-unique

Clustering Index
specified on the ordering key field of ordered file records

One clustering index per data file

Creating a Clustering Index


CREATE [UNIQUE] CLUSTERING INDEX <index_name> ON table(attribute list);

Sorts the data file and maintains it ordered

Secondary index
can be specified on any nonordering field. A data file can have several secondary indexes

CREATE [UNIQUE] INDEX <index_name> ON table(attribute list);

The B+ -Tree Dynamic Multilevel Index


Disk-based search tree. Nodes have the size of a block.

Given 2 search keys, Ki-1 < Ki, the elements stored in the corresponding subtree are Ki-1 < X <= Ki

Tree pointers are pointers to blocks: data file + block


Record pointers (rids) have the structure: data file + block + record

B+ Tree Index - Balanced Tree


reorganized at each insert or delete using split and unsplit of nodes. Nodes are also at least half
full, except the root node

Bottom-up construction

B+ tree Index - Leaf Nodes


store search keys and data pointers (rids). For a non-unique search field, the pointer points to list
of pointers to the data file records. Store pointers to the next and to the previous leaf nodes

B+ Tree Index - Internal Nodes


store search keys and tree pointers. Some search field values from the leaf nodes repeated to
guide search

B+ Tree Insertion
start a search based on the key of the entry being inserted from the root until reach a leaf node.
If there is room in the leaf node for the new entry, add the entry and end the insert. If the leaf
node is full, execute a split

B+ Tree Insertion - Split


Create a new node
Divide the entries between the current node and the new node
Promote the median element to the parent node
The split can be propagated up to the root node, increasing the height of the tree

B+ Tree Search
if the search is an interval, set the search key as the lower bound of the interval (for a point query,
lower and upper bounds are equal)

Start a search based on the search key from the root until a leaf node. Retrieve the record
pointers of the entries from the search key (lower bound) to the upper bound, traversing the
leaves via the pointer to the next leaf

B+ Tree Deletion
Start a search based on the key to be deleted from the root until a leaf node. Delete from the leaf
node the entry that matches the search key.

If the leaf node becomes less than half full try to borrow an entry from a sibling leaf node
otherwise execute an unsplit

B+ Tree Deletion- Unsplit


Merge the entries of 2 sibling nodes in one of these nodes
Delete the other node
Propagate the operation to remove from the parent node the entry to the deleted node
The split can be propagated up to the root node, decreasing the height of the tree

Hash Indexes
Based on a hashing function. Uses hashing on a search key other than the one used for the
primary data file organization. Suitable for point queries, not range queries

Domain-Specific Indexes
Spatial: for spatial based queries (ex. queries on maps) ex. R-tree, kd-tree

Full-text search: for keyword-based search in text attributes ex. inverted indexes

Transaction
Describes local unit of database processing. An executing process that includes one or more
database access operations.

Characteristic operations: Reads, writes

OLTP (online transaction processing)


Large multi-user database systems supporting thousands of concurrent transactions per minute.
Require high availability and fast response time

Transaction processing model - basic operations


Granularity (size) of each data item is immaterial

Basic operations:
read_item(X) - reads a database item X into a program variable named X

write_item(X) - writes the value of program X into the database item named X
Read operation steps
Find the address of the disk block that contains item X
Copy that disk block into a buffer in main memory (if that disk block is not already in some main
memory buffer)
Copy item X from the buffer to the program variable named X

Write operation steps


Find the address of the disk block that contains item X
Copy that disk block into a buffer in main memory (if it is not already in a main memory buffer)
Copy item X from the program variable named X into its correct location in the buffer
Store the updated block from the buffer back to disk (either immediately or at some later point in
time)

Trasaction boundaries
Begin_transaction and End_transaction

Application program may include specification of several transactions separated by Begin and End
transaction boundaries

Transaction end states


Commit: transactio successfully completes and its results are committed (made permanent)

Abort: transaction does not complete and none of its actions are reflected in the database

Transaction Notation
Ti specifies a unique transaction identifier

wi(Y) means transaction Ti writes out the value for data item Y

ri(Y) means transaction Ti reads the value for data item Y

ci means transaction Ti committed

ai means transaction Ti aborted

Interleaved Processing (modes of concurrency)


concurrent execution of processes is interleaved on a single CPU
Parallel Processing (modes of concurrency)
Processes are concurrently executed on multiple CPUs

Schedule
Sequence of interleaved operations from several transactions

ACID Properties
Atomicity, Consistency, Isolation, Durability

Atomicity
A transaction is an atomic unit of processing; it is either performed in its entirety or not
performed at all

Ensured by the recovery system

Consistency
A correct execution of the transaction must take the database from one consistent state to
another

Responsibility of the database constraint system

Isolation
Even though transactions are executing concurrently, they should appear to be executed in
isolation i.e. their final effect should be as it each transaction was executed in isolation from start
to finish

Responsibility of the concurrency control mechanism

Durability
Once a transaction is committed, its changes (writes) applied to the database must never be lost
because of subsequent failure

Ensured by the recovery system

Lost Update Problem


occurs when 2 transactions update the same data item, but both read the same original value
before update

Dirty Read Problem


Occurs when a transaction T2 reads a database item that was updated by another uncommitted
transaction T1 but then T1 aborts, invalidating the value that T2 read

Inconsistent Read Problem


A concurrency problem where the data is read in the middle of an update and is incorrect or not
current

Unrepeatable read problem


Occurs when one transaction updates a database item, which is read by another transaction both
before and after the update

Serial Schedule
A schedule S is serial if no interleaving of operations from several transactions (i.e. for every
transaction T, all the operations are executed consecutively)

Any serial schedule will produce a correct result

Problems with serial schedules


Long transactions force other transactions to wait, when a transaction is waiting for disk I/O or
any other event, system cannot switch to other transaction

Solution: allow some interleaving, without sacrificing correctness

Serializable schedule
A schedule equivalent to some serial schedule. It will leave the database in a consistent state.

Interleaving such that: transactions see data as if they were serially executed, transactions leave
DB state as if they were serially executed, efficiently achievable through concurrent execution

There are n! serial schedules for n transactions

Conflict Serializability
A schedule S with n transactions is conflict serializable if it is conflict equivalent to some serial
schedule of the same n transactions i.e. relative order of any 2 conflicting operations is the same
in both schedules

Conflicting Operations
Two operations are conflicting in a schedule if:
They belong to different transactions, they access the same item X and at least one of the
operations is a write_item(X)

Concurrency control protocols - two-phase locking protocols


lock data items to prevent concurrent access

Concurrency control protocols - timestamp ordering protocols


assgn a unique identifier for each transaction, apply rules to control how transactions access
items according to the timestamps

Concurrency control protocols - multiversion techniques


kept several versions of an item, accept some read operations that would be rejected in other
techniques by reading an older version of the item while maintaining serializability

Concurrency control protocols - optimistic techniques


perform no checking while the transaction is executing, execute a validation phase to check
whether any of the transactions updates violate serializability, and commit or abort transactions
based on result

Database Locks
variable associated with a data item describing status for operations that can be applied. One
lock for each item in the database

Locking operations
read_lock(X) - shared lock, required for reading
write_lock(X) - exclusive lock, required for writing
unlock(X)

Two-phase locking protocol (2PL) phases


All locking operations preceds the first unlock operation in the transaction
Two phases:
1. expanding (growing): new locks can be acquired but none can be released. Lock conversion
upgrades (read_lock(X) -> write_lock(X)) must be done during this phase
2. Shrinking phase: existing locks can be released but none can be acquired. downgrades
(write_lock(X) -> read_lock(X)) must be done during this phase

2PL potential problems


Deadlock: each transaction is waiting for some item locked by some other transaction (solution:
detect deadlock and select one of the transactions incolved to abort)

Starvation: occurs if a transaction cannot proceed for an indefinite period of time while other
transactions continue normally (solution: first come first serve queue)

Conservative 2PL
requires a transaction to lock all the items it accesses before the transaction begins

Deadlock free protocol

Strict 2PL
transaction does not release any of its exclusive locks (write) until after it commits or aborts

prevents dirty reads

Rigorous 2PL
transaction does not release any of its locks (read or write) until after it commits or aborts

prevents dirty reads

Purpose of database recovery


to bring the database into the most recent consistent state that existed prior to a failure

The UNDO-REDO approach


After a crash:
1. UNDO transactions that had not committed to ensure Atomicity (partial results discarded)
2. REDO committed transactions to ensure durability

uses a system log with write-ahead logging policy

Append-only file (System log)


keep track of all operations of all transactions in the order in whcih operations occurred

Stored-on disk (System log)


persistent except for disk or catastrophic failure, periodically backed up, guard against disk and
catastrophic failures

Main memory buffer (System log)


holds records being appended, occassionally whole buffer appended to end of log on disk

System log entries


[start_transaction, T]
[write_item, T, X, old_value, new_value] - T has changed X from old value to new
[commit, T]
[abort, T]

Write-Ahead Logging (WAL)


used to ensure that the log is consistent with the database and to ensure that the log can be used
to recover the database to a onsistent state

WAL rules
log record for a page must be written before corresponsing page is flushed to disk (for atomicity
so each operation is known and can be undone)

all log records must be written before commit (for durability so the effect of a committed
transaction is known)

WAL Commit Point


A transaction is said to be committed when: all of its operations are executed and all its log
records are flushed to disk

UNDO process
Scan log from tail to head (backward in time)
Create a list of committed transactions
create a list of rolled-back transactions
undo updates of active transactions
restore before image
append[undo] record to log (in case of crash during recovery)
REDO process
scan the log from head to tail (forward in time)
redo updates of committed transactions
use after image for new values

Required only for log records after checkpoint record

Query Parsing
Scanner identifies query tokens
Parser checks the query syntax
validation checks all attribute and relation names

Query Tree
represents a specific order of operations for executing a query

Leaves are the input relations, nodes are the operators, bottom-up execution

Algorithms for query operators


query operators have several alternative algorithms to execute the operator. Optimization
incolves selecting the cheapest algorithm (cost associated with number of disk accesses or
number of blocks accessed)

SELECT algorithms
Sequential scan: cost = b (number of blocks of the data file)

Index scan: cost = x+s (x = number of levels of the index, s = cardinality of the selection i.e.
number of tuples that satisfy the select condition)

PROJECT Algorithm
Scan the input and generate projected tuples as output, cost = b

ORDER BY and DISTINCT Algorithm


sort (and eliminate duplicates), cost = b log b

Aggregates and GROUP BY Algorithm


Scan the input relation, build the groups if needed and compute the aggregates, cost = b log b

JOIN Algorithms - Nested Loops Join


Iterate over all tuples of the 2 input relations and produce as output tuples that are th
concatenation od the input tuples satisfying the join condition, cost = bR * bS

The inner relation may be accessed through an index on the join attributes if such an index exists,
cost = bR + rR * (cost of the index search on relation S), where rR is the number of records of the
outer relation R

JOIN Algorithms - Merge Join


Sort the 2 input relations each according to the join attribute and merge the tuples satisfying the
join condition, cost = bR + bS + (cost for sorting both R and S)

JOIN Algorithms - Hash Join


Hash the 2 relations into buckets using the same hash functions over the respective join
attributes. Join the tuples of corresponding buckets using the nested loops join algorithm, Cost =
3 (bR + bS)

Materialization
Creating, storing and passing temporary results. Required for pipelinine blocking operators when
the temporary input relations do not fit in the available memory buffers

Pipelining
combines several operations into one, avoids writing temporary files

Logical Optimization (Query Optimization)


Query rewrite: reorganization of the query operators using reqrite rules to generate equivalent
query trees

Physical optimization (query optimization)


selection of the algorithms for the query operators that produce the smallest tree cost based on
the query estimates

General Optimization Approach (Algebraic)


1. split select operators with a composite condition into a cascade of selects
2. push down selections
3. convert cartesian products followed by selections into joins
4. define an efficient join order
5. introduce projections

80-20 rule
80% of processing accounted for by only 20% of queries and transactions

Tuning Indexes when Queries take too long to run


index attributes frequently used in search conditions

index primary key and foreign key attributes


create an additional index on associative tables

Tuning indexes when transactions take too long


drop indexes that may not get utilized in queries

drop indexes that may undergo too much updating if based on an attribute that undergoes
frequent changes

You might also like