Flash Cards
Flash Cards
some part of the real world about which data is stored in a database
Queries: access different parts of data and formulate the result of the request
Transactions: may read some data and 'update' certain values or generate new data and store
that in the database
DBMS catalog
stores the description of a particular database (e.g. data structures, types and constraints)
Program-Data independence
Structure of data files is stored in DBMS catalog separately from access programs
Allows changing data structures and storage organization without having to change the DBMS
access programs
Data abstraction
a data model is used to hide storage details and present the users with a conceptual view of the
database
Programs refer to the data model constructs rather than data storage details
Big Data
High-volume, high-velocity, and/or high-variety information assets that require innovative forms
of information processing for enhanced insight and decision making.
Data Model
A set of concepts to describe the structure of a database, the operations for manipulating these
structures, and certain constraints that the database should obey.
Self-Describing
Three-Schema Architecture
External Schema (end users)
Conceptual Schema (conceptual and logical data models)
Internal Schema (physical data model)
Inherent/Implicit constraints
based on the data model itself (e.g. relational data model does not allow a list as a value for any
attribute)
Schema-based/Explicit constraints
expressed in the schema by using the facilities provided by the model (ex. max cardinality ratio)
2. design a schema that does not suffer from insertion, deletion and update anomalies
3. relations sould be designed such that their tuples will have as few NULL values as possible
Functional Dependencies
A FD holds if whenever 2 tuples have the same value for X they MUST have the same value for Y
i.e. If t1[X]=t2[X] then t1[Y]=t2[Y]
Given an instance of a relation can only conclude that an FD may exist or does not exist, can't
know for sure that condition holds in all cases
1NF normalization
Move the attributes violating 1NF to a new relation and associate the relations via keys (FK, PK)
i.e. a FD Y -> Z where removal of any attribute from Y means the FD does not hold anymore
Prime attribute
An attribute that is a member of a (candidate) key K
2NF Normalization
Move the attribute involved in a 2NF violation to a new relation, maintain in the original relation
the LHS attributes and associate them to the new relation via keys
SET DEFAULT
ALTER TABLE department ALTER COLUMN mgr_ssn SET DEFAULT '123456789'; (can also drop
default or set not null)
Specifying Sequences
CREATE SEQUENCE <sequence_name> START <start_value> INCREMENT <increment_value>;
SELECT statement
SELECT <attribute and function list> FROM <table> [WHERE <condition>] [GROUP BY <grouping
attributes>] [HAVING <group condition>] [ORDER BY <attribute list>];
Logical operators
=, >, >=, <, <=, <>
BETWEEN operator
BETWEEN <value1> AND <value2>
Aliasing
Used to shorten query when referring to same table more than once
Can also rename attributes Ex. SELECT ... FROM Employee AS e(fn, mi, ln)...;
Join condition
<table1> JOIN <table2> ON <condition>
Aggregate functions
Can be used in SELECT or HAVING clause
ex. SELECT fname, CASE (sex) WHEN 'M' THEN 'Male', ELSE 'Female' END FROM employee;
UPDATE employee SET salary = (CASE WHEN dno=5 THEN salary1.15 ELSE salary 1.3 END);
Views in SQL
single table derived from other tables
Computed Views
the DBMS stores the view definition and executes the view query every time the view is used
(always up to date; updated automatically when parent tables are updated)
Materialized view
CREATE MATERIALIZED VIEW...
the DBMS stores the view definition, executes the query and stores the result as system
controlled table
increases performance but requires system to update view to reflect updates in base tables
Periodic update
Derived Tables
Used for bulk-loading of several tuples into a table that satisfy a condition
Assertions
CREATE ASSERTION <assertion name> CHECK (<condition, can include SELECT statement>);
Triggers
CREATE TRIGGER <name> {BEFORE | AFTER} [event [OR ...]] ON <table_name> [FOR [EACH] {ROW
| STATEMENT}] [WHEN (condition)] EXECUTE (function);
Impedance Mismatch
Differences between database model and programming language model
Server-side programming
Allows implementing complex operations using a programming language supported by the DBMS,
operations executed in the DBMS server, can store procedures
C: for complex applications where queries have to be generated at runtime, function call
approach more suitable
Double Buffering
Used to read continuous stream of blocks.
Use of 2 buffers, A and B, for reading from disk. While one buffer is beinf filled the other is
consumed
Unspanned records
records that fit in a block; not allowed to cross block boundaries
Spanned records
records larger than a single block; pointer at end of first block points to block containing
remainder of record
Updates are made in the buffer pool, turning the dirty bit trie, and stored in disk after the
transaction commits
ex. Delete, Insert
inserting a new record is very efficient, searching for a record requires linear search
reading records in order of ordering key value is extremely efficient, binary search technique,
updates require reorganizing file
Indexing
Used to speed up record retrieval. Index structures provide secondary access paths.
Clustering Index
specified on the ordering key field of ordered file records
Secondary index
can be specified on any nonordering field. A data file can have several secondary indexes
Given 2 search keys, Ki-1 < Ki, the elements stored in the corresponding subtree are Ki-1 < X <= Ki
Bottom-up construction
B+ Tree Insertion
start a search based on the key of the entry being inserted from the root until reach a leaf node.
If there is room in the leaf node for the new entry, add the entry and end the insert. If the leaf
node is full, execute a split
B+ Tree Search
if the search is an interval, set the search key as the lower bound of the interval (for a point query,
lower and upper bounds are equal)
Start a search based on the search key from the root until a leaf node. Retrieve the record
pointers of the entries from the search key (lower bound) to the upper bound, traversing the
leaves via the pointer to the next leaf
B+ Tree Deletion
Start a search based on the key to be deleted from the root until a leaf node. Delete from the leaf
node the entry that matches the search key.
If the leaf node becomes less than half full try to borrow an entry from a sibling leaf node
otherwise execute an unsplit
Hash Indexes
Based on a hashing function. Uses hashing on a search key other than the one used for the
primary data file organization. Suitable for point queries, not range queries
Domain-Specific Indexes
Spatial: for spatial based queries (ex. queries on maps) ex. R-tree, kd-tree
Full-text search: for keyword-based search in text attributes ex. inverted indexes
Transaction
Describes local unit of database processing. An executing process that includes one or more
database access operations.
Basic operations:
read_item(X) - reads a database item X into a program variable named X
write_item(X) - writes the value of program X into the database item named X
Read operation steps
Find the address of the disk block that contains item X
Copy that disk block into a buffer in main memory (if that disk block is not already in some main
memory buffer)
Copy item X from the buffer to the program variable named X
Trasaction boundaries
Begin_transaction and End_transaction
Application program may include specification of several transactions separated by Begin and End
transaction boundaries
Abort: transaction does not complete and none of its actions are reflected in the database
Transaction Notation
Ti specifies a unique transaction identifier
wi(Y) means transaction Ti writes out the value for data item Y
Schedule
Sequence of interleaved operations from several transactions
ACID Properties
Atomicity, Consistency, Isolation, Durability
Atomicity
A transaction is an atomic unit of processing; it is either performed in its entirety or not
performed at all
Consistency
A correct execution of the transaction must take the database from one consistent state to
another
Isolation
Even though transactions are executing concurrently, they should appear to be executed in
isolation i.e. their final effect should be as it each transaction was executed in isolation from start
to finish
Durability
Once a transaction is committed, its changes (writes) applied to the database must never be lost
because of subsequent failure
Serial Schedule
A schedule S is serial if no interleaving of operations from several transactions (i.e. for every
transaction T, all the operations are executed consecutively)
Serializable schedule
A schedule equivalent to some serial schedule. It will leave the database in a consistent state.
Interleaving such that: transactions see data as if they were serially executed, transactions leave
DB state as if they were serially executed, efficiently achievable through concurrent execution
Conflict Serializability
A schedule S with n transactions is conflict serializable if it is conflict equivalent to some serial
schedule of the same n transactions i.e. relative order of any 2 conflicting operations is the same
in both schedules
Conflicting Operations
Two operations are conflicting in a schedule if:
They belong to different transactions, they access the same item X and at least one of the
operations is a write_item(X)
Database Locks
variable associated with a data item describing status for operations that can be applied. One
lock for each item in the database
Locking operations
read_lock(X) - shared lock, required for reading
write_lock(X) - exclusive lock, required for writing
unlock(X)
Starvation: occurs if a transaction cannot proceed for an indefinite period of time while other
transactions continue normally (solution: first come first serve queue)
Conservative 2PL
requires a transaction to lock all the items it accesses before the transaction begins
Strict 2PL
transaction does not release any of its exclusive locks (write) until after it commits or aborts
Rigorous 2PL
transaction does not release any of its locks (read or write) until after it commits or aborts
WAL rules
log record for a page must be written before corresponsing page is flushed to disk (for atomicity
so each operation is known and can be undone)
all log records must be written before commit (for durability so the effect of a committed
transaction is known)
UNDO process
Scan log from tail to head (backward in time)
Create a list of committed transactions
create a list of rolled-back transactions
undo updates of active transactions
restore before image
append[undo] record to log (in case of crash during recovery)
REDO process
scan the log from head to tail (forward in time)
redo updates of committed transactions
use after image for new values
Query Parsing
Scanner identifies query tokens
Parser checks the query syntax
validation checks all attribute and relation names
Query Tree
represents a specific order of operations for executing a query
Leaves are the input relations, nodes are the operators, bottom-up execution
SELECT algorithms
Sequential scan: cost = b (number of blocks of the data file)
Index scan: cost = x+s (x = number of levels of the index, s = cardinality of the selection i.e.
number of tuples that satisfy the select condition)
PROJECT Algorithm
Scan the input and generate projected tuples as output, cost = b
The inner relation may be accessed through an index on the join attributes if such an index exists,
cost = bR + rR * (cost of the index search on relation S), where rR is the number of records of the
outer relation R
Materialization
Creating, storing and passing temporary results. Required for pipelinine blocking operators when
the temporary input relations do not fit in the available memory buffers
Pipelining
combines several operations into one, avoids writing temporary files
80-20 rule
80% of processing accounted for by only 20% of queries and transactions
drop indexes that may undergo too much updating if based on an attribute that undergoes
frequent changes