Introduction to Dbms
Introduction to Dbms
A Primer on Databases
On the Verge of A Disruptive Century:
Breakthroughs
Gene
Ubiquitous
Sequencing and
Computing
Biotechnology
Smaller, Faster,
Cheaper Sensors
Faster
Communication
The amount of data is only growing…
A Common
1.2 Zettabytes (1ZB = 1021 BTheme isTB)
or 1 Billion Data
in 2010
We Live in a World of Data
Nearly 500 Exabytes per day are generated by the Large Hadron
Collider experiments (not all recorded!)
more!
Encrypt
…. and
Mobile Devices
Computers
We also want to access, share and process our data from all of our devices,
anytime, anywhere!
Data is Becoming Critical to Our Lives
Health Science
Domains
Education of Data
Work
Environment Finance
… and more
Why Studying Databases?
Data is everywhere and is critical to our lives
A Primer on Databases
Course Objectives
In this course we aim at studying:
Big Data,
Hadoop,
How to construct BigTable, parallel
buffer and disk and distributed
How to refine space managers, DBMSs, NoSQL
and speed up query optimizers, and NewSQL
How to query data retrieval and concurrency databases
and manipulate and and crash
How to design databases manipulation recovery
and implement managers for
databases from
DBMSs
‘cradle-to-grave’
1. Describe a wide range of data involved in real-world organizations using the entity-
relationship (ER) data model
3. Analyze and apply a formal query language, relational calculus and algebra
4. Indicate how SQL builds upon relational calculus and algebra and effectively apply
SQL to create, query and manipulate relational databases
5. Design and develop multi-tiered, full-fledged standalone and web-based
applications with back-end databases
6. Appreciate how DBMSs create, manipulate and manage files of fixed-length and
variable-length records on disks
Learning Outcomes
After finishing this course you will be able to:
7. Create and operate various static and dynamic tree-based (e.g., ISAM and B+ trees)
and hash-based (e.g., extendable and linear hashing) indexing schemes
8. Explain and evaluate various algorithms for relational operations (e.g., join) using
techniques such as iteration, indexing and partitioning
9. Analyze and apply different query evaluation plans and describe the various tasks of
a typical relational query optimizer
10. Describe how transactions can be interleaved correctly, and indicate how a DBMS
can ensure atomicity and durability when systems fail or entirely crash
11. Identify alternative architectures for distributed databases, and describe how data
can be partitioned and distributed across networked nodes of a DBMS
12. Appreciate the scale of Big Data, discuss some popular analytics engines for Big
Data processing and denote the applicability of NoSQL databases for Big Data storage
Teaching Methods, Assignments
and Projects
26 Lectures
• Motivate learning
• Provide a framework or roadmap to organize the information of the course
• Explain subjects and reinforce the critical big ideas
14 Recitations
• Get you to reveal what you do not understand, so we can help you
• Allow you to practice skills you will need to become an expert
5 Assignments
• We will have 5 assignments which involve problem solving and span most of the
topics that we discuss in the class
3 Projects
• We will have 3 projects which involve using Postgres, SQL, Python, and Django
Some Rules on the Projects
For all the projects (except the final one), the following rules
apply:
If you submit one day late, 25% will be deducted from your
project score
The project will not be graded (and you will receive a zero
score) if you submit more than two days late
Type # Weight
Projects 3 40%
Exams 2 30%
Problem Solving Assignments 5 15%
Quizzes 2 10%
Class/Recitation Participation and 42 5%
Attendance
Target Audience, Prerequisites
and Textbook
Target Audience:
Juniors and Seniors
Prerequisites:
15-121 and 15-213
Students should have a basic knowledge of data
structures, algorithms, computer systems and
programming languages like C, C++ and Python
Textbook:
Raghu Ramakrishnan and Johannes Gehrke, "Database
Management Systems", Third Edition, McGraw-Hill, 2002
Motivation Outline
Course Overview and Administrivia
A Primer on Databases
A Motivating Scenario
Qatar Foundation (QF) has a “large” collection of data (say 500GB) on
employees, students, universities, research centers, etc.,
Correctness (Consistency)
Changes made to the data by different users must be applied consistently
Correctness (Security)
Access to certain parts of data (e.g., salaries) must be restricted
Correctness (Durability and Atomicity)
This data should survive system crashes/failures
Managing Data using File Systems
What about managing QF data using local file systems?
Files of fixed-length and variable-length records as well as formats
Main memory vs. disk
Computer systems with 32-bit addressing vs. 64-bit addressing schemes
Special programs (e.g., C++ and Python programs) for answering user questions
Special measures to maintain atomicity
Special measures to maintain consistency of data
Special measures to maintain data isolation
Special measures to offer software and hardware fault-tolerance
Special measures to enforce security policies in which different users are
granted different permissions to access diverse subsets of data
Students(sid: string, name: string, login: string, dob: string, gpa: real)
DBMS ensures that conflicts do not arise via using a locking protocol
Shared vs. Exclusive locks
Ensuring Atomicity
Transactions can be interrupted before running to completion for a
variety of reasons (e.g., due to a system crash)
SQL Commands