DATABASE SYSTEM
AND DATA
MODELING CONCEPT
Abrahem P. Anqui
DATABASE SYSTEM
Learning Objectives
Database System Data Modeling Concept
◦ Explain the fundamentals of ◦ Learn about data modeling and
DBMS why data models are important
◦ Describe how the Database plays ◦ Illustrate the basic building blocks
an important role in every of data modeling
Computer System ◦ Explain what is business rule and
how they influence database
◦ Identify the characteristics of
design
data
◦ Write a database model based on a given
specification using different database model
Rationale
◦ Good decisions require good information that is derived from raw facts.
These raw facts are known as data. Data are likely to be managed most
efficiently when they are stored in a database. In this chapter, you will
learn what a database is, what it does, and why it yields better results than
other data management methods. You will also learn about various types of
databases and why database design is so important.
WHY DATABASES?
◦ Imagine trying to operate a business without knowing who your customers are, what products you
are selling, who is working for you, who owes you money, and whom you owe money. All businesses
have to keep this type of data and much more; and just as importantly, they must have those data
available to decision makers when they need them. It can be argued that the ultimate purpose of all
business information systems is to help businesses use information as an organizational resource. At
the heart of all of these systems are the collection, storage, aggregation, manipulation,
dissemination, and management of data, and as time marches on data double even triple
◦ Depending on the type of information system and the characteristics of the business, these data
could vary from a few megabytes on just one or two topics to terabytes covering hundreds of topics
within the business’s internal and external environment.
◦ How can these businesses process this much data? How can they store it all, and then quickly
retrieve just the facts that decision makers want to know, just when they want to know it? The
answer is that they use databases.
Data vs. Information
To understand what drives database
design, you must understand the
difference between data and information.
Data are raw facts. The word raw
indicates that the facts have not yet
been processed to reveal their meaning.
Information is the result of processing
raw data to reveal its meaning. Data
processing can be as simple as
organizing data to reveal patterns or as
complex as making forecasts or drawing
inferences using statistical modeling. To
reveal meaning, information requires
context.
In this “information age,” production of accurate, relevant, and
timely information is the key to good decision making. In turn,
good decision making is the key to business survival in a global
market.
Data are the foundation of information, which is the bedrock of
knowledge—that is, the body of information and facts about a
specific subject. Knowledge implies familiarity, awareness, and
understanding of information as it applies to an environment. A
key characteristic of knowledge is that “new” knowledge can be
derived from “old” knowledge.
Let’s summarize some key points:
◦ Data constitute the building blocks of information.
◦ Information is produced by processing data.
◦ Information is used to reveal the meaning of data.
◦ Accurate, relevant, and timely information is the key to good decision making.
◦ Good decision making is the key to organizational survival in a global
environment.
Data management is a discipline that focuses on the proper
generation, storage, and retrieval of data. Given the crucial role
that data play, it should not surprise you that data management is
a core activity for any business, government agency, service
organization, or charity.
Database
Efficient data management typically requires the use of a computer database. A database is a
shared, integrated computer structure that stores a collection of:
◦ End-user data, that is, raw facts of interest to the end user.
◦ Metadata, or data about data, through which the end-user data are integrated and managed. it's
information that's used to describe the data that's contained in something like a web page, document, or
file. Another way to think of metadata is as a short explanation or summary of what the data is.
A database management system (DBMS) is a collection of programs that manages the
database structure and controls access to the data stored in the database. In a sense, a database
resembles a very well-organized electronic filing cabinet in which powerful software, known as a
database management system, helps manage the cabinet’s contents.
◦ Some DBMS examples include MySQL, PostgreSQL, Microsoft Access, SQL Server,
FileMaker, Oracle, RDBMS, dBASE, Clipper, and FoxPro.
Role and Advantages of DBMS
The DBMS serves as the intermediary between the user and the database. The database
structure itself is stored as a collection of files, and the only way to access the data in those files is
through the DBMS.
Role and Advantages of DBMS
Having a DBMS between the end user’s applications and the database offers some important
advantages. First, the DBMS enables the data in the database to be shared among multiple
applications or users. Second, the DBMS integrates the many different users’ views of the data
into a single all-encompassing data repository.
In particular, a DBMS provides advantages such as:
◦ Improved data sharing
◦ Improved data security
◦ Better data integration
◦ Minimized data inconsistency
◦ Improved data access
◦ Improved decision Making
◦ Increased end-user productivity
Types of Database
A DBMS can support many different types of databases. Databases can be
classified according to the number of users, the database location(s), and
the expected type and extent of use.
◦ Single-user database – supports only on user at a time
◦ Multi-user database - supports multiple users at the same time.
◦ When the multiuser database supports a relatively small number of users
(usually fewer than 50) or a specific department within an organization, it is
called a workgroup database. When the database is used by the entire
organization and supports many users (more than 50, usually hundreds) across
many departments, the database is known as an enterprise database.
Types of Database
Location might also be used to classify the database.
◦ Centralized database - a database that supports data located at a single site
◦ Distributed database - A database that supports data distributed across several different sites
The most popular way of classifying databases today, however, is based on how they will be used
and on the time sensitivity of the information gathered from them.
◦ Operational database (sometimes referred to as a transactional or production database)
- database that is designed primarily to support a company’s day-to-day operations is classified
◦ Data warehouse - focuses primarily on storing data used to generate information required to
make tactical or strategic decisions. Such decisions typically require extensive “data massaging”
(data manipulation) to extract information to formulate pricing decisions, sales forecasts, market
positioning, and so on. Most decision support data are based on data obtained from operational
databases over time and stored in data warehouses. Additionally, the data warehouse can store
data derived from many sources. To make it easier to retrieve such data, the data warehouse
structure is quite different from that of an operational or transactional database.
Why database design is important
Database design refers to the activities that focus on the design of the database
structure that will be used to store and manage end-user data. A database that meets
all user requirements does not just happen; its structure must be designed carefully. In
fact, database design is such a crucial aspect of working with databases that most of
this book is dedicated to the development of good database design techniques. Even a
good DBMS will perform poorly with a badly designed database.
Evolution of File System Data
Processing
◦ Manual File System – Paper-Pencil System
◦ Computerized File System - computer files within the file system were similar to the manual files.
Problems with file system data processing
◦ Structural Dependency
◦ which means that access to a file is dependent on its structure, exists when it is possible to make changes
in the file structure without affecting the application program’s ability to access the data.
◦ Data Dependency
◦ exists when it is possible to make changes in the data storage characteristics without affecting the
application program’s ability to access the data.
◦ Data Redundancy
◦ exists when the same data are stored unnecessarily at different places
◦ Uncontrolled data redundancy sets the stage for
◦ Poor data security.
◦ Data inconsistency
Database System
The problems inherent in file
systems make using a database
system very desirable. Unlike the
file system, with its many separate
and unrelated files, the database
system consists of logically related
data stored in a single logical data
repository.
Database System Environment
The term database system
refers to an organization of
components that define and
regulate the collection,
storage, management, and
use of data within a
database environment
◦ Hardware
◦ Software
◦ People
◦ Procedure
◦ Data
DBMS Function
◦ A DBMS performs several important functions that guarantee the integrity and
consistency of the data in the database. Most of those functions are transparent to end
users, and most can be achieved only through the use of a DBMS. They include
◦ data dictionary management,
◦ data storage management,
◦ data transformation and presentation,
◦ security management,
◦ multiuser access control,
◦ backup and recovery management,
◦ data integrity management,
◦ database access languages and application programming interfaces,
◦ and database communication interfaces.
Assessment Task
1. Define each of the following terms based on what you understand
a. Data
b. Information
c. Database
d. Database System
e. Database Management System
2. What is data redundancy, and which characteristics of the file system can lead to it?
3. What is data independence, and why is it lacking in file systems?
4. What is a DBMS, and what are its functions?
5. Explain the difference between data and information.
6. What is the role of a DBMS, and what are its advantages?
7. What are the main components of a database system?
8. List and describe the different types of databases.
9. What are metadata?
10. Explain why database design is important
DATA MODELS
Data modeling and data models
Database design focuses on how the database structure will be used to store
and manage end-user data. Data modeling, the first step in designing a
database, refers to the process of creating a specific data model for a
determined problem domain. (A problem domain is a clearly defined area within
the real-world environment, with well-defined scope and boundaries, that is to
be systematically addressed.
Data Model
A data model is a relatively simple representation, usually graphical, of more complex
real-world data structures. In general terms, a model is an abstraction of a more
complex real-world object or event. A model’s main function is to help you understand
the complexities of the real-world environment. Within the database environment, a
data model represents data structures and their characteristics, relations, constraints,
transformations, and other constructs with the purpose of supporting a specific problem
domain.
Data Modeling
Data modeling is an iterative, progressive process. You start with a simple understanding
of the problem domain, and as your understanding of the problem domain increases, so
does the level of detail of the data model. Done properly, the final data model is in effect a
“blueprint” containing all the instructions to build a database that will meet all end-user
requirements. This blueprint is narrative and graphical in nature, meaning that it contains
both text descriptions in plain, unambiguous language and clear, useful diagrams
depicting the main data elements.
Note: The terms data model and database model are often used interchangeably. In this
book, the term database model is used to refer to the implementation of a data model in a
specific database system.
Importance of Data Models
Data models can facilitate interaction among the designer, the applications programmer, and the end
user. A well-developed data model can even foster improved understanding of the organization for
which the database design is developed. In short, data models are a communication tool. This
important aspect of data modeling was summed up neatly by a client whose reaction was as follows;
“I created this business, I worked with this business for years, and this is the first time I’ve really
understood how all the pieces really fit together.”
Keep in mind that a house blueprint is an abstraction; you cannot live in the blueprint. Similarly, the
data model is an abstraction; you cannot draw the required data out of the data model. Just as you are
not likely to build a good house without a blueprint, you are equally unlikely to create a good database
without first creating an appropriate data model.
Data Model Building Blocks
◦ Attribute - is a characteristic of an entity. For example, a CUSTOMER entity would be described by attributes
such as customer last name, customer first name, customer phone, customer address, and customer credit
limit. Attributes are the equivalent of fields in file systems.
◦ Relationship - describes an association among entities. For example, a relationship exists between
customers and agents that can be described as follows: an agent can serve many customers, and each
customer may be served by one agent. Data models use three types of relationships: one-to-many, many-to-
many, and one-to-one.
◦ One-to-many (1:M or 1..*) relationship. A painter paints many different paintings, but each one of them is painted by
only one painter.
◦ Many-to-many (M:N or *..*) relationship. An employee may learn many job skills, and each job skill may be learned by
many employees.
◦ One-to-one (1:1 or 1..1) relationship. A retail company’s management structure may require that each of its stores be
managed by a single employee.
◦ A constraint is a restriction placed on the data. Constraints are important because they help to ensure data
integrity. Constraints are normally expressed in the form of rules. For example:
◦ An employee’s salary must have values that are between 6,000 and 350,000.
◦ A student’s GPA must be between 0.00 and 4.00.
◦ Each class must have one and only one teacher.
Business Rule
◦ When database designers go about selecting or determining the entities, attributes, and relationships
that will be used to build a data model, they might start by gaining a thorough understanding of what
types of data are in an organization, how the data are used, and in what time frames they are used.
But such data and information do not, by themselves, yield the required understanding of the total
business. From a database point of view, the collection of data becomes meaningful only when it
reflects properly defined business rules. A business rule is a brief, precise, and unambiguous
description of a policy, procedure, or principle within a specific organization.
◦ Business rules, derived from a detailed description of an organization’s operations, help to create and
enforce actions within that organization’s environment. Business rules must be rendered in writing and
updated to reflect any change in the organization’s operational environment.
◦ Properly written business rules are used to define entities, attributes, relationships, and constraints
Hierarchical and Network Model
◦ The hierarchical model was developed in the 1960s to manage large
amounts of data for complex manufacturing projects such as the Apollo
rocket that landed on the moon in 1969. Its basic logical structure is
represented by an upside-down tree. The hierarchical structure contains
levels, or segments. A segment is the equivalent of a file system’s record
type. Within the hierarchy, a higher layer is perceived as the parent of the
segment directly beneath it, which is called the child. The hierarchical
model depicts a set of one-to-many (1:M) relationships between a parent
and its children segments. (Each parent can have many children, but each
child has only one parent.)
Hierarchical and Network Model
◦ The network model was created to represent complex data relationships more effectively than
the hierarchical model, to improve database performance, and to impose a database standard.
In the network model, the user perceives the network database as a collection of records in 1:M
relationships. However, unlike the hierarchical model, the network model allows a record to have
more than one parent. While the network database model is generally not used today, the
definitions of standard database concepts that emerged with the network model are still used by
modern data models. Some important concepts that were defined at this time are:
◦ The schema, which is the conceptual organization of the entire database as viewed by the database
administrator.
◦ The subschema, which defines the portion of the database “seen” by the application programs that
actually produce the desired information from the data contained within the database.
◦ A data management language (DML), which defines the environment in which data can be managed
and to work with the data in the database.
◦ A schema data definition language (DDL), which enables the database administrator to define the
schema components.
*The Relational Model
◦ The relational model was introduced in 1970 by E. F. Codd (of IBM) in his landmark paper “A
Relational Model of Data for Large Shared Databanks” (Communications of the ACM, June 1970,
pp. 377−387). The relational model represented a major breakthrough for both users and
designers. To use an analogy, the relational model produced an “automatic transmission” database
to replace the “standard transmission” databases that preceded it. Its conceptual simplicity set the
stage for a genuine database revolution.
◦ The relational model foundation is a mathematical concept known as a relation. To avoid the
complexity of abstract mathematical theory, you can think of a relation (sometimes called a table)
as a matrix composed of intersecting rows and columns. Each row in a relation is called a tuple.
Each column represents an attribute. The relational model also describes a precise set of data
manipulation constructs based on advanced mathematical concepts.
◦ The relational data model is implemented through a very sophisticated relational database
management system (RDBMS). The RDBMS performs the same basic functions provided by the
hierarchical and network DBMS systems, in addition to a host of other functions that make the
relational data model easier to understand and implement.
The Relational Model
The relationship type (1:1, 1:M, or M:N) is
often shown in a relational schema, an
example of which is shown in Figure 2.2. A
relational diagram is a representation of
the relational database’s entities, the
attributes within those entities, and the
relationships between those entities.
Entity Relationship Model
The entity relationship (ER) model, or ERM, has become a widely accepted standard
for data modeling.
Peter Chen first introduced the ER data model in 1976; it was the graphical
representation of entities and their relationships in a database structure that quickly
became popular because it complemented the relational data model concepts. The
relational data model and ERM combined to provide the foundation for tightly structured
database design. ER models are normally represented in an entity relationship
diagram (ERD), which uses graphical representations to model database components.
The ER model is based on the following
components:
◦ Entity. Earlier in this chapter, an entity was defined as anything about which data are to be
collected and stored. An entity is represented in the ERD by a rectangle, also known as an entity
box. The name of the entity, a noun, is written in the center of the rectangle. The entity name is
generally written in capital letters and is written in the singular form: PAINTER rather than
PAINTERS, and EMPLOYEE rather than EMPLOYEES. Usually, when applying the ERD to the
relational model, an entity is mapped to a relational table. Each row in the relational table is known
as an entity instance or entity occurrence in the ER model. Each entity is described by a set of
attributes that describes particular characteristics of the entity. For example, the entity EMPLOYEE
will have attributes such as a Social Security number, a last name, and a first name. (Chapter 4
explains how attributes are included in the ERD.)
◦ Relationships. Relationships describe associations among data. Most relationships describe
associations between two entities. When the basic data model components were introduced, three
types of relationships among data were illustrated: one-to-many (1:M), many-to-many (M:N), and
one-to-one (1:1). The ER model uses the term connectivity to label the relationship types. The
name of the relationship is usually an active or passive verb. For example, a PAINTER paints many
PAINTINGs; an EMPLOYEE learns many SKILLs; an EMPLOYEE manages a STORE.
Figure 2.3 shows the different types of relationships using two ER notations: the
original Chen notation and the more current Crow’s Foot notation.
Object-oriented Model
Increasingly complex real-world The OO data model is based on the following components:
problems demonstrated a need ◦ An object is an abstraction of a real-world entity. In general terms, an
for a data model that more object may be considered equivalent to an ER model’s entity. More
closely represented the real precisely, an object represents only one occurrence of an entity.
world. In the object-oriented
data model (OODM), both data ◦ Attributes describe the properties of an object. For example, a
and their relationships are PERSON object includes the attributes Name, Social Security
contained in a single structure Number, and Date of Birth.
known as an object. In turn, the ◦ Objects that share similar characteristics are grouped in classes. A
OODM is the basis for the class is a collection of similar objects with shared structure (attributes)
object-oriented database and behavior (methods).
management system ◦ A class’s method represents a real-world action such as finding a
(OODBMS).
selected PERSON’s name, changing a PERSON’s name, or printing a
PERSON’s address.
◦ Inheritance is the ability of an object within the class hierarchy to
inherit the attributes and methods of the classes above it.
Object-oriented data models are typically depicted using Unified Modeling Language (UML) class diagrams.
Unified Modeling Language (UML) is a language based on OO concepts that describes a set of diagrams and
symbols that can be used to graphically model a system. UML class diagrams are used to represent data and
their relationships within the larger UML object-oriented system’s modeling language. For a more complete
description of UML see Appendix H, Unified Modeling Language (UML).
Assessment Task
◦ Using the table below create the
following model
1. Relational Model
2. Entity-relational Model
a. Chen’s Notation
b. Crow’s Foot Notation
3. UML Class Diagram