0% found this document useful (0 votes)
16 views

Computer Science Study Notes SL - HL

Uploaded by

23s7006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Computer Science Study Notes SL - HL

Uploaded by

23s7006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

COMPUTER SCIENCE STUDY NOTES

JAMES REID, IB 2020

TABLE OF CONTENTS

System fundamnetals – Topic 1 2


1.1 Systems in Organaisations 2
Context for planning a system – 1.1.1 3
Compatibility issues 🡪 legacy systems, business mergers – 1.1.3 3
Different systems implementation 3
Alternative installation processes 4
Data migration – problems 5
Testing 5
User documentation 5
User Training Methods 6
Data Loss 6
Methods to prevent data loss 6

Computer Architecture – topic 2 7


CPU components 7
Memory 7
Operating Systems 7
Software applications 8

Networks – Topic 3 8
Computer networks 8
Intranet vs internet 9
Extranet 9
Open Systemes interconnection (OSI) 9
Data packets 9
Protocols 9
Hardware for Wireless 10

Algorithms – TOpic 4 10
Computer language 10
Fundamental operations of computers 10

Option Topic: Databases 10


Data vs Information 10
Information Systems 10
Types of Information System 11
Durability 11
States of Transaction 11
Database Transaction 11
ACID 11
What Is A Database 12

Database Anatomy 12
General 12
Join (only inner join in syllabus) 13
Redundant data (orphan data) 13
Database Normalisation 13
1st Normal Form 13
2nd Normal Form 13
3rd Normal Form 14
Referential Integrity 14
DBMS 14
Relational DataBase Management System (RDBMS) 15
RDBMS vs DBMS 15
Database Administrators 15
Query Functions 16
Update Functions 16
Data Validation 16
Data Verification 16
Schema 16
Levels of Schema 16
Data Definition Language 17
Data Modelling 17
Redundant Data 18
Referential Integrity 18
Integrity Rules 19
End-Users 19
Database Recovery 19
Recovery 20
Recovery with Concurrent Transactions 20
Checkpoint 20
Database Integration 21
Database Protection Act 21
Data Matching and Data Mining 21

SYSTEM FUNDAMNETALS – TOPIC 1

1.1 SYSTEMS IN ORGANAISATIONS

2
CONTEXT FOR PLANNING A SYSTEM – 1.1.1

A new system is created to replace a system that is inefficient, no longer suitable for its original purpose, redundant
or out-dated.

Purpose of a new system:

● Increase productivity
● Improve quality of output
● Minimise cost

Potential organisational issues:

● Lack of stakeholder and end-user participation


● Lack of end-user ‘ownership’ of system
● Lack of attention to required training
● Lack of attention to the design of tasks and jobs, allocation of information system tasks and the overall
usability of the system.

Feasibility using TOLES:

● Technical feasibility 🡪 is the existing technology sufficient to implement propose system?


● Economic feasibility 🡪 is the proposed system cost effective?
● Legal feasibility 🡪 are there any conflicts between the proposed system and law/regulation?
● Operational feasibility 🡪 are the existing organisation practices and procedures sufficient to support the
maintenance and operation of the new system?
● Schedule feasibility 🡪 how long will we wait?

COMPATIBILITY ISSUES 🡪 LEGACY SYSTEMS, BUSINESS MERGERS – 1.1.3

Some legacy systems are important in organisations because its data cannot be converted to newer formats, or its
applications cannot be upgraded.

Business merger: the combining of two or more business entities

● Usually to reduce costs

Strategies for integration:

● Keep both systems, develop them to have the same functionality (high maintenance cost)
● Replace both systems with a new one (increased initial cost)
● Select best IT systems from each company and combine them (can be difficult for employees to work with
systems from another company)
● Choose one system and drop the other (policy problems)

Software incompatibility: when different software entities or systems cannot operate satisfactorily, cooperatively
or independently on the same computer, or on different computers linked by a computer network,

DIFFERENT SYSTEMS IMPLEMENTATION

3
Locally hosted 🡪 appropriate for larger, complex systems

Remotely hosted 🡪 appropriate when there is no necessary hardware equipment in place, or when admin wishes to
outsource responsibilities for maintenance, support, backups, security, etc.

SaaS (Software-as-a-Service): allows software and data to be hosted and managed centrally on a remote data
centre, users pay to access service on a subscription basis, reside on the cloud and need a web browser and
broadband internet connection to be accessed.

Positives of SaaS

● Less expensive – low initial costs, few investments in installation, maintenance and upgrading, only have
to pay for subscription (cheaper in short-medium term)
● Supported by wide range of desktop, portable and mobile devices
● Fewer IT personnel required

Negatives of SaaS

● Possibility of data loss if provider goes out of business


● Performance of web browser-based application hosted in a distant data centre accessed via internet is low
when compare to software running on local machine or over company’s local network

ALTERNATIVE INSTALLATION PROCESSES

Changeover: the process of putting the new system online and retiring the old one

Parallel changeover

● Both systems work in parallel for a short period of time


● Limited risk 🡪 outputs of both systems can be compared to ensure new system is running properly
● Extra cost + workload of running two systems concurrently
● Not efficient if systems have different processing tasks, functions, inputs, outputs

Big Bang or Direct

● High risk since company plugs in new system and unplugs the old one at the same time
● Preferred when the system is not critical
● All users need to be trained before the switch takes place
Pilot

● large organisations that have multiple sites


● Low risk
● First group that adopts the system, is called the pilot site or pilot group
● After system proves successful at pilot site, it is implemented into the rest of the company using a
changeover method

Phased

● one module of the system at a time

4
● Takes longer
● Installation is often done by department

DATA MIGRATION – PROBLEMS

Data migration: the transfer of data between different formats, storage types and computer systems

Problems include:

● Incompatibility
● Non-recognisable data structures
● Data may be lost or not transferred due to an incomplete data transfer or error
● Data can be misinterpreted, caused by different conventions of each country regarding date, time and
measurement units

TESTING

Functional testing tests individual commands, text input, menu functions, etc. confirms that they perform and
function correctly according to the design specifications

Data testing is when normal, abnormal and extreme data is put into the system

Alpha testing → before available to general public, internally tested in company in laboratory type environment

Beta testing → comments and suggestions of the users, feedback is used to fix defects and errors that were
missed, Users outside company are involved in testing

Dry-run testing → is conducted using pen-and-paper by programmer, mentally runs algorithm and examines
source code to decide what the output should be

Unit testing → individual parts of system are tested separately

Integration testing → all system is tested at once to verify that all components work together

User acceptance testing → determines if system satisfies the customer needs

Debugging → systematic process of finding and correcting the number of bugs (errors) in a computer program

Validation is the process of evaluation whether data input follows appropriate specifications and is within
reasonable limits

Verification is the process of ensuring that the data input is the same as the original source data, e.g. double entry

USER DOCUMENTATION

User documentation is the document that explains the working of the system to the user

Simple user documentation → faster system implementation (less training)

Only need to know how to use system, not how it works

5
Types:

● Manual → always available, helps when installing system, can be lost, may not be updated when system is
updated, no search facility
● Email support
● Embedded assistance or integrated user assistance 🡪 accessible at any time when using program, can
help solve some major errors, only used after system is installed, no help when installing system, not very
specific, lacks search capability
● FAQ
● Live chat session
● Online portals → extensive compared to help files, continuously revised by system developer, often have
live support, search capabilities, useless if user loses internet connection, live support ineffective with
users unfamiliar with computers
● Remote desk connection

USER TRAINING METHODS

Self-instruction

● Advantages: cheaper, users can go through at anytime


● Disadvantages: no assistance of they become stuck, can be hard to become an expert in the system, only
suitable for experienced computer users
● Uses manual/learn as you go

Formal classes

● Instructor teaches and explains how to use system


● Useful for large amounts of staff, effective/cheap

Remote/online/personal Training

● Instructor training a single user


● Most effective form of training, can be suited for user’s needs and abilities, very expensive compared to

DATA LOSS

Sources of data loss

● Accidental user mistakes


● Poor storage
● Power outage
● Defect hardware drives
● System crashes
● Malicious activity by employees or outsiders
● Natural disasters
● Gets changed to be incorrect

METHODS TO PREVENT DATA LOSS

6
● Regular backups
● Copy sensitive information on to another device
● Store in another building (remote)
● Making hard copies
● Printing/photocopying
● Antivirus protection
● Redundancy (backup copies of a file)
● Incremental backups (files you changed) or autosave
● Compression
● Firewalls

COMPUTER ARCHITECTURE – TOPIC 2

CPU COMPONENTS

CPU = brain of computer, controls peripheries and processes inside computer and carries out calculations in data
system.

Control Unit 🡪 controls operation of CPU, responsible for retrieval and execution of instructions, controls input and
output devices.

Arithmetic Logic Unit 🡪 performs arithmetic and logical operations.

Registers – small very fast circuits that store intermediate values from calculations or instructions inside CPU.

● Hold address of locations in memory to store data.

MEMORY

Primary – directly accessed by CPU, smaller, faster

● RAM 🡪 read and written from, volatile, data and instructions


o DRAM – dynamic (cheaper, slower, main memory)
o SRAM – static (faster, expensive, cache)
● ROM 🡪 only read from, non-volatile, BIOS

Secondary – stores data after programs turned off, large memory, slower access

● CDs, hard drives, etc.

Cache memory is random access memory that a computer microprocessor can access more quickly than regular
primary memory (RAM).

Fetch-execute cycle: Fetching, Decoding, Executing, Storing

OPERATING SYSTEMS

Peripheral communication

7
● Input and output systems
● Peripherals controlled by device drivers = software which allows HW to be used by OS (translator)

Memory management

● Keeps track of storage devices and controls application access to RAM


● Memory can be read, modified and written by OS
● Ensures that applications do not interfere with memory for another application

Resource monitoring and multitasking

● Coordinates execution of instructions/applications by allocating CPU time – by time and priority

Networking

● Manages connections/interactions with other networked systems

Disk access and data management

● Keeps track of which file is being used by which application


● Coordinates transfer of data from disk file to memory

Security

● Prevents unauthorised access


● Protects files

SOFTWARE APPLICATIONS

Applications focus on using the computer to solve specific real-world problems.

NETWORKS – TOPIC 3

COMPUTER NETWORKS

A computer network is a collection of computers and devices connected together via communication devices and
transmission media.

Server: computer system or a software application that provides a service to the other computer systems
connected to the same network.

Client: computer system or a software application that requests a service from a server connected to the same
network;

Transmission media

● WIFI
o Small range, inexpensive, decent reliability, poor security
● Metal conductor
o Work up to 100m, expensive, highly reliable, good security
● Fibre optic

8
o Faster (light signals), bandwidth is greater, more expensive, security is very good, large range

INTRANET VS INTERNET

Internet = an open, public space

Intranet = private space, may be accessible from the internet but protected by a password and accessible only to
employees or other authorised users

EXTRANET

Uses the internet to allow controlled access by specific users to a specific WAN or LAN.

Inaccessible to the public

OPEN SYSTEMES INTERCONNECTION (OSI)

Standard layers with specific responsibilities 🡪 possible to implement communication with interchangeable
modules

Layers of OSI:

1. Physical 🡪 cabling components


2. Data link 🡪 NIC
3. Network 🡪 routing
4. Transport 🡪 transmission-error detection
5. Session 🡪 retransmission of data if it is not received by a device
6. Presentation 🡪 encryption and decryption of message
7. Application 🡪 electronic mail

DATA PACKETS

Data packet 🡪 data is broken up into segments and each segment is put into a packet

● A packet contains the data and the information that identifies the type of data, where it comes from and
where it is going
● Used in internet protocol transmissions for data that is transmitted

Packet switching is the transportation process in which data is broken into suitable-pieces or blocks for fast and
efficient transfer via different network devices/paths

PROTOCOLS

Protocols are a set of rules that control how communication happens in networked computers.

Handshaking 🡪 when two devices start communicating, they must agree on the transmission speed that will be
used.

Functions of protocols:

9
● Data integrity
● Flow control
● Congestion management
● Deadlock
● Error checking
● Speed of transmission
● Data compression

HARDWARE FOR WIRELESS

Wireless router – a router sends info between your network and the internet/controls the transfer of data in a
network

Access point – responsible for sending and receiving data

Network Interface Card (NIC) – Interface cards determine the infrastructure of a local area network (LAN); and
allow all of the computers to connect to the network;

MAC ADDRESS

Each (wireless network) adapter has a unique label called a MAC address;

Routers uses these addresses to identify/authenticate computers (routers include an option to whitelist or
blacklist certain devices based on MAC addresses, so access could be restricted to any device which is not in the
whitelist);

One disadvantage is that the whitelist should be amended any time a new device is purchased / when access to
guests should be granted;

Also this method is useless against hackers who use programs which intercept data passing through network and
report the MAC address of any device communicating on the network;

ALGORITHMS – TOPIC 4

COMPUTER LANGUAGE

Fixed vocabulary, unambiguous meaning, consistent grammar and syntax

Compiler 🡪 entire program, searching for syntax errors, stored in binary, very fast to execute, secure

Interpreter 🡪 line by line, machine depended, suited to teaching programming, least efficient

FUNDAMENTAL OPERATIONS OF COMPUTERS

Add: accumulator + 1

Store: accumulator into memory

Compare: A > B

10
Retrieve: Get A

ABSTRACTION
Abstraction allows us to create a general idea of what the problem is and how to solve it;

Abstraction removes all specific detail, and any patterns that will not help in solving a problem. This helps in
forming a “model” (If designers don’t abstract they may end up with the wrong solution to the problem they are
trying to solve);

Abstraction is widely used because there exist a number of “patterns” in programming that keeps repeating in
every application/program;

The pattern corresponding to an issue can be found, then the abstract solution to it can be found and
implemented, and the problem is solved;
Most programming languages provide some built-in abstract patterns, which are easy to use (some API provides
more advanced patterns);

OPTION TOPIC: DATABASES

DATA VS INFORMATION

Data is raw, unorganized facts that need to be processed.

When data is processed, organized, structured or presented in a given context so as to make it useful, it is called
information

INFORMATION SYSTEMS

Information systems convert the data into information.

At the core of an information system is a database (raw data)

TYPES OF INFORMATION SYSTEM

Management Information Systems (MIS)

● A computer-based system that supports databases required by different levels of users (stakeholders) and
management is called a management information system
● As the name implies it is a tool primarily for the management of any company and primarily it helps
management

Transaction Processing Systems (TPS)

Decision Support Systems (DSS)

Executive Information Systems (EIS)

Expert Systems (ES)

11
DURABILITY

The durability property guarantees that, once a transaction completes successfully all the updates that it carried
out on the database persist, even if there is a system failure after the transaction completes execution

Ensuring durability is the responsibility of a component of the database system called the recovery-management
component

STATES OF TRANSACTION

Active: the starting state, the transaction stays in this state whilst the code for this state is being run

Partially committed: until the final line of the code for this state has finished running

Failed: when the normal running of the code has failed to take place

Aborted: due to a failure in the process the database is not updated but has been rolled back to its original state
prior to the start of this

Committed: after successful running of the full code and update of the correct records

DATABASE TRANSACTION

A transaction in the context of a database is a sequence of operations performed as a single logical unit of work

ACID

Atomicity

● A transaction is said to be atomic if a transaction always executes all its action in one go or not executes
any actions at all
● Either all or none of the steps are performed

Consistency

● Data should be valid according to all defined rules


● No violation of integrity

Isolation

● Ensures that different transactions have nothing to do with each other


● If several transactions are executed concurrently (or in parallel), then each transaction must behave as if it
was executed in isolation
● A solution to the problem of concurrently executing transaction is to execute each transaction serially
● Concurrency control is the process which prevents such processes from interfering with each other

Durability

● Committed data would not be lost even after power failure

12
● Once a transaction commits, the system must guarantee that the result of its operations will never be lost,
in spite of subsequent failures

ACID is a set of properties guarantee database transactions are processed reliably

WHAT IS A DATABASE

An organised collection of data generally stored and accessed electronically from a computer system

Stored in fields and records

● Field = column, record = row


● Need for Databases:
o Access data
o Search data
o Share data
o Transfer data
o Migrate data

DATABASE ANATOMY

GENERAL

A database is a collection of one or more database files

A file is a collection of related information (records)

A record is the information relating to one person, product or event

A field is a discrete chunk of information in a record

Primary key is a field which uniquely identifies the records of a given table

Secondary key is a field in a database which can be used for searching, it is called ‘indexed’

Foreign key is a primary key which has been imported into another table

Super key or composite key is one or more fields which collectively identify a record in a table

An entity is some unit of data that can be classified and has stated relationships to other data units.

JOIN (ONLY INNER JOIN IN SYLLABUS)

In databases a join refers to the linking of two or more tables using primary keys which may/may not be
composite in nature

The inner join ensures that the string which denotes a field in one table matches that same key in another table

REDUNDANT DATA (ORPHAN DATA)

13
Rogue fields of data in a table which have no source or origin are referred to as orphan or redundant data

DATABASE NORMALISATION

The process of organising data to minimise duplication

● Usually involves dividing a database into two or more tables and defining relationships between the
tables

Advantages:

● Reduce total storage


● Update is easier
● Easier to protect sensitive information through the use of keys
● Atomic values in attributes give uniformity in writing queries
● More complex operations can be implemented more easily by queries on rows/columns

1ST NORMAL FORM

A database is in first normal form if it satisfies the following conditions:

● Contains only atomic values


● Each field should be unique
● There are no repeating groups
● Each row/record is unique and has a primary key

2ND NORMAL FORM

A database is in second normal form it satisfies the conditions:

● The table must be in 1NF


● All non-key attributes must depend on every part of the primary key

3RD NORMAL FORM

A database is in third normal form if it satisfies the following conditions:

● The table must be in 2NF


● There are no non-key attributes that depend on another non-key attribute

REFERENTIAL INTEGRITY

Referential integrity refers to the accuracy and consistency of data within a relationship. In relationships, data is
linked between two or more tables. This is achieved by having the foreign key (in the associated table) reference a
primary key value (in the primary – or parent – table)

14
DBMS

A database management system is a set of programs that allows to read, store, change/extract data in a database

● Allows user to manipulate data and create databases as per their requirements
● Provides data security and data protection
● Examples - MySQL, MS Access, Oracle

Advantages of DBMS 🡪 speed, accuracy and accessibility

Functions and tools of DMBS

● Management functions and tools 🡪 focus on creation, manipulation and interrogation of a database
● Uses files rather than linked tables to hold data
● Persistent storage management
● Transaction management
● Resiliency: recovery from crashes
● Separation between logical and physical views of the data
● High level query and data manipulation language
● Efficient query processing
● Interface with programming languages

Data Security - DMBS

● Features involving data validation, access rights and data locking


● Locks and timestamps can be used to provide an environment in which concurrent transactions can
preserve their consistency and isolation properties

Data locking - DMBS

● A lock is a mechanism that tells the DMBS whether a particular data item is being used by any transaction
for read/write purposes 🡪 can’t be edited by multiple people at same time

RELATIONAL DATABASE MANAGEMENT SYSTEM (RDBMS)

A RDBMS stores data in tables that makes the relation between the data

Advantages of RDBMS

● Data is stored only once, and hence multiple record changes are not required
● Deletion and modification of data becomes simpler and storage efficiency is very high
● Complex queries can be carried out using the Structure Query Language (SQL) - ‘insert’, ‘update’, ‘delete’,
‘create’, ‘drop’
● Better security is offered by the creation of tables and customised level of data protection - users can set
access barriers to limit access to the available content
● Provision for future requirements as new data can easily be added and appended to existing tables and
made consistent with previously available content

Disadvantages of RDBMS

15
● Cost of execution - special software, setting up of data
● Security of data can be threatened during data migration
● Complex images, numbers and designs are not easy to categorise into tables - presents a problem
● Some fields have a character limit
● Isolated databases can be created - such large volumes of data are not easy to connect

RDBMS VS DBMS

DBMS stores data as files whereas RDBMS stores data in a tabular arrangement

RDBMS allows for normalisation of data

RDBMS maintains a relation between the data stored in its tables, a normal DBMS blankly stores its data in files
with no link

Structured approach of RDBMS supports a distributed database unlike a normal DBMS

DATABASE ADMINISTRATORS

Database administrators use specialised software to store and organise data

Tasks/roles

● Sets access levels and passwords


● Manages back-up procedures
● Establishes a recovery plan for the database in disaster case
● Capacity to plan, install and configure → also maintenance and upgrades
● Database design
● Data migration
● Performance monitoring
● Security
● Troubleshooting

QUERY FUNCTIONS

Query functions - searching data for required results and update functions

A query function is designed to retrieve specific results from a database

UPDATE FUNCTIONS

Update functions update the data and records stored in a database

Can be done by the user/stakeholder/database administrator

DATA VALIDATION

Validation 🡪 trying to reject bad data at the entry level rather than once it has entered the system

16
Data validation is the process of comparing data with a set of rules to find out if data is reasonable:

● Format check - checks data is in the right format (dd/mm/yyyy)


● Presence check - checks that data has been entered into a field
● Range check - checks that a value falls within the specified range
● Type check - ensures the correct data type has been entered
● Length check

Validation is the process that tries to ensure that data is sensible, reasonable, complete and within acceptable
boundaries

DATA VERIFICATION

Data verification is the process of checking that the data entered exactly matches the original source to find out if
data is accurate

Methods include

● Double entry - entering data twice


● Proofreading data - someone checks the data entered against the original document
● Echo - system repeats the data being entered

SCHEMA

The term ‘schema’ refers to the organisation of data as a blueprint of how the database is constructed

● Includes access, view for different users, etc.

LEVELS OF SCHEMA

Conceptual

● Structure of the whole database for the community of users. Hides information about the physical storage
structures and focuses on describing data types, entities, relationships, etc.

Logical

● Logical constraints that need to be applied on the data stored. It defines tables, views and integrity
constraints

Physical

● Pertains the actual storage of data and its form of storage like files, indices, etc. It defines how the data
will be stored in the secondary storage.

Mappings among schema levels are needed to transform requests and data

DATA DEFINITION LANGUAGE

17
We can implement our data model by using a powerful database language such as SQL it is possible to create or
alter tables directly

To do this directly we need Data Definition Language (DDL) statements which are used to define the database
structure or schema

Some examples of powerful database language are:

● CREATE: this creates a new table in the database


● DROP: deletes a table form the database
● ALTER: to modify the design of a table

DATA MODELLING

Data model can be defined as an integrated collection of concepts for describing and manipulating data
relationships between data, and constraints on the data in an organisation

Data model components

● Structural part → consisting of a set of rules according to which databases can be constructed
● Manipulative part → defining the types of operation that are allowed on the data
● Integrity rules → which ensures that the data is accurate

The purpose of a data model is to represent data and to make the data understandable

Physical data models → used for data at the internal level

● Describe how data is stored in the computer, representing information such as record structures, record
ordering, index, B-tree, hashing and access paths

Record based data models → used to describe data at the conceptual level

● Used to specify the overall logical structure of the database and to provide a higher-level description of
the implementation
● Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length
● Three record-based data models
o Hierarchical model
o Network model
o Relational model → most common in recent models

Object based data models → used to describe data at the conceptual level

● Entity Relationship (E-R) model has emerged as one of the main techniques for modelling database design
and forms the basis for the database design methodology
● Object oriented data model extends the definition of an entity to include, not only the attributes that
describe the state of the object but also the actions that are associated with the object, that is, its
behaviour. The object is said to encapsulate both state and behaviour
● Semantic systems model represents the equivalent of a record in a relational; system or an object in an
OO system but they do not include behaviour. They are abstractions ‘used to represent real world or
conceptual objects’

18
REDUNDANT DATA

Redundancy means having multiple copies of the same data in the database

● This problem arises when a database is not normalised

Causes of redundant data

● Insertion anomaly
o This problem occurs when the insertion of a data record is not possible without adding some
additional unrelated data to the record
● Deletion anomaly
o Occurs when deletion of a data record results in losing some unrelated information that was
stored as part of the record that was deleted
● Updation anomaly
o If updation does not occur at all places, then the database will be inconsistent

REFERENTIAL INTEGRITY

Ensures that the link between two tables is linking the correct fields and not drawing any incorrect data from
another table

The preservation of the integrity of a database system is concerned with the maintenance of the correctness and
consistency of the data

Integrity violations may arise from many different sources such as: → all result in data corruption

● Typing errors by data entry clerks


● Logical errors in application programs
● Errors in system software

Many commercial DBMS have an integrity subsystem which is responsible for:

● Monitoring transactions
● Updating the database
● Detecting integrity violations

In the event of an integrity violation, the system then takes appropriate action, which should involve rejecting the
operation, reporting the violation, and if necessary returning the database to a consistent state

INTEGRITY RULES

Domain integrity rules

● A domain defines the possible values of an attribute/field


● In a database, the domain integrity is defined by:
o datatype and length
o NULL value acceptance
o allowable values, through techniques like constraints or rules

19
o default value

Entity integrity rules

● Entity integrity rules relate to the correctness of relationships among attributes of the same relation and
to the preservation of key uniqueness
● Requirement of entity integrity rules: all entries are unique and non-null entries in primary key
● Purpose of entity integrity rules: guarantees that each entity will have a unique primary key

Referential integrity rules

● Referential integrity rules are concerned with maintaining the correctness and consistency of
relationships between relations
● Foreign key must have either a null entry or an entry that matches the primary key value in a table to
which it is related

END-USERS

The end-user is most likely to be using a given set of pre-defined queries on a regular basis, so that raw data can be
protected at all times from accidental damage

The interaction of an end-user with a database is pre-defined by their role in the company. Based on their role they
will have a given ‘view’ of the database

DATABASE RECOVERY

Issues relating to the cost of implementing such systems are weighed against the importance of the data

Crash Recovery

● The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware
and system software
● If it fails or crashes in the middle of transactions, it is expected that the system would follow some sort of
algorithms or techniques to recover lost data

We generalise a failure into various categories:

● Transaction failure - a transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. Causes:
o Logical errors - when a transaction cannot complete because it has some code error or any
internal error condition
o System errors - where the database system itself terminates an active transaction because the
DBMS is not able to execute it, or it has to stop because of some system conditions
● System crash - there are problems that are external to the system that may cause the system to stop
abruptly and cause the system to crash
o E.g. interruptions to power supply, hardware failure, software failure
● Disk failure - in the early days of tech evolution it was a common problem where hard-disk drives or
storage drives used to fail frequently. Disk failures include:
o Formation of bad sectors

20
o Unreachability to the disk
o Disk head crash
o Any other failure which destroys all or part of disk storage

RECOVERY

When a DBMS recovers from a crash, it should maintain the following:

● Check the states of all the transactions which were being executed
● A DBMS must ensure the atomicity of the transactions
● No transactions would be allowed to leave the DBMS in an inconsistent state

There are two types of techniques which can help a DBMS in recovering as well as maintaining the atomicity of a
transaction:

● Maintaining the logs of each transaction, and writing them onto some stable storage before actually
modifying the database
● Maintaining shadow paging where the changes are done on a volatile memory, and later, the actual
database is updated

RECOVERY WITH CONCURRENT TRANSACTIONS

When more than one transaction is being executed in parallel the logs are interleaved

● At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start
recovering

To ease this situation, most modern DBMS use the concept of ‘checkpoints’

CHECKPOINT

Keeping and maintaining logs in real time and in real environment may fill out all the memory space available

● As time passes, the log file may grow too big to be handled at all

Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a
storage disk

Checkpoint declares a point before the DBMS was in a consistent state, and all the transactions were committed

DATABASE INTEGRATION

Database integration refers to getting data from one database to another

The integration could be in ‘real time’ or happen on a periodic basis: nightly, weekly, bi-weekly, monthly, quarterly
or annually

Data integration involves combining data from various sources providing users with a unified view

For example:

21
● Stock control
● Police records
● Health records
● Employee data

DATABASE PROTECTION ACT

The Data Protection Act controls how your personal information is used by organisations, businesses or the
government

Everyone responsible for using data has to follow strict rules so that data is:

● Used fairly and lawfully used for limited, specifically stated purposes only
● Used in a way that is adequate, relevant and not excessive
● Not kept for no longer than is absolutely necessary
● Handled according to people's data protection rights
● Kept accurate, safe and secure

DATA MATCHING AND DATA MINING

Data is collected and kept on a large scale by various companies in the world

Data mining is the set of techniques used to look for patterns which would otherwise go undetected

Enables organisations to create successful and effective sales campaigns

● Targeted marketing plans

Security agencies to analyse calling patterns and hence detect terrorist activities

22

You might also like