Computer Science Study Notes SL - HL
Computer Science Study Notes SL - HL
TABLE OF CONTENTS
Networks – Topic 3 8
Computer networks 8
Intranet vs internet 9
Extranet 9
Open Systemes interconnection (OSI) 9
Data packets 9
Protocols 9
Hardware for Wireless 10
Algorithms – TOpic 4 10
Computer language 10
Fundamental operations of computers 10
Database Anatomy 12
General 12
Join (only inner join in syllabus) 13
Redundant data (orphan data) 13
Database Normalisation 13
1st Normal Form 13
2nd Normal Form 13
3rd Normal Form 14
Referential Integrity 14
DBMS 14
Relational DataBase Management System (RDBMS) 15
RDBMS vs DBMS 15
Database Administrators 15
Query Functions 16
Update Functions 16
Data Validation 16
Data Verification 16
Schema 16
Levels of Schema 16
Data Definition Language 17
Data Modelling 17
Redundant Data 18
Referential Integrity 18
Integrity Rules 19
End-Users 19
Database Recovery 19
Recovery 20
Recovery with Concurrent Transactions 20
Checkpoint 20
Database Integration 21
Database Protection Act 21
Data Matching and Data Mining 21
2
CONTEXT FOR PLANNING A SYSTEM – 1.1.1
A new system is created to replace a system that is inefficient, no longer suitable for its original purpose, redundant
or out-dated.
● Increase productivity
● Improve quality of output
● Minimise cost
Some legacy systems are important in organisations because its data cannot be converted to newer formats, or its
applications cannot be upgraded.
● Keep both systems, develop them to have the same functionality (high maintenance cost)
● Replace both systems with a new one (increased initial cost)
● Select best IT systems from each company and combine them (can be difficult for employees to work with
systems from another company)
● Choose one system and drop the other (policy problems)
Software incompatibility: when different software entities or systems cannot operate satisfactorily, cooperatively
or independently on the same computer, or on different computers linked by a computer network,
3
Locally hosted 🡪 appropriate for larger, complex systems
Remotely hosted 🡪 appropriate when there is no necessary hardware equipment in place, or when admin wishes to
outsource responsibilities for maintenance, support, backups, security, etc.
SaaS (Software-as-a-Service): allows software and data to be hosted and managed centrally on a remote data
centre, users pay to access service on a subscription basis, reside on the cloud and need a web browser and
broadband internet connection to be accessed.
Positives of SaaS
● Less expensive – low initial costs, few investments in installation, maintenance and upgrading, only have
to pay for subscription (cheaper in short-medium term)
● Supported by wide range of desktop, portable and mobile devices
● Fewer IT personnel required
Negatives of SaaS
Changeover: the process of putting the new system online and retiring the old one
Parallel changeover
● High risk since company plugs in new system and unplugs the old one at the same time
● Preferred when the system is not critical
● All users need to be trained before the switch takes place
Pilot
Phased
4
● Takes longer
● Installation is often done by department
Data migration: the transfer of data between different formats, storage types and computer systems
Problems include:
● Incompatibility
● Non-recognisable data structures
● Data may be lost or not transferred due to an incomplete data transfer or error
● Data can be misinterpreted, caused by different conventions of each country regarding date, time and
measurement units
TESTING
Functional testing tests individual commands, text input, menu functions, etc. confirms that they perform and
function correctly according to the design specifications
Data testing is when normal, abnormal and extreme data is put into the system
Alpha testing → before available to general public, internally tested in company in laboratory type environment
Beta testing → comments and suggestions of the users, feedback is used to fix defects and errors that were
missed, Users outside company are involved in testing
Dry-run testing → is conducted using pen-and-paper by programmer, mentally runs algorithm and examines
source code to decide what the output should be
Integration testing → all system is tested at once to verify that all components work together
Debugging → systematic process of finding and correcting the number of bugs (errors) in a computer program
Validation is the process of evaluation whether data input follows appropriate specifications and is within
reasonable limits
Verification is the process of ensuring that the data input is the same as the original source data, e.g. double entry
USER DOCUMENTATION
User documentation is the document that explains the working of the system to the user
5
Types:
● Manual → always available, helps when installing system, can be lost, may not be updated when system is
updated, no search facility
● Email support
● Embedded assistance or integrated user assistance 🡪 accessible at any time when using program, can
help solve some major errors, only used after system is installed, no help when installing system, not very
specific, lacks search capability
● FAQ
● Live chat session
● Online portals → extensive compared to help files, continuously revised by system developer, often have
live support, search capabilities, useless if user loses internet connection, live support ineffective with
users unfamiliar with computers
● Remote desk connection
Self-instruction
Formal classes
Remote/online/personal Training
DATA LOSS
6
● Regular backups
● Copy sensitive information on to another device
● Store in another building (remote)
● Making hard copies
● Printing/photocopying
● Antivirus protection
● Redundancy (backup copies of a file)
● Incremental backups (files you changed) or autosave
● Compression
● Firewalls
CPU COMPONENTS
CPU = brain of computer, controls peripheries and processes inside computer and carries out calculations in data
system.
Control Unit 🡪 controls operation of CPU, responsible for retrieval and execution of instructions, controls input and
output devices.
Registers – small very fast circuits that store intermediate values from calculations or instructions inside CPU.
MEMORY
Secondary – stores data after programs turned off, large memory, slower access
Cache memory is random access memory that a computer microprocessor can access more quickly than regular
primary memory (RAM).
OPERATING SYSTEMS
Peripheral communication
7
● Input and output systems
● Peripherals controlled by device drivers = software which allows HW to be used by OS (translator)
Memory management
Networking
Security
SOFTWARE APPLICATIONS
NETWORKS – TOPIC 3
COMPUTER NETWORKS
A computer network is a collection of computers and devices connected together via communication devices and
transmission media.
Server: computer system or a software application that provides a service to the other computer systems
connected to the same network.
Client: computer system or a software application that requests a service from a server connected to the same
network;
Transmission media
● WIFI
o Small range, inexpensive, decent reliability, poor security
● Metal conductor
o Work up to 100m, expensive, highly reliable, good security
● Fibre optic
8
o Faster (light signals), bandwidth is greater, more expensive, security is very good, large range
INTRANET VS INTERNET
Intranet = private space, may be accessible from the internet but protected by a password and accessible only to
employees or other authorised users
EXTRANET
Uses the internet to allow controlled access by specific users to a specific WAN or LAN.
Standard layers with specific responsibilities 🡪 possible to implement communication with interchangeable
modules
Layers of OSI:
DATA PACKETS
Data packet 🡪 data is broken up into segments and each segment is put into a packet
● A packet contains the data and the information that identifies the type of data, where it comes from and
where it is going
● Used in internet protocol transmissions for data that is transmitted
Packet switching is the transportation process in which data is broken into suitable-pieces or blocks for fast and
efficient transfer via different network devices/paths
PROTOCOLS
Protocols are a set of rules that control how communication happens in networked computers.
Handshaking 🡪 when two devices start communicating, they must agree on the transmission speed that will be
used.
Functions of protocols:
9
● Data integrity
● Flow control
● Congestion management
● Deadlock
● Error checking
● Speed of transmission
● Data compression
Wireless router – a router sends info between your network and the internet/controls the transfer of data in a
network
Network Interface Card (NIC) – Interface cards determine the infrastructure of a local area network (LAN); and
allow all of the computers to connect to the network;
MAC ADDRESS
Each (wireless network) adapter has a unique label called a MAC address;
Routers uses these addresses to identify/authenticate computers (routers include an option to whitelist or
blacklist certain devices based on MAC addresses, so access could be restricted to any device which is not in the
whitelist);
One disadvantage is that the whitelist should be amended any time a new device is purchased / when access to
guests should be granted;
Also this method is useless against hackers who use programs which intercept data passing through network and
report the MAC address of any device communicating on the network;
ALGORITHMS – TOPIC 4
COMPUTER LANGUAGE
Compiler 🡪 entire program, searching for syntax errors, stored in binary, very fast to execute, secure
Interpreter 🡪 line by line, machine depended, suited to teaching programming, least efficient
Add: accumulator + 1
Compare: A > B
10
Retrieve: Get A
ABSTRACTION
Abstraction allows us to create a general idea of what the problem is and how to solve it;
Abstraction removes all specific detail, and any patterns that will not help in solving a problem. This helps in
forming a “model” (If designers don’t abstract they may end up with the wrong solution to the problem they are
trying to solve);
Abstraction is widely used because there exist a number of “patterns” in programming that keeps repeating in
every application/program;
The pattern corresponding to an issue can be found, then the abstract solution to it can be found and
implemented, and the problem is solved;
Most programming languages provide some built-in abstract patterns, which are easy to use (some API provides
more advanced patterns);
DATA VS INFORMATION
When data is processed, organized, structured or presented in a given context so as to make it useful, it is called
information
INFORMATION SYSTEMS
● A computer-based system that supports databases required by different levels of users (stakeholders) and
management is called a management information system
● As the name implies it is a tool primarily for the management of any company and primarily it helps
management
11
DURABILITY
The durability property guarantees that, once a transaction completes successfully all the updates that it carried
out on the database persist, even if there is a system failure after the transaction completes execution
Ensuring durability is the responsibility of a component of the database system called the recovery-management
component
STATES OF TRANSACTION
Active: the starting state, the transaction stays in this state whilst the code for this state is being run
Partially committed: until the final line of the code for this state has finished running
Failed: when the normal running of the code has failed to take place
Aborted: due to a failure in the process the database is not updated but has been rolled back to its original state
prior to the start of this
Committed: after successful running of the full code and update of the correct records
DATABASE TRANSACTION
A transaction in the context of a database is a sequence of operations performed as a single logical unit of work
ACID
Atomicity
● A transaction is said to be atomic if a transaction always executes all its action in one go or not executes
any actions at all
● Either all or none of the steps are performed
Consistency
Isolation
Durability
12
● Once a transaction commits, the system must guarantee that the result of its operations will never be lost,
in spite of subsequent failures
WHAT IS A DATABASE
An organised collection of data generally stored and accessed electronically from a computer system
DATABASE ANATOMY
GENERAL
Primary key is a field which uniquely identifies the records of a given table
Secondary key is a field in a database which can be used for searching, it is called ‘indexed’
Foreign key is a primary key which has been imported into another table
Super key or composite key is one or more fields which collectively identify a record in a table
An entity is some unit of data that can be classified and has stated relationships to other data units.
In databases a join refers to the linking of two or more tables using primary keys which may/may not be
composite in nature
The inner join ensures that the string which denotes a field in one table matches that same key in another table
13
Rogue fields of data in a table which have no source or origin are referred to as orphan or redundant data
DATABASE NORMALISATION
● Usually involves dividing a database into two or more tables and defining relationships between the
tables
Advantages:
REFERENTIAL INTEGRITY
Referential integrity refers to the accuracy and consistency of data within a relationship. In relationships, data is
linked between two or more tables. This is achieved by having the foreign key (in the associated table) reference a
primary key value (in the primary – or parent – table)
14
DBMS
A database management system is a set of programs that allows to read, store, change/extract data in a database
● Allows user to manipulate data and create databases as per their requirements
● Provides data security and data protection
● Examples - MySQL, MS Access, Oracle
● Management functions and tools 🡪 focus on creation, manipulation and interrogation of a database
● Uses files rather than linked tables to hold data
● Persistent storage management
● Transaction management
● Resiliency: recovery from crashes
● Separation between logical and physical views of the data
● High level query and data manipulation language
● Efficient query processing
● Interface with programming languages
● A lock is a mechanism that tells the DMBS whether a particular data item is being used by any transaction
for read/write purposes 🡪 can’t be edited by multiple people at same time
A RDBMS stores data in tables that makes the relation between the data
Advantages of RDBMS
● Data is stored only once, and hence multiple record changes are not required
● Deletion and modification of data becomes simpler and storage efficiency is very high
● Complex queries can be carried out using the Structure Query Language (SQL) - ‘insert’, ‘update’, ‘delete’,
‘create’, ‘drop’
● Better security is offered by the creation of tables and customised level of data protection - users can set
access barriers to limit access to the available content
● Provision for future requirements as new data can easily be added and appended to existing tables and
made consistent with previously available content
Disadvantages of RDBMS
15
● Cost of execution - special software, setting up of data
● Security of data can be threatened during data migration
● Complex images, numbers and designs are not easy to categorise into tables - presents a problem
● Some fields have a character limit
● Isolated databases can be created - such large volumes of data are not easy to connect
RDBMS VS DBMS
DBMS stores data as files whereas RDBMS stores data in a tabular arrangement
RDBMS maintains a relation between the data stored in its tables, a normal DBMS blankly stores its data in files
with no link
DATABASE ADMINISTRATORS
Tasks/roles
QUERY FUNCTIONS
Query functions - searching data for required results and update functions
UPDATE FUNCTIONS
DATA VALIDATION
Validation 🡪 trying to reject bad data at the entry level rather than once it has entered the system
16
Data validation is the process of comparing data with a set of rules to find out if data is reasonable:
Validation is the process that tries to ensure that data is sensible, reasonable, complete and within acceptable
boundaries
DATA VERIFICATION
Data verification is the process of checking that the data entered exactly matches the original source to find out if
data is accurate
Methods include
SCHEMA
The term ‘schema’ refers to the organisation of data as a blueprint of how the database is constructed
LEVELS OF SCHEMA
Conceptual
● Structure of the whole database for the community of users. Hides information about the physical storage
structures and focuses on describing data types, entities, relationships, etc.
Logical
● Logical constraints that need to be applied on the data stored. It defines tables, views and integrity
constraints
Physical
● Pertains the actual storage of data and its form of storage like files, indices, etc. It defines how the data
will be stored in the secondary storage.
Mappings among schema levels are needed to transform requests and data
17
We can implement our data model by using a powerful database language such as SQL it is possible to create or
alter tables directly
To do this directly we need Data Definition Language (DDL) statements which are used to define the database
structure or schema
DATA MODELLING
Data model can be defined as an integrated collection of concepts for describing and manipulating data
relationships between data, and constraints on the data in an organisation
● Structural part → consisting of a set of rules according to which databases can be constructed
● Manipulative part → defining the types of operation that are allowed on the data
● Integrity rules → which ensures that the data is accurate
The purpose of a data model is to represent data and to make the data understandable
● Describe how data is stored in the computer, representing information such as record structures, record
ordering, index, B-tree, hashing and access paths
Record based data models → used to describe data at the conceptual level
● Used to specify the overall logical structure of the database and to provide a higher-level description of
the implementation
● Each record type defines a fixed number of fields, or attributes, and each field is usually of a fixed length
● Three record-based data models
o Hierarchical model
o Network model
o Relational model → most common in recent models
Object based data models → used to describe data at the conceptual level
● Entity Relationship (E-R) model has emerged as one of the main techniques for modelling database design
and forms the basis for the database design methodology
● Object oriented data model extends the definition of an entity to include, not only the attributes that
describe the state of the object but also the actions that are associated with the object, that is, its
behaviour. The object is said to encapsulate both state and behaviour
● Semantic systems model represents the equivalent of a record in a relational; system or an object in an
OO system but they do not include behaviour. They are abstractions ‘used to represent real world or
conceptual objects’
18
REDUNDANT DATA
Redundancy means having multiple copies of the same data in the database
● Insertion anomaly
o This problem occurs when the insertion of a data record is not possible without adding some
additional unrelated data to the record
● Deletion anomaly
o Occurs when deletion of a data record results in losing some unrelated information that was
stored as part of the record that was deleted
● Updation anomaly
o If updation does not occur at all places, then the database will be inconsistent
REFERENTIAL INTEGRITY
Ensures that the link between two tables is linking the correct fields and not drawing any incorrect data from
another table
The preservation of the integrity of a database system is concerned with the maintenance of the correctness and
consistency of the data
Integrity violations may arise from many different sources such as: → all result in data corruption
● Monitoring transactions
● Updating the database
● Detecting integrity violations
In the event of an integrity violation, the system then takes appropriate action, which should involve rejecting the
operation, reporting the violation, and if necessary returning the database to a consistent state
INTEGRITY RULES
19
o default value
● Entity integrity rules relate to the correctness of relationships among attributes of the same relation and
to the preservation of key uniqueness
● Requirement of entity integrity rules: all entries are unique and non-null entries in primary key
● Purpose of entity integrity rules: guarantees that each entity will have a unique primary key
● Referential integrity rules are concerned with maintaining the correctness and consistency of
relationships between relations
● Foreign key must have either a null entry or an entry that matches the primary key value in a table to
which it is related
END-USERS
The end-user is most likely to be using a given set of pre-defined queries on a regular basis, so that raw data can be
protected at all times from accidental damage
The interaction of an end-user with a database is pre-defined by their role in the company. Based on their role they
will have a given ‘view’ of the database
DATABASE RECOVERY
Issues relating to the cost of implementing such systems are weighed against the importance of the data
Crash Recovery
● The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware
and system software
● If it fails or crashes in the middle of transactions, it is expected that the system would follow some sort of
algorithms or techniques to recover lost data
● Transaction failure - a transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. Causes:
o Logical errors - when a transaction cannot complete because it has some code error or any
internal error condition
o System errors - where the database system itself terminates an active transaction because the
DBMS is not able to execute it, or it has to stop because of some system conditions
● System crash - there are problems that are external to the system that may cause the system to stop
abruptly and cause the system to crash
o E.g. interruptions to power supply, hardware failure, software failure
● Disk failure - in the early days of tech evolution it was a common problem where hard-disk drives or
storage drives used to fail frequently. Disk failures include:
o Formation of bad sectors
20
o Unreachability to the disk
o Disk head crash
o Any other failure which destroys all or part of disk storage
RECOVERY
● Check the states of all the transactions which were being executed
● A DBMS must ensure the atomicity of the transactions
● No transactions would be allowed to leave the DBMS in an inconsistent state
There are two types of techniques which can help a DBMS in recovering as well as maintaining the atomicity of a
transaction:
● Maintaining the logs of each transaction, and writing them onto some stable storage before actually
modifying the database
● Maintaining shadow paging where the changes are done on a volatile memory, and later, the actual
database is updated
When more than one transaction is being executed in parallel the logs are interleaved
● At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start
recovering
To ease this situation, most modern DBMS use the concept of ‘checkpoints’
CHECKPOINT
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available
● As time passes, the log file may grow too big to be handled at all
Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a
storage disk
Checkpoint declares a point before the DBMS was in a consistent state, and all the transactions were committed
DATABASE INTEGRATION
The integration could be in ‘real time’ or happen on a periodic basis: nightly, weekly, bi-weekly, monthly, quarterly
or annually
Data integration involves combining data from various sources providing users with a unified view
For example:
21
● Stock control
● Police records
● Health records
● Employee data
The Data Protection Act controls how your personal information is used by organisations, businesses or the
government
Everyone responsible for using data has to follow strict rules so that data is:
● Used fairly and lawfully used for limited, specifically stated purposes only
● Used in a way that is adequate, relevant and not excessive
● Not kept for no longer than is absolutely necessary
● Handled according to people's data protection rights
● Kept accurate, safe and secure
Data is collected and kept on a large scale by various companies in the world
Data mining is the set of techniques used to look for patterns which would otherwise go undetected
Security agencies to analyse calling patterns and hence detect terrorist activities
22