The Incubator Hub - SQL for Data Analysis (2)
The Incubator Hub - SQL for Data Analysis (2)
Femi Ayodele
Data Analyst || BI Developer || Analytics Engineer
1st
Week of SQL for Data Analysis
➢ Introduction to SQL
➢ What is the Databases
➢ What are the advantage of Databases
➢ How to Database Store Data
➢ Why SQL (Why Data Analyst use SQL)
➢ Types of Databases (Microsoft Access, SQL Server, MySQL, PostgreSQL,
Oracle, etc)
SQL Commands
➢ Introduction to SQL Command
➢ Types of SQL Commands : DDL, DML, DCL, DQL, and TCL
Writing SQL Queries
➢ Introduction to SQL Queries ( Create first Database)
➢ SQL - Data Types
➢ SQL - Create table
➢ SQL - Keys: Unique keys, Primary Keys, Foreign Keys.
SQL for DataAnalysis
➢ SQL - Insert into Table, select Table Course breakdown
➢ SQL - Drop, Delete, Truncate, Rename
➢ SQL – Alter tables, Update tables, Drop tables, Delete tables, Truncate
tables
TREY
research
2
2nd Week of SQL for Data Analysis
Structure Queries Language (SLQ) Cont’. (SQL Operators and Clauses)
SQL Aggregation Part 1
➢ SQL – Clause: where, order by, Group by, Having, Top.
➢ SQL – Operators: Like, AND, OR, Between, Not Between, IN, NOT IN.
➢ SQL - UNION Vs UNION ALL, INTERSECT.
➢ SQL- Wild Card
SQL Joins
➢ Introduction to Join
➢ Type of Join 1: Inner Join
➢ Type of Join 2: Outer Join
➢ Type of Join 3: Right Join
➢ Type of Join 4: Left Join
➢ Type of Join 5: Full join
SQL for DataAnalysis
Course breakdown
TREY
research
3
3rd Week of SQL for Data Analysis
SQL Aggregation Part 2
➢ Introduction to SQL Aggregation
➢ Introduction to Null
➢ Aggregation: COUNT, SUM, AVERAGE, MIN, MAX etc
SQL Views
➢ Introduction to SQL Views
➢ SQL – Create views
➢ SQL - update views
➢ SQL – Drops
➢ SQL – Rename views
TREY
research
4
Introduction to SQL
• SQL stands for Structured Query Language.
• It is used for storing and managing data in Relational Database
Management System (RDBMS).
• It is a standard language for Relational Database System. It enables
a user to relate databases and tables.
• All the RDBMS like MySQL, PostgreSQL, Oracle, MS Access, and
SQL Server use SQL as their standard database language.
• SQL allows users to query the database in a number of ways, using
English-like statements
A database is an organized collection of data that is stored and managed in a structured way to allow for easy access,
retrieval, and manipulation.
Advantage of Databases:
➢ Data Integrity: Ensures accuracy and consistency of data.
➢ Security: Protects data from unauthorized access and breaches.
➢ Backup and Recovery: Allows data to be recovered in case of loss or corruption.
➢ Concurrency: Supports multiple users accessing the database simultaneously.
➢ Scalability: Ability to handle increasing amounts of data and users without performance degradation.
➢ Efficient Data Management: Databases are optimized for quick access to large volumes of data, enabling fast retrieval
of information through queries.
TREY
research
6
How Databases Store Data
Databases store data in a structured format using tables, which are composed of rows and columns. Each table
represents a specific type of data, and each row (or record) in the table represents a single entry, such as a customer or
transaction. The columns (or fields) define the attributes of the data, such as a customer’s name, age, or account
number.
✓ If you have used Excel, you should already be familiar with tables
✓ Tables have rows and columns just like Excel.
✓ Database tables for instance are organized by column
✓ Each column must have a unique name,
✓ You will notice, that some columns contain numbers, while other contain texts. In a spreadsheet, each cell can have
its own data types.
✓ But in databases tables, all the data in a column must be of the same type.
✓ This makes performing analysis on database tables pretty simples, while the data type must be consistent.
TREY
research
7
How Databases Store Data
A few Key points about stored in SQL Databases:
1. Data in Databases is stored in tables that can be thought of just like Excel spreadsheets.
2. All the data in the same column must match in terms of data type.
3. Consistent columns types are one of the main reasons working with databases is fast.: Often databases hold a LOT
of data. So, knowing that the columns are all of the same type of data means that obtaining data from a database
can still be fast
Why SQL
There are some major advantages to using traditional relational databases, which we interact with using SQL.
The five most apparent are:
TREY
research
8
Basic SQL Command
SQL Commands
✓ SQL commands are instructions. It is used to communicate with the database.
✓ It is also used to perform specific tasks, functions, and queries of data.
✓ SQL can perform various tasks like create a table, add data to tables, drop the table, modify the table, set permission
for users.
TREY
research
9
TREY
research
10
Data Definition Language (DDL)
DDL changes the structure of the table like creating a table, deleting a table, altering a table, etc.
All the command of DDL are auto-committed that means it permanently save all the changes in the database
• Create: It is used to create a new table in • Here are some commands that
the database. come under DDL:
• ➢ CREATE
• Drop: It is used to delete both the
structure and record stored in the table. • ➢ ALTER
• ➢ DROP
• Alter: It is used to alter the structure of • ➢ TRUNCATE
the database. This change could be
either to modify the characteristics of an
existing attribute or probably to add a
new attribute.
Insert: The INSERT statement is a SQL query. It is used to insert data into the row of a
table.
Update: This command is used to update or modify the value of a column in the table.
TREY
research
12
Data Control Language (DCL)
DCL commands are used to GRANT and TAKE BACK authority from any database user.
Here are some commands that come under DCL:
➢Grant
➢Revoke
Grant: It is used to give user access privileges to a database.
TREY
research
13
Transaction Control Language (TCL)
TCL commands are used to manage transactions in the database. These are used to manage the changes made DML
Statement (INSERT, DELETE and UPDATE only). It also allows statements to be grouped into logical transactions
Here are some commands that come under TCL:
➢ COMMIT
➢ ROLLBACK
➢ SAVEPOINT
COMMIT .
Commit command is used to permanently save any transaction.
Commit command is used to save all the transactions to the database.
Example:
DELETE FROM CUSTOMERS WHERE AGE = 25
COMMIT;
Rollback:
This command restores the database to last committed state
Rollback command is used to undo transactions that have not already been saved to the
database.
Example: DELETE FROM CUSTOMERS WHERE AGE = 25;
ROLLBACK;
Savepoint: It is used to roll the transaction back to a certain point without rolling back the entire transaction.
Savepoint command is used to temporarily save a transaction so that you can rollback to that point whenever necessary.
TREY
research
14
SQL - Data Types
✓ SQL Datatype is used to define the values that a column can contain.
✓ Every column is required to have a name and data type in the database table.
SQL Datatype
Numeric Datatype
String Datatype Date Datatype
Binary Datatype E.g:
Bit, tinyint, smallint, int, bigint, decimal,
E.g: E.g:
E.g: Char, Varchar, nchar, Varbinary,
Yes/No, True/False money, float, real etc Date, Datetime, Timestamp.
nvarchar
✓ String Datatype allow us to store a group of characters, enclosed in single quotes, in a record of a table column. These characters can
be; numerals, letters, symbols etc.
✓ Numeric Datatype: are one of the most widely used data type in SQL. They are used to store numeric values only
✓ Date Datatype are used in SQL for values to be stored in date.
TREY
research
15
SQL - Keys
In SQL, Keys are special fields in a table that help:
✓ Create relationships between tables,
✓ Maintain uniqueness
✓ Ensure data is consistent and valid.
Database Keys
Composite Key
Alternate Key
Primary Key
TREY
research
16
Primary Key:
A special type of key that uniquely identifies each record in a table. Each table can have only one primary key.
Example: Employee_id in the Employee table.
Foreign Key:
A field in one table that uniquely identifies a row of another table, creating a relationship between the two tables.
Example: Employee_id in the Salary table is a foreign key (FK) that references the Employee_id in the Employee table
(PK)
Surrogate Key:
A surrogate key is a unique identifier for each record in a table, typically created by the database itself (e.g an auto-
incrementing integer)
TREY
research
17
Candidate Key:
Candidate key is a key of a table which can be selected as primary key. A table can have multiple candidate keys, out of
which one can be selected as a primary key.
Example: Employee_id, License_Number, and Passport_Number
Alternate key:
Alternate key is a candidate key, currently not selected as a primary key of the table
Example: License_Number and Passport_Number
TREY
research
18
TREY
research
19
Practical Exercise
TREY
research
20
SQL Operators
An SQL operator is a reserved words or a character used primarily in an SQL statement's WHERE clause to perform
operation(s), such as comparisons and arithmetic operations. These Operators are used to specify conditions in an SQL
statement and to serve as conjunctions for multiple conditions in a statement
SQL Operator
SQL clauses are essential components of SQL (Structured Query Language) that define how queries interact with the
database. They are used to specify conditions, modify data, and control how results are returned. Here’s an overview of
some of the most common SQL clauses
SQL Clause
TREY
research
22
Group BY Clause
SQL GROUP BY statement is used to arrange identical data into groups.
• The GROUP BY statement is used with the SQL SELECT statement.
• The GROUP BY statement follows the WHERE clause in a SELECT statement and precedes the ORDER BY clause.
• The GROUP BY statement is used with aggregation function
Having Clause
• HAVING clause is used to specify a search condition for a group or an aggregate.
• Having is used in a GROUP BY clause. If you are not using GROUP BY clause then you can use HAVING function like a
WHERE clause
Order BY
• The ORDER BY clause sorts the result-set in ascending or descending order.
• It sorts the records in ascending order by default. DESC keyword is used to sort the records in descending order.
TREY
research
23
SQL Aggregate Functions
SQL aggregate functions are powerful tools used to perform calculations on a set of values, returning a single value that
summarizes the data. They are commonly used in conjunction with the GROUP BY clause to group the data by one or more
columns before applying the aggregate function
SQL Aggregation
Function
TREY
research
24
1. COUNT()Purpose:
Counts the number of rows in a result set or the number of non-null values in a column.
Example: SELECT COUNT(*) FROM employees;
SELECT COUNT(department) FROM employees;
Use: Useful for finding the total number of records or counting non-null values in a column.
2. SUM()Purpose:
Calculates the total sum of a numeric column.
Example: SELECT SUM(salary) FROM employees;
Use: Adds up all the values in a numeric column.
3. AVG()Purpose:
Calculates the average value of a numeric column.
Example: SELECT AVG(salary) FROM employees;
Use: Returns the mean of the values in a numeric column.
4. MIN()Purpose:
Returns the smallest (minimum) value in a column.
Example: SELECT MIN(salary) FROM employees;
Use: Finds the lowest value in a column.
5. MAX()Purpose:
Returns the largest (maximum) value in a column.
Example: SELECT MAX(salary) FROM employees;
Use: Finds the highest value in a column.
TREY
research
25
SQL Joins
SQL JOIN means "to combine two or more tables". In SQL, JOIN clause is used to combine the records from two or more
tables in a database.
SQL Joins
TREY
research
26
INNER JOIN
In SQL, INNER JOIN selects records that have matching values in both tables as long as the condition is satisfied. It returns the combination of all rows from
both the tables where the condition satisfies.
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
INNER JOIN
PROJECT ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID
LEFT JOIN
The SQL left join returns all the values from left table and the matching values from the right table. If there is no matching join value, it
will return NULL.
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
LEFT JOIN
PROJECT ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID
RIGHT JOIN
In SQL, RIGHT JOIN returns all the values from the values from the rows of right table and the matched values from the left table. If
there is no matching in both tables, it will return NULL.
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
RIGHT JOIN
PROJECT ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID
FULL JOIN
In SQL, FULL JOIN is the result of a combination of both left and right outerjoin. Join tables have all the records from both tables. It puts
NULL on the place of matches not found.
SELECT EMPLOYEE.EMP_NAME, PROJECT.DEPARTMENT FROM EMPLOYEE
FULL JOIN
PROJECT ON PROJECT.EMP_ID = EMPLOYEE.EMP_ID
TREY
research
27
SQL Set Operation
SQL set operations allow you to combine the results of two or more SELECT queries. These operations treat the result sets
of each query as mathematical sets, enabling you to perform set operations like union, intersection, and difference on them.
SQL provides the following set operations:
TREY
research
28
Union Operation
• The SQL Union operation is used to combine the result of two or more SQL SELECT queries.
• In the union operation, all the number of datatype and columns must be same in both the tables on which UNION operation is being
applied.
• The union operation eliminates the duplicate rows from its result set.
SELECT column1, column2 FROM table1
UNION
SELECT column1, column2 FROM table2;
Union All
Combines the results of two or more SELECT queries but does not remove duplicates.
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2;
Intersect Operation
• It is used to combine two SELECT statements. The Intersect operation returns the common rows from the SELECT statements.
• In the Intersect operation, the number of datatype and columns must be the same.
• It has no duplicates, and it arranges the data in ascending order by default.
SELECT name FROM employees
INTERSECT
SELECT name FROM clients;
TREY
research
29
View in SQL
✓ An SQL View is a virtual table that is created based on the result set of a SQL query. Unlike a regular table, a view does not
store data itself; instead, it dynamically retrieves data from one or more underlying tables whenever the view is queried.
✓ A view behaves like a table in SQL, allowing you to select, update, insert, and delete data (with some limitations).
✓ The data in a view is not stored physically; it is generated dynamically when the view is accessed.
✓ A view is defined using a SELECT statement that can join multiple tables, filter rows, and select specific columns.
Security
✓ Views can be used to restrict access to certain data in a table by exposing only specific columns or rows to the user.
✓ For example, you can create a view that only shows certain fields of a sensitive table, hiding the rest from the user.
Simplicity
✓ Views simplify complex queries. Instead of writing a complex query repeatedly, you can create a view and use it as a
simple table.
TREY
research
30
SQL CASE WHEN STATEMENT
The CASE WHEN statement in SQL is a conditional expression that allows you to create different outputs based on certain
conditions. It is similar to the IF-THEN-ELSE logic in programming languages.
This statement is often used to implement logic directly within SQL queries, making them more dynamic and adaptable to
different scenarios.
SELECT
product_name,
CASE category_id
WHEN 1 THEN 'Electronics'
WHEN 2 THEN 'Furniture'
WHEN 3 THEN 'Clothing'
ELSE 'Other'
END AS category_name
FROM products;
In this example, the CASE statement checks the category_id of each product and returns the corresponding category
name.
If category_id is 1, it returns 'Electronics’;
if 2, it returns 'Furniture’;
and so on.
TREY
research
31
Key Points:
✓ Flexibility:
CASE can be used in SELECT, UPDATE, INSERT, and ORDER BY clauses, making it very versatile..
✓ Readability:
CASE statements can make your SQL queries more readable by replacing nested IF statements.
✓ Efficiency:
Even though CASE adds some complexity to your queries, it often improves efficiency by reducing the need for multiple
queries or conditional logic in your application code.
TREY
research
32
ThankYou
Femi Ayodele
+2349135067747
[email protected]
(26) Femi Ayodele (MCT) | LinkedIn
TREY
research
33