Data Warehouse: Bilal Hussain
Data Warehouse: Bilal Hussain
Bilal Hussain
• Course Outlines:
1. Introduction & Background.
2. De-Normalization.
3. OLAP & Dimensional Modeling.
4. ETL and Data Quality Management (DQM).
5. Database Performance (Parallelism, Partitioning).
6. ETL Implementation using ODI.
7. Data Visualization using OBIEE.
8. Project (Design Data warehouse for any organization using any
ETL and BI Tool).
Course
Week #
plan: Assignment # Quiz No
1
2
3 Assign #1 Quiz # 1
4
5
6 Assign #2 Quiz # 2
7
8
9 Mid-Term
10-23-12-2021 Assign #3 Quiz # 3
11-30-12-2021
12-06-01-2022 Assign #4 Quiz # 4
13-13-01-2022
14-20-01-2022
15-27-01-2022
16-03-02-2022 Final Exam
How to improve response time in DWH.
• Indexes
• Partitioning
• Parallelism
• Compression
• Minimize bottleneck.
Types of Queries
• Point Query.
• Select count(*) from emp where empno=1;
• Full Table Scan.
• Select count(*) from emp;
• Range.
• Select count(*) from emp where hiredate between firstdate and seconddate;
What is Index
• An index is a database structure/segment that provides quick lookup
of data in a column or columns of a table.
Where Index can be used.
• How many customers I have in Islamabad.
• What is total sale amount in Jan.
• Total Students in MS-CS.
• I/O Bottleneck.
Types of Indexes.
• B-Tree
• Bitmap
• Function Based Index.
• Partitioned Index.
• Clustered Index.
• Index organized Tables.
What is Table Partitioning?
• Partitioning enables tables and indexes to be subdivided into individual
smaller pieces. Each piece of the database object is called a partition. A
partition has its own name, and may optionally have its own storage
characteristics. From the perspective of a database administrator, a
partitioned object has multiple pieces that can be managed either
collectively or individually. This gives the administrator considerable
flexibility in managing a partitioned object. However, from the perspective
of the application, a partitioned table is identical to a non-partitioned table;
no modifications are necessary when accessing a partitioned table using
SQL DML commands. Logically, it is still only one table and any application
can access this one table as they do for a non-partitioned table.
Types of Partitioning
• List
• Range
• Hash
Partition Pruning.
• Partition pruning is an essential performance feature for Data
warehouse. In partition pruning, the optimizer analyzes from and
where clauses in SQL statements to eliminate unneeded partitions.
Example
• CREATE TABLE Sales_part
• ( "PRODKEY" NUMBER(5,0), "PERIODKEY" NUMBER(10,0),
• "INVNBR" NUMBER(10,0), "CUSTKEY" NUMBER(5,0),
• "DWACOSTEXTND" FLOAT(126),"REPCOSTEXTND" FLOAT(126),
• "ACTLEXTND" FLOAT(126), "UNITSHPD" NUMBER(10,0),
• "UNITORDD" NUMBER(10,0), "NETWGHTSHPD" FLOAT(126),
• "CMDOLRS" FLOAT(126), "NULL_FIELD" NUMBER(10,0)
• )
• partition by range (prodkey)
•(
• partition p01 values less than (1094),
• partition p02 values less than (9999)
• );
• Insert into sales_part select * from sales;commit;
What is Parallelism?
• Parallelism is the idea of breaking down a task so that, instead of one
process doing all of the work in a query, many processes do part of
the work at the same time. An example of this is when 12 processes
handle 12 different months in a year instead of one process handling
all 12 months by itself. The improvement in performance can be quite
high.
Parallelism Advantages.
Parallel execution improves processing for
• Large Table scans and joins.
• Creation of large indexes.
• Partitioned index scans.
• Bulk inserts, updates, and deletes.
• Aggregations and copying.
Query Example:
• 111000000111110011111
• 13#06#15#02#15
Example:
• CREATE TABLE Sales_Compressed
• ( "PRODKEY" NUMBER(5,0), "PERIODKEY" NUMBER(10,0),
• "INVNBR" NUMBER(10,0), "CUSTKEY" NUMBER(5,0),
• "DWACOSTEXTND" FLOAT(126),"REPCOSTEXTND" FLOAT(126),
• "ACTLEXTND" FLOAT(126), "UNITSHPD" NUMBER(10,0),
• "UNITORDD" NUMBER(10,0), "NETWGHTSHPD" FLOAT(126),
• "CMDOLRS" FLOAT(126), "NULL_FIELD" NUMBER(10,0)
• )
• COMPRESS for oltp;
Space and Query Speed Comparison.
• set autotrace on;
• select count(*)
• from sales s
• inner join d1_products p
• on s.prodkey=p.productkey
• where suppliercode=2300;
• 66sec – 216
• Set autotrace on;
• select count(*)
• from sales_compressed s
• inner join d1_products p
• on s.prodkey=p.productkey
• where suppliercode=2300;
• 9sec –168 – 13%
End