0% found this document useful (0 votes)
53 views

Data Warehouse: Bilal Hussain

The document provides an overview of key concepts for designing and optimizing a data warehouse including dimensional modeling, ETL processes, indexing, partitioning, parallelism, compression and query optimization techniques. It outlines a course plan covering these topics and provides examples of how to implement indexing, partitioning, parallelism and compression to improve query performance and reduce the physical storage requirements in a data warehouse.

Uploaded by

Daneil Radcliffe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Data Warehouse: Bilal Hussain

The document provides an overview of key concepts for designing and optimizing a data warehouse including dimensional modeling, ETL processes, indexing, partitioning, parallelism, compression and query optimization techniques. It outlines a course plan covering these topics and provides examples of how to implement indexing, partitioning, parallelism and compression to improve query performance and reduce the physical storage requirements in a data warehouse.

Uploaded by

Daneil Radcliffe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

Data Warehouse

Bilal Hussain
• Course Outlines:
1. Introduction & Background.
2. De-Normalization.
3. OLAP & Dimensional Modeling.
4. ETL and Data Quality Management (DQM).
5. Database Performance (Parallelism, Partitioning).
6. ETL Implementation using ODI.
7. Data Visualization using OBIEE.
8. Project (Design Data warehouse for any organization using any
ETL and BI Tool).
Course
Week #
plan: Assignment # Quiz No
1
2
3 Assign #1 Quiz # 1
4
5
6 Assign #2 Quiz # 2
7
8
9 Mid-Term
10-23-12-2021 Assign #3 Quiz # 3
11-30-12-2021
12-06-01-2022 Assign #4 Quiz # 4
13-13-01-2022
14-20-01-2022
15-27-01-2022
16-03-02-2022 Final Exam
How to improve response time in DWH.
• Indexes
• Partitioning
• Parallelism
• Compression
• Minimize bottleneck.
Types of Queries
• Point Query.
• Select count(*) from emp where empno=1;
• Full Table Scan.
• Select count(*) from emp;
• Range.
• Select count(*) from emp where hiredate between firstdate and seconddate;
What is Index
• An index is a database structure/segment that provides quick lookup
of data in a column or columns of a table.
Where Index can be used.
• How many customers I have in Islamabad.
• What is total sale amount in Jan.
• Total Students in MS-CS.
• I/O Bottleneck.
Types of Indexes.
• B-Tree
• Bitmap
• Function Based Index.
• Partitioned Index.
• Clustered Index.
• Index organized Tables.
What is Table Partitioning?
• Partitioning enables tables and indexes to be subdivided into individual
smaller pieces. Each piece of the database object is called a partition. A
partition has its own name, and may optionally have its own storage
characteristics. From the perspective of a database administrator, a
partitioned object has multiple pieces that can be managed either
collectively or individually. This gives the administrator considerable
flexibility in managing a partitioned object. However, from the perspective
of the application, a partitioned table is identical to a non-partitioned table;
no modifications are necessary when accessing a partitioned table using
SQL DML commands. Logically, it is still only one table and any application
can access this one table as they do for a non-partitioned table.
Types of Partitioning
• List
• Range
• Hash
Partition Pruning.
• Partition pruning is an essential performance feature for Data
warehouse. In partition pruning, the optimizer analyzes from and
where clauses in SQL statements to eliminate unneeded partitions.
Example
• CREATE TABLE Sales_part
• ( "PRODKEY" NUMBER(5,0), "PERIODKEY" NUMBER(10,0),
• "INVNBR" NUMBER(10,0), "CUSTKEY" NUMBER(5,0),
• "DWACOSTEXTND" FLOAT(126),"REPCOSTEXTND" FLOAT(126),
• "ACTLEXTND" FLOAT(126), "UNITSHPD" NUMBER(10,0),
• "UNITORDD" NUMBER(10,0), "NETWGHTSHPD" FLOAT(126),
• "CMDOLRS" FLOAT(126), "NULL_FIELD" NUMBER(10,0)
• )
• partition by range (prodkey)
•(
• partition p01 values less than (1094),
• partition p02 values less than (9999)
• );
• Insert into sales_part select * from sales;commit;
What is Parallelism?
• Parallelism is the idea of breaking down a task so that, instead of one
process doing all of the work in a query, many processes do part of
the work at the same time. An example of this is when 12 processes
handle 12 different months in a year instead of one process handling
all 12 months by itself. The improvement in performance can be quite
high.
Parallelism Advantages.
Parallel execution improves processing for
• Large Table scans and joins.
• Creation of large indexes.
• Partitioned index scans.
• Bulk inserts, updates, and deletes.
• Aggregations and copying.
Query Example:

• set autotrace on;


• select /*+ PARALLEL(5) */
• count(*)
• from sales_compressed s
• inner join d1_products p
• on s.prodkey=p.productkey
• where suppliercode=2300;
What is Compression?
• Database compression is a set of techniques that reorganizes
database content to save on physical storage space and improve
performance.

• 111000000111110011111
• 13#06#15#02#15
Example:
• CREATE TABLE Sales_Compressed
• ( "PRODKEY" NUMBER(5,0), "PERIODKEY" NUMBER(10,0),
• "INVNBR" NUMBER(10,0), "CUSTKEY" NUMBER(5,0),
• "DWACOSTEXTND" FLOAT(126),"REPCOSTEXTND" FLOAT(126),
• "ACTLEXTND" FLOAT(126), "UNITSHPD" NUMBER(10,0),
• "UNITORDD" NUMBER(10,0), "NETWGHTSHPD" FLOAT(126),
• "CMDOLRS" FLOAT(126), "NULL_FIELD" NUMBER(10,0)
• )
• COMPRESS for oltp;
Space and Query Speed Comparison.
• set autotrace on;
• select count(*)
• from sales s
• inner join d1_products p
• on s.prodkey=p.productkey
• where suppliercode=2300;
• 66sec – 216
• Set autotrace on;
• select count(*)
• from sales_compressed s
• inner join d1_products p
• on s.prodkey=p.productkey
• where suppliercode=2300;
• 9sec –168 – 13%
End

You might also like