Nipuna DWH
Nipuna DWH
Data warehouse is relational database used for query analysis and reporting. By definition data
warehouse is subject-oriented, integrated, non-volatile, time variant.
Integrated: Data collected from multiple sources integrated into a user readable unique format.
Data warehouse is maintaining the total organization of data. Multiple data marts used in data
warehouse. Where as data mart is maintained only particular subject.
OLTP is online transaction processing.This maintains current transactional data. That means insert,
update and delete must be fast.
5) Explain ODS?
Operational data store is a part of data warehouse. This is maintained only current transactional
data.ODS is subject oriented, integrated, volatile, current data.
Power center receive all product functionality including ability to multiple register servers and metadata
across the repository and partition data. One repository multiple informatica servers. Power mart
received all features expect multiple register servers and partition data.
Staging area is a temporary storage area used for transactions integrated and rather than transaction
processing when ever your data put in data warehouse you need to clean and process your data.
Surrogate key is a series of sequential numbers assigned to a primary key for the table.
---Star Schema consists of one or more fact table and one or more dimension table that are related
to foreign keys.
---Dimension tables are De-normalized, fact table-normalized
Simplify queries.
The dimension data has been grouped into one large table. Both dimension and fact tables
normalized.
If both data marts use same type of dimension that is called confirm dimension. If you have
same type of dimension can be used in multiple fact that is called confirm dimension.
Slowly growing dimensions are dimensional data, there dimensions increasing dimension data
with out update existing dimensions. That means appending new data to existing dimensions.
Slowly changing dimension are dimension data, these dimensions increasing dimensions data
with update existing dimensions.
Type1: Rows containing changes to existing dimensional are update in the target by overwriting
the existing dimension. In the Type1 Dimension mapping, all rows contain current dimension
data.
Use the type1 dimension mapping to update a slowly changing dimension table when you do
not need to keep any previous versions of dimensions in the table.
Type2: The Type2 Dimension data mapping inserts both new and changed dimensions into the
target. Changes are tracked in the target table by versioning the primary key and creating a
version number for each dimension in the table.
Use the Type2 Dimension/version data mapping to update a slowly changing dimension when
you want to keep a full history of dimension data in the table. version numbers and versioned
primary keys track the order of changes to each dimension.
Type3: The type 3 dimension mapping filters source rows based on user-defined comparisons
and inserts only those found to be new dimensions to the target. Rows containing changes to
existing dimensions are updated in the target. When updating an existing dimension the
informatica server saves existing data in different columns of the same row and replaces the
existing data with the updates.
Your target table is also look up table then you go for dynamic cache .In dynamic cache
multiple matches return an error. use only = operator.
Override the default SQL statement. You can join multiple sources use lookup override. By
default informatica server add the order by clause.
You specify the target load order based on source qualifiers in a mapping. if u have the multiple
source qualifiers connected to the multiple targets you can designate the order in which
informatica server loads data into the targets.
23) What are the difference between joiner transformation and source qualifier
transformation?
You can join heterogeneous data sources in joiner transformation, which we cannot achieve in
source qualifier transformation.
You need matching keys to join two relational sources in source qualifier transformation.
Where you doesn’t need matching keys to join two sources.
Two relational sources should come from same data source in source qualifier. You can join
relational sources, which are coming from different sources in source qualifier. You can join
relational sources which are coming from different sources also.
Whenever you create the target table whether you are store the historical data or current
transaction data in to target table.
Data driven.
The information server follows instructions coded into update strategy transformations with in the
session mapping determine how to flag records for insert,update,delete or reject if u do not choose
data driven option setting , the informatica server ignores all update strategy transformations in the
mapping.
28) what are the options in the target session of update strategy transformation?
Insert
Delete
Update
Update as update
Update as insert
Truncate table.
Source filter is filtering the data only relational sources. Where as filter transformation filter the
data any type of source.
-- can you connect multiple ports from one group to multiple transformations?
Yes
31) Can you connect more than one group to the same target or transformation?
NO
Two methods
2) Promote a standard transformation from the mapping designer. After you add a
transformation to the mapping, you can promote it to status of reusable transformation.
Once you promote a standard transformation to reusable status, you can demote it to a
standard transformation at any time.
Mapping parameter represents a constant value that you can define before running a
session. A mapping parameter retains the same value throughout the entire session.
When you use the mapping parameter, you declare and use the parameter in a mapping or
mapplet.Then defines the value of parameter in a parameter file for the session.
Unlike a mapping parameter, a mapping variable represents a value that can change
through out the session. The informatica server save the value of mapping variable to the
repository at the end of session run and uses that value next time you run the session.
34) can you use the mapping parameters or variables created in one mapping into
another mapping?
NO, we can use mapping parameters or variables in any transformation of the same
mapping or mapplet in which have crated mapping parameters or variables.
35) Can you are the mapping parameters or variables created in one mapping into any
other result transformation.
Yes because the reusable transformation is not contained with any mapplet or mapping.
36) How the informatica server sorts the string values in rank transformation?
When the informatica server runs in the ASCII data movement mode it sorts session data
using binary sort order.If you configures the session to use a binary sort order, the
informatica server calculates the binary value of each string and returns the specified
number of rows with the highest binary values for the string.
The designer automatically creates a RANKINDEX port for each Rank transformation.
The informatica server uses the Rank Index port to store the ranking position for each
record in a group. For example, if you create a Rank transformation that ranks the top 5
sales persons for each quarter, the rank index numbers the salespeople from 1 to 5.
Mapplet is a set of transformation that you build in the mapplet designer and you can use
in multiple mappings.
WORKFLOW MANAGER
The power center server moves data from source to targets based on a workflow and
mapping metadata stored in a repository.
A workflow is a set of instructions that describe how and when to run tasks related to
extracting, transformation and loading data.
-- What is session?
A session is a set of instructions that describes how to move data from source to target using a
mapping.
Use the work flow monitor work flows and stop the power center server.
The power center server uses both process memory and system shared memory to perform
these tasks.
Load manager process: stores and locks the workflow tasks and start the DTM run the
sessions.
Data Transformation Process DTM: Perform session validations, create threads to initialize
the session, read, write and transform data, and handle pre and post session operations.
Mapping thread.
Transformation thread.
Reader thread.
Writer thread.
1) Task developer.
3) Worklet designer.
You can sehedule a work flow to run continuously, repeat at given time or interval or you
manually start a work flow. By default the workflow runs on demand.
If the power center is executing a session task when you issue the stop the command the
power center stop reading data. If continuous processing and writing data and committing
data to targets.
If the power center can’t finish processing and committing data you issue the abort
command.
You can also abort a session by using the Abort () function in the mapping logic.
A worklet is an object that represents a set of taske.It can contain any task available in the
work flow manager. You can run worklets inside a workflow. You can also nest a worklet
in another worklet.The worklet manager does not provide a parameter file for worklets.
The power center server writes information about worklet execution in the workflow log.
A commit interval is the interval at which power center server commits data to targets
during a session. The commit interval the number of rows you want to use as a basis for
the commit point.
Target Based commit: The power center server commits data based on the number of
target rows and the key constraints on the target table. The commit point also depends on
the buffer block size and the commit interval.
You can use bulk loading to improve performance of a session that inserts a large amount
of data to a db2, sysbase, and oracle or MS SQL server database.
When bulk loading the power center server by passes the database log, which speeds
performance.
With out writing to the database log, however the target database can’t perform
rollback.As a result you may not be perform recovery.
When you select this option the power center server orders the target load on a row-by-
row basis only.
If session is configured constraint abased loading when target table receive rows from
different sources. The power center server revert the normal loading for those tables but
loads all other targets in the session using constraint based loading when possible loading
the primary key table first then the foreign key table.
Use the constraint based loading only when the session option treat rows as set to insert.
Constraint based load ordering functionality which allows developers to read the source
once and populate parent and child tables in a single process.
When using incremental aggregation you apply captured changes in the source to
aggregate calculations in a session. If the source changes only incrementally and you can
capture changes you can configure the session to process only those changes. This allows
the power center server to update your target incrementally rather than forcing it to
process the entire source and recalculate the same data each time you run the session.
You can capture new source data. Use incremental aggregation when you can capture
new source data much time you run the session. Use a stored procedure on filter
transformation only new data.
Incremental changes do not significantly change the target. Use incremental aggregation
when the changes do not significantly change the target. If processing the incrementally
changed source alters more than half the existing target, the session may not benefit from
using incremental aggregation. In this case drop the table and recreate the target with
complete source data.
The first time u runs an incremental aggregation session the power center server process
the entire source. At the end of the session the power center server stores aggregate data
from the session runs in two files, the index file and the data file .The power center server
creates the files in a local directory.
Transformations.
2 types:
1) Active
2) Passive.
Active transformation can change the number of rows that pass through it. No of output rows
less than or equal to no of input rows.
Passive transformation does not change the number of rows. Always no of output rows equal
to no of input rows.
55) Difference filter and router transformation.
Filter transformation to filter the data only one condition and drop the rows don’t meet the
condition.
Drop rows does not store any ware like session log file..
Router transformation to filter the data based on multiple conditions and give you the option
to route rows that don’t match to a default group.
Expression transformation calculates the single row values before writes the target.
Expression transformation executed by row-by-row basis only.
Aggregator transformation allows you to perform aggregate calculations like max, min,
avg…
The aggregate stores data in the aggregate cache until it completes aggregate calculations.
When u run a session that uses an aggregate transformation, the informatica server creates
index and data caches in memory is process the transformation. If the informatica server
requires more space it seores overview values in cache files.
Joiner transformation joins two related heterogeneous sources residing in different locations
or files.
Normal
Master outer
Detail outer
Full outer
61) Difference between connected and unconnected transformations.
Both input pipelines originate from the same source qualifier transformation.
63) What are the settings that u use to configure the joiner transformation?
Type of join
Look up transformation can be used in a table view based on condition by default lookup is
left outer join
Get a related value. For example if your table includes employee ID,but you want to include
such as gross sales per invoice or sales tax but not the calculated value(such as net sales)
Update slowly changing dimension tables. You can use a lookup transformation to determine
whether records already exist in the target.
Receives input values directly from Receives input values from the result of a clkp
the pipe line. expression in an another transformation.
Cache
Cache includes all lookup columns Cache includes all lookup/output ports in the
used in the mapping(that is lookup lookup condition and the lookup/return port.
table columns included in the lookup
condition and lookup table columns
linked as output ports to other
transformations)
Can return multiple columns from Designate one return port(R).Returns one
the same row or insert into the column from each row.
dynamic lookup cache.
If there is no match for the lookup If there is no matching for the lookup condition
condition, the informatica server the informatica server returns NULL
returns the default value for all
output ports. If u configure dynamic
caching the informatica server inserts
rows into the cache.
The informatica server stores conditions values in the index cache and output values in the
data cache.
Persistent cache: U can save the look up cache files and reuse them the next time the
informatica server processes a lookup transformation to use the cache.
Static cache: U can configure a static or read-only lookup table. By default informatica server
creates a static cache. It caches the lookup table and lookup values in the cache for each row
that comes into the transformation. When the lookup condition is true the inforamtica server
does not update the cache while it processes the lookup transformation.
Dynamic cache: If you want to cache the target table and insert new rows into cache and the
target you can create a look up transformation to use dynamic cache. The informatica server
dynamically inserts data into the target table.
Shared cache: You can share the lookup cache between multiple transformations. You can
share unnamed cache between transformation in the same mapping.
You cannot insert or update the You can insert rows into the cache as you
cache pass rows to the target
The informatica server returns a The informatica server inserts rows into the
value from the lookup table or cache when the condition is false. This
cache when the condition is true,. indicates that the row in the cache or target
When the condition is true the table. You can pass these rows to the target
informatica server returns the table.
default value for connected
transformation
ORACLE:
The set of redo log files for a database is collectively known as the databases redo log.
A database contains one or more rollback segments to temporarily store undo information. Roll back
segment are used to generate read consistent data base information during database recovery to rollback
uncommitted transactions for users.
A data base is divided into logical storage unit called table space. A table space is used to grouped
related logical structures together.
-- How to delete the duplicate records.
One of which rows that don’t match those in the common column of another table.
Select * from EMP e where 5> (select count (*) from EMP where Sal>e.sal)
80) --------------------------------
81)
Function returns a value. Procedure does not return a value (but returns value true IN OUT
parameters!!!!!!)
Select distinct (a. Sal) from EMP a where &n=select count (distinct (b.sal) from emp b where
a.sal<=b.sal
Select * from EMP where (rowed, 1) in (select rowed, mod (rownum, 2) from EMP)