Ab Initio - V1.4

The document describes various components that can be used in dataflow graphs like sort, join, replicate, filter etc. It also discusses parallel processing techniques like component level parallelism, pipeline parallelism and data parallelism. Additionally, it covers topics like partition and departition components, multifile systems, sandboxes and deploying graphs.

Uploaded by

Praveen Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Ab Initio - V1.4

Uploaded by

Praveen Joshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 15

Sample Components

 Sort
 Dedup
 Join
 Replicate
 Rollup
 Filter by Expression
 Merge
 Lookup
 Reformat etc.
Creating Graph – Sort Component
 Sort: The sort component
reorders data. It
comprises two
parameters: Key and
Specify Key for
the Sort
max-core.
 Key: The Key is one of
the parameters for Sort
component which
describes the collation
order.
 Max-core: The max-core
parameter controls how
often the sort component
dumps data from
memory to disk.
Creating Graph – Dedup component
 Dedup component
removes duplicate
records.
 Dedup criteria will
be either unique-
only, First or Last.

Select Dedup criteria.

Creating Graph – Replicate Component
 Replicate
combines the
data records from
the inputs into
one flow and
writes a copy of
that flow to each
of its output ports.
 Use Replicate to
support
component
parallelism.
Creating Graph – Join Component

• Specify the key for join

• Specify Type of Join
Database Configuration (.dbc)

 A file with a .dbc extension which provides the GDE with the
information it needs to connect to a database. A
configuration file contains the following information:
– The name and version number of the database to which you want to
connect.
– The name of the computer on which the database instance or
server to which you want to connect runs, or on which the database
remote access software is installed.
– The name of the database instance, server, or provider to which you
want to connect.
– You generate a configuration file by using the Properties dialog box
for one of the Database components.
Creating Parallel Applications

 Types of Parallel Processing

– Component-level Parallelism: An application with multiple
components running simultaneously on separate data uses
component parallelism.
– Pipeline parallelism: An application with multiple components
running simultaneously on the same data uses pipeline parallelism.
– Data Parallelism: An application with data divided into segments
that operates on each segment simultaneously uses data
parallelism.
Partition Components
 Partition by Expression: Dividing data according to a DML expression.
 Partition by Key: Grouping data by a key.
 Partition with Load balance: Dynamic load balancing.
 Partition by Percentage: Distributing data, so the output is proportional
to fractions of 100.
 Partition by Range: Dividing data evenly among nodes, based on a key
and a set of partitioning ranges.
 Partition by Round-robin: Distributing data evenly, in blocksize chunks,
across the output partitions.
Departition Components
 Concatenate: Concatenate component produces a single output flow
that contains first all the records from the first input partition, then all
the records from the second input partition and so on.
 Gather: Gather component collects inputs from multiple partitions in an
arbitrary manner, and produces a single output flow, does not maintain
sort order.
 Interleave: Interleave component collects records from many sources
in round robin fashion.
 Merge: Merge component collects inputs from multiple sorted partitions
and maintains the sort order.
Multifile systems
 A multifile system is a specially created set of directories, possibly on
different machines, which have identical substructure.
 Each directory is a partition of the multifile system. When a multifile is
placed in a multifile system, its partitions are files within each of the
partitions of the multifile system.
 Multifile system leads to better performance than flat file systems
because multifile systems can divide your data among multiple disks or
CPUs.
 Typically (SMP machine is exception) a multifile system is created with
the control partition on one node and data partitions on other nodes to
distribute the work and improve performance.
 To do this use full internet URLs that specify file and directory names
and locations on remote machines.
Multifile
SANDBOX
 A sandbox is a collection of graphs and related files that
are stored in a single directory tree, and treated as a group
for purposes of version control, navigation, and migration.
 A sandbox can be a file system copy of a datastore project.

 In the graph, instead of specifying the entire path for any

file location ,we specify only the sandbox parameter
variable. For ex : $AI_IN_DATA/customer_info.dat. where
$AI_IN_DATA contains the entire path with reference to the
sandbox $AI_HOME variable.

 The actual in_data dir is $AI_HOME/in_data in sandbox

SANDBOX
 The sandbox provides an excellent mechanism to
maintain uniqueness while moving from
development to production environment by means
switch parameters.

 We can define parameters in sandbox those can

be used across all the graphs pertaining to that
sandbox.

 The topmost variable $PROJECT_DIR contains

the path of the home directory
SANDBOX
Deploying
 Every graph after validation and testing has to be deployed
as .ksh file into the run directory on UNIX.
 This .ksh file is an executable file which is the backbone for
the entire automation/wrapper process.
 The wrapper automation consists of .run, .env, dependency
list ,job list etc
 For a detailed description on wrapper and different
directories and files , Please refer the documentation on
wrapper / UNIX presentation.

Ab Initio Interview Questions - 1
80% (5)
Ab Initio Interview Questions - 1
19 pages
Abinitio Interview
100% (6)
Abinitio Interview
70 pages
Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
Foundation in Data Engineering Full Notes
No ratings yet
Foundation in Data Engineering Full Notes
39 pages
G330 Intelysis
No ratings yet
G330 Intelysis
330 pages
Abinitio Questions
No ratings yet
Abinitio Questions
2 pages
Sap Bods Tutorial
No ratings yet
Sap Bods Tutorial
2 pages
Abinitio Material
No ratings yet
Abinitio Material
11 pages
Abinitio Questions
0% (1)
Abinitio Questions
33 pages
Ab Initio Custom Component
100% (1)
Ab Initio Custom Component
33 pages
Learning Informatica PowerCenter 9.x
From Everand
Learning Informatica PowerCenter 9.x
Rahul Malewar
3/5 (4)
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
From Everand
Exploring Hadoop Ecosystem (Volume 2): Stream Processing
Wei Liu
No ratings yet
C# for Beginners: Learn in 24 Hours
From Everand
C# for Beginners: Learn in 24 Hours
Alex Nordeen
No ratings yet
CUSP Call Processing
No ratings yet
CUSP Call Processing
9 pages
ab initio study Ans
No ratings yet
ab initio study Ans
40 pages
1.ab Initio - Unix - DB - Concepts & Questions - !
No ratings yet
1.ab Initio - Unix - DB - Concepts & Questions - !
35 pages
Selector Web - Definitions
No ratings yet
Selector Web - Definitions
8 pages
Abinitio-Material
No ratings yet
Abinitio-Material
11 pages
Abinitio 2
100% (2)
Abinitio 2
45 pages
Abinitio 12 Curriculum V6
No ratings yet
Abinitio 12 Curriculum V6
9 pages
Interview_Qs_1
100% (1)
Interview_Qs_1
48 pages
Ab Initio Playbook 1
No ratings yet
Ab Initio Playbook 1
11 pages
Ab Comp
100% (1)
Ab Comp
18 pages
Components
No ratings yet
Components
11 pages
AbInitio FAQ
No ratings yet
AbInitio FAQ
15 pages
AI Questions
No ratings yet
AI Questions
12 pages
Ab Initio Basic Turorial
No ratings yet
Ab Initio Basic Turorial
22 pages
Unit 5-Modeling Component Design
No ratings yet
Unit 5-Modeling Component Design
27 pages
Abnitio Interview Question
No ratings yet
Abnitio Interview Question
9 pages
Msbi Notes PPT Faqs
No ratings yet
Msbi Notes PPT Faqs
3 pages
Component Diagrams: by V.RAVIKIRAN (10311D2510) & A.KARTHIK (10311D2509)
No ratings yet
Component Diagrams: by V.RAVIKIRAN (10311D2510) & A.KARTHIK (10311D2509)
18 pages
Partition
No ratings yet
Partition
5 pages
DataDef-Interbase
No ratings yet
DataDef-Interbase
236 pages
DataBase - Management DataBase - Administration
No ratings yet
DataBase - Management DataBase - Administration
848 pages
SDLC Informatica Unix Project Devlopment of Project
No ratings yet
SDLC Informatica Unix Project Devlopment of Project
31 pages
Interview Quetions
No ratings yet
Interview Quetions
4 pages
Chapter - 1 : Skill Developers
No ratings yet
Chapter - 1 : Skill Developers
4 pages
Abintio 2
No ratings yet
Abintio 2
4 pages
File System
No ratings yet
File System
12 pages
UNIT-6
No ratings yet
UNIT-6
21 pages
Unit_I DBMS
No ratings yet
Unit_I DBMS
74 pages
Kubernetes Made Easy
From Everand
Kubernetes Made Easy
Pankaj Joshi
No ratings yet
Computing Environments
No ratings yet
Computing Environments
24 pages
Input Output Intermediate and Look Up Files
50% (2)
Input Output Intermediate and Look Up Files
12 pages
SIT102 Lecture 8.2
No ratings yet
SIT102 Lecture 8.2
32 pages
PT 08 - File Systems
No ratings yet
PT 08 - File Systems
6 pages
CS-220 Database Systems (Fall 21) : Muneer Ahmad
No ratings yet
CS-220 Database Systems (Fall 21) : Muneer Ahmad
31 pages
OS Lecture-14 (File Systems)
No ratings yet
OS Lecture-14 (File Systems)
70 pages
Chapter 1 Slides
No ratings yet
Chapter 1 Slides
50 pages
Info Sphere DataStage Parallel Framework Standard Practices
No ratings yet
Info Sphere DataStage Parallel Framework Standard Practices
460 pages
DBS REVIEWER
No ratings yet
DBS REVIEWER
4 pages
Parallel Standard Practices DS
No ratings yet
Parallel Standard Practices DS
458 pages
Elective-I Advanced Database Management Systems: Unit Ii
100% (1)
Elective-I Advanced Database Management Systems: Unit Ii
141 pages
Abinitio Training
100% (4)
Abinitio Training
54 pages
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
The Definitive Guide to PowerShell
From Everand
The Definitive Guide to PowerShell
Wesley Dunne
No ratings yet
Linux Services Deployment
From Everand
Linux Services Deployment
Fabian Mestre
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
From Everand
Mastering Terraform A Comprehensive Guide to Infrastructure As Code
Mario Marinov
No ratings yet
Java / J2EE Interview Questions You'll Most Likely Be Asked
From Everand
Java / J2EE Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
Blender Pro Studio Advanced Techniques for Real-World Projects: Blender, #3
From Everand
Blender Pro Studio Advanced Techniques for Real-World Projects: Blender, #3
Steven Mcananey
No ratings yet
Co - Operating System Administrator Guide
No ratings yet
Co - Operating System Administrator Guide
256 pages
Ab Initio - V1.6
No ratings yet
Ab Initio - V1.6
50 pages
EME-Guide To Manage Enterprise Metadata
No ratings yet
EME-Guide To Manage Enterprise Metadata
310 pages
Ab Initio - V1.5
No ratings yet
Ab Initio - V1.5
162 pages
Ab Initio - V1.2
No ratings yet
Ab Initio - V1.2
29 pages
Ab Initio - V1.3
No ratings yet
Ab Initio - V1.3
37 pages
Ab Initio - V1.1
No ratings yet
Ab Initio - V1.1
26 pages
Kernal: Redis-Server Redis-Cli Redis-Server Redis-Cli
No ratings yet
Kernal: Redis-Server Redis-Cli Redis-Server Redis-Cli
264 pages
QlikView Vs Cognos Comparision
No ratings yet
QlikView Vs Cognos Comparision
13 pages
Cka PDF
No ratings yet
Cka PDF
56 pages
Operation & Maintenance - M300-V - Sn3001-Up PDF
100% (1)
Operation & Maintenance - M300-V - Sn3001-Up PDF
208 pages
Website: Vce To PDF Converter: Facebook: Twitter:: Mb-800.Vceplus - Premium.Exam.47Q
No ratings yet
Website: Vce To PDF Converter: Facebook: Twitter:: Mb-800.Vceplus - Premium.Exam.47Q
42 pages
10-Maths-Chapter 08
No ratings yet
10-Maths-Chapter 08
2 pages
NWR Poster
No ratings yet
NWR Poster
1 page
CE117 Process Trainer Datasheet
No ratings yet
CE117 Process Trainer Datasheet
3 pages
CODE OF CONDUCT FOR DATA INTEGRITY
No ratings yet
CODE OF CONDUCT FOR DATA INTEGRITY
5 pages
Parent Guardian Interview
No ratings yet
Parent Guardian Interview
4 pages
Leaflet Hrce HPGR 4420-07-2021 en MNG Web
No ratings yet
Leaflet Hrce HPGR 4420-07-2021 en MNG Web
2 pages
Introduction To Artificial Intelligence
No ratings yet
Introduction To Artificial Intelligence
7 pages
mcq all for test
No ratings yet
mcq all for test
16 pages
ListDismantlerRecycler2022 Tamilnadu
100% (1)
ListDismantlerRecycler2022 Tamilnadu
8 pages
EUWA ES 3.08 Wheel and Rim Markings.
No ratings yet
EUWA ES 3.08 Wheel and Rim Markings.
4 pages
11 Cylinder Serie 51 SL - 0608
No ratings yet
11 Cylinder Serie 51 SL - 0608
2 pages
Grade 10 - Chapter7 - Worksheet
No ratings yet
Grade 10 - Chapter7 - Worksheet
4 pages
Information Assurance Security in the Information Environment 2nd edition by Andrew Blyth, Gerald Kovacich ISBN 1846282667 978-1846282669 download
100% (1)
Information Assurance Security in the Information Environment 2nd edition by Andrew Blyth, Gerald Kovacich ISBN 1846282667 978-1846282669 download
47 pages
TaHoma Compatibility List Release 12.2022
No ratings yet
TaHoma Compatibility List Release 12.2022
22 pages
Data Sheet Model FD 115 en
No ratings yet
Data Sheet Model FD 115 en
6 pages
A Computer Hardware Engineer
No ratings yet
A Computer Hardware Engineer
5 pages
Redacción Avanzada
100% (1)
Redacción Avanzada
8 pages
Final High Rise Case Study
No ratings yet
Final High Rise Case Study
1 page
A2 Flyers Listening Test 1 Part 2 Pre-Intermediate Level of English
No ratings yet
A2 Flyers Listening Test 1 Part 2 Pre-Intermediate Level of English
1 page
STDIX-Unit4-IntroductiontoGenerativeAIExercise(2024-25)
No ratings yet
STDIX-Unit4-IntroductiontoGenerativeAIExercise(2024-25)
6 pages
Declaration of Original Work
No ratings yet
Declaration of Original Work
5 pages
Advanced Power BI (Data) : Sample Manual - First Two Chapters
No ratings yet
Advanced Power BI (Data) : Sample Manual - First Two Chapters
26 pages
Lesson 4 - Entrepreneurial Marketing Management IM
No ratings yet
Lesson 4 - Entrepreneurial Marketing Management IM
7 pages
Hi-Scan 145180-2is Pro: Technical Information
No ratings yet
Hi-Scan 145180-2is Pro: Technical Information
2 pages
Epc Contractor Coal Fired Power Plant 25 MW, 50 MW, 100 MW, 200 MW
No ratings yet
Epc Contractor Coal Fired Power Plant 25 MW, 50 MW, 100 MW, 200 MW
7 pages
Ultrasonic Testing of Welds
No ratings yet
Ultrasonic Testing of Welds
10 pages

Ab Initio - V1.4

Uploaded by

Ab Initio - V1.4

Uploaded by

Sample Components

Select Dedup criteria.

• Specify the key for join

 Types of Parallel Processing

 In the graph, instead of specifying the entire path for any

 The actual in_data dir is $AI_HOME/in_data in sandbox

 We can define parameters in sandbox those can

 The topmost variable $PROJECT_DIR contains

You might also like