المختبر الثاني
المختبر الثاني
Contents
1. Data Profiling 2
2. File System Task 7
3. Bulk Insert Task 9
4. Execute SQL Task 10
5. Transfer Database Task 11
6. Data Flow Task 13
7. Source 14
8. Destination 17
1
DM&DW Lab. 2
1. Data Profiling
Data profiling is the process of examining data and collecting metadata about
the quality of the data, about frequency of statistical patterns,
interdependencies, uniqueness, and redundancy.
This type of analytical activity is important for the overall quality and health of
an operational data store (ODS) or data warehouse.
The Data Profiling Task is located in the SSIS Toolbox, but you probably
shouldn’t attempt to use the results to make an automated workflow decision in
the SSIS package Control Flow.
The profiler can only report on statistics in the data; you still need to make
judgments about these statistics. For example, a column may contain an
overwhelming amount of NULL values, but the profiler doesn’t know whether
this reflects a valid business scenario.
2
DM&DW Lab. 2
3
DM&DW Lab. 2
4
DM&DW Lab. 2
Select the file or create it on any place and click OK then execute the
package.
5
DM&DW Lab. 2
6
DM&DW Lab. 2
7
DM&DW Lab. 2
Select the source connection to the directory which you want to move it’s
content, select the destination to select the destination connection and set
the folder and click OK, then execute it.
8
DM&DW Lab. 2
9
DM&DW Lab. 2
10
DM&DW Lab. 2
Set the connection to the database from Connection, select the SQL
statement to ‘EXEC sp_dbremove ‘Test’ to remove ‘Test’ database.
11
DM&DW Lab. 2
The Action property controls whether the task should copy or move the
source database. The Method property controls whether the database should
be copied while the source database is kept online, using SQL Server
Management Objects (SMO), or by detaching the database, moving the files,
and then reattaching the database. The DestinationOverwrite property
controls whether the creation of the destination database should be allowed
to overwrite.
This includes deleting the database in the destination if it is found. This is
useful in cases where you want to copy a database from production into a
quality-control or production test environment, and the new database should
replace any existing similar database. The last property is the
ReattachSourceDatabase, which specifies what action should be taken upon
failure of the copy. Use this property if you have a package running on a
schedule that takes a production database offline to copy it, and you need to
guarantee that the database goes back online even if the copy fails.
12
DM&DW Lab. 2
cleanse and reshape it for its new purpose, and into one or more destinations.
The Data Flow does its work primarily in memory, which gives SSIS its
strength, allowing the Data Flow to perform faster than any ELT packages.
One of the toughest concepts to understand for a new SSIS developer is the
difference between the Control Flow and the Data Flow tabs.
The Control Flow tab controls the workflow of the package and the order in
which each task will execute. Each task in the Control Flow has a user
interface to configure the task, with the exception of the Data Flow Task.
The Data Flow Task is configured in the Data Flow tab. Once you drag a Data
Flow Task onto the Control Flow tab and double-click it to configure it,
you’re immediately taken to the Data Flow tab.
Data viewers are a very important feature in SSIS for debugging your Data
Flow pipeline. They enable you to view data at points in time at runtime. If
you place a data viewer before and after the Aggregate Transformation, for
example, you can see the data flowing into the transformation at runtime and
what it looks like after the transformation happens.
To place a data viewer in your pipeline, right-click one of the paths (red or
blue arrows leaving a transformation or source) and select Enable Data
Viewer.
7. Source
o A source in the SSIS Data Flow is where you specify the location of your
source data. Most sources will point to a Connection Manager in SSIS. By
pointing to a Connection Manager, you can reuse connections throughout your
package, because you need only change the connection in one place.
o The Source Assistant and Destination Assistant are two components designed
to remove the complexity of configuring a source or a destination in the Data
Flow.
13
DM&DW Lab. 2
o OLE DB Source
o The OLE DB Source is the most common type of source, and it can point
to any OLE DB–compliant Data Source such as SQL Server, Oracle, or
DB2. To configure the OLE DB Source, double-click the source once you
have added it to the design pane in the Data Flow tab. In the Connection
Manager page of the OLE DB Source Editor, select the Connection
Manager of your OLE DB Source from the OLE DB Connection Manager
dropdown box. You can also add a new Connection Manager in the editor
by clicking the New button.
14
DM&DW Lab. 2
o As with most sources, you can go to the Columns page to set columns that
you wish to output to the Data Flow, as shown below. Simply check the
columns you wish to output, and you can then assign the name you want to
send down the Data Flow in the Output column. Select only the columns
that you want to use, because the smaller the data set, the better the
performance you will get.
o Optionally, you can go to the Error Output page (shown in Figure 4-4) and
specify how you wish to handle rows that have errors. For example, you
may wish to output any rows that have a data type conversion issue to a
different path in the Data Flow. On each column, you can specify that if an
error occurs, you wish the row to be ignored, be redirected, or fail. If you
choose to ignore failures, the column for that row will be set to NULL. If
you redirect the row, the row will be sent down the red path in the Data
Flow coming out of the OLE DB Source.
15
DM&DW Lab. 2
o Excel Source
o The Excel Source is a source component that points to an Excel
spreadsheet, just like it sounds. Once you point to an Excel Connection
Manager, you can select the sheet from the “Name of the Excel sheet”
dropdown box, or you can run a query by changing the Data Access Mode.
This source treats Excel just like a database, where an Excel sheet is the
table and the workbook is the database. If you do not see a list of sheets in
the dropdown box, you may have a 64-bit machine that needs the ACE
driver installed or you need to run the package in 32-bit mode.
16
DM&DW Lab. 2
specify an OLE DB Source. Once you add it to your Data Flow pane, you
point it to a Connection Manager connection that is a flat file or a multi-flat
file. Next, from the Columns tab, you specify which columns you want to
be presented to the Data Flow. All the specifications for the flat file, such
as delimiter type, were previously set in the Flat File Connection Manager.
8. Destination
Inside the Data Flow, destinations accept the data from the Data Sources and
from the transformations. The architecture can send the data to nearly any
OLE DB–compliant Data Source, a flat file, or Analysis Services, to name
just a few. Like sources, destinations are managed through Connection
Managers. The configuration difference between sources and destinations is
that in destinations, you have a Mappings page (shown in Figure below),
where you specify how the inputted data from the Data Flow maps to the
destination. As shown in the Mappings page in this figure, the columns are
automatically mapped based on column names, but they don’t necessarily
have to be exactly lined up. You can also choose to ignore given columns,
such as when you’re inserting into a table that has an identity column, and
you don’t want to inherit the value from the source table.
17
DM&DW Lab. 2
Excel Destination
o The Excel Destination is identical to the Excel Source except that it accepts
data rather than sends data. To use it, first select the Excel Connection
Manager from the Connection Manager page, and then specify the
worksheet into which you wish to load data.
Flat File Destination
o The commonly used Flat File Destination sends data to a flat file, and it can
be fixed-width or delimited based on the Connection Manager. The
destination uses a Flat File Connection Manager.
o You can also add a custom header to the file by typing it into the Header
option in the Connection Manager page. Lastly, you can specify on this page
that the destination file should be overwritten each time the Data Flow is
run.
OLE DB Destination
o Your most commonly used destination will probably be the OLE DB
Destination (see Figure below).
18
DM&DW Lab. 2
19