MSBI (SSIS/SSRS/SSAS) Online Training

Sunday, July 26, 2009

SSIS - Adventure Approach - Best Practice

Integration Services Tutorial
Microsoft SQL Server 2005 Integration Services (SSIS) is a platform for building high performance data integration solutions, including the extraction, transformation, and load (ETL) packages for data warehousing. Integration Services includes graphical tools and wizards for building and debugging packages; tasks for performing workflow functions such as FTP operations, executing SQL statements, and sending e-mail messages; data sources and destinations for extracting and loading data; transformations for cleaning, aggregating, merging, and copying data; a management service, Integration Services Service, for administering Integration Services; and application programming interfaces (APIs) for programming the Integration Services object model.
In this tutorial, you will learn how to use SSIS Designer to create a simple Microsoft SQL Server 2005 Integration Services package. The package that you create takes data from a flat file, reformats the data, and then inserts the reformatted data into a fact table. In following lessons, the package will be expanded to demonstrate looping, package configurations, logging and error flow.
What You Will Learn
The best way to become acquainted with the new tools, controls and features available in Microsoft SQL Server 2005 Integration Services is to use them. This tutorial walks you through SSIS Designer to create a simple ETL package that includes looping, configurations, error flow logic and logging.
Lessons
Description
Lesson 1: Creating a Simple ETL Package
In this lesson, you will create a simple ETL package that extracts data from a single flat file, transforms the data using lookup transformations and finally loads the result into a fact table destination.
Lesson 2: Adding Looping
In this lesson, you will expand the package you created in Lesson 1 to take advantage of new looping features to extract multiple flat files into a single data flow process.
Lesson 3: Adding Package Configurations
In this lesson, you will expand the package you created in Lesson 2 to take advantage of new package configuration options.
Lesson 4: Adding Logging
In this lesson, you will expand the package you created in Lesson 3 to take advantage of new logging features.
Lesson 5: Adding Error Flow Redirection
In this lesson, you will expand the package you created in lesson 4 to take advantage of new error output configurations.
Requirements
This tutorial is intended for users familiar with fundamental database operations, but who have limited exposure to the new features available in SQL Server 2005 Integration Services.
To use this tutorial, your system must have the following components installed:
· SQL Server 2005 with the AdventureWorksDW database. To enhance security, the sample databases are not installed by default. To install the sample databases, see Running Setup to Install AdventureWorks Sample Databases and Samples.
· This tutorial also requires sample data. The sample data is installed together with the samples. If you cannot find the sample data, return to the procedure above and complete installation as described.

In this lesson, you will create a simple ETL package that extracts data from a single flat file source, transforms the data using two lookup transformation components, and writes that data to the FactCurrencyRate fact table in AdventureWorksDW. As part of this lesson, you will learn how to create new packages, add and configure data source and destination connections, and work with new control flow and data flow components.
Understanding the Package Requirements
Before creating a package, you need a good understanding of the formatting used in both the source data and the destination. Once you understand both of these data formats, you are then able to define the transformations necessary to map the source data to the destination.
Looking at the Source
For this tutorial, the source data is a set of historical currency data contained in the flat file, SampleCurrencyData.txt. The source data has the following four columns: the average rate of the currency, a currency key, a date key, and the end-of-day rate.
Here is an example of the source data contained in the SampleCurrencyData.txt file:
1.00010001 ARS 9/3/2001 0:00 0.99960016
1.00010001 ARS 9/4/2001 0:00 1.001001001
1.00020004 ARS 9/5/2001 0:00 0.99990001
1.00020004 ARS 9/6/2001 0:00 1.00040016
1.00050025 ARS 9/7/2001 0:00 0.99990001
1.00050025 ARS 9/8/2001 0:00 1.001001001
1.00050025 ARS 9/9/2001 0:00 1
1.00010001 ARS 9/10/2001 0:00 1.00040016
1.00020004 ARS 9/11/2001 0:00 0.99990001
1.00020004 ARS 9/12/2001 0:00 1.001101211

When working with flat file source data, it is important to understand how the Flat File connection manager interprets the flat file data. If the flat file source is Unicode, the Flat File connection manager defines all columns as [DT_WSTR] with a default column width of 50. If the flat file source is ANSI-encoded, the columns are defined as [DT_STR] with a column width of 50. You will probably have to change these defaults to make the string column types more appropriate for your data. To do this, you will need to look at the data type of the destination where the data will be written to and then choose the correct type within the Flat File connection manager.
Looking at the Destination
The ultimate destination for the source data is the FactCurrencyRate fact table in AdventureWorksDW. The FactCurrencyRate fact table has four columns, and has relationships to two dimension tables, as shown in the following table.
Column Name
Data Type
Lookup Table
Lookup Column
AverageRate
float
None
None
CurrencyKey
int (FK)
DimCurrency
CurrencyKey (PK)
TimeKey
Int (FK)
DimTime
TimeKey (PK)
EndOfDayRate
float
None
None
Mapping Source Data to be Compatible with the Destination
Analysis of the source and destination data formats indicates that lookups will be necessary for the CurrencyKey and TimeKey values. The transformations that will perform these lookups will obtain the CurrencyKey and TimeKey values by using the alternate keys from DimCurrency and DimTime dimension tables.
Flat File Column
Table Name
Column Name
Data Type
0
FactCurrencyRate
AverageRate
Float
1
DimCurrency
CurrencyAlternateKey
nchar (3)
2
DimTime
FullDateAlternateKey
Datetime
3
FactCurrencyRate
EndOfDayRate
Float

The first step in creating a package in Microsoft SQL Server 2005 Integration Services (SSIS) is to create an Integration Services project. This project includes the templates for the objects — data sources, data source views, and packages — that you use in a data transformation solution.
To create a new Integration Services project
1. On the Start menu, point to All Programs, point to Microsoft SQL Server 2005, and click SQL Server Business Intelligence Development Studio.
2. On the File menu, point to New, and click Project to create a new Integration Services project.
3. In the New Project dialog box, select Integration Services Project in the Templates pane.
4. In the Name box, change the default name to SSIS Tutorial.
5. Click OK.
By default, an empty package, titled Package.dtsx, will be created and added to your project.
6. In the Solution Explorer toolbar, right-click Package.dtsx, click Rename, and rename the default package to Lesson 1.dtsx.
7. (Optional) When prompted to rename the package object, click Yes.
In this task, you add a Flat File connection manager to the package that you just created. A Flat File connection manager enables a package to extract data from a flat file. Using the Flat File connection manager, you can specify the file name and location, the locale and code page, and the file format, including column delimiters, to apply when the package extracts data from the flat file. In addition, you can manually specify the data type for the individual columns, or use the Suggest Column Types dialog box to automatically map the columns of extracted data to Integration Services data types.
You must create a new Flat File connection manager for each file format that you work with. Because this tutorial extracts data from multiple flat files that have exactly the same data format, you will need to add configure only one Flat File connection manager to your package.
For this tutorial, you will configure the following properties in your Flat File connection manager:
· Column names Because the flat file does not have column names, the Flat File connection manager creates default column names. These default names are not useful for identifying what each column represents. To make these default names more useful, you need to change the default names to names that match the fact table into which the flat file data is to be loaded.
· Data mappings The data type mappings that you specify for the Flat File connection manager will be used by all flat file data source components that reference the connection manager. You can either manually map the data types by using the Flat File connection manager, or you can use the Suggest Column Types dialog box. In this tutorial, you will view the mappings suggested in the Suggest Column Types dialog box and then manually make the necessary mappings in the Flat File Connection Manager Editor dialog box.
To add a Flat File connection manager
1. Right-click anywhere in the Connection Managers area, and then click New Flat File Connection.
2. In the Flat File Connection Manager Editor dialog box, for Connection manager name, type Sample Flat File Source Data.
3. Click Browse.
4. In the Open dialog box, browse to the sample data folder and open the SampleCurrencyData.txt file. By default, the tutorial sample data is installed to the c:\Program Files\Microsoft SQL Server\90\Samples\Integration Services\Tutorial\Creating a Simple ETL Package\Sample Data folder.
To rename columns in the Flat File connection manager
1. In the Flat File Connection Manager Editor dialog box, click Advanced.
2. In the property pane, makes the following changes:
· Change the Column 0 name property to AverageRate.
· Change the Column 1 name property to CurrencyID.
· Change the Column 2 name property to CurrencyDate.
· Change the Column 3 name property to EndOfDayRate.
Note:
By default, all four of the columns are initially set to a string data type [DT_STR] with an OutputColumnWidth of 50.
To remap column data types
1. In the Flat File Connection Manager Editor dialog box, click Suggest Types.
Integration Services automatically suggests data types based on the first 100 rows of data, but you can change the suggestion options to sample more or less data, specify the default data type for integer or Boolean data, or add spaces added as padding to string columns.
For now, make no changes, and click Cancel to return to the Advanced pane of the Flat File connection Manager Editor dialog box and view the suggested column data types.
In this tutorial, Integration Services suggests the data types shown in the second column of the table below for the data from the SampleCurrencyData.txt file. However, the data types that are required for the columns in the destination, which will be defined in a later step, are shown in the last column of the following table.
Flat File Column
Suggested Type
Destination Column
Destination Type
AverageRate
Float [DT_R4]
FactCurrencyRate.AverageRate
Float
CurrencyID
String [DT_STR]
DimCurrency,CurrencyAlternateKey
nchar(3)
CurrencyDate
Date [DT_DATE]
DimTime.FullDateAlternateKey
datetime
EndOfDayRate
Float [DT_R4]
FactCurrencyRate.EndOfDayRate
Float
The data types suggested for the CurrencyID and CurrencyDate columns are not compatible with the data types of the fields in the destination table. Because the data type of DimCurrency.CurrencyAlternateKey is nchar (3), CurrencyID needs to be changed from string [DT_STR] to string [DT_WSTR]. Additionally, the field DimTime.FullDateAlternateKey is defined as a DateTime data type; therefore, CurrencyDate needs to be changed from date [DT_Date] to database timestamp [DT_DBTIMESTAMP].
2. In the property pane, change the data type of column CurrencyID from string [DT_STR] to Unicode string [DT_WSTR].
3. In the property pane, change the data type of column CurrencyDate from date [DT_DATE] to database timestamp [DT_DBTIMESTAMP].
4. Click OK.


After you have added a Flat File connection manager to connect to the data source, the next task is to add an OLE DB connection manager to connect to the destination. An OLE DB connection manager enables a package to extract data from or load data into any OLE DB–compliant data source. Using the OLE DB Connection manager, you can specify the server, the authentication method, and the default database for the connection.
In this lesson, you will create an OLE DB connection manager that uses Windows Authentication to connect to the local instance of AdventureWorksDB. The OLE DB connection manager that you create will also be referenced by other components that you will create later in this tutorial, such as the Lookup transformation and the OLE DB destination.
To add and configure an OLE DB Connection Manager
1. Right-click anywhere in the Connection Managers area, and then click New OLE DB Connection.
2. In the Configure OLE DB Connection Manager dialog box, click New.
3. For Server name, enter localhost.
When you specify localhost as the server name, the connection manager connects to the default instance of Microsoft SQL Server 2005 on the local computer. To use a remote instance of SQL Server 2005, replace localhost with the name of the server to which you want to connect.
4. In the Log on to the server group, verify that Use Windows Authentication is selected.
5. In the Connect to a database group, in the Select or enter a database name box, type or select AdventureWorksDW.
6. Click Test Connection to verify that the connection settings you have specified are valid.
7. Click OK.
8. Click OK.
9. In the Data Connections pane of the Configure OLE DB Connection Manager dialog box, verify that localhost.AdventureWorksDW is selected.
10. Click OK.
After you have created the connection managers for the source and destination data, the next task is to add a Data Flow task to your package. The Data Flow task encapsulates the data flow engine that moves data between sources and destinations, and provides the functionality for transforming, cleaning, and modifying data as it is moved. The Data Flow task is where most of the work of an extract, transform, and load (ETL) process occurs.
Note:
Microsoft SQL Server 2005 Integration Services separates data flow from control flow. This separation of data flow from control flow is one of the dramatic differences between Integration Services and Microsoft SQL Server 2000 Data Transformation Services.
To add a Data Flow task
1. Click the Control Flow tab.
2. In the Toolbox, expand Control Flow Items, and drag a Data Flow Task onto the design surface of the Control Flow tab.
3. On the Control Flow design surface, right-click the newly added Data Flow Task, click Rename, and change the name to Extract Sample Currency Data.
It is good practice to provide unique names to all components that you add to a design surface. For ease of use and maintainability, the names should describe the function that each component performs. Following these naming guidelines allows your Integration Services packages to be self-documenting. Another way to document your packages is by using annotations. For more information about annotations, see Using Annotations in Packages.
In this task, you will add and configure a Flat File source to your package. A Flat File source is a data flow component that uses metadata defined by a Flat File connection manager to specify the format and structure of the data to be extracted from the flat file by a transform process. The Flat File source can be configured to extract data from a single flat file by using the file format definition provided by the Flat File connection manager.
For this tutorial, you will configure the Flat File source to use the Sample Flat File Source Data connection manager that you previously created.
To add a Flat File Source component
1. Open the Data Flow designer, either by double-clicking the Extract Sample Currency Data data flow task or by clicking the Data Flow tab.
2. In the Toolbox, expand Data Flow Sources, and then drag a Flat File Source onto the design surface of the Data Flow tab.
3. On the Data Flow design surface, right-click the newly added Flat File Source, click Rename, and change the name to Extract Sample Currency Data.
4. Right-click the Flat File source to open the Flat File Source Editor dialog box.
5. In the Flat file connection manager box, type or select Sample Flat File Data.
6. Click Columns and verify that the names of the columns are correct.
7. Click OK.
After you have configured the Flat File source to extract data from the source file, the next task is to define the Lookup transformations needed to obtain the values for the CurrencyKey and TimeKey. A Lookup transformation performs a lookup by joining data in the specified input column to a column in a reference dataset. The reference dataset can be an existing table or view, a new table, or the result of an SQL statement. The Lookup transformation uses an OLE DB connection manager to connect to the database that contains the data that is the source of the reference dataset.
For this tutorial, you will add and configure the following two Lookup transformation components to the package:
· One transformation to perform a lookup of values from the CurrencyKey column of the DimCurrency dimension table based on matching CurrencyID column values from the flat file.
· One transformation to perform a lookup of values from the TimeKey column of the DimTime dimension table based on matching CurrencyDate column values from the flat file.
In both cases, the Lookup transformations will utilize the OLE DB connection manager that you previously created.
To add and configure the CurrencyKey Lookup transformation
1. In the Toolbox, expand Data Flow Transformations, and then drag Lookup onto the design surface of the Data Flow tab.
2. Click the Extract Sample Currency Data flat file source and drag the green arrow onto the newly added Lookup transformation to connect the two components.
3. On the Data Flow design surface, right-click the newly added Lookup transformation, click Rename, and change the name to Lookup Currency Key.
4. Double-click the Lookup Currency Key transformation.
5. In the Lookup Transformation Editor dialog box, in the OLE DB connection manager box, ensure that localhost.AdventureWorksDW is displayed.
6. In the Use a table or view box, type or select [dbo].[DimCurrency].
7. Click the Columns tab.
8. In the Available Input Columns panel, drag CurrencyID to the Available Lookup Columns panel and drop it on CurrencyAlternateKey.
9. Select CurrencyKey.
10. Click OK.
To add and configure the DateKey Lookup transformation
1. In the Toolbox, drag Lookup onto the Data Flow design surface.
2. Click the Lookup Currency Key transformation and drag the green arrow onto the newly added Lookup transformation to connect the two components.
3. On the Data Flow design surface, right-click the newly added Lookup transformation, click Rename, and change the name to Lookup Date Key.
4. Double-click the Lookup Date Key transformation.
5. In the Lookup Transformation Editor dialog box, in the OLE DB connection manager box, ensure that localhost.AdventureWorksDW is displayed.
6. In the Use a table or view box, type or select [dbo].[DimTime].
7. Click the Columns tab.
8. In the Available Input Columns panel, drag CurrencyDate to the Available Lookup Columns panel and drop it on FullDateAlternateKey.
9. Select TimeKey.
10. Click OK.
Your package now can extract data from the flat file source and transform that data into a format that is compatible with the destination. The next task is to actually load the transformed data into the destination. To load the data, you must add an OLE DB destination to the data flow. The OLE DB destination can use a database table, view, or an SQL command to load data into a variety of OLE DB-compliant databases.
In this procedure, you add and configure an OLE DB destination to use the OLE DB connection manager that you previously created.
To add and configure the Sample OLE DB destination
1. In the Toolbox, expand Data Flow Destinations, and drag OLE DB Destination onto the design surface of the Data Flow tab.
2. Click the Lookup Date Key transformation and drag the green arrow over to the newly added OLE DB Destination to connect the two components together.
3. On the Data Flow design surface, right-click the newly added OLE DB Destination component, click Rename, and change the name to Sample OLE DB Destination.
4. Double-click Sample OLE DB Destination.
5. In the OLE DB Destination Editor dialog box, ensure that localhost.AdventureWorksDW is selected in the OLE DB Connection manager box.
6. In the Name of the table or the view box, type or select [dbo].[FactCurrencyRate].
7. Click Mappings.
8. Verify that the input columns are mapped correctly to the destination columns.
9. Click OK.
In this lesson, you have done the following tasks:
· Created a new Microsoft SQL Server 2005 Integration Services (SSIS) project.
· Configured the connection managers that the package needs to connect to the source and destination data.
· Added a data flow that takes the data from a flat file source, performs the necessary Lookup transformations on the data, and configures the data for the destination.
Your package is now complete! It is time to test your package.
To run the Lesson 1 tutorial package
1. On the Debug menu, click Start Debugging.
The package will run, resulting in 1097 rows successfully added into the FactCurrency fact table in AdventureWorksDW.
2. After the package has completed running, on the Debug menu, click Stop Debugging

2 comments:

Diwakar kumar said...

Hi
i am new at ssis and and using multiple .dtsx xml file by for each loop container as my source and joining multiple outputs of xml nodes in a target. my problem is that when all node exists in xml source the package works fine but if in any of the source one particular node does not exist then pipeline of that node does not get proceed and join operation is not taking place for that.

I want null value for the node which is not existing.
Please help me with proper solution if any of you have.
Thanks,
Diwakar

soumya said...

Very Nice Blog you have shared with us. Thanks To provide it check once MSBI Online Training Hyderabad to get a clarity on it.