A while back I started keeping a list of all the things I learned to make working with SSIS less painful and I thought I’d share. These are small things: tips, tricks and hints. Just stuff I wish I had known when I first started working with SSIS.
There isn’t much here in terms of an overarching philosophy. I’m still working on that. In general though, I would say SSIS development and management is better when:
- Packages are focused on a tight set of functions rather than doing a lot of different things.
- Packages are easy to test.
- Configuration is done using variables or better yet external configuration.
- Packages can move through environments without connection managers or other components being modified (which means your configuration is done through variables or external configuration).
I’ve only worked in small and medium environments. Some of these ideas may not be applicable in larger environments.
Configuration
- Behavior and configuration should be separate where possible. Ideally you should be able to move a package through different environments (DEV -> TEST -> PROD) simply by altering configuration, never by opening up and altering connections, components and tasks. This approach makes packages easier to test (which means packages get tested more, improving quality) and simplifies deployment.
- EncryptSensitiveWithUserKey should not be used for ProtectionLevel.
- Handle configuration with variables or configuration files.
- If you are going to store sensitive information inside connection managers, at the very least choose EncryptSensitiveWithPassword for the ProtectionLevel. This way other team members can access the package (you’ll need to give them the password) without losing connection information and the team can deploy packages without having to reconfigure them on the target deployment machine. Don’t rely on the package password for security as much as for convenience.
Connections
- Don’t use explicit server names in connection managers. This becomes confusing when you are moving a package between environments. Using a more granular name (the name of a database, the name of a file) that is not tied to a physical location is better. Example: Using SQLETLPRD01.Datamart is bad. Using just Datamart is better.
- When you create flat file connections, don’t try to define the schema by hand in BIDS using the connection manager. Instead, go outside of BIDS, create a sample file using Notepad or Excel that is in the final flat file format and put some sample data it it. Then go back inside BIDS and point the connection manager to that file. BIDS will build all the metadata it needs to define that flat file’s schema. You’ll need to review it but you’ll save a ton of work.
Containers
- Break control flow tasks up into units of work using Sequence Containers.
- Containers can help readability of a package. The can also be used to group units of work that can easily be tested or rerun together.
Deployment
- (Painful. File system deployment seems like a clear winner in terms of deployment and maintenance but may not be as secure?)
Documentation
- Don’t forget that you can include documentation inside a package with annotations.
- Documentation should include notes on deployment.
- Documentation should note whether a package can be rerun after failure or whether it needs intervention by an operator, a package developer or a process owner.
Expressions
- To use a backslash in an expression, you must escape it. For example, C:\temp\ won’t work in an expression. You have to write it as C:\\temp\\.
- Building a date tag is a pain. This snippet can help. It produces the current date in the form of DDMMYYYY:
RIGHT("0" + (DT_WSTR,2)MONTH(GETDATE()), 2) + RIGHT("0" + (DT_WSTR,2)DAY( GETDATE()), 2) + (DT_WSTR,4)YEAR(GETDATE())
Error Handling
- A package should never fail a SQL Agent job for non-exceptional reasons. Example. Every Wednesday a job runs an SSIS package that downloads a remote file, transforms a few fields and writes the data to a database table. The SSIS package should not fail the job just because this Wednesday the file can’t be found on the remote server. This is a non-exceptional state for the process, meaning it’s not something that should be considered outside normal operation. It happens all the time in the real world. The SSIS package should be able to handle it gracefully and notify the process owner via email or some other method.
File Operations
- Using wildcards (ex: /*.dat) does not work in the file system task.
- Using a script task is often a better way of handling bulk file operations (like deletes) than using the foreach loop task. You’ll need to reference the .NET System.IO namespace. (Remember that you can access package variables inside script tasks.)
- But use a foreach loop container if you need to execute other SSIS tasks each time through the loop.
- It is easier to delete a directory full of files than to try to delete a large number of files using scripts or file system tasks (especially if you need a foreach loop container). So if you need to create a large number of files and then delete them later, put them all in a folder. Then just create/delete the folder using file system tasks.
FTP
- Using wildcards (ex: /*.dat) does work in remote FTP paths.
Packages
- Give packages a unique name instead of Package.dtsx.
- A good package name might take the form of Project-Module-Function.dtsx.
- Packages should not have too many functions. Smaller packages are easier to test and deploy. They minimize risk of breaking functionality when making changes and redeploying. If you need to use SSIS to provide multiple related functions in the same module of a project and run them on the same schedule, it may be better to create separate packages and put them in separate steps of a single SQL Agent job. That is often preferable to developing and managing a single monolithic package.
- If functionality in a package is deprecated, take it out. If it needs to stay there for reference, clearly mark it as deprecated.
Logging
- Log to a text file instead of a SQL table. A SQL table is convenient but the file system is a more robust subsystem than SQL in the event of database problems.
- Be careful about the amount of information you include in logging. Production packages should generally only log errors (SQL Agent logs steps and run times). In other environments, more verbose output is ok.
Variables
- Create as few variables as possible.
- Using the package variable scope is usually just fine (vs. getting more granular).
- Using variable expressions is an easy way to set variable values before the control flow starts. An alternative is to set variables at the start of the control flow using a script task. The drawback to that though is that if you want to only test part of a package you can’t just execute a single container to test it—you must disable all containers/components you don’t want to execute and then run the entire package to make sure those variables are set.
- It’s a good practice to distinguish between directories and paths. A path is a directory + a filename. For example, \\fileserver01\inbound\ is a directory while \\fileserver01\inbound\2011-09-01.zip is a path.
- If you need to reference a path or a directory more than once in a package, put it in a variable.
- If you only reference a path or directory once but it is dependant on some other variable (such as the home directory for a process, which changes from environment to environment), set it in that connection’s or component’s Expressions collection.
Please leave a comment with any improvements and additions to this list. I’ll keep the list updated. Also, take a look at Jamie Thompson’s list of SSIS: Suggested Best Practices and naming conventions.