This is a follow up to my previous post How to Convert Taxware Tax Exempt Certificates to Vertex Tax Exempt Certificates. In that post, I described the four steps I used to get Taxware exempt certificate information into the Vertex system’s database:
- Parse the Taxware exemption certificates and convert the data into a tab-separated file using a C# console application.
- Load the Taxware data into SQL Server staging tables and scrub/prepare it using T-SQL.
- Export the data to a fixed-width flat file from SQL Server using an SSIS package.
- Load the data file into Vertex using Vertex’s Import/Export utility.
This post will explain how I accomplished step # 1 of this process.
Here is a sample of what the data inside Taxware tax exempt certificates report file looks like:
COMPANY ID:MYCO NAME:ABC COMPANY INC NUMBER:CERT123456 CUSTOMER ID: 001111-1111 ADDR1:(123/456-7890 CALL TO NOTIFY) PHYS. LOCATION: JUR STATE:AK(ALASKA) ADDR2:PO BOX 1231233 REASON: (UNKNOWN) JURIS LEVEL:STATE CITY: SOMETOWN EFFECTIVE DATE: 01/21/2000 JOB NUMBER: ST: AK(ALASKA) EXPIRATION DATE: 01/21/2010 JURIS LOC: ZIP: 12333- DATE RECEIVED: 01/21/2000 OCCURRENCE NO:00000 CERTIFICATE COMPLETE/ACTIVE APPLY CERT TO ALL LEVELS DESCRIPTION:
The rest of the file consists of whitespace and lines that don’t hold data I need. The goal is to take this report file and turn it into a clean CSV, like:
MYCO,ABC COMPANY INC,CERT123456,001111-1111,... MYCO,ZXY COMPANY LLC,CERT434343,233444-0001,...
The report file record above contains 22 fields. The field offsets are well defined. The labels and flags will all be the same for each record. The field length on all the right-most fields varies from record to record however. Some fields are in the form of LABEL: (COMPANY ID:MYCO). Others simply have (APPLY CERT TO ALL LEVELS). So field types can be “data” and “flag”. I also note that some fields start a record (COMPANY ID:) and some fields end a record (DESCRIPTION:).
After deciding to write a C# console app to parse the ADP Taxware tax exempt certificates report, I boiled the process down to the following:
1. Read file into List.
2. Configure a List<> of line definition objects. These line def objects contain definitions for fields (offset, label, type (data | flag), field mode (starts line | ends line).
3. Loop through the raw lines.
4. Foreach raw line, loop through the line definitions.
5. Foreach def, call a method to identify if the current line matches the current line def.
6. If the current line def matches the current raw line, parse the line using the line def into a List<> of parsed fields and break the loop of line profiles.
7. If the line definition doesn’t identify/match, it keeps looping through available line defs until it does find a match or it sends the raw line to a List<> of discarded lines with a line number.
8. If a field has a mode of EndRecord, it takes all the fields that have accumulated, formats them as CSV and then writes them to the List<> of outLines.
9. Once the foreach loops have run through, the output and discard files are written.
I still think there is a better solution out there somewhere but this is the approach I went with.
I won’t walk through all of the source code here. I documented it so you could see my intent for doing certain things. Warning:
Don’t use my coding as any sort of reference. I only code here and there at this point, mostly for utilities and tools I need. Some of the code is decent, some of it is cringe-inducing. I know I will look at it 2 months from now and shudder but that happens to everybody, right?
Anyway…
Using an app.config file might seem like a strange choice for a command line app. I agree. Given the timeframe, I opted not to handle command line args. Although this is single-use tool, I did want more flexibility than hardcoding argument values. The app.config was an acceptable compromise and works just fine. I may end up changing this as I continue to play with the code. I’d like to use command line args like this:
stepconv -in:filename -out:filename -bad:filename [-tab|-csv] [-qi] [-help]
For now we’ll stick with app.config. (But if you know of a good command line argument parser let me know.)
Here is a run down of other parts of the app:
ConfigHelper loads the configuration values as an object so I could touch the config file just and then call parameters as needed.
FileHelper is a wrapper that provides simplified access to reading and writing text files.
FormattingHelper provides a collection of methods format strings and List objects.
The other classes perform the heavy lifting of how to process multiple lines in the text file and turn them into a complete record to be written out into the output file. See the code for details.
Program.cs is the main program. It loads the config, reads/writes files and holds the main processing loops.
The download package includes the source and binary files (src, bin respectively). It also contains a sample Taxware tax exempt certificate report I created using customer information from AdventureWorks. You should be able to run the exe and generate out.txt and bad.txt from the in.txt file.
The code was authored in Visual Studio 2005 and targets .NET 2.0. I have only tested in Windows XP Pro.
At this point we took a file that we couldn’t really do much with and put it into a standard format we can actually work with.
This code is licensed under the GNU General Public License.