This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The challenge: Parsing a fixed width text file dynamically.
Columns of data are contained in a fixed width text file with an unknown quantity of fields. These fields are laid out neatly for printing on state-of-the-art green-bar paper (note: insert sarcasm here). @danrh enlisted the help of the Alteryx Community to solve this "doozy" of a challenge to create a macro that can parse any input file of this structure type. I responded "Yes, it can be done." I took the time to respond with a 3-part video along with the macro itself. Following that effort, I suggested to my friend @JoeM that he make this post into a weeklychallenge. Now I find myself turning this into a KB article. The purpose of the article is to demonstrate the art of thinking.
Explain it to your Mom. I mean no disrespect in that statement. My Mom was very proud of my accomplishments but on most occasions, she really didn't understand what I did.
In this post Dan lists the key to parsing the incoming data as:
The key to knowing the column widths is the second row with the dashes. Two blank spaces separate each field.
He then asks the question:
Is there a way to take the format of a single row and parse the rest of the data accordingly?
He thinks of two approaches:
I've poked at it with Regex, but no dice yet.
The only idea I've had thus far (and it ain't a good one) is to parse out every character onto a separate row, use a Multi-Row to figure out where the breaks are, and Concatenate them back together. But, the size of the file is quickly going to become an issue. 200 characters across 200 records would give 40,000 rows --- this gets out of hand very quickly, not exactly a viable solution.
Facing a wall, he adds the following:
Assume no absolutes with the column naming conventions. Could have letters, numbers, punctuation, etc. I haven't built iterative marcros and while I want to learn, I'm not convinced that I want to learn today. Looking to do this in a single pass if possible.
Looking at the picture above, do you need instructions from me to interpret the data? Can you figure out how to parse the data visually? Would you explain how to parse the data and lose your audience? While I can't guarantee that my Mom would understand it, I do believe that she'd understand the challenge of receiving a FAX and having to retype the data into a spreadsheet. She'd get that the first row of data needs to become the column headers and that she'd skip the dashes. Each row of data would need to be broken by their visual breaks and the words would be put into the proper column alignments.
I have an advantage here over readers that are not familiar with the CReW macros. I can envisage a solution where I can use a substring() function for each field. There is a "dynamic formula" tool that can be used to create a formula for each of the dynamic columns/fields in the incoming data. At Inspire18, come join @JoeM and myself in an elective course on these macros. But for now, let's explain the solution to my Mom.
When the FAX comes in, maybe you should take out a red pen and a ruler. Starting at the left of the first column of data draw a vertical line. Continue this effort for each column that you see. Now in terms of the position of these lines, if you put an imaginary guide at the top of the data, you might see this:
1 2 3 4 5 6 7 8 9 0
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 |PART NUMBER |PART TYPE |ACTIVATE DATE |------------ |----------- |-------------
The first field begins in position 0 and has a length of 11. I can tell that because the PIPE is in position 0 and there are 11 dashes. The next field starts in position 14 and has a length of 11. I wouldn't explain that finding to my Mom. But my Mom isn't Alteryx. I've got to come up with a way of explaining that pattern to Alteryx. I invite you to watch the following videos to see the solution to the challenge. I also invite you to solve this in a different manner as @david_fetters did.
Thinking through the challenge: 8 min
Creating a workflow/macro:
Building the macro: 30 min
Completing the macro: 8 min
I hope that this post finds you well. Please do feel free to use this or the original post to comment with your feedback/solutions.