community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.
New Data Science Blog

Check out the latest post: All Models Are Wrong

READ MORE

Ready. FIRE! Aim. An ACE's Guide to Thinking

Alteryx Certified Partner
Alteryx Certified Partner
PRODUCT: Alteryx Designer 
VERSION: 2018.1 
LAST UPDATE: 05/09/2018 

ATTACHMENT: AutoParse.yxzp 


The challenge:  Parsing a fixed width text file dynamically.

 

Columns of data are contained in a fixed width text file with an unknown quantity of fields. These fields are laid out neatly for printing on state-of-the-art green-bar paper (note: insert sarcasm here). @danrh enlisted the help of the Alteryx Community to solve this "doozy" of a challenge to create a macro that can parse any input file of this structure type.  I responded "Yes, it can be done." I took the time to respond with a 3-part video along with the macro itself. Following that effort, I suggested to my friend @JoeM that he make this post into a weeklychallenge. Now I find myself turning this into a KB article. The purpose of the article is to demonstrate the art of thinking.

 

Explain it to your Mom. I mean no disrespect in that statement. My Mom was very proud of my accomplishments but on most occasions, she really didn't understand what I did.

 

 

 

In this post Dan lists the key to parsing the incoming data as:

The key to knowing the column widths is the second row with the dashes. Two blank spaces separate each field. 

He then asks the question:

Is there a way to take the format of a single row and parse the rest of the data accordingly? 

He thinks of two approaches:

  1. I've poked at it with Regex, but no dice yet.
  2. The only idea I've had thus far (and it ain't a good one) is to parse out every character onto a separate row, use a Multi-Row to figure out where the breaks are, and Concatenate them back together. But, the size of the file is quickly going to become an issue. 200 characters across 200 records would give 40,000 rows --- this gets out of hand very quickly, not exactly a viable solution.

Facing a wall, he adds the following:

Assume no absolutes with the column naming conventions.  Could have letters, numbers, punctuation, etc.  I haven't built iterative marcros and while I want to learn, I'm not convinced that I want to learn today.  Looking to do this in a single pass if possible.

 

image.png

 

Looking at the picture above, do you need instructions from me to interpret the data? Can you figure out how to parse the data visually? Would you explain how to parse the data and lose your audience? While I can't guarantee that my Mom would understand it, I do believe that she'd understand the challenge of receiving a FAX and having to retype the data into a spreadsheet. She'd get that the first row of data needs to become the column headers and that she'd skip the dashes. Each row of data would need to be broken by their visual breaks and the words would be put into the proper column alignments.

 

I have an advantage here over readers that are not familiar with the CReW macros. I can envisage a solution where I can use a substring() function for each field. There is a "dynamic formula" tool that can be used to create a formula for each of the dynamic columns/fields in the incoming data. At Inspire18, come join @JoeM and myself in an elective course on these macros. But for now, let's explain the solution to my Mom.

 

When the FAX comes in, maybe you should take out a red pen and a ruler. Starting at the left of the first column of data draw a vertical line. Continue this effort for each column that you see.  Now in terms of the position of these lines, if you put an imaginary guide at the top of the data, you might see this:

 

           1        2         3         4         5         6         7         8         9         0
01234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789
|PART NUMBER |PART TYPE |ACTIVATE DATE
|------------ |----------- |-------------

The first field begins in position 0 and has a length of 11. I can tell that because the PIPE is in position 0 and there are 11 dashes. The next field starts in position 14 and has a length of 11.  I wouldn't explain that finding to my Mom. But my Mom isn't Alteryx.  I've got to come up with a way of explaining that pattern to Alteryx. I invite you to watch the following videos to see the solution to the challenge.  I also invite you to solve this in a different manner as @david_fetters did.

 

Thinking through the challenge:  8 min

Creating a workflow/macro:

 

Building the macro:  30 min

 

 

Completing the macro:  8 min

 

 

I hope that this post finds you well.  Please do feel free to use this or the original post to comment with your feedback/solutions.

 

Cheers,

Mark

 

 

Contributors