Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to Input this Text File?

hydrogurl01
8 - Asteroid

Hi All,

 

I'm having trouble inputting my text file into workable columns into Alteryx. I have used text files in the past, but I guess they were a lot cleaner than what I'm working with now. I have attached an example of my text file formatting to this message. I've tried parsing with the text to columns and just inputting as a text file in the input and starting my input at line 31 which skips the proper rows I want it to, but I'm still having a hard time organizing the columns in the way I'd like. I see that when inputting my data in, I can choose to input with fixed width. However, I can't seem to use fixed width while also skipping lines of my input data. I'm not able to accurately denote what my fixed width would be in the screen since I can't see the actual columns I would like to separate out. Any help would be greatly appreciated.

2 REPLIES 2
Joe_Mako
12 - Quasar

There a few ways to approach files like this. Generally my approach is to first identify and separate the data by the type of content, in your case, you have:
header data, rows 1-29
field names, rows 30-31
field data, rows 33-79
total data, rows 81-90
then we look for ways it dynamically identify these groups, since it is not feasible to identify these rows manually.
field names is the two rows above the row with a bunch of dashes, we can identify that record with an expression like:
StartsWith([Field_1],"---------------------------------------------------------")
header data is then the record before the field names
total data can be identified starting at the record that matches:
StartsWith([Field_1],"Company Total:")
and then field data is the remaining records

 

next, we can identify columns by looking for 2 or more consecutive spaces.
For the field names we can also keep track of the character position so the two records can be concatenated.
For the field data, we can identify the type of data record based on how many values there are, and then shift the data to the correct column. There is more sophistication that can be done if you have a hierarchy of data contents.

 

There could be other things that need to happen to this data like cleaning or aggregation, and in this tall format, you could apply this additional logic, or reshape any way you want.
I am not sure what you want to do with the total data, so I just unioned it in with the field data, crosstabbed it all, and renamed the fields.

 

The three browse tools have the data pulled out of the file. Some of the header data could use some more parsing, but this should get you most of the way there.

 

parse text.png

hydrogurl01
8 - Asteroid

Thanks so much, Joe! This worked perfectly in cleaning up the entire file. I wasn't familiar with using the RegEx in doing this at all, so that was a great thing to learn on top of this! Appreciate you putting this together so quickly! =)

Labels