This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I receive fixed width text files that I currently parse out one by one. I want to create a macro that auto-parses them. The issue is that the number of columns and the width of individual columns changes from report to report. The one I'm currently looking at comes like so:
The key to knowing the column widths is the second row with the dashes. Two blank spaces separate each field. Is there a way to take the format of a single row and parse the rest of the data accordingly? I've poked at it with Regex, but no dice yet.
The only idea I've had thus far (and it ain't a good one) is to parse out every character onto a separate row, use a Multi-Row to figure out where the breaks are, and Concatenate them back together. But, the size of the file is quickly going to become an issue. 200 characters across 200 records would give 40,000 rows --- this gets out of hand very quickly, not exactly a viable solution.
Assume no absolutes with the column naming conventions. Could have letters, numbers, punctuation, etc. I haven't built iterative marcros and while I want to learn, I'm not convinced that I want to learn today. Looking to do this in a single pass if possible.
This is a gem of a problem! I'm onto something here, but I want to know if I can assume that the first row containing the field names (as well as the whole document) will always have two white space characters at a minimum between columns, and at a maximum one white space in between words?
I'm working on something that creates a regex parse statement based on the number of dashes between white space characters, then injects that formula into a parse tool contained in a macro (since you can only update a tool based on workflow data if you're in the layer above it).
Give me a minute to run with things and i'll see if i can't get you a small workflow.
Ok, there's a bunch wrapped up in this macro. Two key problems: 1) building a regex that knows how many characters need to be in each column, and 2) building the XML code that specifies the right number of columns so that the inner regex parser creates the right number. See if you can work through it with the comments I added. I went through and included some extra tools to trim whitespace from the fields.
I've tested this on a variable number of columns using the dummy text and it seems to work if I remove a column. Let me know what happens if you add to it.
The key to inputting your text file is to set it as delimited using the "\0" delimiter, which means no delimiter at all, and disabling field names. Then be sure to set the max field width wide enough to hold the entire row (i used 10000).
EDIT: @MarqueeCrew hosting a WebEx and beat me to the punch. You make it hard to earn stars :)
I encrypted the macro because I'm sending the post To @JoeM as a potential weekly challenge. Hopefully @danrh can verify that it works for his data varieties. I can YouTube the macro. It was fun to create.
Essentially i use the second row to calculate the maximum field lengths. Then I dynamically substring each record into fields and finally rename and auto field the types.
Substring was my friend. Feel free to Pm me and I can send you the macro or review it together. Maybe tomorrow I can create a video.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.