Hello Team,
I want to import set of file from a folder, where every week new file is added . the format of the file is as attached .
.tx batch file has junk or log info above, then multiple data blocks that all start with
|SGrp||Submission| (i.e., the header line). When I am trying to import the file and its coming in one column , but the header are splitting into two columns . here row 13 and 14 both are headers. Also the following rows the data inputs are spliting into two as well.
I want to import those set of files where data points are aligned under each header and no logs or specials char. I have just stared using alteryx so have not faced this kind of file before, any help will be apppreciated.
Solved! Go to Solution.
I would start with one file and read the whole file in as a csv with \0 as the delimiter. This should put the whole file into a single cell. Parse to rows using text to columns on a new line delimiter \n. Filter out the garbage rows. Then parse to columns using the pipe delimiter. Once you have that file parsed out, then it can be turned into a batch macro.
https://knowledge.alteryx.com/index/s/article/Getting-Started-with-Batch-Macros-1583461640393
https://help.alteryx.com/current/en/designer/apps-and-macros/macros/batch-macro.html
That was a fun challenge! Here is my take on it but its not dynamic so if the structure of the file (and any assumptions made on filtering etc) change it will break:
First import it in as a CSV with no delimiter, starting on row 13 and extend the Field Length so nothing gets missed:
You then want to filter out any rows that are not required, based on identified patterns:
Then you generate a row number, if it starts with a pipe then its 1, else its 2. Filter on row 1 and join these two streams back, but joining on record position:
Using Formula tool combine Field_1 and Right_Field_1, then using Text to Columns tool split on | into 66 columns (this is where it will break if the number of columns differ with each file).
Take the header from the first row of data using Dynamic Rename tool.
Remove any columns not required, undertake data cleansing to remove all the duplicate whitespace etc, and finally filter out any rows that start with SGrp to remove the extra headers.
I'd double check the output to make sure it aligns!
Hello davidskaife,
Thank you for the solution. Though it is not dynamic but solves 90% of the problem. I can work on making it dynamic, but your solutions helped with main problem.
You're very welcome. If you need help making it dynamic please do share the details and the Community can rise to the challenge!
As my workflow helped you would you mind accepting it as the solution so others may find and use it?
This is an accepted answer. Great help. I have a folder with muliple files as such , every week the folder is added new file. Here while importing I want to make the dynamic so that i do not need to mention the "Start data import on Line" as it might not be same for every file.
Can you share the input file again?
Thanks! I've updated the workflow - start importing on line 1, and have added additional filter clauses to the Filter tool to remove any of the preceding rows that are not required
If other files have different data in the initial rows then you would need to add them in as a filter condition, so there is still an element of manual updating (if its needed).
If you want it truly dynamic i've also included another way, utilising macro's (second stream coming from the Input tool in the screenshot). What this essentially does is in the first macro it identifies the row number that the first instance of |SGrp appears, passes that to another macro which updates the 'Start Data Insert on Row' section, and then carries on as normal.
Inside both macros (to open right click on them and select Open Macro) you will need to update the Macro Input and Input tools by pointing to your specific folder location, as they are pointing to my downloads folder. I've used a wildcard instead of the file name so it will pick up any text file in the folder you specify.
Hope this helps!
