on 09-28-2015 03:31 PM - edited on 09-19-2023 11:40 AM by mwong
You have a delimited file that you need to read into Alteryx but there are an unknown number of data fields. What do you do? (Aside from opening the file and counting out the delimiters by hand)
In this example all the data is coming through in one column of data and it needs to be parsed out to individual columns.
Pretending we didn’t know there are 4 fields we can follow simple workflow to dynamically parse the data into the appropriate number of fields.
Start by adding a Record ID tool to the workflow, then add a Text to Columns tool. Rather than configuring the tool to parse to columns, we’re going split to rows using the delimiter, in this case the ‘|’.
This will take all the data from the one row and put it in individual rows, notice that the RecordID groups everything from the same row together, which will come in handy a little later.
Add Formula tool and create a field call count and set it equal to 1. Now a Running Total tool and configure it to group by RecordID and Create Running Total on the Count Field. This will essentially give us the number of columns in our data.
Two more tools and we are good to go.
Now add a Cross Tab tool, this is where the ‘magic’ happens. Configure to the tool to group by RecordID, set the Head Field to RunTot_Count, Data Field should be set to F1 and then select First under Methodologies.
The Final step is to add a Dynamic Rename tool and configure it to ‘Take Field Names from First Row of Data’. Run the workflow and you’re done!
Happy Parsing! (this example workflow - created in v. 10.0 - is attached)
The "run total" trick! Heavily used in Content Engineering. Great post!!
Great post, I have to deliver a job really similar on this but the content in my column "F1" has many delimiters, some time there is "," or "//" or "--" and I am lost on this, can you guys please help me on this?
Thanks.
Hi @Edu_XGomes
One way to to deal with multiple delimeters would be to use a Formula Tool to replace all the other delimeters with one delimeter so we only need to deal with one particular delimeter character.
In the Formula tool use the below function to update your 'F1' field.
Replace([F1], '//', '|')
In the example above wherever // is found it will be replaced with a | You would just need to set up one formula for each of the delimeters you would need to replace.
Here is example Input data.
Insert a Formula tool after the Input.
Configure the Formula tool to account for all the varried delimeters.
How the data looks coming out of the Formula tool. Everything has the same delimeter.
The bottom workflow is how the updated workflow would look.
Hope this Helps.
This is amazing... alteryx has really good features..
Thanks peter for writing this nice article..
Fantastic! Learned about Running Total Tool
Create Running Total doesn't see Count field from previous step. How do you setup Record ID part? Should starting value be defined or remain as 1?
Hi @mszpot89,
When configuring your Formula tool to create the count field, be sure to set the Data type to a number type like a Double. When creating a new formula the default data type is a V_WString which will not be recognized by the Running Total tool.
Be sure to download the example workflow to see how it is configured.
Thanks for sharing. I've been struggling with same kind of source. Now figured it out.
Beautiful explanation @PeterS. Learning a lot from this post.
Alteryx:
This article has great information but it needs quite a few edits.
A few examples:
there are an unknown ( ?)
we’re going split to rows using the delimiter, in this case the ‘|’. (The comma after the word delimiter creates confusion, although it is a pause, it is also the default delimiter as well as the commas surrounding the pipe) we’re going to split.
Pretending we didn’t know there are 4 fields (there is actually only one field that needs to split into four)
Add Formula tool and create a field call count and set it equal to 1 (Add a Formula tool/create a field called Count)
Configure to the tool to group by RecordID, set the Head Field (Remove the word “to” and change Head to Header)
select First under Methodologies (Under Methodologies select First).
I would also remove the commas from the word magic.
Thank you so much! This was EXTREMELY helpful and the post was well written, detailed, and easy to follow.