I have a dataset in Alteryx that has 81 fields. In most of the fields, there are concatenated values separated by a pipe. How do I clean the data in each field whereby I only leave in distinct values within each field and remove the duplicate values?
I attached a dummy example for one customer. Do I need a Macro or Regex expression to remove duplicate values in a field? The official customer list I have is 60,000 customers so the type of data in the fields will vary. I just need to know what the most efficient process would be to remove duplicates and the Pipes from within each field.
Any help you can provide would be TREMENDOUSLY appreciated!
Thank you.
Oh, this is a fun problem to solve! I think you're going to get a variety of ideas from the community.
I'll kick us off and recommend a macro that parses, transposes, groups up (distinct), and then if you need it back in one cell, crosstabs. Loop through each field doing the same.
@jrlindem I don't have great experience with creating macro's? Is there a basic one you can show me with my dataset?
Here is a macro you can use as a starting point. It's an iterative macro that can process any amount of fields you have to parse. So if you increase the count of fields you use to more than 81 you won't have to make edits to the macro.
Before sending the data to the macro you need to...
1. Transpose it.
2. Give each transposed field it's own RecordID column.
In the macro itself, you'll do the following.
1. Use a filter tool to return only one RecordID/field for processing.
2. Use a Text to Columns tool to split the values into rows, using a "|" as your delimiter.
3. Use a unique tool to return unique values from the Value column.
4. Use a Summarize tool to combine unique values for the RecordID and Field grouping. I concatenated the values with a comma.
In the Interface Designer window pane, you'll have to make sure the Iteration Input is the Macro Input that takes in the data from your workflow. The Iteration Output will be the data that doesn't get processed for the current iteration. You'll have to determine what you want to do when your max iteration is hit. In this case it is set to 100.
Here is the macro output.
@RCern -
Here's a Batch Macro that gets you there:
The workflow itself flips everything back to horizontal for you too:
Both workflow files are attached. Hope this, and/or @Hsandness's iterative macro version helps!
Cheers, -Jay
THank you both! I am going to try both approaches!!!
@Hsandness when i use the iterative macro on my workflow, I get an error that says "Iteration limit of 100 was reached". Is there a way to increase the limit?
@Hsandness - You can increase the number of iterations in the Interface Designer, under "Maximum Number of Iterations" shown here:
Also, regarding @binu_acs non-macro solution, which is great in of itself; there is one change I would recommend. Move the Record ID tool in front of the first Transpose tool so that the workflow can handle multiple rows of data. In your example, assuming only one row, the way binu_acs workflow is configured works fine, but if you have multiple rows, you need the Record ID to be ahead of the other Transform'ing tools. I also threw a crosstab tool at the end to put the records back into their respective rows:
So, between the Batch-Macro, Iterative-Macro, and Non-Macro; I think you should be set to pick the one you like best to move forward with your work!
Hope all of this helps. If you get a chance, mark all three as solutions as they all directly address your request. Cheers, -Jay
@Hsandness I have the macro working... partly. The problem I am seeing is that I have a Customer with 33 rows. When I run this customer "within" the macro, it produces 26 rows (Screenshot 1 below). However, when I add the macro to my workflow, it produces 25 rows (screenshot 2). Why would one of the rows drop? Are there any settings I need to adjust?
Also, all 33 fields are going in but not all 33 are coming out in both scenarios above. Any idea why this would be?
