Hi All - I'm hoping someone can help me solve my issue. I want to identify if only 6 columns in my massive data set have all the same values in. I have attached a before and after so can hopefully understand what I want to return. After I identify which lines are duplicate, I then need to get rid of the line or multiple lines that are duplicates from my data but obviously keeping the lines that are not duplicate. What tool can I use to get rid of these as if I just filtered by 'yes' would still bring back the duplicate lines. Any help would be much appreciated :)
解決済! 解決策の投稿を見る。
You can use the Multi Row tool to check. Configure it to create a new field and use an IF statement. In this case, you can use IF Column = Column + 1 AND Column_2 = Column_2 + 1 AND … THEN “Duplicate” ELSE “Not duplicate” ENDIF. The AND … indicates the columns you want to put as the check.
Make sure to group by a unique code that ties the rows together.
Alternatively, if it’s just two rows each group, you could pivot it and then transpose based on a Mod count (1,2) of the groups. Then you can either use a select tool to split them and join them again OR you could use a formula tool to directly compare + a summarise tool to group and concat the status (Duplicate/Not duplicate). So if Not duplicate exists, then that row is not a duplicate, but if only duplicate exists, then you have your status.
In addition, may be different to your request, you can also use the cross tab and transpose tools to see how many duplicates are there in the columns - either a full duplication or just partial. Not sure if that’s useful but something worth considering and trying. I hope my comments help! Currently on my phone so I cannot build something for you.
Thanks for taking the time to reply, as I'm really new to Alteryx, I haven't used all the tools you are talking about and wonder if you wouldn't mind putting in a work flow so I can see and use the formula? Thank you in advance!
Just seen your comment about being on your phone! I will try my best to build but think I will get a bit stuck as not used some of the tools before sadly. Thank you again
I won’t be back for some time as I’m going home now, but if you can check up the multi row tool, then you can try the first method I’ve mentioned. Otherwise if you don’t mind, I can build something tomorrow if that helps.
Someone else could also build on my idea to help you!
@binsell
one way of doing this
find workflow attached
mark done if solved.
@binsell one way of doing this with the batch macro
Thank you - wasn't actually as hard as I thought! Thank you again