Let's say I have this for an example:
Item description Invoice #
5 lbs Rice 101
5 lbs Rice 101
5 lbs Rice 101
5 lbs Rice 201
5 lbs Rice 201
5 lbs Rice 201
In our data, the first 3 rows that say "5 lbs Rice" from Invoice 101 are not necessarily duplicates because they could be white rice, brown rice or yellow rice. But the next time an invoice comes in (#201), that likely IS a duplicate and we want that invoice marked as a dup for further review. I'm still relatively new to Alteryx and I've tried grouping, uniquing, summarizing and I haven't come up with the proper combination yet to figure this out. As I'm currently doing it, Alteryx is marking lines 2, 3, 5, and 6 as duplicates which isn't the case.
Any ideas? Thanks!
Solved! Go to Solution.
My first thought is to Summarize using Group by Invoice # and concatenate item description
101 -> 5 lbs Rice, 5 lbs Rice, 5 lbs Rice
201 -> 5 lbs Rice, 5 lbs Rice, 5 lbs Rice
now you can unique on the Concatenated Field and lookup the duplicates.
This would find invoices with the exact same content (order too). Of course it would also find any order of a single item many times. I guess seeing some real data would help.
Cheers,
Mark
Thank you. I will try that and let you know... or post up what new challenge I have found. :-)
Appreciate the quick response! 
Rohit, I gave that example to try to simplify the kinds of duplicates I'm seeing in my data, so your question is valid, but my situation is actually medical billing data which sometimes gets billed again at a later date. Marquee Crew's answer actually did work for me, so I'm all good now. Thanks for the response!
Thank you MarqueeCrew. That worked. The only additional thing I had to do was sort ascending by the invoice number to make sure 101 was okay, but 102, 103 etc would we flagged as a possible dup.
Good point! Thanks!
Actually, I was wrong. It is still treating my invoice 101 as if the 2nd and 3rd Rice items are dups when I actually only want invoice 201 to be dups. I think this might need a multi-row formula?
Plus, if I start out with 6 rows, I need to have 6 in my output as well.
 
					
				
				
			
		

