Alteryx Designer Desktop Discussions

Esmeralda_In_Paris · ‎12-10-2021

Hi Everyone,

I am dealing with a huge file (more that 300000 rows and more than 200 fields).

I need to filter the rows that have the same customer, same division and same year.

here is a sample file with the 4 fields that need to be filtered.

Could anyone help me with it?

thank you

Esme

mathieuf · ‎12-10-2021

Hi @Esmeralda_In_Paris,

I'm not 100% of your need. Is that what you are looking for:

_NB · ‎12-10-2021

Use the Unique tool to find unique values from the Customer, Year, and Division fields. Use the Duplicate output to rejoin on the Customer, Year, and Division fields and deselecting the duplicate fields from the join output. There were no duplications in your provided dataset, so I had to create one.

Esmeralda_In_Paris · ‎12-10-2021

bonsoir Mathieu

merci pour votre aide

good thing I learnt: I did not know I could use a file in Left and right join with the filter tool ! Good to know

but unfortunately, the formula [recordID]!=[right_recordID] brings no common ID.

_NB · ‎12-10-2021

There was one more thing I forgot to mention that I just thought of. Add a GROUP BY on the Customer, Year, and Division fields between the Unique and Join tools before rejoining with the original data. This will create a list of unique duplicates present in the data in case there are more than two instances of this set of data.

Esmeralda_In_Paris · ‎12-11-2021

Hi NB,

thank you for this additional comment. I do not understand what this Group By brings in. Could you please explain?

mathieuf · ‎12-12-2021

Bonjour @Esmeralda_In_Paris 🙂

Can you provide a simple sample to tag the lines you want to output?

That would help.

Thanks

_NB · ‎12-14-2021

If you have more than two instances of this set of data (more than one duplication) in your dataset, the additional Group By will give you a list of the unique duplicates for the Customer, Year, Division combo. Using this to rejoin will provide a list of unique record IDs that are duplicate. If you rejoin without a Group By, the joined result will use ALL the duplicate results to join on the original dataset making the results (N*(N+1)) large. You can still use another Group By on all fields after the fact, but it will make your workflow much faster if you slim down your data before the join. See attached.

Esmeralda_In_Paris · ‎12-16-2021

Hi @_NB

thank you so much to explain that clearly the Group By action!

Esme

Alteryx Designer Desktop Discussions

Filter records given a certain context