Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Filter records given a certain context

Esmeralda_In_Paris
8 - Asteroid

Hi Everyone, 

 

I am dealing with a huge file (more that 300000 rows and more than 200 fields). 

I need to filter the rows that have the same customer, same division and same year.

 

here is a sample file with the 4 fields that need to be filtered. 

Could anyone help me with it? 

thank you 

Esme

8 REPLIES 8
mathieuf
Alteryx
Alteryx

Hi @Esmeralda_In_Paris,

 

I'm not 100% of your need. Is that what you are looking for:

 

mathieuf_0-1639149556820.png

 

_NB
6 - Meteoroid

Use the Unique tool to find unique values from the Customer, Year, and Division fields. Use the Duplicate output to rejoin on the Customer, Year, and Division fields and deselecting the duplicate fields from the join output. There were no duplications in your provided dataset, so I had to create one.

Duplications.jpg

Esmeralda_In_Paris
8 - Asteroid

bonsoir Mathieu

merci pour votre aide

 

good thing I learnt: I did not know I could use a file in Left and right join with the filter tool !  Good to know

 

but unfortunately, the formula [recordID]!=[right_recordID] brings no common ID. 

 

 

 

_NB
6 - Meteoroid

There was one more thing I forgot to mention that I just thought of. Add a GROUP BY on the Customer, Year, and Division fields between the Unique and Join tools before rejoining with the original data. This will create a list of unique duplicates present in the data in case there are more than two instances of this set of data.

Esmeralda_In_Paris
8 - Asteroid

Hi NB, 

thank you for this additional comment. I do not understand what this Group By brings in. Could you please explain? 

mathieuf
Alteryx
Alteryx

Bonjour @Esmeralda_In_Paris 🙂

 

Can you provide a simple sample to tag the lines you want to output?

That would help.

 

Thanks

_NB
6 - Meteoroid

If you have more than two instances of this set of data (more than one duplication) in your dataset, the additional Group By will give you a list of the unique duplicates for the Customer, Year, Division combo. Using this to rejoin will provide a list of unique record IDs that are duplicate. If you rejoin without a Group By, the joined result will use ALL the duplicate results to join on the original dataset making the results (N*(N+1)) large. You can still use another Group By on all fields after the fact, but it will make your workflow much faster if you slim down your data before the join. See attached.

Duplications 2.jpg

 

Esmeralda_In_Paris
8 - Asteroid

Hi @_NB 

thank you so much to explain that clearly the Group By action! 

Esme

 

Labels