The Alteryx Community is a finalist in three 2026 CMX Awards! Help us win Customer Support Community, Most Engaged Community, and User Group Program of the Year - vote now! (it only takes about 2 minutes) before January 9.
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests Early 2026. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Alteryx Designer Cloud Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Cloud.
SOLVED

Recipe for deleting duplicate rows based on 1 column (PrimaryKey)

TrifactaUsers
10 - Fireball

We have a dataset with a PrimaryKey, called AddressUuid, see http://prntscr.com/kdcxzo. A job runs every night with new statusses of the same AddressUuid. I want to dedouplicate based on AddressUuid and a date field. I found some documentation but was not able to https://cloud.google.com/dataprep/docs/html/Deduplicate-Data_57344590. Can you give me the correct recipe? 

 

Row 1

AddressUuid xxx

Date 8-1-2018

 

Row 2

AddressUuid xxx

Date 7-31-2018

 

I only want to keep row 1, row 2 should be deleted..

2 REPLIES 2
Trifacta_Alumni
Alteryx Alumni (Retired)

Hi John,

 

it would take two steps to do this. The first is to create a ranking column, the second is to delete values where ranking column not equal to 1. This basically indicates your primary addressUuid (based on most recent date).

 

1) create a new column indicating the rank of each row for each AddressUuid. We will use the rownumber() function for this.

derive value: ROWNUMBER() group: AddressUuid order: -Date

the -Date ensures the column is descending, so more recent date first.

 

2) filter out the duplicates (values greater than 1)

filter type: greaterThan col: Accepts_CreditCards greaterThan: 1 action: Delete

 

Hope this helps!

Trifacta_Alumni
Alteryx Alumni (Retired)

Updated with the following how-to article based on this question: https://community.trifacta.com/s/article/Remove-duplicate-rows-based-on-a-primary-key