Alteryx Designer Cloud Discussions

4a449ebf5e0b5283b640 · ‎12-08-2017

Trifacta_Alumni · ‎12-20-2017

For a simple check of duplicate values, you can highlight the data with your cursor, which will trigger all duplicates to be highlighted and you can view them easily by moving up and down the dataset.

To find out how many duplicate values there are in each row:

Follow the instruction above
Select the "Countpattern" suggestion card that matches your requirement then click "add"
The new column will show you how many matching values there are in each row

For the total number of duplicates:

Select the newly generated column from the "countpattern" transform
Select the "Aggregate" suggestion card that calculates the total number of duplicates
The result will show a final number, which represents the total of duplicates.

For more information, click on the following,

https://docs.trifacta.com/display/PE/Deduplicate+Data

4a449ebf5e0b5283b640 · ‎01-03-2018

I'm not sure if this will work for string data. So I basically have a column of customer IDs and I want to make sure that every customer ID only occurs once. I'm not sure how the countpattern transform will accomplish this for me. How do I just see if customer ID "ABC" occurs once, customer ID "DEF" occurs once, etc?

TrifactaUsers · ‎07-31-2018

I have the same question.. Did you get a respons?

Trifacta_Alumni · ‎08-10-2018

Gina answered the original question, which was how to "tell" or "see" if there is duplicate data in a column -- always remembering that the data visible in Wranger is a sample and may not represent your entire data set, depending on how large that is.

Actually enforcing uniqueness is also possible: see the Deduplicate Data page in the docs (https://docs.trifacta.com/display/SS/Deduplicate+Data). In the simplest case, whole rows may be duplicated -- see the Deduplicate Transform section. More likely, the data will contain multiple, differing rows with the same primary key (e.g., customer ID). See the Deduplicate Rows Based on a Primary Key section. Note that you will probably have to do some normalization and/or sorting of the relevant column(s) first. And of course, under this approach the row with the first instance of a given primary key value wins.

TrifactaUsers · ‎08-10-2018

Thnx.. we did it like this.. it works

TrifactaUsers · ‎08-10-2018

image

Alteryx Designer Cloud Discussions

How do I tell if I have duplicate data in a column?

Re: how to convert yyyymmddHH:MM:SS to yyyymm*...

Re: Practicing in Designer cloud

Re: best practices for workflow promotion from dev...

Alteryx Designer Cloud Discussions

How do I tell if I have duplicate data in a column?

Re: how to convert yyyy*mm*dd*HH:MM:SS to yyyy*mm*...

Re: Practicing in Designer cloud

Re: best practices for workflow promotion from dev...

Re: how to convert yyyymmddHH:MM:SS to yyyymm*...