Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Complex removing duplicates

m_hohaus
6 - Meteoroid

Hello Community, 

 

here's a hard one for you: I have a dataset with two columns with account numbers. Whenever either of the account numbers is the same for two rows, I need to establish a charging rule. Accounts are charged -1 if they are inactive for each record but they are charged only once if they are active (no matter how many rows there are that fulfill this). If the records are both mixed between active and inactive, the account is charged -1 for each inactive, and just 1 for each active. 

 

Do you have any easy way to do this? Ideally, I'd have a new column "charge code" with 1 or -1 for each records, obviously excluding doubles as stated above..

 

Thank you!

7 REPLIES 7
wdavis
Alteryx
Alteryx

Hi @m_hohaus 

 

Would you be able to share some sample data of what the inputs for these would look like, then we can build a workflow to replicate your challenge.

 

In terms of the active/inactive status, is there anything in the dataset to determine this? Or is it based on the number of rows with the same account number?

 

Thanks

Will

LukeM
Moderator
Moderator

Hi @m_hohaus 

 

The first step I would suggest is to download the Crew Macro pack as there is a tool in their called Only Unique. This will filter and provide you for the Account Numbers where you have duplicates.

 

I'd then create a column that flags if they are active. Then you can Summarize and count how many are active for that account number and then build you logic off that, using the formula tool.

 

As my esteemed colleague @wdavis says, it may be easier if you post a dummy example if you are still stuggling.

 

Luke

m_hohaus
6 - Meteoroid

Thank you Will! I have shared a sample of how the data would look like. The active/inactive status is based on the column "status" - this is attributed to each e-mail in the system. Multiple e-mails can be linked to one account (which is why we want to charge only 1 to the account, and not for each e-mail address). 

 

Thank you for your help!

m_hohaus
6 - Meteoroid

Hello, am downloading this now and will let you know! Thank you for your support!

wdavis
Alteryx
Alteryx

Hi @m_hohaus 

 

I have started a workflow to try and replicate this.

 

The formula tool is used to assign an initial charge code based on their status using an IF statement

 

The Unique Tool is then used to split the data into Uniques and Duplicates based on the Acc_number 1 field and Status

- if the values are Unique they are left

- if the values are Duplicates, the charge code is changed to 0

 

Does this start to achieve what you are looking to do?

 

I did have some questions around the difference in Acc number 1 and 2, do these have to both be unique when comparing the data?

 

Thanks

Will

m_hohaus
6 - Meteoroid
Hi Will, thank you for that, that makes a lot of sense!

Yes, if either of the account number is double it should not be charged. I
tried using two unique tools to test first for one account and then the
other, the problem is that sometimes one of the account numbers is [null]
and therefore with this method all the nulls land up in the duplicates
bucket. Should I use an IF-Formula before the unique tool or is there
another way to change the settings of the unique tool?
wdavis
Alteryx
Alteryx

No problem, I have attached a workflow that hopefully solves the two issues you mentioned.

 

The Tranpose tool puts all of your account numbers into one field so that you only need to use one Unique tool. If the data needs to be back in the format of the 2 different fields, then we can use a cross-tab tool to achieve this.

 

For the Null values, I have added some into the dataset in the workflow and then used a Filter tool to remove these out, so they are not included in any unique calculation.

 

Let me know if this works for you.

 

Thanks

Will

Labels