Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Compare equal fields & other fields to filter spreadsheet

MKM
7 - Meteor

Hello! I'm new to Alteryx, so apologies if this is a repeat/dumb question!

 

I have an Excel spreadsheet that contains fields that include date, names, and statuses. The spreadsheet is pretty messy. It has multiple entries that are identical, but it also has nearly identical entries where a date field or a status could be different. What I'm trying to do is filter out those that are either identical dupes (keeping just one) AND filter out those that are slightly different and only keep the relevant one (be it on status and//or date, since both could be the driver of the difference, or together they both identify which record to keep). 

 

For example, one person could have two entries with the same name, date, but one entry could be "pending" and the other "approved." I'd like to be able to filter out "pending" since I'd want to keep approved when the information is the same. 

 

Is this type of filtering possible to do in Alteryx? I've mainly been trying in Excel, but it is super time consuming for a spreadsheet we receive and need to do this for every day, so I was hoping this could be done in Alteryx and save the expression for continual use

 

Thanks!

6 REPLIES 6
Joe_Mako
12 - Quasar

pri.png

 

One option is to use the Formula too to add a field, lets call it "Priority", with the expression:

Switch([Status],0,
"Pending",1,
"In Progress",2,
"Approved",3)

 

This will assign a number to each "Status" value (larger values are higher priority and will be kept before lower priority values), then a Sort tool with Priority sorted Descending, and a Sample tool to select the first record per Date-Name combination Group.

MKM
7 - Meteor

Thanks Joe_Mako! I'm pretty sure this is working how I wanted, though I'm not sure I'm doing the last part with the "Sample" tool right. I selected the "First N Records" when N =1, but I'm still not sure which combo of fields to pick in the "Sample" section. If I have records, based on ID number that appear multiple times and I want to pick to only keep the record with the most recent date & (thanks to this new field) the higher priority number, which fields would I select in the "Sample" section? 

 

So far, my selecting options either removes almost every row or seemingly keeps nearly every row, so I know I'm doing something wrong still. 

 

Thank you for the help!!!

Joe_Mako
12 - Quasar

Depends on what is more important to you, Date or Priority. For example, with this sample data, what should be returned?:

DateNameStatusPriority
2017-01-02BPending1
2017-01-04BIn Progress2
2017-01-03BApproved3

 

The attached has two routes depending on what is more important.

MKM
7 - Meteor

Date would be more important in the below example. We want the most recent status, but if the dates were the same, then the priority would be next important. 

 

I couldn't open either attachment, I guess my version is too old. Would you mind including a screenshot? Sorry for the inconvenience, especially considering how much you're helping.

 

Also, since you obviously know Alteryx. If i install the newer version (11.5 it seems, I have 11.0), will my existing workflow still work fine?  

Joe_Mako
12 - Quasar

You can edit the file as a text file, changing the version umber in the XML. see https://community.alteryx.com/t5/Alteryx-Knowledge-Base/Adjusting-Alteryx-Files-for-Different-Versio...

 

This is done in the attached.

MKM
7 - Meteor

Ah thanks for that link. And I think it worked!! Once I cleared out some of the blank spaces in the IDs, it removed all the dupes. There were more unique than I expected, which was unexpected but I guess that's the point of clean data :)

 

Thank you for the help Joe_Mako!!

Labels