I have a dataset similar to the following:
ID | Region | Name | Order Count |
1234 | West | Bob Jordan | 12 |
1234 | West | Bob M. Jordan | 9 |
1234 | North | Bobby Jordan | 5 |
1526 | North | Jim Smith | 22 |
1526 | North | James Smith | 15 |
1055 | West | John Doe | 28 |
As you can see, we have the same ID for someone, but slightly different iterations of their name.
What I am trying to do is find the 2 unique Regions with the highest order counts by ID. So, for ID 1234, I should get a result of West region and North region with their corresponding order counts of 12 and 5 respectively.
Currently, I am sorting on ID and then Order Count and then using the Sample tool to give me the first 2 rows by ID, but since there are different spellings/iterations of the names, it is giving me West region twice.
Essentially, I would like to use something like the Unique tool, but tell it to give me the first 2 unique instances of Region by ID. But, I can't figure out how to do that.
Solved! Go to Solution.
Hey @arnettmh, it sounds like you're on the right track with the sorting/sampling method. For your unique configuration, you could just tick the following 2 fields in order to give you unique ID-Region pairs:
@arnettmh another way of doing this
1. add summarise tool, group by ID, region, take max order count
2. Add join tool, join with ID, Region, order count with max order count
Thanks! That worked. I didn't realize you could select multiple fields in the Unique Tool.
A secondary ask if I may. Now that I have the top 2 regions for each ID, I want to label each row with an extra field/column. So, for the top one, it would be labeled as Primary and for the second most sales, it would have a label of Secondary. What is the best way to go about that?
Example:
ID | Region | Name | Order Count | Label |
1234 | West | Bob Jordan | 12 | Primary |
1234 | North | Bobby Jordan | 5 | Secondary |
1055 | West | John Doe | 28 | Primary |
Hey @arnettmh, glad to hear that worked! For your latest ask, I'd go about it like this:
We group by ID and because there'll only be 2 records in each ID, we can tell Alteryx to look back one row - for the first entry, there won't be a previous row, hence we set 'Values for Rows that don't Exist' to NULL and then assign 'Primary' based on the previous row being null. If there is a previous row in that ID, that means we assign 'Secondary'. Hope this helps!
DataNath, thank you so much. That worked beautifully.