This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I need to remove duplicates of company names but I need to prioritize which duplicate to keep. For example, for the below company showing up 4 times, I need to make sure the customer domain is kept, then the VAR would be the next priority and then the Active Leads. How would I do this?
One way to do this is to create a lookup table with your domains and a "rank" based on which ones are prioritized. You can then join it to your data using the Domain name and use a summary tool to group it by "Name" and take the "min" of the Domain Rank. You can then join it back to your lookup table to get the Domain associated with a given rank.
Another approach, a bit similar to @acastelazo 's, is to use a priority lookup table, join in the priority field, and then Sort on the company name (ascending or descending) and the priority (ascending if you set the highest priority to 1, descending if you give it the highest number). Then use the sample tool to pull the first record only, grouping by company name.
I like this solution, too! According to The Periodic Table of Alteryx Tools, both the summary and sort tools are both blocking tools, which means that the workflow is paused until all of the records are processed. If the data set is larger than what your computer can hold in memory, this means the engine will have to write to the disk before moving on. So, there may not be efficiency gains with either method, but I love that there is always a different way to get to where you need to go with Alteryx!