Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Join Without Duplicates

jackyy
5 - Atom

Hello,

I am very new to Alteryx and am struggling with this task. I need to join two files without duplicating the data for each organization. I can't share the full files for privacy reasons, but any help is appreciated. I really just need all of the columns as they are in the second image, with a new column added that includes the CONVERTED (TB) values for each org. So, ideally, this client should still have 3 rows for each of its orgs. How can I do this? 

 

This is one of the files I want to join (L). One account can have multiple orgs (subsidiary if you will), but each data is needed.

Data 1 - sub organizations.png

Here is my second set from a different report. I need to add each CONVERTED (TB) value from the first set to the corresponding org/column of the second.

Data 2.png

 

This is what I get when I join by the "SalesForce Name" fields. There are too many duplicates, I just need the 3 rows for each of the orgs with their CONVERTED (TB) value.

Data 3.png

I appreciate any and all feedback!

5 REPLIES 5
jirikrecek
8 - Asteroid

Jacky,

Join tool will by default add records. Similar to SQL outer join may do.

There is a super easy fix to this!
After your Join tool just add a Unique tool and be sure to set it to show only unique records for SalesForceName, Org and Grafana columns.

  • U output anchor of the Unique tool will show you only the unique records - will pick the first record from the set of duplicates
  • D output anchor of the Unique tool will shove all the remaining duplicates (this is the anchor you will ignore.

Any time I use a Join tool I always put a Unique tool right behind it just as a precaution and then verify how many records flowed into the join tool, how many flowed out of it and ensure my U anchor of the Unique tool matches the records coming into the join (usually my Left input)

 

jirikrecek
8 - Asteroid

One more thing:
After closer inspection of your data I need to ask what is the key your are joining by? It appears you may be joining by all three columns together, but pay attention to your Grafana column - the spellings differ!!!

So to amend my suggestion:

  • First (and this is critical) decide what you will be joining by (which column, or combination of columns) is your primary key?
    • You may need to do some data clean up in the right table - some words have spaces, some do not
    • When I looked at this, I'd use the three columns as a single key to find a unique match (SFName, Org and Grafana)
  • Second, once the join works, then add the Unique tool to remove duplicates - the columns there should be the same as those you used in the primary key of the join
jirikrecek
8 - Asteroid

Here is a live example of your problem:

  • Starts with 7 records in left table
  • notice how the join's J anchor shows 18 records coming out, after bringing data from right table
  • Notice how the U anchor of Unique tool brings it back to 7 original records
  • This is why it is a good idea to always use a Unique tool after a join - you never know if the right table will start showing duplicate values for your primary key in the future and while you may not see duplicates now coming out of the join, they may start appearing in the future, without you even touching the workflow. The Unique tool is more of a tollgate to prevent any duplicating of records after a join. For peace of mind.

After Unique tool

Screenshot 2025-02-24 155059.png

Before unique tool and after join

Screenshot 2025-02-24 155045.png

Flow itself

Screenshot 2025-02-24 155034.png

jackyy
5 - Atom

Hi @jirikrecek, thank you for the detailed suggestions, I appreciate it. I tried to set the unique records for SalesForce Name, Org, and Grafana but the results did not include the values for the rest of my numerical columns (which are needed), how can I avoid this?

 

Data 4.png

jirikrecek
8 - Asteroid

@jackyy Jacky, this is what I was mentioning. Look at your column values in the right table - your left table says "ajcdawgnation" in the left table and your right table has "ajc dawgnation".
So, the left table is pulling three rows, but right table has no matching record and backfills it with zeroes.
So you have 2 issues going on at the same time

  • you are seeing duplicates, because right table has more than one matching record
  • you are seeing no data, because your data in right and left tables is dirty and the values do not 100% match

in my previous post I suggested you look at your data BEFORE You do any joining and be sure to clean it up - you can do this in Excel and just look at your Org names. Looks like your Grafana values always match the Org value, so if you do a join you can purely use SFName + Org column combination as your primary key, but.....
you have to clean up the data first, because the extra spaces in your right table is compromising the join = not finding matching records

 

If I were doing this clean up, I'd put the right and left table into Excel, merge the SFName and Org columns into a single string and do VLOOKUPs both ways, from left to right and right to left, looking for any #N/A values in the vlooup where it should have found a match, but didn't. You can do the same in Alteryx, but in Excel it is much much faster to spot the data integrity issues.
Your data is dirty, and that is based on my seeing only a dozen records, so I am assuming there is a pervasive problem with the Org column

Labels
Top Solution Authors