Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Join/union duplicating records

J054305
6 - Meteoroid

Hello.

hoping someone can help.

im trying to connect a username table to a target data table,

joining by the username. The hierarchy has different variations of the username with their IDs. Example:

Mike Smith   #22309

Michael Smith #22309

 

with some having as many as 5 different variations of their name. The problem I’m having is I have the target table that has about 300,000 records and out of those only 120,000 have that username to match up to, the other 180,000 are just blank and I want the username and ID to remain blank for those 180,000. So I want to just append to those 120,000 records by matching by the username to get the user IDs. end up with 300,000 records when all done.

When I try to use the join and union function, or even add in the unique function I end up with over 2million records. 

any help is greatly appreciated!!

6 REPLIES 6
Qiu
20 - Arcturus
20 - Arcturus

@J054305 
I think you are talking about 1VS many Join, and it will produce times of orginal record.

Make sure the unique field is selected in the unique tool.

A sample data set would help us to understand the issue better.

J054305
6 - Meteoroid

I attached an excel file of sample data - with three tabs - the username tab, data tab, and the final results tab what I want to achieve. also second attachment is a snapshot of how I'm making the connection right now. but its making my file much larger.

thank you!

AbhilashR
15 - Aurora
15 - Aurora

Hi @J054305, I might be oversimplifying your ask but is the approach in my solution address your problem statement?

AbhilashR_0-1611635374767.png

I join the two datasets and then apply a Union tool to bring back the records that remained un-joined.

 

mhey01
7 - Meteor

Hey @J054305,

 

Looking at your image, it looks like you need to include the 'Unique' tool or a 'Summarise' tool after your first join (illustration below).

 

join.png

Would it be possible for you to share your workflow with us?

echuong1
Alteryx Alumni (Retired)

An increase in records means there is a many-to-many relationship, which will cause what's called a cartesian join. It will essentially output all combinations, which is why you'll get more records than you started with.

 

I recommend using fuzzy matching, since you can have name variations in both of your datasets and this may lead to a mismatch. In the example below, I started by creating a list of every username in both tables with a Union. I then used a Fuzzy Match and Make Groups to get every name in the list, and the variations of it. The Key corresponds to the variations, and the Group is what we want them to be normalized to. I then used a Find and Replace on both datasets to look for variations of the keys, and replace the value with the normalized group. To ensure I didn't have the variation issue, I used a Unique on the ID value. I then used a Join to bring over the ID values.

 

echuong1_1-1611700103876.png

 

echuong1_0-1611699922975.png

 

brandonstilley
5 - Atom

This helped me a ton! I was performing a join and union to add back in the record that fell out, and this took care of it easily for me. In some cases I had 5-7 duplicates and it was super frustrating!

Labels