Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Community is experiencing an influx of spam. As we work toward a solution, please use the 'Notify Moderator' option on the ellipsis menu to flag inappropriate posts.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Row replicating many times after a join??

ewall
7 - Meteor

I'm currently doing a join based on a customer ID. There is 1 row of Customer #1 on the left input, and 1 row of Customer #1 on the right input. I am using customer ID to join. However, somehow on the JOIN output, it creates 8 records for Customer #1, and only 1 of those rows is accurate (the rest are filled with zeroes).

 

Why/how is this happening? And how do I fix??

11 REPLIES 11
atcodedog05
22 - Nova
22 - Nova

Hi @ewall 

 

Can you provide sample data or snapshot of whats happening?

Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @ewall ,

 

Can you share your workflow or your input files ?

apathetichell
19 - Altair

is it possible that there are actually 8 instances of customer 1 in one of your joins - perhaps with nulls() or something in the rest of the field? It would seem quite odd otherwise. Can you maybe do a count of that value prior to the join? The most logical explanation is that one of your data sources as 8 entries, or one has 4 and the other has 2.

ewall
7 - Meteor

@atcodedog05 @Jean-Balteryx I can't share the data...but setup is as follows:

 

Summarize tool gives me Customer ID, Customer Name, Category, and 3 amount buckets. I then do a cross tab to create columns for the 2 possible categories within each $ bucket. I then do a multiple join to bring them back together. So, at this point I now have 1 record for Customer #1, with 6 columns with $ amounts (let's call them Customer Amount A, Customer Amount B, Customer Billing A, Customer Billing B, Customer Quoted A, Customer Quoted B) for the LEFT input.

 

The RIGHT input is Customer ID, Customer Name, and one amount bucket (not split out by category). There is ONE row for Customer #1 coming into the join.

 

I then join by Customer ID, and it spits out 8 rows.

ewall
7 - Meteor

@apathetichell I've already confirmed there is only 1 row for the customer ID on each side of the join.

Jean-Balteryx
16 - Nebula
16 - Nebula

Have you tried adding a unique tools after each join input to test if it produce a different result ?

apathetichell
19 - Altair

something is going on with  your multiple join and you are creating additional records. Sorry I know you think I am wrong and that this isn't the case but too many records coming out of join are a symptom of multiple records from one of the streams with whatever you are joining on. Without seeing your data I can't pinpoint it, but I'd suggest dropping the multiple join and using a common join and monitoring what is coming in in the output...

danilang
19 - Altair
19 - Altair

Hi @ewall 

 

What you're describing here is very unusual.  In standard cases Join cases, when you have only one record on either side you only get one one matched record coming out of the join.  Since this is very unexpected behavour, the only way that we will be able to debug it is to be able to reproduce the error on our side.  I understand that you may have confidentiality restrictions around posting your original data, but can you post a couple of dummy data table that produces the same result?

 

Dan  

ewall
7 - Meteor

@apathetichell good idea. I'll try using multiple common joins to see if I can isolate anything. 

 

In the meantime, leveraging an indexed column and the unique tool, I seem to have band-aided the problem for my immediate need but will work on this suggestion!

Labels