Row replicating many times after a join??

I'm currently doing a join based on a customer ID. There is 1 row of Customer #1 on the left input, and 1 row of Customer #1 on the right input. I am using customer ID to join. However, somehow on the JOIN output, it creates 8 records for Customer #1, and only 1 of those rows is accurate (the rest are filled with zeroes).

Why/how is this happening? And how do I fix??

Help

Join

Accepted answers

All comments

atcodedog05

Hi @ewall

Can you provide sample data or snapshot of whats happening?

Jean-Balteryx

Hi @ewall ,

Can you share your workflow or your input files ?

apathetichell

is it possible that there are actually 8 instances of customer 1 in one of your joins - perhaps with nulls() or something in the rest of the field? It would seem quite odd otherwise. Can you maybe do a count of that value prior to the join? The most logical explanation is that one of your data sources as 8 entries, or one has 4 and the other has 2.

ewall

@atcodedog05 @Jean-Balteryx I can't share the data...but setup is as follows:

Summarize tool gives me Customer ID, Customer Name, Category, and 3 amount buckets. I then do a cross tab to create columns for the 2 possible categories within each $ bucket. I then do a multiple join to bring them back together. So, at this point I now have 1 record for Customer #1, with 6 columns with $ amounts (let's call them Customer Amount A, Customer Amount B, Customer Billing A, Customer Billing B, Customer Quoted A, Customer Quoted for the LEFT input.

The RIGHT input is Customer ID, Customer Name, and one amount bucket (not split out by category). There is ONE row for Customer #1 coming into the join.

I then join by Customer ID, and it spits out 8 rows.

ewall

@apathetichell I've already confirmed there is only 1 row for the customer ID on each side of the join.

Jean-Balteryx

Have you tried adding a unique tools after each join input to test if it produce a different result ?

apathetichell

something is going on with your multiple join and you are creating additional records. Sorry I know you think I am wrong and that this isn't the case but too many records coming out of join are a symptom of multiple records from one of the streams with whatever you are joining on. Without seeing your data I can't pinpoint it, but I'd suggest dropping the multiple join and using a common join and monitoring what is coming in in the output...

danilang

Hi @ewall

What you're describing here is very unusual. In standard cases Join cases, when you have only one record on either side you only get one one matched record coming out of the join. Since this is very unexpected behavour, the only way that we will be able to debug it is to be able to reproduce the error on our side. I understand that you may have confidentiality restrictions around posting your original data, but can you post a couple of dummy data table that produces the same result?

Dan

ewall

@apathetichell good idea. I'll try using multiple common joins to see if I can isolate anything.

In the meantime, leveraging an indexed column and the unique tool, I seem to have band-aided the problem for my immediate need but will work on this suggestion!

ewall

@danilang I'll see if I can reproduce it using generic data!

apathetichell

I'll reiterate that join multiple does some weird stuff - keep in mind that it always creates the largest number of matching records in a datasource... So if 150 records match on one data source and 1 on another and 1 on another etc -- you'll end up with 150 records and a lot of nulls.

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703