Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Issues with a Join

Rosaile
7 - Meteor

Good Morning!

 

I'm having a brain freeze trying to join some data.  Essentially, I have a concatenate that I was using as a unique ID, but I found a handful of instances where my data in the concatenates can change from year to year.  So, my join created additional rows for those instances, but I need them on a single row.  My example is below, and I was wondering if anyone could think of a way I could join this outlier data.  I thought about trying to sort and then have alteryx create a unique id and join on that, but I'm not sure what the best approaches would be.  Could I use Fuzzy Match?

 

Data Stream 1 (Left Join) 
    
YearClient #NameConcatenate
20181John Doe1John Doe
    
Data Stream 2 (Right Join) 
    
Year

Client # (Number has changed for X Reason, but the data is related to left join)

NameConcatenate
20192John Doe2John Doe

 

 

8 REPLIES 8
danilang
19 - Altair
19 - Altair

hi @Rosaile 

 

I'm not quite sure what the output of your join would be.  Are you saying that the 2 records for John Doe should be joined into one row?  I assume that joining on the just the name gives you too many matches.  

 

How do you know that "1John Doe" and "2John Doe" are actually the same record?  If you can quantify what additional information implies that result, you should be able add it to the join.  If it's something like address, then maybe the Fuzzy Match tool can help you.

 

Dan 

Rosaile
7 - Meteor

Sorry, maybe this is a better example.  Thinking of a single business paying property taxes to a specific county each year.  That business may have its account number changed by the taxing jurisdiction from one year to another, and likewise, the business may change its own unique identifier from one year to another or the business may move addresses etc. 

 

Either way, I have to keep track of the current year tax paid and the prior year tax paid, but it has to be reported on a single line both PY and CY.  For most of my report, I can just create a unique ID through concats to satisfy this requirement, but instances where one of my fields used to create the concat change, I end up with more than one line.

 

Data Stream 1 (Left Join)      
        
YearBusiness NameBusiness IDTax ID    
2018John's Business123456    
        
Data Stream 2 (Right Join)      
        
YearBusiness NameBusiness IDTax ID (Changed in New Year)    
2019John's Business123678    
        
        
What I want to see on my report     
        
YearBusiness NameBusiness IDTax ID (Changed in New Year)Input #2 YearInput #2 Business NameInput #2 Business IDInput #2 Tax ID (Changed in New Year)
2018John's Business1234562019John's Business123678
Rosaile
7 - Meteor

As an addendum, both the Business ID or the Tax ID can change, so their really isn't a good unique ID to use.  That's why I was thinking maybe Fuzzy Match or something might be a good tool to use for the outliers.  Unfortunately, I'm not sure how to do that with the Fuzzy Match.

Rosaile
7 - Meteor

Thanks for the assistance @danilang in getting me to think a little more on this. 

 

I decided to try and "waterfall" the joins, if i'm using that term correctly, and join first by one of the fields, and then by next one...............until i pick up all the outliers.

Thableaus
17 - Castor
17 - Castor

Hi @Rosaile 

 

Is the Full Name a good thing to join?

I was thinking about an Append Fields Tool, but if there are many records on both data streams, this wouldn't work well.

 

But appending is a good way to pair data and use other tools to compare columns with each other.

 

Cheers,

Rosaile
7 - Meteor

Thank you for the help @Thableaus .  I do have a lot of records, but when I get down the remaining items I'm having to join I will definitely look into the append tool, as that's one I have not tried out yet.

ThizViz
11 - Bolide

I think you're on the right track with the waterfall joins.

 

One approach we've taken before is to create a crosswalk table. Typically in cases like this, there is some individual who knows for sure what the old an new numbers are - they would probably have to do that in accounting, for instance.

 

So we would make a table, very much like your output example, putting the old and new numbers in the same row, but you only have to do it for the changed items.

If you know for sure what's changed, you can match it up at the beginning instead of the end using a text input and a find/replace.

 

Then when your data proceeds through the rest of your workflow, you can join on the old numbers instead of trying to match old and new.

 

Another benefit of this method is that any additional companies you find at the end that "fall out" of the join can be added to the text input and gradually your unmatched records are reduced as you run the workflow, add your exceptions to the text input, then run again.

 

 

@thizviz aka cbridges, Bolide
http://community.alteryx.com/t5/user/viewprofilepage/user-id/2328
Rosaile
7 - Meteor

Thank you for your feedback @ThizViz .  I'm going to utilize your suggestion and add the "old" numbers in the same row with the new at the beginning of the workflow, so they can be matched up later on.

Labels