Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Joining or Matching non-identical fields

MattR79
8 - Asteroid

I have two input data documents will over 10,000 lines each of which is generated by a separate program and am trying to join the two by reference numbers.  However there are slight differences in the reference numbers between the two systems due to how they are input.  For example Program A might have FGRT102 and Program B might have FGRT102$ with the only difference being the $ at the end.  To add to the difficulty, some reference numbers are 9 digits while others might be only 5 or up to 12.  Is there any way to join these?

22 REPLIES 22
MattR79
8 - Asteroid

I will give this a shot and see what I come up with.  Thank you.

apathetichell
20 - Arcturus

Here's an alternative version... This is the usual id match strategy I use. I haven't seen @Jean-Balteryx 's workflow - but I would expect that his works perfectly for the data you have provided. Obviously with the limited data we can't tell what other variations may occur in the full set.

ncrlelia
11 - Bolide

Hi @MattR79 

 

My suggestion is based on the assumption that the main difference is the trailing characters. That is, one alias is always a subset (from the left) of the other.

The main idea is to use the shorter form (regardless in which Program Alias) to be the key value to lookup.

 

1. a. Find within Program B Alias for values matching Program A Alias, based on the beginning of the field.

        And output the Program A Alias - this is Program A Alias2 in the output (screenshot below)

    b. Keep records whose Program A Alias2 is not null

    c. Use select tool to exclude (original) Program A Alias and Rename Program A Alias2 to Program A Alias.

        ncrlelia_1-1627977399872.png

 

 

2. Repeat step 1, using Program B as the reference.

    a. Find within Program A Alias for values matching Program B Alias, based on the beginning of the field.

        And output the Program B Alias - this is Program B Alias2 in the output

    b. Keep records whose Program B Alias2 is not null

    c. Use select tool to exclude (original) Program B Alias and Rename Program B Alias2 to Program B Alias.

 

3. Union the 2 outputs from Step 1 & 2.

    Create ID. You can then add the ID to each the 2 datasets, and subsequently join the 2 datasets based on this ID. 

    ncrlelia_2-1627977629690.png

 

Hope this helps.

 

Cheers,

Lelia

 

 

 

Labels
Top Solution Authors