I have two input data documents will over 10,000 lines each of which is generated by a separate program and am trying to join the two by reference numbers. However there are slight differences in the reference numbers between the two systems due to how they are input. For example Program A might have FGRT102 and Program B might have FGRT102$ with the only difference being the $ at the end. To add to the difficulty, some reference numbers are 9 digits while others might be only 5 or up to 12. Is there any way to join these?
I will give this a shot and see what I come up with. Thank you.
Here's an alternative version... This is the usual id match strategy I use. I haven't seen @Jean-Balteryx 's workflow - but I would expect that his works perfectly for the data you have provided. Obviously with the limited data we can't tell what other variations may occur in the full set.
Hi @MattR79
My suggestion is based on the assumption that the main difference is the trailing characters. That is, one alias is always a subset (from the left) of the other.
The main idea is to use the shorter form (regardless in which Program Alias) to be the key value to lookup.
1. a. Find within Program B Alias for values matching Program A Alias, based on the beginning of the field.
And output the Program A Alias - this is Program A Alias2 in the output (screenshot below)
b. Keep records whose Program A Alias2 is not null
c. Use select tool to exclude (original) Program A Alias and Rename Program A Alias2 to Program A Alias.
2. Repeat step 1, using Program B as the reference.
a. Find within Program A Alias for values matching Program B Alias, based on the beginning of the field.
And output the Program B Alias - this is Program B Alias2 in the output
b. Keep records whose Program B Alias2 is not null
c. Use select tool to exclude (original) Program B Alias and Rename Program B Alias2 to Program B Alias.
3. Union the 2 outputs from Step 1 & 2.
Create ID. You can then add the ID to each the 2 datasets, and subsequently join the 2 datasets based on this ID.
Hope this helps.
Cheers,
Lelia
User | Count |
---|---|
107 | |
82 | |
69 | |
54 | |
40 |