I have two input data documents will over 10,000 lines each of which is generated by a separate program and am trying to join the two by reference numbers. However there are slight differences in the reference numbers between the two systems due to how they are input. For example Program A might have FGRT102 and Program B might have FGRT102$ with the only difference being the $ at the end. To add to the difficulty, some reference numbers are 9 digits while others might be only 5 or up to 12. Is there any way to join these?
Hi @MattR79
Find and replace might be of help. It can search in between strings and match. You can also maybe use data cleanse tool to clear special characters.
https://community.alteryx.com/t5/Interactive-Lessons/VLookUps-with-Designer/ta-p/80201
Hope this helps 🙂
you can try fuzzy match - but honestly that's so much better for NL than for ids. Id's vary by 1 digit in 100 and are totally different. words less so.
the best way is to try to figure out the way to edit your primary keys is to have some underlying matching subset of identifiers... Post some sample data and I can explain it - but there has to be a base primary key between some of the two data sets digits.
I tried using find and replace and for some reason it replaced all of my data not just the reference IDs I was trying to do.
Hi @MattR79 ,
It may be tough but you can try the Fuzzy Match tool. Please see attached as an example.
Thanks,
Mike
Apologies, if the workflow is messy, I'm very new to this. I am trying to find the Aliases coming out of the union that may not match exactly with the secondary JPM reference numbers from the top input report and replace those so I can then join them. The top input has the clean reference numbers. I need to ensure that all items between the files match as this is going to be use for balance reconciliation between two systems.
Hi @MattR79 - that's just screen grab of your workflow. Can you post sample data??? I'd really need to look at the formats of the two match columns... If there is a reconciliation you can describe, it can be implemented in Alteryx.
Hi @MattR79
Can you uncheck replace mutiple found.
Can you show us output of the find & replace tool. And sample of alias and reference number.
Here is some sample data similar to what I might have between the two programs. The issue that I have is that the aliases vary in length and the amount of trailing characters.
Program A Alias | Program B Alias |
TTTS-345623 | TTTS-345623$ |
NUSCGS005431 | NUSCGS005431EXT |
P-234571A | P-234571 |
AHKHKI115694 | AHKHKI115694ZZ |
CPCS-03248010 | CPCS-03248 |
1000054274 | 1000054274X |
User | Count |
---|---|
107 | |
82 | |
69 | |
54 | |
40 |