Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Joining or Matching non-identical fields

MattR79
8 - Asteroid

I have two input data documents will over 10,000 lines each of which is generated by a separate program and am trying to join the two by reference numbers.  However there are slight differences in the reference numbers between the two systems due to how they are input.  For example Program A might have FGRT102 and Program B might have FGRT102$ with the only difference being the $ at the end.  To add to the difficulty, some reference numbers are 9 digits while others might be only 5 or up to 12.  Is there any way to join these?

22 REPLIES 22
apathetichell
20 - Arcturus

would it make sense to try to just match the first 5 or 6 characters from each?

apathetichell
20 - Arcturus

Also if Program A is always included in Program B there are some ways you can do a wildcard non-join match... You can have both in a macro where he Program A is the macro Input and the B is a static input. It can then search if value A+wildcard is is Program B Alias....

MattR79
8 - Asteroid

Unfortunately due to the varying length and the fact that there may be 10 or 15 lines all of which have the same first few digits that is not an option here.

MattR79
8 - Asteroid

Is that something you could show me?  I am pretty new to this and would have no idea how to set that up.

MattR79
8 - Asteroid

Is that something that you could show me an example of?

Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @MattR79 ,

 

Can your values have more than 1 trailing characters ?

apathetichell
20 - Arcturus

Post some sample data and an example of the output you would want. You also kind of need an arbitrary number of characters where you deem the ID the same...  The way how your sample data is it's impossible for me to know if there is some other entry with the first 7 or 8 or 10 characters which are identical. One datasource has to have a base id- no?

MattR79
8 - Asteroid

See below for the sample data, I need a way to join the two data sets based on the alias, I could assume that the aliases in Program A would be correct so I need to see if there is a way to do that.  The issue is that the data below is representative of the string lengths I work with and I cannot trim them down to a set number of characters as some items like the AHKHKI are sequential and trimming down the characters would eliminate those.

 

Program A AliasProgram B Alias
TTTS-345623TTTS-345623$
NUSCGS005431NUSCGS005431EXT
P-234571P-234571A
AHKHKI115694AHKHKI115694ZZ
CPCS-032480CPCS-032480!
10000542741000054274X
MattR79
8 - Asteroid

No, generally not.  There is typically only one trailing.

Jean-Balteryx
16 - Nebula
16 - Nebula

Hi @MattR79 ,

 

Here is a first solution ! It may be improved but it works !

Labels
Top Solution Authors