02-03-2017 12:37 PM - edited 07-15-2021 03:12 PM
One common reason fewer records than expected come out of ajoincan be due to mixed cases within the data of the join fields between the two datasets. For example, and as shown below, "Orange" from the first dataset is not going to join to "orange" in the second dataset. And this does not just apply to the leading character. Another common reason can be due to the existence of trailing or leading spaces. As also shown in the example below, "Plum " is not going to join with "Plum". The little red triangle in the upper right-hand corner of "Plum" indicates that there are leading and/or trailing spaces.
A simple method to get around this is to attach a formula tool after each Input tool and use the following formula for, in this example, the field "Fruit":
TRIM(UPPERCASE([Fruit]))
This will convert all records for this field in each dataset to upper case, while also removing any leading or trailing spaces, allowing for a more successful join.If you are joining by multiple fields, then simply use the Multi-Field Formula Tool to convert all of the fields.
In a related issue, you might find yourself getting too manyrecords than you were expecting. If this is the case, check out this Community article about inadvertentlygetting duplicates when joining.
Thank you!
Very helpful!
I tried this in a workflow that I had created and I was able to join more records. However, there are still a few that should be joined, based on name field, but I am not sure what tool to use to join (vlookup) the following inputs:
Scenario 1:
Input 1: Alexandra Smith
Input 2: Alex Smith
The "J" anchor is not listing Alexandra, however, it should since it is the same person. Similar situation happens for the following scenarios:
Scenario 2:
Input 1: Ron McDonald
Input 2: Ron Mc Donald
Scenario 3:
Input 1: Lee-Va Tran
Input 2: Lee Va Tran
Scenario 4:
Input 1: Hellen Garcia de la Guardia
Input 2: Hellen Garcia DLG
Scenario 5:
Input 1: Lili Smith
Input 2: Lilia Smith
Scenario 6:
Input 1: Lauren Smith
Input 2: Lauren Smith-Gonzalez
Scenario 7:
Input 1: Maria de Dolores Cruz
Input 2: Maria Dolores Cruz
Scenario 8:
Input 1: Allison Brown
Input 2: Ally Brown
Could you please advise what is the best way to perfom this vlookup when the fields are not exactly identical between two sources?
Hey iludeiro, I will send you some follow up resources. But for the benefit of others who might have the same question, the tool you would be looking for is the Fuzzy Match tool. The Fuzzy Match gives you the ability to join fields together when their names/values are different. It’s kind of an art and science, in the sense that you configure the tool to try and get you the best match rate based on the different matching algorithms in the tool. Here is a link to a Community article about it!
https://community.alteryx.com/t5/Alteryx-Designer-Knowledge-Base/Tool-Mastery-Fuzzy-Match/ta-p/45485
This was quite helpful thank you.