hi all
I have 2 datasets where in one the data are with german special characters like äöü.
In the other dataset the same word (name) with the international "translation"
Examples:
Müller vs Mueller
Küng vs Kueng
König vs Koenig
Käse vs Kaese
Is there a way to (for example) join the 2 datasets even I have these differences ?
Or in my case I need to verify the 2 name fields for inconsistences and this should not come up with an error.
I dont want to find/replace the ü with ue ....
Thanks
Steffen
Solved! Go to Solution.
Hi @chvizda
I think you can use Fuzzy Match in this one:
- Add a Formula Tool to identify Source
- Union both datasets
- Use Merge Mode by Source in Fuzzy Match
- Use "Name" option
- After all you can join your datasets with the FuzzyMatch result
If the Threshold is not getting the results you are looking for, maybe you will need to make some adjustments on your Fuzzy Match Style
Cheers,
Hi @chvizda
@Thableaus presented you with a great solution, as always
Now, I realize that you specifically said that "I dont want to find/replace the ü with ue ....", but is this because you want to use the German spelling in the rest of the workflow and replacing the characters would lose that? If that's the case, consider this work flow.
The characters to be converted are stored in the Lookup Characters list along with their equivalent "International" replacements. The bottom half of the workflow builds an international equivalent of the accented name and adds it as a new field. The last join uses this "international" to match with the international list. From this point one, both the accented and international names are available for each record. You can drop the international name completely from this point on
Note that the 2 "...Value" fields are data I added into the input tables to show how their now both available on the records.
The beauty of this technique is that you can expand the look-up list as needed, i.e. é=e, ç=c as required without having to manually change a Switch or IF in a formula tool
Dan