I have a challenge of how to deal with input data that has last names that could have multiple capital letters, sometimes with a space before it and sometimes without a space before the second capital letter.
While the Proper Casing might seem like a good way to standardize things, it really doesn't help me when I have last name data like this:
LaSage
McDonald
Mc Donald
MacDonald
Van Horn
VanBoxtel
D'Acquisto
Smith Edwards
German-Edwards
Any best practices for dealing with this so I'm not left with Lasage, Mcdonald, Macdonald, etc.?
This is a nice challenge, you can share it with Alteryx team so they can add it to the weekly challenge.
@DavidP has a suggestion on this post for how to handle it. You add a space after the "Mc" before "Donald" so it becomes "Mc Donald", etc. Then Title Case will handle it properly and then you can remove the space. "LaSage" is a bit of a weird one, but the rest should work with what David described.
It looks like you have made a file of mapping names in order to get this to work.
In my case, it is over 40,000 records/rows a month, which would take a lot of time and have a bunch of variations. How would people do this with even larger data sets?
@czello
I somehow agree with @nagakavyasri on this one.
With those exceptions, it might be a good idea to have a library of last names then keep improving that list.