Hello,
I am trying to use RegEx to parse out a column into First Name, Middle Name, & Last Name. I used the following in the RegEx tool:
(^[[:alpha:]]+)\s*([[:alpha:]]*)\s+([[:alpha:]]+$)
This works great for all except for the ones with a name such as Frances L Guida Dileo. I just get [Null] in my output fields with names such as this.
Does anyone have any suggestions on how I can change/update the expression above to put the remaining characters after the 2nd space into the Last Name column? Any help would be much appreciated.
Solved! Go to Solution.
@kvoelker, I think I was able to do what you needed using the text to columns tool delimiting on "\s" and leaving left overs in column three. I then just had to do a formula to swap the middle and last name for those that did not have a middle name listed. I've attached the workflow to show you what I came up with. Hope this helps.
Hi @kvoelker
This Regex expression might work:
^(\w+)\s*(\w*)\s+(.*$)
or adapted to yours:
(^[[:alpha:]]+)\s*([[:alpha:]]*)\s+(.*$)
Instead of forcing the expression to have the last column LAST NAME with only alphabet characters, you change it to a dot, so it can also accept spaces.
Cheers,
Hi @Thableaus
These both worked well but it's still giving me incorrect results when I have a name with a suffix, such as John Doe Sr. Then my output has the correct first name, but the middle name is Doe & the last name is Sr. Any suggestions on how to ensure that doesn't happen? Thanks so much!
Hi @kvoelker
Because of different formats that names can come in, I don't think that parsing can be handled solely in a regex statement. Since the various Regex formulas are using spaces as the delimiters, adding extra words to the front, back, middle, or indeed not providing all three parts will throw things off. "Dr. Jose Fernandez", "Jose Fernandez Jr.", "Jose Fernandez" and "Jose Pablo Manuel Fernandez" all will all cause the regex to place components into the incorrect field.
I would attack this using a series of tools
1. For the prefix/suffix problem, I would build up a list of common ones and strip those off using a Find Replace tool, placing them into a prefix/suffix column.
2. From what's left, the first word is probably the given name and the last one is probably the surname name, unless there's a comma as in "Fernandez, Jose Pablo Manuel"
3. What's left after you take out the given and surnames, is probably the middle names, unless you have a situation like "Enrico da Silva" in which case the surname is "da Silva" and not just "Silva"
Of course all bets are off if the name is not from a western language. In this case the first word may be the surname with the given name following after that, unless the person has reversed the order to adjust to local customs
Good luck
Dan
If your Middle Name is composed by a single character (an abbreviation, let's say), then you could use this:
(^[[:alpha:]]+\b)\s*([[:alpha:]]?\b)\s+(.*$)
But it should follow this rule.
Cheers,
Hi All,
Thank you so much for all of the suggestions & help. I was able to combine @Thableaus suggestion with @JoBen suggestion to get the output that I need. Please see the screen shots below.