Hi all. I have a string that I am trying to break into two parts based on a hypen. However, the complication is that there are situations where I don't want the hypen to count. Example:
1: ThisIsSampleText-ThisIsSampleName
2: This - IsSampleText-ThisIsSampleName
In #1, I want two pieces: ThisISSampleText and ThisIsSampleName
In #2, I want two pieces: This - IsSampleText and ThisIsSampleName
There are, of course, variations of this where I could have something like:
This- IsSample...
This -IsSample...
ThisIsSampleText-ThisIs-SampleName...
I basically want to look for the first hypen without any spaces before and/or after and break the string at that point.
I would also like to count the number of hypens in the overall string to help me create error processing around the other possible complications.
Could someone assist with the RegEx or formula that will allow me to do this? Thank you.
Solved! Go to Solution.
Try this RegEx string: (.*?\S)-(\S.*)
Should separate your field based on the first hyphen it finds that is not surrounded by spaces (\S indicated a non-whitespace character before & after the hypen).
As for counting hyphens, you could use a formula REGEX_CountMatches([Field1],"-") in a Formula tool to identify those records that have more than 1 hyphen.
If you want to try out some other combinations, I find that Regex101.com is a helpful site for testing your RegEx strings on sample entries 🙂 Hope that helps!
Cheers,
NJ
@NicoleJohnson Thank you - this worked. However, now I have a second question (refinement of the first question actually).
In the regex, to account for situations where I have multiple instances of a hypen surrounded by other characters, how would I adjust the regex so that it split on the 2nd hyphen? So as an example, in the below:
North-Technical Operations-Smith, John Joseph
your expression is correctly splitting on North. But is it possible to have it adjust to split into
North-Technical Operations and Smith, John Joseph
Also, could the count of hyphens RegEx be modified to look for hyphens surrounded by other characters only?
Thank you.
OK new strategy 🙂
I'd use a Regex_Replace in a Formula tool with the following formula to split with a pipe delimiter (or some other delimiter that you are sure will not be present somewhere else in that field):
REGEX_Replace([Field1],"(\S)(-)(\S*,\s.*)","$1|$3")
That will turn North-Technical Operations-Smith, John Joseph into North-Technical Operations|Smith, John Joseph, at which point you could use a Text to Columns tool to split the field on the pipe delimiter. I've attached a workflow with a bunch of examples that appear to work with this logic, including hyphens on either side of the split.
As for the count of matches, just add the \S on either side of the hyphen to get the count of only those scenarios that do not have whitespace surrounding the hyphen: REGEX_Countmatches([Field1],"(\S-\S)")
RegEx problems are always a mind-bender!
NJ
@NicoleJohnson Thank you, this is coming so close. From what I can tell, the scenario it's failing on the most is this one:
Region-Marketing-Smith,John
Which is weird because your expression works just fine on:
University-Midwest Division-Doe, Jane
Any ideas on why those combinations are triggering false breaks?
I believe it's because there isn't a space after Smith in the one that's failing? You will probably end up with a few exceptions regardless of the expression you choose... perhaps you can add a filter after this step to check for null values in the second column, and then you might need to apply different expression or manually review those exceptions.
NJ
@NicoleJohnson Thank you for your help!