Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Formula or Regex string that will help me identify a hypen without surrounding white space

jay_chang
8 - Asteroid

Hi all.  I have a string that I am trying to break into two parts based on a hypen.  However, the complication is that there are situations where I don't want the hypen to count.  Example:

 

1:  ThisIsSampleText-ThisIsSampleName

 

2:  This - IsSampleText-ThisIsSampleName

 

In #1, I want two pieces: ThisISSampleText and ThisIsSampleName

 

In #2, I want two pieces: This - IsSampleText and ThisIsSampleName

 

There are, of course, variations of this where I could have something like:

    This- IsSample...

    This -IsSample...

    ThisIsSampleText-ThisIs-SampleName...

 

I basically want to look for the first hypen without any spaces before and/or after and break the string at that point.

 

I would also like to count the number of hypens in the overall string to help me create error processing around the other possible complications.

 

Could someone assist with the RegEx or formula that will allow me to do this?  Thank you. 

7 REPLIES 7
NicoleJohnson
ACE Emeritus
ACE Emeritus

Try this RegEx string: (.*?\S)-(\S.*)

Should separate your field based on the first hyphen it finds that is not surrounded by spaces (\S indicated a non-whitespace character before & after the hypen).

 

As for counting hyphens, you could use a formula REGEX_CountMatches([Field1],"-") in a Formula tool to identify those records that have more than 1 hyphen.

 

If you want to try out some other combinations, I find that Regex101.com is a helpful site for testing your RegEx strings on sample entries 🙂 Hope that helps!

 

Cheers,

NJ

benakesh
12 - Quasar

Hi @jay_chang , 

Text to columns or regex  can split  and  tokenize can count  hyphen.

clipboard_image_0.png

 

 

jay_chang
8 - Asteroid

@NicoleJohnson Thank you - this worked.  However, now I have a second question (refinement of the first question actually).

 

In the regex, to account for situations where I have multiple instances of a hypen surrounded by other characters, how would I adjust the regex so that it split on the 2nd hyphen?  So as an example, in the below:

 

        North-Technical Operations-Smith, John Joseph

 

your expression is correctly splitting on North.  But is it possible to have it adjust to split into 

 

        North-Technical Operations and Smith, John Joseph

 

Also, could the count of hyphens RegEx be modified to look for hyphens surrounded by other characters only?

 

Thank you.

 

NicoleJohnson
ACE Emeritus
ACE Emeritus

OK new strategy 🙂 

 

I'd use a Regex_Replace in a Formula tool with the following formula to split with a pipe delimiter (or some other delimiter that you are sure will not be present somewhere else in that field):

REGEX_Replace([Field1],"(\S)(-)(\S*,\s.*)","$1|$3")

 

That will turn North-Technical Operations-Smith, John Joseph into North-Technical Operations|Smith, John Joseph, at which point you could use a Text to Columns tool to split the field on the pipe delimiter. I've attached a workflow with a bunch of examples that appear to work with this logic, including hyphens on either side of the split.

 

As for the count of matches, just add the \S on either side of the hyphen to get the count of only those scenarios that do not have whitespace surrounding the hyphen: REGEX_Countmatches([Field1],"(\S-\S)")

 

RegEx problems are always a mind-bender!

 

NJ

jay_chang
8 - Asteroid

@NicoleJohnson Thank you, this is coming so close.  From what I can tell, the scenario it's failing on the most is this one:

 

Region-Marketing-Smith,John

 

 

Which is weird because your expression works just fine on:

 

 

University-Midwest Division-Doe, Jane

 

 

 

Any ideas on why those combinations are triggering false breaks?

NicoleJohnson
ACE Emeritus
ACE Emeritus

I believe it's because there isn't a space after Smith in the one that's failing? You will probably end up with a few exceptions regardless of the expression you choose... perhaps you can add a filter after this step to check for null values in the second column, and then you might need to apply different expression or manually review those exceptions.

 

NJ

jay_chang
8 - Asteroid

@NicoleJohnson Thank you for your help!

Labels