Insert delimiter between uppercase and lowercase characters in string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi All,
I'm trying to split this string in a column into two columns based on the Case - I've been looking for a while to figure out how to do this? I figured add a delimiter checking for case and use a text to columns function - but now I'm not sure how to add the delimiter based on Case. Here's a sample of what I want
Col 1 | Col 2 | col 3 |
ABCaaa | ABC | aaa |
AHCNaaarty | AHCN | aaarty |
BGhnf | BG | hnf |
Thanks - appreciate your help in advance!
Solved! Go to Solution.
- Labels:
- Parse
- Preparation
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
\u is uppercase
\l is lowercase.
\u+ is 1 or more uppercase letters
\l+ is 1 or more lowercase letters.
(\u+)(\l+) is 2 groups
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @MarqueeCrew
Not sure I follow though - (\u+)(\l+) as a REGEX input? I'm trying to insert a delimiter to split the columns - What do I need to add to check for case and then split the columns?
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Regex_replace([string], "(\u+)(\l+)",'$1|$2',0)
Should work. Else switch to ,1. I forget which way the case sensitivity works.
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@MarqueeCrew Works perfect - thanks so much!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Now that I am in front of my computer I can more easily reply. When you requested the delimiter, I wanted to explain that you can use the RegEx tool to complete the parsing for you. If you configure the tool as shown below, all parsing is completed in one tool. You can identify the groups of letters and parse them into their two component fields as shown:
The regular expression used was:
(\u+)(\l+)
This puts the uppercase (1+) characters into the first group and the lowercase (1+) characters into the second group. You then De-Select the "case insensitive" option and then rename the groups inside of the output fields configuration.
This results in 2 output fields being created from you original input field.
Cheers,
Mark
P.S. Even when I'm in front of my computer, speling errors can happen.
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@MarqueeCrewHaha! Thanks for your nifty reply from your phone standing in line at a mall!! That was awesome! :)
I ran into a interesting situation that I didn't realize previously with my data -
Expected Input - ABCabc, ADJChasggh
Actual Input - ABCAbce, ADJCHasGgh
To give you some context I basically want to split this into two strings but the problem is my first string I know is all CAPS - then 2nd half of the string essentially sometimes has two words/Caps but I want to split that as one string into a column.
Let me give you an example with actual data
Input Output
HOMERoot - HOME + Root
SETTINGSSettingsRootView - SETTINGS +SettingsRootView
Thanks in advance!
I knew I should've looked at the data before but I make mistakes at my computer all the time ;)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
That is quite a challenge. The string "SETTINGSSettingsRootView" looks like CAPS followed by anything where one would not immediately know if the pair of "S" characters are part of the same word/values. I was afraid that I could not solve the problem for you with Alteryx. I have however conquered that fear by breaking down the problem.
Pattern #1 we have already identified: A set of CAPS followed by lowercase. Now Pattern #2 is CAPS followed by MixedCase.
We can use a REGEX_MATCH() function to see which pattern we are dealing with and write different expressions for each case.
Here is a solution formula:
IF REGEX_Match([Field1], "\u+\l+\u+.*",0) THEN REGEX_Replace([Field1], "(\u+)(\u\l+.*)", '$1+$2',0) ELSE REGEX_Replace([Field1], "(\u+)(\l+)", '$1+$2',0) ENDIF
We check to see if the pattern is: 1+ Uppercase followed by 1+ Lowercase followed by 1+ Uppercase followed by anything (with case sensitivity set). Then we apply a replacement formula where we take the first Uppercase group of letters (prior to the first uppercase followed by a lowercase) and assign that to group #1. We add a + sign (as you used it for a delimiter in your example) and place the balance of the expression into group #2.
Otherwise, we use our proven expression.
I love Alteryx and am very fond of RegEx.
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This is awesome @MarqueeCrew!!! Thank you so much - This is some next level stuff!!
