Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing Uppercase Words From a String That Optionally Includes Spaces or Other Characters

taxtechfl
7 - Meteor

I have concatenated strings from which I need to parse out words in all uppercase.  Multiple words can occur and they could be separated by a space or other non-lowercase letter characters.  Other words in the string can have the first letter only in uppercase (title case), but those words should not be included.

 

I have provided a text file of sample data as an attachment, including the desired outputs from the parsing.  I have tried using REGEX, but I am struggling with the complexity of the pattern and the presence of title case words that I do not want included.  Any assistance with a solution is greatly appreciated.  Thank you in advance!

 

Note:  Ideally the uppercase words should be at the beginning only.  In other words, I would prefer that those occurring later in the string would not be parsed (see Record 5 for an example).  I would welcome an alternate solution as well that would parse them anywhere in the string, however, in the event I need to do so with a future dataset.

 

Excerpt Sample:

 

OriginalDesired1Desired2
THIS is a first example 202201.pdfTHISis a first example 202201.pdf
UPPER PENINSULA Sample Data LineUPPER PENINSULASample Data Line
Word This Prepayment Form[Null]Word This Prepayment Form
Retailer_s More Other DA-1S[Null]Retailer_s More Other DA-1S
Monthly Word More OTHER A-1246[Null]Monthly Word More OTHER A-1246

 

Screen Shot (see txt attachment):

taxtechfl_0-1647620381967.png

 

 

Regards,

taxtechfl

5 REPLIES 5
MarqueeCrew
20 - Arcturus
20 - Arcturus

Hey @taxtechfl ,

 

Try this:

REGEX_Replace([Field1], "\b\u\l+\b|\b\l+\b", '',0)

 

It will look for words that are all lowercase or titlecase and will zap them.  It will fail miserably on "aBCDe".

 

I wrote another expression in case you end up with extra spaces:

 

Regex_Replace(REGEX_Replace([Field1], "\b\u\l+\b|\b\l+\b", '',0),"\s{2,}",' ')

 

To be sure of all uppercase words I am going to explain the workflow that YOU must build.

 

1.  If you don't have a recordID (unique key), put one on the data.

2. TEXT to COLUMNS tool with space as a delimiter - SET output to ROWS

3. FILTER as:  REGEX_Match([Field1], "\u+",0) 

4. SUMMARIZE as:  Group By Record ID  + String (Concatenate Function) with SPACE as a delimiter

5. JOIN the data back to your incoming data

 

Try this out and see the magic.

 

Cheers,

 

Mark

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
taxtechfl
7 - Meteor

Thank you so much for the response and information.  Although I was able to build the alternate workflow to parse all uppercase words, I was not able to get the REGEX solution for beginning uppercase words to work as expected for all records.  I have attached the workflow I built.  Thank you again for your assistance.

taxtechfl
7 - Meteor

I accidentally replied to my main thread instead of your response, and it won't let me delete it.  I'm replying so you get the notification.  Thanks.

danilang
19 - Altair
19 - Altair

Hi @taxtechfl 

 

Here's one way to do it.

 

danilang_0-1647692364518.png

Once the words are split to rows, number them, check if they're all upper case and then mark the strings of uppercase words from the start only.  Concatenate the two sets of words and then join them back on record id and then join this to your original data set, giving you

 

danilang_1-1647692633824.png

 

Note that I modified the regex to check if the word is all upper case to [\u+\-*]* to catch TWO-OR-MORE hyphenated words at the start of the sentence

 

Dan

taxtechfl
7 - Meteor

Thank you for the assistance.  That gets the desired results without pulling uppercase words later in the string.  This Community is awesome. 

 

Labels