Alteryx Designer Desktop Discussions

taxtechfl · ‎03-18-2022

I have concatenated strings from which I need to parse out words in all uppercase. Multiple words can occur and they could be separated by a space or other non-lowercase letter characters. Other words in the string can have the first letter only in uppercase (title case), but those words should not be included.

I have provided a text file of sample data as an attachment, including the desired outputs from the parsing. I have tried using REGEX, but I am struggling with the complexity of the pattern and the presence of title case words that I do not want included. Any assistance with a solution is greatly appreciated. Thank you in advance!

Note: Ideally the uppercase words should be at the beginning only. In other words, I would prefer that those occurring later in the string would not be parsed (see Record 5 for an example). I would welcome an alternate solution as well that would parse them anywhere in the string, however, in the event I need to do so with a future dataset.

Excerpt Sample:

Original	Desired1	Desired2
THIS is a first example 202201.pdf	THIS	is a first example 202201.pdf
UPPER PENINSULA Sample Data Line	UPPER PENINSULA	Sample Data Line
Word This Prepayment Form	[Null]	Word This Prepayment Form
Retailer_s More Other DA-1S	[Null]	Retailer_s More Other DA-1S
Monthly Word More OTHER A-1246	[Null]	Monthly Word More OTHER A-1246

Screen Shot (see txt attachment):

Regards,

taxtechfl

MarqueeCrew · ‎03-18-2022

Hey @taxtechfl ,

Try this:

REGEX_Replace([Field1], "\b\u\l+\b|\b\l+\b", '',0)

It will look for words that are all lowercase or titlecase and will zap them. It will fail miserably on "aBCDe".

I wrote another expression in case you end up with extra spaces:

Regex_Replace(REGEX_Replace([Field1], "\b\u\l+\b|\b\l+\b", '',0),"\s{2,}",' ')

To be sure of all uppercase words I am going to explain the workflow that YOU must build.

1. If you don't have a recordID (unique key), put one on the data.

2. TEXT to COLUMNS tool with space as a delimiter - SET output to ROWS

3. FILTER as: REGEX_Match([Field1], "\u+",0)

4. SUMMARIZE as: Group By Record ID + String (Concatenate Function) with SPACE as a delimiter

5. JOIN the data back to your incoming data

Try this out and see the magic.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

taxtechfl · ‎03-18-2022

Thank you so much for the response and information. Although I was able to build the alternate workflow to parse all uppercase words, I was not able to get the REGEX solution for beginning uppercase words to work as expected for all records. I have attached the workflow I built. Thank you again for your assistance.

taxtechfl · ‎03-18-2022

I accidentally replied to my main thread instead of your response, and it won't let me delete it. I'm replying so you get the notification. Thanks.

danilang · ‎03-19-2022

Hi @taxtechfl

Here's one way to do it.

Once the words are split to rows, number them, check if they're all upper case and then mark the strings of uppercase words from the start only. Concatenate the two sets of words and then join them back on record id and then join this to your original data set, giving you

Note that I modified the regex to check if the word is all upper case to [\u+\-*]* to catch TWO-OR-MORE hyphenated words at the start of the sentence

Dan

taxtechfl · ‎03-21-2022

Thank you for the assistance. That gets the desired results without pulling uppercase words later in the string. This Community is awesome.

Alteryx Designer Desktop Discussions

Parsing Uppercase Words From a String That Optionally Includes Spaces or Other Characters