Weekly Challenges

kelly_gilbert · ‎03-16-2021

@hellyars, regarding your question HERE about not shortening specific words made me think of a database naming convention we had at a previous job... If each "word" in the fieldname was fewer than five letters, then you use it as-is, otherwise, you remove vowels. So, STORE_SKU would remain STORE_SKU, but STORE_ELEVATION would become STORE_ELVTN.

One way to do it would be to separate the input strings into tokens and then evaluate each token individually (i.e. if the token is longer than 5 chars, take out the vowels). This is sort of a mash-up of the regex solution and the solution in @MarqueeCrew 's video.

Using the Regex Parse tool with the Tokenize method splits the strings into tokens.

(.+?) = match any character one or more times (lazy/not greedy)
(?:\b|$) = this is a non-capturing group, meaning match it, but don't return it. Either a word boundary or the end of the string ($) indicates the end of a token.

Then, you can handle each token separately to decide whether or not to disemvowel it. Instead of using length to decide whether or not to disemvowel the token, you could also join to a text input (or file) that contains words you never want to shorten (such as the company name).

Then, you can use concatenate (with nothing in the separator) to put the strings back together again, grouping by the RecordID.

Now, the field name example I mentioned above gets tricky with regex, because we have to break the tokens at underscores. We want to split it into three tokens: STORE, _, and ELEVATION, so we can evaluate the lengths separately.

Underscores are actually included in "word" character class, so \b (word boundary) wouldn't create a break between STORE and _. The only way I know how to handle that (with regex) is to use lookarounds to identify the boundary.

Regex to tokenize with underscores: (.+?)(?:(?=\W|_)(?<!\W|_)|(?<=\W|_)(?!\W|_)|$)

(.+?) = match any character one or more times, ? means lazy/not greedy
(?:...) a non-capturing group (match it, but don't return it). A match in this group marks the end of a token.

Inside the non-capturing group:
(?=\W|_)(?<!\W|_) = the NEXT char is either a nonword or underscore, but the PREVIOUS char is not
| = OR
(?<=\W|_)(?!\W|_) = the PREVIOUS char is a nonword or underscore, but the NEXT char is not
| = OR
$ = end of string

That would correctly break STORE_ELEVATION into STORE, _, ELEVATION, so we can apply the desired logic to STORE and ELEVATION separately.

mbogusz · ‎03-16-2021

Spoiler

Challenge #259 in the can

mpennington · ‎03-16-2021

Might make me weird, but I do love me some RegEx.

Spoiler

RegEx for the win.

DXC · ‎03-16-2021

REGEX_Replace([Input Data], '(a|e|i|o|u|y)', '',0)

estherb47 · ‎03-16-2021

@MarqueeCrew always love your challenges!!

Took a non-RegEx approach at first, which worked perfectly. Find Replace solved this very quickly. A simple text input listing the vowels, all in lower case, and a formula tool to assign the null value for replacing.

This works because the vowels to be replaced are all lowercase. You could decompose the word to cut off the first letter, use a case insensitive find replace on the rest of the word, and concatenate the word back together.

RegEx approach will work to replace lower and uppercase vowels, excepting those that start the word.

KaneG · ‎03-16-2021

I left both methods that I tried. Neither are as succinct as using the boundary flag though.

Spoiler

At first the quickest solution I could see was taking the first letter off, replacing the vowels and then stitching back together. (Regex, Regex, Formula..). I then saw that I could do that in one formula pretty easily:

Left([Input Data], 1) + ReplaceChar(Right([Input Data],Length([Input Data])-1),'aeiouy','')

This didn't take into account if the first letter of the second word was a vowel and so I had to add a couple of tools around that (Record ID, T2C to Split to Rows on space, Summarize to bring it back together)

At first the quickest solution I could see was taking the first letter off, replacing the vowels and then stitching back together. (Regex, Regex, Formula..). I then saw that I could do that in one formula pretty easily:Left([Input Data], 1) + ReplaceChar(Right([Input Data],Length([Input Data])-1),'aeiouy','') This didn't take into account if the first letter of the second word was a vowel and so I had to add a couple of tools around that (Record ID, T2C to Split to Rows on space, Summarize to bring it back together)

ChrisMason_tpabay · ‎03-16-2021

Well, the top formula only works because the first letter is CAPS in the test data, but the regex replace works (It just took me a while to figure out how to make it work).

Eilean_ · ‎03-16-2021

Used Find Replace tool and make it case sensitive so the vowels which are the first letter in a word would not be excluded.

ruudweersink · ‎03-17-2021

Formula tool:-)

Dan_Z · ‎03-17-2021

Very cool seeing everyone's regex solutions! So many different ways to accomplish and here I am solving with just one formula...

Spoiler

Weekly Challenges

IDEAS WANTED

Challenge #259: Disenvowel