Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #259: Disenvowel

kelly_gilbert
13 - Pulsar

@hellyars, regarding your question HERE about not shortening specific words made me think of a database naming convention we had at a previous job... If each "word" in the fieldname was fewer than five letters, then you use it as-is, otherwise, you remove vowels. So, STORE_SKU would remain STORE_SKU, but STORE_ELEVATION would become STORE_ELVTN.

 

One way to do it would be to separate the input strings into tokens and then evaluate each token individually (i.e. if the token is longer than 5 chars, take out the vowels). This is sort of a mash-up of the regex solution and the solution in @MarqueeCrew 's video.

Using the Regex Parse tool with the Tokenize method splits the strings into tokens.

  • (.+?) = match any character one or more times (lazy/not greedy)
  • (?:\b|$) = this is a non-capturing group, meaning match it, but don't return it. Either a word boundary or the end of the string ($) indicates the end of a token.

kelly_gilbert_1-1615925767395.png

 

 

Then, you can handle each token separately to decide whether or not to disemvowel it. Instead of using length to decide whether or not to disemvowel the token, you could also join to a text input (or file) that contains words you never want to shorten (such as the company name).

kelly_gilbert_2-1615926000460.png

 

 

Then, you can use concatenate (with nothing in the separator) to put the strings back together again, grouping by the RecordID.

 

kelly_gilbert_3-1615926142517.png

 

 

Now, the field name example I mentioned above gets tricky with regex, because we have to break the tokens at underscores. We want to split it into three tokens: STORE, _, and ELEVATION, so we can evaluate the lengths separately.

 

Underscores are actually included in "word" character class, so \b (word boundary) wouldn't create a break between STORE and _. The only way I know how to handle that (with regex) is to use lookarounds to identify the boundary.

Regex to tokenize with underscores: (.+?)(?:(?=\W|_)(?<!\W|_)|(?<=\W|_)(?!\W|_)|$)

 

(.+?) = match any character one or more times, ? means lazy/not greedy
(?:...) a non-capturing group (match it, but don't return it). A match in this group marks the end of a token.

 

Inside the non-capturing group:
(?=\W|_)(?<!\W|_)  =  the NEXT char is either a nonword or underscore, but the PREVIOUS char is not
|  =  OR
(?<=\W|_)(?!\W|_)  =  the PREVIOUS char is a nonword or underscore, but the NEXT char is not
|  =  OR
$  =  end of string

 

That would correctly break STORE_ELEVATION into STORE, _, ELEVATION, so we can apply the desired logic to STORE and ELEVATION separately.

mbogusz
9 - Comet
Spoiler
Challenge #259 in the canChallenge #259 in the can
mpennington
11 - Bolide

Might make me weird, but I do love me some RegEx. 

 

Spoiler
RegEx for the win.RegEx for the win.

 

 

DXC
6 - Meteoroid

REGEX_Replace([Input Data], '(a|e|i|o|u|y)', '',0)

estherb47
15 - Aurora
15 - Aurora

@MarqueeCrew  always love your challenges!!

 

Took a non-RegEx approach at first, which worked perfectly. Find Replace solved this very quickly. A simple text input listing the vowels, all in lower case, and a formula tool to assign the null value for replacing.

 

This works because the vowels to be replaced are all lowercase. You could decompose the word to cut off the first letter, use a case insensitive find replace on the rest of the word, and concatenate the word back together.

 

RegEx approach will work to replace lower and uppercase vowels, excepting those that start the word. 

EstherB47_0-1615940857731.png

 

KaneG
Alteryx Alumni (Retired)

I left both methods that I tried. Neither are as succinct as using the boundary flag though.

 

Spoiler

At first the quickest solution I could see was taking the first letter off, replacing the vowels and then stitching back together. (Regex, Regex, Formula..). I then saw that I could do that in one formula pretty easily:

Left([Input Data], 1) + ReplaceChar(Right([Input Data],Length([Input Data])-1),'aeiouy','')

 

This didn't take into account if the first letter of the second word was a vowel and so I had to add a couple of tools around that (Record ID, T2C to Split to Rows on space, Summarize to bring it back together)


KaneG_0-1615945825334.png

 

ChrisMason_tpabay
6 - Meteoroid

Well, the top formula only works because the first letter is CAPS in the test data, but the regex replace works (It just took me a while to figure out how to make it work).

Eilean_
8 - Asteroid

Used Find Replace tool and make it case sensitive so the vowels which are the first letter in a word would not be excluded. 

ruudweersink
7 - Meteor

Formula tool:-)

Dan_Z
7 - Meteor

Very cool seeing everyone's regex solutions! So many different ways to accomplish and here I am solving with just one formula...

 

Spoiler
solution.PNG