Challenge #259: Disenvowel
- Suscribirse a un feed RSS
- Marcar tema como nuevo
- Marcar tema como leído
- Flotar este Tema para el usuario actual
- Favorito
- Suscribir
- Silenciar
- Página de impresión sencilla
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
@hellyars, regarding your question HERE about not shortening specific words made me think of a database naming convention we had at a previous job... If each "word" in the fieldname was fewer than five letters, then you use it as-is, otherwise, you remove vowels. So, STORE_SKU would remain STORE_SKU, but STORE_ELEVATION would become STORE_ELVTN.
One way to do it would be to separate the input strings into tokens and then evaluate each token individually (i.e. if the token is longer than 5 chars, take out the vowels). This is sort of a mash-up of the regex solution and the solution in @MarqueeCrew 's video.
Using the Regex Parse tool with the Tokenize method splits the strings into tokens.
- (.+?) = match any character one or more times (lazy/not greedy)
- (?:\b|$) = this is a non-capturing group, meaning match it, but don't return it. Either a word boundary or the end of the string ($) indicates the end of a token.
Then, you can handle each token separately to decide whether or not to disemvowel it. Instead of using length to decide whether or not to disemvowel the token, you could also join to a text input (or file) that contains words you never want to shorten (such as the company name).
Then, you can use concatenate (with nothing in the separator) to put the strings back together again, grouping by the RecordID.
Now, the field name example I mentioned above gets tricky with regex, because we have to break the tokens at underscores. We want to split it into three tokens: STORE, _, and ELEVATION, so we can evaluate the lengths separately.
Underscores are actually included in "word" character class, so \b (word boundary) wouldn't create a break between STORE and _. The only way I know how to handle that (with regex) is to use lookarounds to identify the boundary.
Regex to tokenize with underscores: (.+?)(?:(?=\W|_)(?<!\W|_)|(?<=\W|_)(?!\W|_)|$)
(.+?) = match any character one or more times, ? means lazy/not greedy
(?:...) a non-capturing group (match it, but don't return it). A match in this group marks the end of a token.
Inside the non-capturing group:
(?=\W|_)(?<!\W|_) = the NEXT char is either a nonword or underscore, but the PREVIOUS char is not
| = OR
(?<=\W|_)(?!\W|_) = the PREVIOUS char is a nonword or underscore, but the NEXT char is not
| = OR
$ = end of string
That would correctly break STORE_ELEVATION into STORE, _, ELEVATION, so we can apply the desired logic to STORE and ELEVATION separately.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
@MarqueeCrew always love your challenges!!
Took a non-RegEx approach at first, which worked perfectly. Find Replace solved this very quickly. A simple text input listing the vowels, all in lower case, and a formula tool to assign the null value for replacing.
This works because the vowels to be replaced are all lowercase. You could decompose the word to cut off the first letter, use a case insensitive find replace on the rest of the word, and concatenate the word back together.
RegEx approach will work to replace lower and uppercase vowels, excepting those that start the word.
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
I left both methods that I tried. Neither are as succinct as using the boundary flag though.
At first the quickest solution I could see was taking the first letter off, replacing the vowels and then stitching back together. (Regex, Regex, Formula..). I then saw that I could do that in one formula pretty easily:
Left([Input Data], 1) + ReplaceChar(Right([Input Data],Length([Input Data])-1),'aeiouy','')
This didn't take into account if the first letter of the second word was a vowel and so I had to add a couple of tools around that (Record ID, T2C to Split to Rows on space, Summarize to bring it back together)
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
- Marcar como nuevo
- Favorito
- Suscribir
- Silenciar
- Suscribirse a un feed RSS
- Resaltar
- Imprimir
- Notificar al moderador
Very cool seeing everyone's regex solutions! So many different ways to accomplish and here I am solving with just one formula...