Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
MarqueeCrew
20 - Arcturus
20 - Arcturus

Editor’s note: @NeilR (Neil) and @MarqueeCrew (Mark) got to talking recently about how some accepted solutions on community are amazing in how they helpfully and concisely answer the OP’s immediate need, but may not fully teach the concepts underlying the solution in order that they may be equipped to solve the next similar problem on their own. We discussed a recent solution Mark provided about how to remove all vowels from a string unless it is the first letter of a word. Mark followed up the solution with a YouTube video detailing an alternate methodology. What follows is a synopsis of our conversation. 

 

 

[Neil] You already answered the question that the OP marked as an accepted solution, what compelled you to create the above video? 

  

[Mark] Neil, providing a solution to the community gives us an opportunity to help someone get unstuck today and hopefully tomorrow, too. This specific post gave me an opportunity to demonstrate how to find your own solution path. Besides my YouTube video, I sent this solved YXMD to the Alteryx Academy for a weekly challenge. This puzzle gives beginners a challenge to use string functions and advanced users a chance to exercise their regex and error handling skills.   

  

Wait a second, the solution you presented in the video is different than the one you shared in the thread. What happened? 

  

Regex was handy from my iPhone. I could readily respond to the post without direct access to my Alteryx desktop. I wanted to help community members think about how to solve. But if you watched the video (past the thank you slide) the regex solution gets explained. This just goes to show you that there isn’t just one way to solve a challenge. 

  

ReplaceChar? Where did that come from? Where do I find all these string functions? 

  

Have you ever reviewed the functions inside of the Formula tool? I have. I also review the help documentation for excitement. OK, that's a stretch. I've had plenty of opportunities to clean dirty data. Knowing string functions has saved me lots of trial and error. The functions have been updated many times since I started using Alteryx. Keep your saw sharp and light bulbs bright by exploring. In the good old days there was no Contains, StartsWith or EndsWith.  

  

So which one is better - regex or not regex? 

  

Regex wins! Generally, I apply regex to unstructured data. When you can use a combination of string functions, I will opt for those instead. Regex consumes more resources than string functions and isn’t so friendly to untrained users. As a consultant, I must make workflows easily explained and maintained for my clients. In this specific example, I wondered about the true complexity of the text data. The regex approach left little room for errors. I am aware of the defect of the YouTube solution.  

  

Input Data 

Expected Results 

Beginner Results 

Regex Results 

Forty-acre 

Frt-acr 

Frt-cr 

Frty-acr 

Apples,Avocados & Apricots 

Appls,Avcds & Aprcts 

Appls,vcds & Aprcts 

Appls,Avcds & Aprcts 

                            

When the “beginner” approach is used, the assumption is that a “word” is preceded by a space. The rules do not specify how to handle examples like hyphenated words. Similarly, a comma might not have a space after it and will cause the first letter of the following word to delete a vowel. The regex expression handles words by boundaries. In simple terms, if the preceding character is not {0-9, A-Z a-z}, then it is a boundary (along with start of field).  

  

REGEX_Replace([Input Data], "\B[aeiouAEIOU]", '')

 

\B = Not a boundary character 

[…] = Set of characters 

  

Explanation: The first character of a word preceded by any character not in the set of {0-9, A-Z a-z} is skipped. All other vowels (as listed) are identified by the regular expression and Alteryx will replace them with ‘’ (nothing). 

  

In this case, regex will outperform the string functions because of the parse and summarize tool expense. In order to make the “beginner” solution work, you’ll need to do some messy work. Experience has taught me to take data into consideration and to Keep It SIMPLE. I wonder if @hellyars considered the data examples and could comment on what the expected results would be. It is possible that he’d want to ignore the punctuation/number and delete the vowel. In that case, SIMPLE wins! 

  

I see you used a Message tool – is that APA?

  

The Message tool is mind-blowing. Don’t underestimate the power of the tool when you are preparing yourself for Analytic Processing Automation (APA). It is a fundamental tool for defensive-configuration. I’m coining the phrase APA Readiness as the ability to repeatedly run workflows with confidence, given a fundamental lack in confidence that the data won’t become trashy in the future.

  

A simple use case for the Message tool and why it is so important is when you read CSV files. All of your input comes to you as V_WString. If you get an amount of $123,456.78 and place a SELECT tool after the input to change the type to DOUBLE, you might get a warning message. If a date of 2021-02-31 comes in you might get a warning message. I say might because you’ve probably got a limit of 10 warnings per occurrence. You might also get it, but if you’re running the workflow on your Server and you don’t check your logs, you might not even know about the problem until you provide your data or report to your boss or client. Enter the Message tool. You can check your data and ERROR, stop or save your @$$ from upstream issues. 

  

For this example, I wanted to demonstrate how the tool works and how you can configure it to both ERROR and display useful messages. If you don’t use a Message or a Test tool in your workflow, you won’t get an A in my virtual class. When you run a workflow on your desktop and you’ve got lots of Browses to visually check for data anomalies, you have a fairly low APA readiness score. You’re spending time doing what Alteryx does faster. 

  

Anything else? 

  

"Anything else?" That’s .* in regex 😉 I’ll keep my eye on community posts for ideas for future deep-dive adventures. I’d really like to encourage readers to comment and to suggest areas where “How To” examples can be helpful. Why did you read this post? If there is desired content, I’d like to contribute what I can to get you involved in Community. 

Comments