This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
RegEx….it can be tough, but extremely useful when looking for ways of extracting information from a string. Regular Expressions are basically a code you can write to match a specific set of characters (it's a pattern matching syntax). It could be something as simple as finding the three-digit number in the string “Eample123Eample” or something more complicated like using hexadecimal to select a certain range of characters. RegEx is something that can be used in Alteryx via the RegEx tool in the Parse toolset and some of the main uses for RegEx are replacing, matching, and parsing specific characters within a string.
Replacing text is a very handy process when working with string fields, as it can be used to replace virtually any set of characters. In the attached example module you will find the following data in Text Input Tool:
What I would like to do for this example is replace the portion of the string “file” and only that portion. The easiest way to do this using the RegEx tool is to follow these steps:
Make sure that the correct field is selected in the Field to Parse drop down.
Enter “file” as my RegEx in the Regular Expression text box.
Make sure that your Output Method is set to Replace.
Finally, enter a replacement text in the Replacement Text text box.
In this example, I choose “FileNumber_”. Once a Browse tool is connected and the module is run, you can see that the output has replaced all of the character matching “file” with “FileNumber_”.
Using RegEx to match is another process that can be very helpful for your data preparation or transformation process. Using the Match output method will assign a True or False value for you data based on the RegEx that you have written. I’m going to use the file initial data that was used above, but this time I was to match only the records that are “file1…”, “file2…”, and “file3…”. To do this, I have written a regular expression of: file.+. The breakdown of the RegEx script is as follows:
First, I am searching for “file” much like I did in the replace example.
Next, I am using brackets “”, to only look for the numbers 1, 2, or 3. Square brackets are used to match specific characters, just like an OR statement in a Formula Expression.
And lastly, “.+”. The period will look for any character and the plus will look for one or more additional characters. This covers the file extension in our data.
As you can see in the below output, only records that had a file name of file1, file2, or file3 have been returned as a True match, while file4 returns False:
Parsing with the RegEx tool can be a little bit more complicated, but again it is an extremely useful process that will help in your data prep and transformation stages. The nice thing about parsing with the RegEx tool is that it also allows you to alter the fields upon output. Once you have a valid regular expression for parsing, you will be able to change the field name, the data type, and the size of the field. The trick to parsing using RegEx is that you have to create a group, and this is done by using parenthesis “()” to identify each group. You can use multiple groups parse into multiple field, but for this example, there will just be one group.
Using the same initial file name field, we will use a regular expression to parse just the file name without an extension. For this, I have used the following expression: ^(.+)\..+$ . Just like before, the breakdown is as follows:
Using a “^” in RegEx means to look at the start of the string. This portion “(.+)” is looking for my group. In this case, any number of characters. And since I added a “^” to the beginning, it will look at the beginning of the string.
The "\.." portion will look for the period in between the file name and the extension. Using a slash before the period will tell RegEx that you are actually looking for a period (the slash is called an Escape character). Otherwise, a period is a wild card, meaning it will return any character.
The "+$" is the last portion where we look for the extension. Using “.+” will look for any characters, which will account for all of the extensions, while the dollar sign is making sure that this is looking at the end of the string.
Here is the outcome of using the parse method to find just the file name:
I hope this helps you get started with regular expressions! As always, if you have any questions please don't hesitate to email firstname.lastname@example.org. Special thanks to Mike Akey for putting together this post, until next time!