community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Knowledge Base

Definitive answers from Designer experts.

Using Find and Replace to Simplify Parsing

Community Content Engineer
Community Content Engineer

Between the RegEx, Text To Columns, and XML Parse Tools, the Alteryx data artisan already has an exceptionally robust selection of tools to help parse uniquely delimited data. However, there are still some data sets so entangled in formatting that it’s labor intensive to parse even for them. Enter the Find and Replace Tool, which captures the ability to find your nightmarish parsing workflows and replace them with sweet color by number pictures. Just kidding, it finds bad jokes and replaces them with good ones. Seriously, though, you could do both if you wanted to because this tool has the capability to look up a table of any number of specified targets to find in your data and will replace them with a table of specified sources. With the help of a few quick configuration steps, this tool can simplify some parsing use cases significantly.

 

Reduce Multi-Character Delimiters for Text to Columns:

Probably the most intuitive of use cases, the Find and Replace Tool can be used to simplify multi-character delimiters. This comes in handy because the Text to Columns can only split fields by single characters; putting multiple characters in the “Delimiters” configuration option will split on each character, not the combination of them. Try reducing those delimiters to a single character with Find Replace:

 

1.png

2.png

 

Circumvent Complex Regex Expressions:

Having trouble parsing with Regex? Me too. It’s hard. You can email me at support@alteryx.com so we can work through that expression together, or you can try to work around it by specifying as many patterns as you can identify as the Find field and replacing them with a delimiter of your choice. This isn’t always applicable, but can often help you work around if time is of the essence and you can’t get your RegEx working with one of these handy RegEx expression testers.  

 

Nested HTML:

Frequently when parsing out HTML there will be layering of the formatting tags for you to unravel before capturing your records (e.g. a table inside of a table). You can parse through this iteratively by removing layer after layer of tags or you can try to find each hierarchical layer of tags and replace them with unique delimiters. This especially comes in handy when the tags might vary, even within the same hierarchical layer:

 

3.png

 

Semistructured, Consistent, Data without Delimiters:

Wanting to parse something without delimiters? Try identifying patterns that give the data semistructure, like the example below parsing out addresses fields. Address records often have multi-word addresses, cities, or states and thus make it impossible to parse an address field into address, city, state, and zip using only whitespace or numeric characters to parse. There are only so many address endings (St, Street, Rd, Road, Ave, Avenue, etc), however, and identifying these in your data can allow you to replace them with the same ending but with a delimiter. Rinse and repeat for state names with a delimiter on either side. After parsing you should now have your address field, your city, you state, and zip code in their own separate fields even if they are multiple words!

 

4.png

 

Bonus example (regex replace): whitespace delimiters

Although this example technically doesn’t even need to use the Find Replace tool, it relies on a similar technique to simplify parsing data structures with formatted (whitespace) delimiters using the function REGEX_REPLACE(). Let’s put it this way, if the Find Replace Tool was a quarterback, REGEX_REPLACE() would be Tom Brady. Picture the following:

 

5.png

 

That’s one ugly field, right? You can turn that ugly duckling into 13 swans just by applying this function in a formula tool: REGEX_REPLACE([DownloadData], '[\s]{2,}' , '|'). Translating RegEx to English, this is essentially saying “Take my DownloadData field, find the pattern \s (space) when it occurs {2,} (two or more) times, and replace it with |.” I like to think that once all the whitespace characters have been replaced by a pipe, we’ve successfully “deflated” the data to something that can be easily parsed into the fields we need. Just send your data into a Text to Columns Tool using | as your delimiter, and then tack on a Dynamic Rename to “Take Field Names from First Row of Data.” Touchdown.

 

Looking for more information on the Find Replace Tool? Everything you need to know can be found in the product help linked above!

Comments
Asteroid

I just started using a similar procedure.  Works great

 

I use REGEX_Replace with the following  <[!fiohldpMNmcsbtua][^>]*>|<\/[^t][^>]*>|<\/title>|<\/table>|&[a-z]+;

Looks crazy but it gets rid of most tags leaving the </tr> and </td>

 

The only difference is this will keep /tr and /td replacing with ~ and | respectively.

 

Some straggling text will likely require a bit more parsing (e.g. script)

 

After this it is simply a matter of Text to rows and (optionally) text to columns

 

One little nut I can't stress enough - create your regex string library.

I don't know how many times I go digging into last months workflow to copy/paste a regex string

 

I use http://www.regexr.com/ for validating regex strings.

One feature is a mouseover event to describe how the regex will parse the data.

 

Have fun parsing!

Meteoroid

Hi,

 

I am trying to do similar parsing, except I want to replace the whole string (in my Source table field) with the value from lookup table(target).

For Ex: SourceField('xxx123') Target(123). If any part of the SourceField contains '123', I want to replace value '123' from Target  into SourceField.

 

My lookup table has 60 values. So I don't want to hard code using IF THEN ELSIF condition. Is there any way or tool to do this in Alteryx?

Alteryx
Alteryx

Hi @syatham1,

 

You may want to take a look at the dynamic rename tool for that particular use case. It is more complicated to set up than the Find and Replace tool but it will allow you to do essentially a "Find and Replace with a Formula".

Meteoroid

Thanks for sharing this nugget!