Early bird tickets for Inspire 2023 are now available! Discounted pricing closes on January 31st. Save your spot!

Alteryx Community Resources

Get up-and-running on the Community and with Alteryx in record time.

Text Parsing Guide

MeganDibble
Alteryx Community Team
Alteryx Community Team
Created

Banner parsing (1).png

 

Heading 1 - Red.png

Getting Started 

 

Text parsing is a common data cleaning task. You have a text field (like a column of addresses, for example) where you want to extract certain parts of the field (like the city) for further analysis. But how do you do that? This guide will walk you through several tools and many examples to help you with your text parsing questions.  

 

Something to keep in mind is that often, no one will have the exact same data format and question as you. But you can learn strategies and tools from other Community members on this topic. 

 

If you are new to text parsing or have little practice, then go ahead and watch the Interactive Lessons on Parsing Data. There are nine lessons in this category, and they cover several tools. 

 

Screen Shot 2022-11-01 at 12.37.10 PM.png

 

When working on a text parsing problem, a good strategy is to start with the simplest solution and work your way up in complexity if the most straightforward option does not work for your data. This guide is structured accordingly, starting with the simplest solutions and moving toward the more complex methods. 

 

The most complex (and versatile) option for parsing strings is a Regular Expression (RegEx). There are advantages and disadvantages to jumping straight to RegEx because you know it can handle any problem. Check out this blog on whether to RegEx or Not to RegEx to understand how the same problem can have multiple solutions and the trade-offs that occur with each choice. 

 

We will go in-depth into RegEx in the final section of the guide. But now, let’s begin our text parsing journey with the Text to Columns tool! 

 

Heading 2 - Red.png

Parsing With the Text to Columns Tool 

 

MeganDibble_2-1669672787746.png

 

The text to Columns tool allows you to split a text field into multiple fields or rows by specifying a delimiter to split on. Below are some delimiters you can specify to split the field: 

 

Screen Shot 2022-11-02 at 5.04.04 PM.png

 

You can also specify the delimiter as any single character (like a period or a dash).  The data below is from the tool example in Designer. The Address field is selected along with “,” as the delimiter. You can specify how many output columns you would like, and voila! The address field is parsed into its components. 

 

Screen Shot 2022-11-03 at 11.00.32 AM.png

 

To understand what you can do with the Text to Columns tool, check out the other tool configurations in the tool example and the Tool Mastery article. 

 

Splitting to Columns Examples 

 

Here are some examples where the Text to Column tool was used to parse data on different delimiters: 

 

In this discussion forum, you can see two solutions: one using RegEx and another using the Text to Columns tool. Both solutions allow you to parse data separated by spaces into separate columns. 

 

It’s important to note that the Text to Columns tool will only work for String fields. See this discussion where the recommendation was to use a Formula tool since the data type to parse was a Date field. For more on parsing DateTime fields, refer to the DateTime Guide. 

 

Splitting to Rows Examples 

 

The Text to Columns tool can also split data from one field into multiple rows. Here are some examples from the Designer discussion forum: 

 

If the text data you want to parse does not have a consistent delimiter, then it’s time to move on to your next options! 

 

Heading 3 - Red.png

Parsing With the Formula Tool 

 

MeganDibble_6-1669672787750.png

 

As you may be aware, the Formula tool can do a lot. This includes parsing strings! The documentation on String Functions is a comprehensive list of all functions for working with strings, and many of them can be used for parsing.  

 

Screen Shot 2022-11-03 at 2.18.06 PM.png 

You might notice that there are three RegEx functions in there: REGEX_CountMatches, REGEX_Match, and REGEX_Replace. We will cover these in the RegEx section. 

 

Extracting Text 

 

Take a look at these examples for extracting parts of text fields: 

 

Additionally, this discussion thread shows how Substring() can be used to get the text when it is always in the same position in a field and how to make the solution dynamic with REGEX_Replace() if needed. 

 

Replacing Text 

 

Here are some examples of replacing text using the formula tool: 

 

This section of the guide is focused on the Formula tool, but please note that you can use the Find Replace tool for replacing text as well. You can view the Find Replace tool mastery article here and another helpful KB on parsing with this tool here. 

 

Heading 4 - Red.png

Structured Text Parsing Tools 

 

Sometimes, you will have a column of text that has a distinct format due to the origin of the data. It might be web-scraped data and contain XML tags, or it could be data from an API in JSON format. For these cases, you can leverage tools to parse the data into rows and columns format that is easier to work with in Alteryx. 

 

XML Parse 

 

MeganDibble_9-1669672787754.png

 

If your data is in XML format, you can use the XML Parse tool to parse the data into individual fields. Below is what the data looks like before and after the XML parse tool in the tool example: 

 

1.png

 

This discussion thread solution shows how parsing XML can be an iterative process as you move from element to element. And as you can see from this solution, these problems can require quite a few XML Parse tools: 


w.png

  

JSON Parse 

 

MeganDibble_12-1669672787757.png

 

You can use the JSON Parse tool to separate JavaScript Object Notation (JSON) text into a table schema for downstream processing. Here are the input and output from the tool in the example workflow, so you can get a sense of what you can accomplish: 

 

2.png

 

Check out this discussion solution, which shows how you can use the Cross Tab tool(s) to get the output of the JSON Parse tool into a more workable format. And for a more complex JSON parsing problem, review the solution to this discussion thread. 

 

Heading 5 - Red.png

Parsing with RegEx 

 

What is RegEx? It’s short for Regular Expressions; these are text patterns used to find, match, replace and extract characters within text fields. RegEx is our most powerful tool for text parsing, as it offers the most flexibility. 

 

Screen Shot 2022-11-14 at 1.55.43 PM.png

 

RegEx is a tough topic—it is covered in the Alteryx Advanced Certification. If you have trouble with the RegEx tool, you are not alone! The rest of this guide is dedicated to walking through the tool, examples, and strategies to use when you are solving your most challenging text parsing problems. 

 

Overview of RegEx in Alteryx 

 

MeganDibble_16-1669672787763.png

 

For shorter lessons on learning how to use Regular Expressions in Alteryx, go through these interactive lessons: 

 

For a longer (1 hour) instructor-led training, watch the Introduction to Regular Expressions video.  

 

Once you have watched these videos, review the help documentation for the RegEx tool, and don’t forget to check out the Tool Example file in Designer. 

 

Screen Shot 2022-11-14 at 3.01.07 PM.png

 

We also have the RegEx Tool Mastery article, which breaks down some examples for each configuration of the tool. Additionally, there is a video about 20 Token RegEx Expressions For The Newbie to help you learn common expressions. 

 

External RegEx Resources 

 

Regex101.com is a great website for testing and troubleshooting your regular expressions. It also explains the expression and has a quick reference section for common tokens and more. 

 

Screen Shot 2022-11-14 at 2.59.32 PM.png

 

Another external resource linked in the help documentation for RegEx is the RegEx Coach. This is an application that you can download to experiment with regular expressions interactively.  

 

Examples 

 

Below are RegEx problem solutions from the Designer discussion forum. Review these before posting your RegEx question, as someone may have asked a similar one already! 

 

RegEx Replace 

 

The replace configuration for the RegEx tool will replace the expression you searched for with a second expression. Here are some examples: 

 

RegEx Parse 

 

Parsing separates the expression into new columns. You can then set the Name, Type, and Size of the new columns. A new column is created in the Output Columns table. Below are some parsing examples: 

 

RegEx Match 

 

The match option appends a column containing a number: 1 if the expression matched what you specified, 0 if it did not. Here are some discussion threads with RegEx matching: 

 

RegEx Tokenize 

 

The tokenize option splits the incoming data using a regular expression. This option works similarly to the Text To Columns tool, except instead of matching and removing what you do not want, you match for what you want to keep. You want to match to the whole token, and if you have a marked group, only that part is returned. To see this option in action, review this discussion thread: Tokenize Group of Words to Column. 

 

RegEx Blogs 

 

Additionally, we have some blogs that dive deeper into RegEx problems. For a financial use case, you can read Verifying Tax IDs with REGEX. And for a tutorial on cleaning XML data, there’s this article: RegEx for the Win: XML Tag Clean-Up. 

 

Heading 6 - Red.pngTroubleshooting Common Errors 

 

Stuck at an error in your parsing problem? Learn from the solutions to these errors that other users saw: 

 

Do you have a discussion forum thread, blog, or other content on Community that has helped you when parsing text? Drop it in the comments below.  

Comments
Wojtek_Dz
8 - Asteroid

Great article thanks @MeganDibble ! Knowledge about text parsing never enough 👨‍🎓

ShankerV
13 - Pulsar

@MeganDibble Thanks a ton.

All at one place, I have bookmarked this page for my future reference.

Would recommend everyone to go through in detail.

 

SeanAdams
17 - Castor
17 - Castor

This really is an incredible resource @MeganDibble  - thank you for the care and time that went into this.