Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

General Discussions has some can't miss conversations going on right now! From conversations about automation to sharing your favorite Alteryx memes, there's something for everyone. Make it part of your community routine!

SOLVED

Data clean td HTML, REPLACECHAR

LindaD
7 - Meteor

Sure I'm sorry, does this help?

echuong1
Alteryx
Alteryx

You can actually make this even simpler using the text to columns. Use both < and > as delimiters.

 

echuong1_0-1606945745032.png

 

joshbennett
10 - Fireball

@echuong1 has already given some great recommendations, so while I may not be able to add much here, I figured I could share a quick method I use to get rid of such unwanted characters with a single formula expression. It may seem more complex initially than it actually is, so I will try to add some contextual explanation as well - but it is a very useful method that you can rinse and repeat quickly on other datasets as applicable once you've used it once or twice.

 

Using the Multi-Field Formula, you can use the REGEX_REPLACE and REPLACE functions (see here for syntax help) to accomplish what you need quickly and easily. After selecting each of your target fields within the tool's configuration, you can chain these expressions together to accomplish what you want with a single formula.

 

  1. First, use REGEX_REPLACE with '[[:punct:]]' specified as the pattern parameter, as shown below. '[[:punct:]]' is what is referred to in RegEx as a "POSIX Bracket Expression", which you can read more about here if desired (though not required for your purpose). You can essentially think of this as a variable or placeholder that looks for any punctuation characters, all of which you are trying to remove in your use case. Thus, you can start with the following expression:
regex_replace(tostring([_CurrentField_], '[[:punct:]]', '')

 

In the expression above, 

  • The first parameter [_CurrentField_] is the variable used by the Multi-Field Formula tool reference each of your selected fields against which it applies the expression. We have to encapsulate it in a simple TOSTRING function to let the Multi-Tool Formula know to treat the data as a text string.
  • The second parameter '[[:punct:]]' tells the tool to look for any punctuation characters in the selected fields.
  • The third parameter '' just signifies an empty quoted string, indicating you want to replace the punctuation characters with nothing (effectively removing the punctuation)

 

With your particular dataset, this formula by itself would still leave your "td" characters, so we can wrap the above formula in a REPLACE function, with the initial formula specified as the first parameter, like so:

 

replace(regex_replace(tostring([_CurrentField_]), '[[:punct:]]', ''),'td', '')

 

Doing so removes all of your unwanted characters, leaving only numeric characters which you can convert to a numeric data type with a subsequent Select tool or simply by using the "Change Output Type to" setting in the Multi-Field Formula (which would just require wrapping the above formula in a TONUMBER function, similar to how the first formula was wrapped in the second in the above steps).

 

joshbennett_0-1606961949178.png

 

 

Let me know if that helps or if you have any questions.

 

LindaD
7 - Meteor

OMG is that all I had to do. Thanks so much I hope to become a great Analyst I enjoy what I am learning.

LindaD
7 - Meteor

Thanks for the detail explanation. I will read about this and from my understanding this is advance but I am up for the challenge. I have a long way to go but I am moving forward. Thanks again!

Labels