Can anyone help on how to get rid of these <td>. I am not familiar with Regex how would I use that. But will take anything that will help. I AM NEW to all of this..UPDATE I added the workflow
Solved! Go to Solution.
@echuong1 has already given some great recommendations, so while I may not be able to add much here, I figured I could share a quick method I use to get rid of such unwanted characters with a single formula expression. It may seem more complex initially than it actually is, so I will try to add some contextual explanation as well - but it is a very useful method that you can rinse and repeat quickly on other datasets as applicable once you've used it once or twice.
Using the Multi-Field Formula, you can use the REGEX_REPLACE and REPLACE functions (see here for syntax help) to accomplish what you need quickly and easily. After selecting each of your target fields within the tool's configuration, you can chain these expressions together to accomplish what you want with a single formula.
regex_replace(tostring([_CurrentField_], '[[:punct:]]', '')
In the expression above,
With your particular dataset, this formula by itself would still leave your "td" characters, so we can wrap the above formula in a REPLACE function, with the initial formula specified as the first parameter, like so:
replace(regex_replace(tostring([_CurrentField_]), '[[:punct:]]', ''),'td', '')
Doing so removes all of your unwanted characters, leaving only numeric characters which you can convert to a numeric data type with a subsequent Select tool or simply by using the "Change Output Type to" setting in the Multi-Field Formula (which would just require wrapping the above formula in a TONUMBER function, similar to how the first formula was wrapped in the second in the above steps).
Let me know if that helps or if you have any questions.
OMG is that all I had to do. Thanks so much I hope to become a great Analyst I enjoy what I am learning.
Thanks for the detail explanation. I will read about this and from my understanding this is advance but I am up for the challenge. I have a long way to go but I am moving forward. Thanks again!
Thanks @ joshbennett for your detailed answer! I'm working on the same problem, however after using the regex replace function I still end up with unwanted characters in two cells (see picture) starting with sup..
Could you please advise how to solve this?
Thanks Thomas
Hi @Thomas
Chaining another REGEX_REPLACE should do it, this time removing any non-digit character:
REGEX_REPLACE(tostring([_CurrentField_], '[^\d]', '')
So, chaining together with the previous two would be something like:
REPLACE(REGEX_REPLACE(REGEX_REPLACE(tostring([_CurrentField_]), '[[:punct:]]', ''), '[^\d]', ''),'td', '')
Example attached - hope that helps.
Josh