Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
The Expert Exam is now live online! Read about the specifics and what it took to bring it to life in the blog by our very own Elizabeth Bonnell!

Replace HTML code with ASCII equivalent

Highlighted
11 - Bolide
11 - Bolide
This is not a question but simply sharing an expression that it just took me a few minutes to create and I thought I'd share.

Business Problem:
I often download web data using Alteryx but find that sometimes the text contains HTML codes that look ugly. I needed a way to replace them with the ASCII equivalent.

Example:
Copyright © 2015 needs to read Copyright © 2015

Methodology:
Here is a page (http://www.ascii.cl/htmlcodes.htm) with a full list of the HTML codes. I found that the numbers used in the HTML codes are the same as the ASCII numbers. Therefore all I had to do was search for the prefix &#, ensure that what followed was a number, and then if it was, use that number with the CharFromInt() function to create the code.
One cool trick I found is that the ToNumber() function automatically drops any non-numeric characters from the end of the number. That took care of the trailing semicolon so I didn't have to.

Syntax:
To use the syntax below in the Formula tool, simply replace [data] with [yourfieldname] where yourfieldname is the name of the field you need to modify.

iif(
   ToNumber(GetWord(Substring([data],FindString([data],'&#')+2),0))>0,
   Replace([data],GetWord(Substring([data],FindString([data],'&#')),0),CharFromInt(ToNumber(GetWord(Substring([data],FindString([data],'&#')+2),0)))),
    [data]
)
John Hollingsworth
Clear Channel Outdoor
Highlighted
Alteryx Alumni (Retired)

@Hollingsworth

 

Thanks for sharing, great tip! 🙂

Andy Cooper
Senior Solutions Engineer - EMEA
Alteryx
Highlighted
5 - Atom

I think this a better approach:

 

IF
FindString([html], '&#') >= 0 THEN

REGEX_Replace([html], '&#(.*?);',
CharFromInt(REGEX_Replace(
REGEX_Replace([html], '&#(.*?);', '\1'), '[A-Za-z]', '')))

 ELSE [html] ENDIF

 

also here is a related macro attached

 

 

Highlighted
11 - Bolide
11 - Bolide

Thanks for sharing @Hollingsworth !

Labels