Alteryx Designer Desktop Discussions

OscarGeorge · ‎08-27-2020

Hi All,

I'm trying to bring in an ODBC data source which contains emails in a CRM system, however the body of the email is saved as HTML. How can i read or manipulate this particular field (Description).

Attached is a screen shot to give you an idea of whats happening, any help greatly appreciated.

Many thanks

Oscar

DavidP · ‎08-27-2020

Hi @OscarGeorge

I would use a Text to Columns tool on [description] set to split to rows on delimiter \n

Then use this formula in a Formula tool: REGEX_Replace([description],'<[!fiohldpMNmcsbtua][^>]*>|<\/[^t][^>]*>|<\/title>|<\/table>|&[a-z]+;','')

It does a decent job of removing html tags in many cases.

Also try using the regex formula first and then do the split to rows - I'm not sure which will work best.

ChrisTX · ‎08-27-2020

Depending on the complexity of your HTML, the Python post below may help.

Parsing HTML with Python Tool

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Parsing-HTML-with-Python-Tool/td-p/353...

Cleaning the Text in a column

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/Cleaning-the-Text-in-a-column/m-p/4171...

List of HTML Codes useful for Website Scraping

https://community.alteryx.com/t5/Alteryx-Designer-Discussions/List-of-HTML-Codes-useful-for-Website-...

Chris

OscarGeorge · ‎08-27-2020

Thanks David, the first method worked quite well. I had to use a filter and manipulate it a little after but that's definitely a handy formula.

Alteryx Designer Desktop Discussions

HTML Data Source

Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Single App that filters user selections dynami...

Re: Sorting Sheet Names