Hi All,
I'm trying to bring in an ODBC data source which contains emails in a CRM system, however the body of the email is saved as HTML. How can i read or manipulate this particular field (Description).
Attached is a screen shot to give you an idea of whats happening, any help greatly appreciated.
Many thanks
Oscar
Solved! Go to Solution.
Hi @OscarGeorge
I would use a Text to Columns tool on [description] set to split to rows on delimiter \n
Then use this formula in a Formula tool: REGEX_Replace([description],'<[!fiohldpMNmcsbtua][^>]*>|<\/[^t][^>]*>|<\/title>|<\/table>|&[a-z]+;','')
It does a decent job of removing html tags in many cases.
Also try using the regex formula first and then do the split to rows - I'm not sure which will work best.
Depending on the complexity of your HTML, the Python post below may help.
Parsing HTML with Python Tool
Cleaning the Text in a column
List of HTML Codes useful for Website Scraping
Chris
Thanks David, the first method worked quite well. I had to use a filter and manipulate it a little after but that's definitely a handy formula.
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |