Alteryx Designer Cloud Discussions

44619e163d12cf5f0d39 · ‎01-12-2018

I have several HTML pages I want to wrangle, I don't care too much about the formatting since the data is already there but I have problem extracting the raw text. Removing the tags through wrangling is a pain. Any recommended approach ?

44619e163d12cf5f0d39 · ‎01-12-2018

By the way I am on Mac, so if there is any utility I could use to do the conversion I could create a script, if needed. TIA

Trifacta_Alumni · ‎01-12-2018

There are several ways you can convert.

MacOS

You can use textutil in order to convert all html pages in the current folder to txt file

textutil -convert txt ./*.html

Linux

You could use unoconv to convert between all LibreOffice supported standards, including HTML to txt. More details and examples in https://linux.die.net/man/1/unoconv

Alteryx Designer Cloud Discussions

What is the easiest approach to wrangle HTML content ?

Re: Alteryx Cloud Designer BigQueryUnexpectedExcep...

Re: Differences between Alteryx Designer vs Cloud