Tool to Parse Tables in HTML

We're currently using Regex and text to columns to parse raw HTML as text into the appropriate format when web scraping, when a tool to at least parse tables would be hugely beneficial.

This functionality exists within Qlik so it would be nice to have this replicated in Alteryx.

Obviously, we need to retain the ability to scrape raw HTML, but automatically parsing data using the <td>, <th> and <tr> tags would be nice.

In the following page there is a table showing the states and territories of the US:

States.PNGWith Qlik, you can input the URL and it will return the available tables in tabular format:


States - Qlik.PNG


As this functionality exists elsewhere it would be nice to incorporate this into Alteryx.

7 - Meteor
13 - Pulsar


10 - Fireball

+1 - PowerQuery already also has this functionality.  I can point it at a Wikipedia page or similar and it just automatically scrapes the data.  It even does 'table like' data.

13 - Pulsar

any updates on this?


By the one the number one 3rd party data source seems to be web pages static or dynamic.

Especially in competitive price comparisons etc.


Any chance Alteryx achieve a capability like










17 - Castor
17 - Castor

Excel also has this!


17 - Castor
17 - Castor
HI all, thanks @patrick_digan; @adrianloong ; @Atabarezz; @SeanAdams @KOBoyle et al for your comments on this. In the meantime, I've built a macro which parses tables into a .csv format. It's the best I think I can do at the moment, but it gets the fiddly HTML bit done quicker than the way we're doing it at the moment. You can download the macro in the public gallery here:!app/HTML-Table-Parse/5d495b280462d70db4d5012b Let me know your thoughts, and as always, any improvements are more than welcome! M.
5 - Atom