community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

Tool to Parse Tables in HTML

We're currently using Regex and text to columns to parse raw HTML as text into the appropriate format when web scraping, when a tool to at least parse tables would be hugely beneficial.

This functionality exists within Qlik so it would be nice to have this replicated in Alteryx.

Obviously, we need to retain the ability to scrape raw HTML, but automatically parsing data using the <td>, <th> and <tr> tags would be nice.

In the following page there is a table showing the states and territories of the US:

States.PNGWith Qlik, you can input the URL and it will return the available tables in tabular format:

 

States - Qlik.PNG

 

As this functionality exists elsewhere it would be nice to incorporate this into Alteryx.

17 Comments
Meteor
+1
Alteryx Partner

+1

+1 - PowerQuery already also has this functionality.  I can point it at a Wikipedia page or similar and it just automatically scrapes the data.  It even does 'table like' data.

Alteryx Partner

any updates on this?

 

By the one the number one 3rd party data source seems to be web pages static or dynamic.

Especially in competitive price comparisons etc.

 

Any chance Alteryx achieve a capability like

https://www.parsehub.com/

 

search-image-v1

 

 

 

ph-app-image-v2

 

 

 

Nebula
Nebula

Excel also has this!

+1

Alteryx Certified Partner
HI all, thanks @patrick_digan; @adrianloong ; @Atabarezz; @SeanAdams @KOBoyle et al for your comments on this. In the meantime, I've built a macro which parses tables into a .csv format. It's the best I think I can do at the moment, but it gets the fiddly HTML bit done quicker than the way we're doing it at the moment. You can download the macro in the public gallery here: https://gallery.alteryx.com/#!app/HTML-Table-Parse/5d495b280462d70db4d5012b Let me know your thoughts, and as always, any improvements are more than welcome! M.
Atom

Thanks