Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Tool to Parse Tables in HTML

We're currently using Regex and text to columns to parse raw HTML as text into the appropriate format when web scraping, when a tool to at least parse tables would be hugely beneficial.

This functionality exists within Qlik so it would be nice to have this replicated in Alteryx.

Obviously, we need to retain the ability to scrape raw HTML, but automatically parsing data using the <td>, <th> and <tr> tags would be nice.

In the following page there is a table showing the states and territories of the US:

States.PNGWith Qlik, you can input the URL and it will return the available tables in tabular format:

 

States - Qlik.PNG

 

As this functionality exists elsewhere it would be nice to incorporate this into Alteryx.

17 Comments
Jchantnicki
7 - Meteor
+1
Atabarezz
13 - Pulsar

+1

OldDogNewTricks
10 - Fireball

+1 - PowerQuery already also has this functionality.  I can point it at a Wikipedia page or similar and it just automatically scrapes the data.  It even does 'table like' data.

Atabarezz
13 - Pulsar

any updates on this?

 

By the one the number one 3rd party data source seems to be web pages static or dynamic.

Especially in competitive price comparisons etc.

 

Any chance Alteryx achieve a capability like

https://www.parsehub.com/

 

search-image-v1

 

 

 

ph-app-image-v2

 

 

 

SeanAdams
17 - Castor
17 - Castor

Excel also has this!

+1

mceleavey
17 - Castor
17 - Castor
HI all, thanks @patrick_digan; @adrianloong ; @Atabarezz; @SeanAdams @KOBoyle et al for your comments on this. In the meantime, I've built a macro which parses tables into a .csv format. It's the best I think I can do at the moment, but it gets the fiddly HTML bit done quicker than the way we're doing it at the moment. You can download the macro in the public gallery here: https://gallery.alteryx.com/#!app/HTML-Table-Parse/5d495b280462d70db4d5012b Let me know your thoughts, and as always, any improvements are more than welcome! M.
PeteSimp
5 - Atom

Thanks