Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Tool to Parse Tables in HTML

We're currently using Regex and text to columns to parse raw HTML as text into the appropriate format when web scraping, when a tool to at least parse tables would be hugely beneficial.

This functionality exists within Qlik so it would be nice to have this replicated in Alteryx.

Obviously, we need to retain the ability to scrape raw HTML, but automatically parsing data using the <td>, <th> and <tr> tags would be nice.

In the following page there is a table showing the states and territories of the US:

States.PNGWith Qlik, you can input the URL and it will return the available tables in tabular format:

 

States - Qlik.PNG

 

As this functionality exists elsewhere it would be nice to incorporate this into Alteryx.

17 Comments
patrick_digan
17 - Castor
17 - Castor

+1. I think a macro could be created and utilized in the meantime, but an out of the box tool would be even better.

mceleavey
17 - Castor
17 - Castor

Yeah, macros are great for repetitive ad hoc tasks that are pretty much unique to the situation, but repetitive tasks that are generic across all users is something that I feel should be developed as part of the core functionality. I mean, who doesn't parse HTML tables?

BenG
Alteryx Alumni (Retired)

Thanks for the request.  This is something that we have seen a need for both from customer requests as well as internal use of Alteryx.  Some work has been done to try and create a tool for this, but it still needs more work in order to finish it up.  There are a lot of edge cases with HTML tables that are taking some work.  We will continue to look into it.  

 

Best Regards,
Ben

adrianloong
11 - Bolide

+1

Atabarezz
13 - Pulsar

+1

 

ARich
Alteryx Alumni (Retired)
Status changed to: Under Review
 
rdoptis
11 - Bolide

+1

SeanAdams
17 - Castor
17 - Castor

Hey @mceleavey,

 

This is one of a few areas that I think that we can improve the download tool - the other is to add native support within Alteryx for HTML and for XML.

We talked about this with @Ned and @AdamR_AYX and @NickJ at Inspire.   Essentially the idea would be to implement a new type within Alteryx for XML / HTML - and this would allow you to parse this kind of data using an object model.

 

One of the common functions in parsing HTML is to spot a table, and then pull this out into data - as you say above - and this would be one of the first capabilities that we could look to implement on this new type.

 

Fully support your thinking here - trying to unpick tables out of a text field in a data stream is more pain than it needs to be currently.

mceleavey
17 - Castor
17 - Castor

Cheers @SeanAdams

 

It's one of those things I should be fairly straightforward to implement (the quality of the HTML notwithstanding), and I think is aligned with removing the need for technical intervention if users don't have the Regex skills required.

 

KOBoyle
11 - Bolide

+1

 

In the short term I would suggest implementing functionality similar to the ImportHTML function in Google Sheets, and dealing with fringe cases at a later date if ever.

 

-Ken