community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!

Tool to Parse Tables in HTML

We're currently using Regex and text to columns to parse raw HTML as text into the appropriate format when web scraping, when a tool to at least parse tables would be hugely beneficial.

This functionality exists within Qlik so it would be nice to have this replicated in Alteryx.

Obviously, we need to retain the ability to scrape raw HTML, but automatically parsing data using the <td>, <th> and <tr> tags would be nice.

In the following page there is a table showing the states and territories of the US:

States.PNGWith Qlik, you can input the URL and it will return the available tables in tabular format:

 

States - Qlik.PNG

 

As this functionality exists elsewhere it would be nice to incorporate this into Alteryx.

14 Comments

+1. I think a macro could be created and utilized in the meantime, but an out of the box tool would be even better.

Alteryx Certified Partner

Yeah, macros are great for repetitive ad hoc tasks that are pretty much unique to the situation, but repetitive tasks that are generic across all users is something that I feel should be developed as part of the core functionality. I mean, who doesn't parse HTML tables?

Alteryx Alumni (Retired)

Thanks for the request.  This is something that we have seen a need for both from customer requests as well as internal use of Alteryx.  Some work has been done to try and create a tool for this, but it still needs more work in order to finish it up.  There are a lot of edge cases with HTML tables that are taking some work.  We will continue to look into it.  

 

Best Regards,
Ben

Alteryx Certified Partner
Alteryx Certified Partner

+1

Alteryx Partner

+1

 

Alteryx
Alteryx
Status changed to: Under Review
 
Fireball

+1

Aurora
Aurora

Hey @mceleavey,

 

This is one of a few areas that I think that we can improve the download tool - the other is to add native support within Alteryx for HTML and for XML.

We talked about this with @Ned and @AdamR and @NickJ at Inspire.   Essentially the idea would be to implement a new type within Alteryx for XML / HTML - and this would allow you to parse this kind of data using an object model.

 

One of the common functions in parsing HTML is to spot a table, and then pull this out into data - as you say above - and this would be one of the first capabilities that we could look to implement on this new type.

 

Fully support your thinking here - trying to unpick tables out of a text field in a data stream is more pain than it needs to be currently.

Alteryx Certified Partner

Cheers @SeanAdams

 

It's one of those things I should be fairly straightforward to implement (the quality of the HTML notwithstanding), and I think is aligned with removing the need for technical intervention if users don't have the Regex skills required.

 

Alteryx Certified Partner

+1

 

In the short term I would suggest implementing functionality similar to the ImportHTML function in Google Sheets, and dealing with fringe cases at a later date if ever.

 

-Ken