Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract Table from HTML File

MRPP1982
5 - Atom

Dear Community!

I am trying to extract table from an HTML file (trust me, it is HUGE), a sub-set of it is attached here.

This form doesn't allow .html to be added, hence I have pasted code in a Notepad, please open the same using a browser.

 

Any help will be greatly appreciated.

 

Cheers.


HTML2Table.png

4 REPLIES 4
Yoshiro_Fujimori
15 - Aurora

Hi @MRPP1982 ,

 

Could you attach the file?

Yoshiro_Fujimori
15 - Aurora

Just as a staring point, I created a simple html file containing a table as below.

 

RecordID

Name

Value

1

Apple

2.0

2

Orange

1.5

 

Though I am not familiar with html format, I guess the basic idea is to extract the <tr> tag and <td> tag.

So I made a workflow as attached.

 

Workflow

html_table_parse.png

 

I hope this helps.

MRPP1982
5 - Atom

Helo @Yoshiro_Fujimori,

 

Thanks for your response. Please forgive my ignorance with HTML tags, I am not able to modify this flow/tools to suit my purpose. I have made a 'short' version of the actual file and attaching here. Could you please re-look at it and help?

 

P.S: I have attached a 'sub-set' of my HTML as .7z file, please 'unzip' the same.

Many thanks in advance!

 

Cheers!
- Sai

 

 

Yoshiro_Fujimori
15 - Aurora

Hi @MRPP1982 

 

Attached the revised version, with additional flow to deal with <th> tags.

You may need to modify the workflow to apply it to the larger data set,

but basically you can extract the rows and columns (and headers) by understanding the html tags

and then you can put the contents in table format by yourself on Alteryx.

Good luck.

 

Workflow

htme_table_parse2_1.png

Labels