Dear Community!
I am trying to extract table from an HTML file (trust me, it is HUGE), a sub-set of it is attached here.
This form doesn't allow .html to be added, hence I have pasted code in a Notepad, please open the same using a browser.
Any help will be greatly appreciated.
Cheers.
Just as a staring point, I created a simple html file containing a table as below.
RecordID | Name | Value |
1 | Apple | 2.0 |
2 | Orange | 1.5 |
Though I am not familiar with html format, I guess the basic idea is to extract the <tr> tag and <td> tag.
So I made a workflow as attached.
Workflow
I hope this helps.
Helo @Yoshiro_Fujimori,
Thanks for your response. Please forgive my ignorance with HTML tags, I am not able to modify this flow/tools to suit my purpose. I have made a 'short' version of the actual file and attaching here. Could you please re-look at it and help?
P.S: I have attached a 'sub-set' of my HTML as .7z file, please 'unzip' the same.
Many thanks in advance!
Cheers!
- Sai
Hi @MRPP1982
Attached the revised version, with additional flow to deal with <th> tags.
You may need to modify the workflow to apply it to the larger data set,
but basically you can extract the rows and columns (and headers) by understanding the html tags
and then you can put the contents in table format by yourself on Alteryx.
Good luck.
Workflow