Extract Table from HTML File
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Dear Community!
I am trying to extract table from an HTML file (trust me, it is HUGE), a sub-set of it is attached here.
This form doesn't allow .html to be added, hence I have pasted code in a Notepad, please open the same using a browser.
Any help will be greatly appreciated.
Cheers.
- Labels:
- Data Investigation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Just as a staring point, I created a simple html file containing a table as below.
RecordID | Name | Value |
1 | Apple | 2.0 |
2 | Orange | 1.5 |
Though I am not familiar with html format, I guess the basic idea is to extract the <tr> tag and <td> tag.
So I made a workflow as attached.
Workflow
I hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Helo @Yoshiro_Fujimori,
Thanks for your response. Please forgive my ignorance with HTML tags, I am not able to modify this flow/tools to suit my purpose. I have made a 'short' version of the actual file and attaching here. Could you please re-look at it and help?
P.S: I have attached a 'sub-set' of my HTML as .7z file, please 'unzip' the same.
Many thanks in advance!
Cheers!
- Sai
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @MRPP1982
Attached the revised version, with additional flow to deal with <th> tags.
You may need to modify the workflow to apply it to the larger data set,
but basically you can extract the rows and columns (and headers) by understanding the html tags
and then you can put the contents in table format by yourself on Alteryx.
Good luck.
Workflow
