Challenge #13: HTML Table Parsing
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Love a bit of regex
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Solution attached
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
This one was ugly and I think I committed a mortal sin by forcing regex on html :P
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Solution attached - thanks!
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
parse/formula/rinse/repeat.
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Challenge Completed
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here is number 13!
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If I ever get a new dog and the family gives me naming rights, I'm naming it Regex because Regex is freaking awesome.
https://regex101.com/ is an outstanding resource for learning and testing out regex.
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Nice and straightforward, but I still long for the days when Alteryx will automatically parse HTML tables, and we can all spend that extra few minutes dancing, staring into space, painting butterflies etc.
Anyway...
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Solution attached. For the first step, I used the Formula tool (FindString) to isolate the second, nested table. I had to do some googling to find the regex to parse the <tr> and <td> tag contents. This approach allowed me to complete it in four steps.
After seeing others comments and the product enhancement request logged in 2016 (https://community.alteryx.com/t5/Alteryx-Product-Ideas/Tool-to-Parse-Tables-in-HTML/idi-p/39400), I was disappointed that an HTML table parser is still not available. I love Alteryx, but I think this use case was too painful and time consuming. This task can also be accomplished in Google Sheets with a single function call (ImportHTML) and in Excel on the Data tab with a couple of button clicks. In addition to being much quicker and easier, neither of these options require any inspection of the HTML source.
-Ken
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.