Want to get involved? We're always looking for ideas and content for Weekly Challenges.
SUBMIT YOUR IDEALove a bit of regex
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
This one was ugly and I think I committed a mortal sin by forcing regex on html :P
parse/formula/rinse/repeat.
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
If I ever get a new dog and the family gives me naming rights, I'm naming it Regex because Regex is freaking awesome.
https://regex101.com/ is an outstanding resource for learning and testing out regex.
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
Nice and straightforward, but I still long for the days when Alteryx will automatically parse HTML tables, and we can all spend that extra few minutes dancing, staring into space, painting butterflies etc.
Anyway...
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.
Solution attached. For the first step, I used the Formula tool (FindString) to isolate the second, nested table. I had to do some googling to find the regex to parse the <tr> and <td> tag contents. This approach allowed me to complete it in four steps.
After seeing others comments and the product enhancement request logged in 2016 (https://community.alteryx.com/t5/Alteryx-Product-Ideas/Tool-to-Parse-Tables-in-HTML/idi-p/39400), I was disappointed that an HTML table parser is still not available. I love Alteryx, but I think this use case was too painful and time consuming. This task can also be accomplished in Google Sheets with a single function call (ImportHTML) and in Excel on the Data tab with a couple of button clicks. In addition to being much quicker and easier, neither of these options require any inspection of the HTML source.
-Ken
This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.