Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #13: HTML Table Parsing

GeneR
Alteryx Alumni (Retired)

The link to the solution for last challenge #12 is HERE.

 

For this challenge let’s look at creating a multi-level hierarchy from employee-manager data. As always there are several ways to do this challenge, I have designated it as an advanced challenge because some of the more complex functions like RegEx can be used, but it is not absolutely necessary. 

 

The use case:

 

We have HTML data that is in a single field, the HTML contains an HTML Table.

 

The input contains a series of name/value pairs within the description field. The description field has a HTML table that contains 14 name/value contained within <td> tags. Each pairing can be found on a different row (designated by the <tr> tag).

 

The objective is to produce a table containing the 14 name/value pairs.

 

Good luck, I look forward to your feedback.

 

Update: As of 9/20/19, the start and solution files were updated.  Your solution may not match those posted by Community members prior to this date. 

TaraM
Alteryx Alumni (Retired)

The solution has been uploaded.

Tara McCoy
alex
11 - Bolide

I'm not familiar with HTML so I'm not sure how well my solution would work on other HTML examples. My bonus was that I got to use one tools in the labratory menu!

Spoiler

Week13Image.PNG
I added the second record to make sure the make columns would work with more than 1 record.
Week13dataImage.PNG
GeneR
Alteryx Alumni (Retired)

I have never seen the 'Make Columns" tool in use before.   Nice Job!

Aguisande
15 - Aurora
15 - Aurora

Good work Alex!

 

Another proof that there are a lot of ways to accomplish a goal in Alteryx!

 

markp201
8 - Asteroid
Spoiler
I'm enjoying these weekly challenges.  Applying quite a bit on my projects - one of which is parsing html tables.

Capture.JPG

First formula line does large part of the work then two more formula to identify rows/columns

REGEX_Replace(Description,'<[t][^>]*>\W{0,1}|<\/[^t][^>]*>\W{0,1}|Null>','') will remove all but /td and /tr

REGEX_Replace(Description,'</tr>','~')

REGEX_Replace(Description,'</td>','|')

 

 

Naledi
7 - Meteor
Spoiler

I made this alternative solution. Largely driven by trying to use the xml parse. This was quite a challenge as I'm new to Alteryx and don't know much about HTML so it took me a while to figure out. I explain each step in more detail in this blog post

Numbered solution.png

 

MarqueeCrew
20 - Arcturus
20 - Arcturus

I'm not overly proud of this one @JoeM 😞

 

 


This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
SeanAdams
17 - Castor
17 - Castor

Kinda half-way between the provided solution from @GeneR, and the solution from @MarqueeCrew

 

Spoiler

This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.

NicoleJohnson
ACE Emeritus
ACE Emeritus

My solution. I might bookmark this one to practice again down the line when I feel a bit more comfortable with my XML Parsing & RegEx skills... 

 

PS. @alex, the "Make Columns" tool has made my day. This will be SO useful for some of my work applications!! I went back & tested it out with my workflow on this challenge after I'd solved it with my original method, and it's so much cleaner, allows for fewer tools... Very cool tool!

 

 

 


This post has been edited by Community Moderation to redact sensitive attachments. The original attachment has been replaced by post_placeholder.txt.