Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #40: Parsing a HTML File

hanykowska
11 - Bolide

Done

 

I've noticed there's an error in the output file for 649th row - the practice is in the city column

AidanBramel
8 - Asteroid

all hail the regex tool

Spoiler
Challenge 40.PNG
TonyA
Alteryx Alumni (Retired)

Here's my solution

DavidThorpe
Alteryx Alumni (Retired)

My solution:

 

Spoiler
Used find and replace for HTML tags to identify Address, Practice, Physician
Then replaced the <br/> tag in address with a | to split address and city
Replaced all remaining tags using regex replace (<.*?>)

This process puts the 'practice' value for record 649 into the correct field, rather than city
rachelgatto
8 - Asteroid

Solved. 

mceleavey
17 - Castor
17 - Castor

Nice challenge. I find I do a lot of HTML parsing so this is a good intro.

 

Spoiler
I started by using the Regex tool on the Downloaddata field and used the following expression:
<h3>.*?<li></li>
This is instructing Alteryx to parse everything between the opening ,h3> tag and the closing <li></li> tag. This identifies the beginning and end of each section. NOTE: It is important to take the <li></li> as teh close, as you can't use the </h5> tag as not all records have the final piece of information (Practice) and so with this tag missing, it means the next record will be skipped as the parser won't find the close of the range until the close of the next range.
Once this is done, you should have 1068 records.
Then simply use text to columns to split the records into their component parts:

text to columns.PNG
Now you have separate columns with some leftover HTML tags which I simply removed using a formula. I then applied a recordID and dropped the extra fields:

Workflow.PNGResults.PNG





Bulien

bkclaw113
9 - Comet
 
Verakso
11 - Bolide

Took me a while to figure out. that it is not possible to match the output 100%, but this is a close as I get.

Spoiler
Spend a great deal of time on this, and I even downloaded the official solution, and since that result does not match 100% either with the provided result, then I am happy with mine.
My WorkflowMy WorkflowOne of the perks with nicely formatted HTML is that you can use the XML parser for this as well.
Since i just did another weekly challenge with the XML parser, then it came up more handy, than trying to do RegEx.

But as I wrote, I spend some time on this, trying to match the provided result 100%, but that was hard when the provided result has it flaws
Here the provided result has Practice in the City field
Provided ResultProvided ResultAnd here my result seem parsed correctly
My ResultMy ResultBut still, is you compare the two result, then they are not 100% identical.

Still Clmbing
/Verakso

 

 

 

 

 

Vidya26
8 - Asteroid

My Solution

RobertoEstrada
8 - Asteroid

Here is my solution, this one was tricky as you need to review the HTML page first to identify the data you need.