Weekly Challenges

hanykowska · ‎04-06-2019

Done

I've noticed there's an error in the output file for 649th row - the practice is in the city column

AidanBramel · ‎04-09-2019

all hail the regex tool

Spoiler

TonyA · ‎04-09-2019

Here's my solution

DavidThorpe · ‎04-30-2019

My solution:

Spoiler

Used find and replace for HTML tags to identify Address, Practice, Physician
Then replaced the <br/> tag in address with a | to split address and city
Replaced all remaining tags using regex replace (<.*?>)

This process puts the 'practice' value for record 649 into the correct field, rather than city

Used find and replace for HTML tags to identify Address, Practice, PhysicianThen replaced the <br/> tag in address with a | to split address and cityReplaced all remaining tags using regex replace (<.*?>)This process puts the 'practice' value for record 649 into the correct field, rather than city

rachelgatto · ‎05-08-2019

Solved.

mceleavey · ‎05-20-2019

Nice challenge. I find I do a lot of HTML parsing so this is a good intro.

Spoiler

I started by using the Regex tool on the Downloaddata field and used the following expression:
<h3>.*?<li></li>
This is instructing Alteryx to parse everything between the opening ,h3> tag and the closing <li></li> tag. This identifies the beginning and end of each section. NOTE: It is important to take the <li></li> as teh close, as you can't use the </h5> tag as not all records have the final piece of information (Practice) and so with this tag missing, it means the next record will be skipped as the parser won't find the close of the range until the close of the next range.
Once this is done, you should have 1068 records.
Then simply use text to columns to split the records into their component parts:

text to columns.PNG

Now you have separate columns with some leftover HTML tags which I simply removed using a formula. I then applied a recordID and dropped the extra fields:

I started by using the Regex tool on the Downloaddata field and used the following expression:<h3>.*?<li></li>This is instructing Alteryx to parse everything between the opening ,h3> tag and the closing <li></li> tag. This identifies the beginning and end of each section. NOTE: It is important to take the <li></li> as teh close, as you can't use the </h5> tag as not all records have the final piece of information (Practice) and so with this tag missing, it means the next record will be skipped as the parser won't find the close of the range until the close of the next range.Once this is done, you should have 1068 records.Then simply use text to columns to split the records into their component parts:Now you have separate columns with some leftover HTML tags which I simply removed using a formula. I then applied a recordID and dropped the extra fields:

bkclaw113 · ‎05-20-2019

Verakso · ‎06-04-2019

Took me a while to figure out. that it is not possible to match the output 100%, but this is a close as I get.

Spoiler

Spend a great deal of time on this, and I even downloaded the official solution, and since that result does not match 100% either with the provided result, then I am happy with mine.

My WorkflowOne of the perks with nicely formatted HTML is that you can use the XML parser for this as well.
Since i just did another weekly challenge with the XML parser, then it came up more handy, than trying to do RegEx.

But as I wrote, I spend some time on this, trying to match the provided result 100%, but that was hard when the provided result has it flaws
Here the provided result has Practice in the City field

Provided ResultAnd here my result seem parsed correctly

My ResultBut still, is you compare the two result, then they are not 100% identical.

Spend a great deal of time on this, and I even downloaded the official solution, and since that result does not match 100% either with the provided result, then I am happy with mine.My WorkflowOne of the perks with nicely formatted HTML is that you can use the XML parser for this as well.Since i just did another weekly challenge with the XML parser, then it came up more handy, than trying to do RegEx.But as I wrote, I spend some time on this, trying to match the provided result 100%, but that was hard when the provided result has it flawsHere the provided result has Practice in the City fieldProvided ResultAnd here my result seem parsed correctlyMy ResultBut still, is you compare the two result, then they are not 100% identical.

Still Clmbing
/Verakso

Vidya26 · ‎06-11-2019

My Solution

RobertoEstrada · ‎06-12-2019

Here is my solution, this one was tricky as you need to review the HTML page first to identify the data you need.

Weekly Challenges

IDEAS WANTED

Challenge #40: Parsing a HTML File