Dublin, IRL

Welcome to the Dublin User Group

Click in the JOIN GROUP button in Home to follow our news and attend our events!

Weekly Exercise #9

Carlos_A
8 - Asteroid

Hi everyone,

 

Hope you enjoyed last week's exercise (solutions). Apps can be very useful for automating tasks through Alteryx. This week's exercise will be a bit tougher:

 

Use case:  5280 Magazine in Denver published a list of the best doctors in the Denver metro area, you need to download that list in database form. (Note the Raw HTML has been provided in the workflow)

 

Objective:  Parse the HTML into a database format containing fields for the ID, Physician, Address, City and Practice.

 

This is a tough one, so don't worry if you don't get the full solution, just give it a go and learn some new things! Good luck.

2 REPLIES 2
Joe_Mako
12 - Quasar

Attached is my 10-step solution.

 

I pull out the Outer XML for <li class="row collapse">, add a record ID, parse out the <h3> (Physician), <h4> (Address & City) <h5> (Practice) tags, in the <h4> there is a <span> that contains the address, summarize to concatenate Practice, join multiple back together on ID, and clean up the empty strings and trailing spaces.

 

XML Parse

 

Here are three points on the differences between your output and what I came up with:

 

1. You have an issue with a character encoding in your output

My Results:
493 Yuko Kitahama-D'Ambrosia Denver 4500 E. Ninth Ave., Suite 200 Obstetrics and Gynecology

Your Results:
493 Yuko Kitahama-D&#039;Ambrosia 4500 E. Ninth Ave., Suite 200 Denver Obstetrics and Gynecology


2. For Jesse Mills, the "(..)" text is in the span tag, and in all others the span tag contains the address, but your output has that text in the city field, and then the Practice in the City.

My Results:
649 Jesse Mills (No longer practicing in the Denver area) Reproductive Endocrinology and Infertility [Null]

Your Results:
649 Jesse Mills [Null] (No longer practicing in the Denver area) Reproductive Endocrinology and Infertility


3. 51 physicians have multiple practices, for example, Reginald Bell. Your results only kept the first. I outputted it as a comma separated list in the field.

Carlos_A
8 - Asteroid

Hi Joe,

 

Nice solution I've uploaded the official solution now. I unfortunately didn't have a chance to finish the exercise this week.