Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help with HTML Parsing Workflow

ivesbr
7 - Meteor

Hi:

 

I've created an HTML parsing workflow that pulls data from a FF website and then uses the formula function to replace the </tr> and </td> html tags with ~ and | symbols so it can then be broken into rows and columns. 

 

I noticed, however, that one of the desired column headers (Yds /Comp) does not come through the download tool with a </td> tag (workflow attached).  As a result, the workflow is mashing up two columns (Yds /Comp and TDs) into one. 

 

Any suggestions on potential solutions for this would be appreciated.  Thanks!  

5 REPLIES 5
danilang
19 - Altair
19 - Altair

Hi @ivesbr 

 

Since the <td> open tag is always there, use this in formula

 

Replace(Replace([DownloadData], '</tr>','~'),'<td>', '|<td>')

And continue splitting on "|"

 

Dan

 

PhilipMannering
16 - Nebula
16 - Nebula

If @danilang hasn't sorted it (which I doubt),

 

This might help you on your way,

 

parse xml.jpg

ivesbr
7 - Meteor

Awesome!  Thank you @PhilipMannering and @danilang 

 

Follow up question for @danilang.  How does the formula know to parse out that one column when you write the expression like this - Replace(Replace([DownloadData], '</tr>','~'),'<td>', '|<td>')?  Just for my own edification.  

 

@PhilipMannering ... I'm going to pour over your much more advanced workflow to see if I can pick up some new learnings.  

 

Thanks again!

 

 

danilang
19 - Altair
19 - Altair

Hi @ivesbr 

 

The formula Replace(Replace([DownloadData], '</tr>','~'),'<td>', '|<td>') is just a nested version of the Replace Function.  The Alteryx engine works from the inside out in these cases.  The red part is performed 1st, with the string in [DownloadData] being modified.  The modified string is used as the input to the outer green Replace function.

 

Dan  

ivesbr
7 - Meteor

Got it ... thank you @danilang!

Labels