Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

HTML Regex Parse

Slushercw
8 - Asteroid

I have data coming to me in the body of an e-mail (6 actually) and have been unable to get the data scheduled in any format other than the existing format, which is HTML (I think).  I've got the outlook input tool from @rpaugh configured to pull in the data from the body of the e-mail.  Now I'm trying to extract the data I want, but the Regex tutorials I've seen aren't helping me much (I don't really have any regex experience).  I'm trying to just parse out the highlighted data below from the HTML data in the attached file:

 

CSlusher_0-1574184863713.png

 

I saw someone mention using Regex parse to remove everything inside <>, but that only partially got me there but left a bunch of \r,\n,\p in my remaining data.  I tried to be clever and use a formula tool to remove the "\r"s (for instance) and it ended up removing every r from the data.  I'm sure there's an easier way to do this, but I feel like I am spinning my wheels at this point.  Any help would be greatly appreciated.

 

Thanks!

2 REPLIES 2
rpaugh
11 - Bolide

Not super elegant, but see attached example that outputs the following:

 

Alteryx HTML Parse Example.png

Slushercw
8 - Asteroid

I didn't exactly use the flow you sent, but it put me on the right path.  The main thing I was missing was using the JSON parse.  I had to make a few changes just because the way the data comes through doesn't always have the same number of rows of data that needs to be pulled, but your example was very helpful.  Thanks @rpaugh!

 

Not the cleanest, but here was the general final result:

CSlusher_0-1574287057458.png

Labels