In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests Early 2026. Make sure to check your account preferences in my.alteryx.com to make sure you have filled out your security questions. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parsing HTML REGEX

cmartin5
6 - Meteoroid

I am new to Alteryx and trying to figure out how to parse html data.  I have a number of txt files with html data in them and would like to extract information out of a number of files in a directory.  The structure of the html within the each txt file looks like this: 

 

"
^ class=""employee"">
<h2>

<a href=""/employee/name/bob-jackson"">bob jackson</a>
</h2>

<p>
2020 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5555 </p>
</div>
^ class=""employee"">
<h2>

<a href=""/employee/name/sal-roberts"">sal roberts</a>
</h2>

<p>
2021 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5556 </p>
</div>
"

I can extract the href full name by adding a regex expression like: 

<a href.*?>(.*?)<\/a>

 

I am struggling with getting anything else to show within my expression.  

 

Note:  I pulled the txt files into my workspace by doing the following: 

1. using input data tool 

2. keeping defaults except changing the delimiter to \0

 

I am not sure what the best practice is for this? Thanks for the help!

10 REPLIES 10
cmartin5
6 - Meteoroid

Thanks again, I was able to get this sorted just like I wanted.  Really appreciate your example, that helped me figure everything out.  

Labels
Top Solution Authors