Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parsing HTML REGEX

cmartin5
6 - Meteoroid

I am new to Alteryx and trying to figure out how to parse html data.  I have a number of txt files with html data in them and would like to extract information out of a number of files in a directory.  The structure of the html within the each txt file looks like this: 

 

"
^ class=""employee"">
<h2>

<a href=""/employee/name/bob-jackson"">bob jackson</a>
</h2>

<p>
2020 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5555 </p>
</div>
^ class=""employee"">
<h2>

<a href=""/employee/name/sal-roberts"">sal roberts</a>
</h2>

<p>
2021 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5556 </p>
</div>
"

I can extract the href full name by adding a regex expression like: 

<a href.*?>(.*?)<\/a>

 

I am struggling with getting anything else to show within my expression.  

 

Note:  I pulled the txt files into my workspace by doing the following: 

1. using input data tool 

2. keeping defaults except changing the delimiter to \0

 

I am not sure what the best practice is for this? Thanks for the help!

10 REPLIES 10
cmartin5
6 - Meteoroid

Thanks again, I was able to get this sorted just like I wanted.  Really appreciate your example, that helped me figure everything out.  

Labels
Top Solution Authors