Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parsing HTML REGEX

cmartin5
6 - Meteoroid

I am new to Alteryx and trying to figure out how to parse html data.  I have a number of txt files with html data in them and would like to extract information out of a number of files in a directory.  The structure of the html within the each txt file looks like this: 

 

"
^ class=""employee"">
<h2>

<a href=""/employee/name/bob-jackson"">bob jackson</a>
</h2>

<p>
2020 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5555 </p>
</div>
^ class=""employee"">
<h2>

<a href=""/employee/name/sal-roberts"">sal roberts</a>
</h2>

<p>
2021 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5556 </p>
</div>
"

I can extract the href full name by adding a regex expression like: 

<a href.*?>(.*?)<\/a>

 

I am struggling with getting anything else to show within my expression.  

 

Note:  I pulled the txt files into my workspace by doing the following: 

1. using input data tool 

2. keeping defaults except changing the delimiter to \0

 

I am not sure what the best practice is for this? Thanks for the help!

10 REPLIES 10
cmartin5
6 - Meteoroid

Thanks again, I was able to get this sorted just like I wanted.  Really appreciate your example, that helped me figure everything out.  

Labels
Top Solution Authors