Alteryx Designer Desktop Discussions

cmartin5 · ‎07-20-2022

I am new to Alteryx and trying to figure out how to parse html data. I have a number of txt files with html data in them and would like to extract information out of a number of files in a directory. The structure of the html within the each txt file looks like this:

"
^ class=""employee"">
<h2>

<a href=""/employee/name/bob-jackson"">bob jackson</a>
</h2>

<p>
2020 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5555 </p>
</div>
^ class=""employee"">
<h2>

<a href=""/employee/name/sal-roberts"">sal roberts</a>
</h2>

<p>
2021 right street
<br/>Somewhere, US 30030
<br/>
(555) 555-5556 </p>
</div>
"

I can extract the href full name by adding a regex expression like:

I am struggling with getting anything else to show within my expression.

Note: I pulled the txt files into my workspace by doing the following:

1. using input data tool

2. keeping defaults except changing the delimiter to \0

I am not sure what the best practice is for this? Thanks for the help!

cmartin5 · ‎07-20-2022

Thanks again, I was able to get this sorted just like I wanted. Really appreciate your example, that helped me figure everything out.

Alteryx Designer Desktop Discussions

Parsing HTML REGEX

Re: Unable to get an output

Re: Extracting the list of sheet names across mult...

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula