We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

how to parse a specific element out of an HTML when the element might show up 0:N times

rfoster7
11 - Bolide

Hello, 

 

I am streaming in a large HTML as a string value into the workflow. 

 

The HTML string has an unknown number of a specific element embedded in it. Every element I want to find will look like this: 

 

<div>Endorsements: <span class='gen-ai-file-name'>XXXXXX</span></div>

 

the XXXXXX will not always be the same length, might be a few characters, might be a long sentence, might be another embedded div or span. 

 

what I really want to get out of this is the XXXXXX value, but I'd settle for getting the entire <div></div> substring

 

This could appear 1 time in the HTML, it might appear 100 times in the HTML. It might not show up in the HTML at all. 

 

Ideally, what I'd like to return out of the parse is 1 row per occurrence with either the XXXXXX value or the entire "<div>Endorsements: <span class='gen-ai-file-name'>XXXXXX</span></div>" value. So if it shows up 1 time there will be 1 row, if it shows up 100 times, there will be 100 rows, if its not in there there won't be any rows. 

 

I'm sure I can use XMLparse to do this, but I'm not very skilled in it. And this particular <div> element may be a parent element, a child element or seventeen layers deep buried in stacked divs and spans and whatnot. 

 

It's proprietary so I can't post a sample, but hopefully I've been clear enough someone who does understand parsing and text mining can help. 

 

Thanks in advance. 

3 REPLIES 3
rfoster7
11 - Bolide

Never mind. figured it using a simple regex tokenize

 

rfoster7_0-1754002888185.png

 

doesn't seem to be a delete option, so my shame will live here forever. 

apathetichell
20 - Arcturus

@rfoster7 --- we've all been there. 

Yoshiro_Fujimori
15 - Aurora
15 - Aurora

@rfoster7 

 

I would use RegEx tool as below.

You may want to try the expression with your data (may need a little more tweaks).

I hope this helps. Good luck.

 

rfoster7.png

Input Data

Field1

abc<div>Endorsements: <span class='gen-ai-file-name'>abc</span></div>xyz

abcdefg

abc<div>Endorsements: <span class='gen-ai-file-name'>def</span></div>xyz

abcdefg

abc<div>Endorsements: <span class='gen-ai-file-name'>ghi</span></div>xyz

abcdefg

abc<div>Endorsements: <span class='gen-ai-file-name'>jkl</span></div>xyz

abcdefg

abc<div>Endorsements: <span class='gen-ai-file-name'>mno</span></div>xyz

abcdefg

 

RegEx Tool configuration

Regular Expression

<div>Endorsements: <span class='gen-ai-file-name'>.*?</span></div>

Output Method

Tokenize

Split to Rows

 

Output Data

Field1

<div>Endorsements: <span class='gen-ai-file-name'>abc</span></div>

<div>Endorsements: <span class='gen-ai-file-name'>def</span></div>

<div>Endorsements: <span class='gen-ai-file-name'>ghi</span></div>

<div>Endorsements: <span class='gen-ai-file-name'>jkl</span></div>

<div>Endorsements: <span class='gen-ai-file-name'>mno</span></div>

 

Labels
Top Solution Authors