Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract Img tag from HTML document in a table

tjamal1
8 - Asteroid

Hi

 

I have a table which contains WebID and HTMLDocument as column

 

I wanted to extract a full img tag in each HTML document 

 

Sample Data

Web IDHTMLDoc
1<!DOCTYPE html>
<html>
<body>

<h2>HTML Image</h2>
<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
<img src="image" alt="image" width="500" height="600">

</body>
</html>
2<!DOCTYPE html>
<html>
<body>

<h2>HTML Image</h2>
<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
<img src="image" alt="image" width="500" height="600">

</body>
</html>

 

 

Expected Output 

WebIDImg
1<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
1<img src="image" alt="image" width="500" height="600">
2<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
2<img src="image" alt="image" width="500" height="600">

 

Note: Each HTML document in a row can have 1 or more image

5 REPLIES 5
Qiu
21 - Polaris
21 - Polaris

@tjamal1 
Hope this is what you need.

0116-tjamal1.PNG

danilang
19 - Altair
19 - Altair

Hello @tjamal1 

 

You can also do this using a single regex tool configured to Split To Rows

 

c.png

 

The regular expressin matches all the tags that start with "<img" and end with ">".      

 

r.png

 

Dan

tjamal1
8 - Asteroid

@Qiu Thanks for the solution. 

 

This works too if your HTML document has a full image tag and others tag on separate lines. 

Since some of my document has para tag and other tags with imag tag its extracting other tags too. 

tjamal1
8 - Asteroid

@danilang Thank you for the workflow, 

This works perfectly in my case. Can I create another Column for images instead of writing tokenize image tags to the same column?

danilang
19 - Altair
19 - Altair

Hi @tjamal1 

 

To get the image tag as new column, join the result after the Regex tool back to input data set.

 

danilang_0-1610807481953.png

 

BTW:  please use @mentions, as in  @tjamal1, in your replies.  Since the posts aren't threaded, it can be difficult to determine which post you're referring to.

 

Dan 

 

Labels
Top Solution Authors