Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract Img tag from HTML document in a table

tjamal1
8 - Asteroid

Hi

 

I have a table which contains WebID and HTMLDocument as column

 

I wanted to extract a full img tag in each HTML document 

 

Sample Data

Web IDHTMLDoc
1<!DOCTYPE html>
<html>
<body>

<h2>HTML Image</h2>
<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
<img src="image" alt="image" width="500" height="600">

</body>
</html>
2<!DOCTYPE html>
<html>
<body>

<h2>HTML Image</h2>
<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
<img src="image" alt="image" width="500" height="600">

</body>
</html>

 

 

Expected Output 

WebIDImg
1<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
1<img src="image" alt="image" width="500" height="600">
2<img src="img_girl.jpg" alt="Girl in a jacket" width="500" height="600">
2<img src="image" alt="image" width="500" height="600">

 

Note: Each HTML document in a row can have 1 or more image

5 REPLIES 5
Qiu
20 - Arcturus
20 - Arcturus

@tjamal1 
Hope this is what you need.

0116-tjamal1.PNG

danilang
19 - Altair
19 - Altair

Hello @tjamal1 

 

You can also do this using a single regex tool configured to Split To Rows

 

c.png

 

The regular expressin matches all the tags that start with "<img" and end with ">".      

 

r.png

 

Dan

tjamal1
8 - Asteroid

@Qiu Thanks for the solution. 

 

This works too if your HTML document has a full image tag and others tag on separate lines. 

Since some of my document has para tag and other tags with imag tag its extracting other tags too. 

tjamal1
8 - Asteroid

@danilang Thank you for the workflow, 

This works perfectly in my case. Can I create another Column for images instead of writing tokenize image tags to the same column?

danilang
19 - Altair
19 - Altair

Hi @tjamal1 

 

To get the image tag as new column, join the result after the Regex tool back to input data set.

 

danilang_0-1610807481953.png

 

BTW:  please use @mentions, as in  @tjamal1, in your replies.  Since the posts aren't threaded, it can be difficult to determine which post you're referring to.

 

Dan 

 

Labels