Hi Everyone,
I have HTML tags in single column (each row). I want to extract information present between <p> tags (the HTML contains inline CSS as well so class references are there):
Column 1
<p class="abc">Something to extract</p>
<p class="xyz">Something extra </p>
Plain text - want to ignore this
<DOCTYPE!.....><p>Something</p>
I tried to use RegEx but I am beginner in RegEx so I am unable to get anywhere.
Appreciate if you can help.
Regards
Solved! Go to Solution.
Hi @Ahmad_S
You can try something like this
REGEX_Replace([Field], ".*<p[^>]*>(.*)</p>", "$1")
Cheers,
Hi @Ahmad_S
If you want to achieve this without using Regex, you could use a filter tool to just pull the rows you require - [Field] Contains "p class="
Then use a Text to Columns tool with the delimeter of '>' this well then separate out just the text you are looking to parse out.
Let me know if that makes sense and works for you!
Thanks
Will
Hi @wdavis
Unfortunately, it have too many html tags in single row and if I do text to column, I am pretty sure, I would've easily 20+ columns to deal with.
Similarly, I want to keep the data where there is no HTML tag as it is. If I use Filter, it will exclude those field.
Regards