Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract URL's from string

craigja
8 - Asteroid

Hi I have a long string that has URL's in it, I want to use the regular expression tool to extract the URL's into separate columns - so Im trying to cobble together an expression that looks for strings starting with "http and ending with ", where the http can be either lower or upper class - Man I hate regular expressions!

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>
15 REPLIES 15
craigja
8 - Asteroid

2018.3.5.52487

yalmar_m
11 - Bolide

Try this one!

craigja
8 - Asteroid

Hi not really what I need - I want to get 1 extra column, with the 2 URL's separated by a comma.  Some of the data has up to 9 URL's in that one section

yalmar_m
11 - Bolide

If you want to Concat the URL's, following workflow might be a solution.

afv2688
16 - Nebula
16 - Nebula

you can do a regex replace with this formula -> href="(.+)" .+ href="(.+)" .+  then add them fith a formula tool

[regex_replace1] + " ," + [regex_replace2]

 

as long as there are more https keep adding more

craigja
8 - Asteroid

Will give the above a try in a minute but just now I have a very dirty solution!

 

"(.*?)" in the regex tool, then run a regex_replace to remove "_Blank" and replace with nothing, then another regex_replace to remove " and replace them with nothing

Labels