Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract URL's from string

craigja
8 - Asteroid

Hi I have a long string that has URL's in it, I want to use the regular expression tool to extract the URL's into separate columns - so Im trying to cobble together an expression that looks for strings starting with "http and ending with ", where the http can be either lower or upper class - Man I hate regular expressions!

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>
15 REPLIES 15
craigja
8 - Asteroid

I have this so far:   "([^"]*)"

But I want it to only pick up where inside the quotes it starts with either http or HTTP

yalmar_m
11 - Bolide

Hi @craigja,

 

Did you try using the XML parse?

This might help you as well.


Best,

Yalmar

afv2688
16 - Nebula
16 - Nebula

use this href="(.+)"

 

If there is more than one url, add it as many times as urls in the string

craigja
8 - Asteroid

But I dont know how many URL's there might be - which is also making me wonder when I tokenize it how do I know how many extra columns to create?  

yalmar_m
11 - Bolide

@craigja can you send an example of your workflow? This might help us solve your problem.

craigja
8 - Asteroid

 

There is nothing in the workflow yet!  I bring in an xml file and one of the columns contains the data below.  I want to extract the URL's and create new columns, or even just one column with all the URL's seperated by a comma, actually that would be better, so for the input below:

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>

 

I would like to get a new colum with:

"https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no...", "https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..."

yalmar_m
11 - Bolide

Is this a suitable solution?

Just using the XML parse results in both URL's, and more if there are multiple.

craigja
8 - Asteroid

Wont open it as Im not on the latest version of Alteryx!

yalmar_m
11 - Bolide

What version are you using? I can edit it to a previous one!

Labels