Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
SOLVED

Extract URL's from string

8 - Asteroid

Hi I have a long string that has URL's in it, I want to use the regular expression tool to extract the URL's into separate columns - so Im trying to cobble together an expression that looks for strings starting with "http and ending with ", where the http can be either lower or upper class - Man I hate regular expressions!

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>
8 - Asteroid

I have this so far:   "([^"]*)"

But I want it to only pick up where inside the quotes it starts with either http or HTTP

Alteryx Certified Partner

Hi @craigja,

 

Did you try using the XML parse?

This might help you as well.


Best,

Yalmar

Alteryx Certified Partner
Alteryx Certified Partner

use this href="(.+)"

 

If there is more than one url, add it as many times as urls in the string

8 - Asteroid

But I dont know how many URL's there might be - which is also making me wonder when I tokenize it how do I know how many extra columns to create?  

Alteryx Certified Partner

@craigja can you send an example of your workflow? This might help us solve your problem.

8 - Asteroid

 

There is nothing in the workflow yet!  I bring in an xml file and one of the columns contains the data below.  I want to extract the URL's and create new columns, or even just one column with all the URL's seperated by a comma, actually that would be better, so for the input below:

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>

 

I would like to get a new colum with:

"https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no...", "https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..."

Alteryx Certified Partner

Is this a suitable solution?

Just using the XML parse results in both URL's, and more if there are multiple.

8 - Asteroid

Wont open it as Im not on the latest version of Alteryx!

Alteryx Certified Partner

What version are you using? I can edit it to a previous one!

Labels