community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More
SOLVED

Extract URL's from string

Asteroid

Hi I have a long string that has URL's in it, I want to use the regular expression tool to extract the URL's into separate columns - so Im trying to cobble together an expression that looks for strings starting with "http and ending with ", where the http can be either lower or upper class - Man I hate regular expressions!

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>
Asteroid

I have this so far:   "([^"]*)"

But I want it to only pick up where inside the quotes it starts with either http or HTTP

Alteryx Partner

Hi @craigja,

 

Did you try using the XML parse?

This might help you as well.


Best,

Yalmar

Alteryx Certified Partner
Alteryx Certified Partner

use this href="(.+)"

 

If there is more than one url, add it as many times as urls in the string

Asteroid

But I dont know how many URL's there might be - which is also making me wonder when I tokenize it how do I know how many extra columns to create?  

Alteryx Partner

@craigja can you send an example of your workflow? This might help us solve your problem.

Asteroid

 

There is nothing in the workflow yet!  I bring in an xml file and one of the columns contains the data below.  I want to extract the URL's and create new columns, or even just one column with all the URL's seperated by a comma, actually that would be better, so for the input below:

 

<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>

 

I would like to get a new colum with:

"https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no...", "https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..."

Alteryx Partner

Is this a suitable solution?

Just using the XML parse results in both URL's, and more if there are multiple.

Highlighted
Asteroid

Wont open it as Im not on the latest version of Alteryx!

Alteryx Partner

What version are you using? I can edit it to a previous one!

Labels