Hi I have a long string that has URL's in it, I want to use the regular expression tool to extract the URL's into separate columns - so Im trying to cobble together an expression that looks for strings starting with "http and ending with ", where the http can be either lower or upper class - Man I hate regular expressions!
<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p> |
Solved! Go to Solution.
I have this so far: "([^"]*)"
But I want it to only pick up where inside the quotes it starts with either http or HTTP
use this href="(.+)"
If there is more than one url, add it as many times as urls in the string
But I dont know how many URL's there might be - which is also making me wonder when I tokenize it how do I know how many extra columns to create?
@craigja can you send an example of your workflow? This might help us solve your problem.
There is nothing in the workflow yet! I bring in an xml file and one of the columns contains the data below. I want to extract the URL's and create new columns, or even just one column with all the URL's seperated by a comma, actually that would be better, so for the input below:
<p>On Feb. 5, <a href="https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no..." target="_blank">ESMA</a> issued <a href="https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..." target="_blank">statement</a> on UK data in no-deal Brexit.</p>
I would like to get a new colum with:
"https://www.esma.europa.eu/press-news/esma-news/esma-sets-out-use-uk-data-in-esma-databases-under-no...", "https://www.esma.europa.eu/sites/default/files/library/esma_70-155-7026_use_of_uk_data_in_esma_datab..."
Wont open it as Im not on the latest version of Alteryx!
What version are you using? I can edit it to a previous one!