Hello! I am trying to extract URLs that are hyperlinked within excel. Thanks to a couple of resources (here, here, here) I have been able to replicate extracting URLs from excel hyperlinks. HOWEVER, the specific URLs I am trying to extract contain the number sign "#" in the middle of the URL. Thus when reading in Alteryx, the everything starting at the # is left off. Does anyone know of a solve or workaround for this issue? Workflow and input file attached that replicates the issue. Thanks!
Could you provide and URL example?
Usually when there is a # in the middle of the URL it refers to a "row" into a larger page. If it is the case, you may ignore what's after # (including #), read the full page and parse.
But I'm not sure this is the case. An example would help.
The URL string after the # does appear. The # is dropped as it is invalid and irrelevant.
Thanks for the replies!
@AZuc Unfortunately I can't provide the exact hyperlinks as it is a proprietary site developed by the company I work for. I know for certain I cannot ignore the # and what comes after because if I try to access the URL that way I receive an error. Sorry I wish there was a better way to troubleshoot this.
@HomesickSurfer The URL string after the # does not appear and the # is relevant for my use case. If you look at the hyperlinks in the underlying excel file, one has the # before /test and one does not. I am trying to get Alteryx to return "http://www.fake.com/#/test" and "http://www.fake.com/test"
The HyperLink is broken up into 2 pieces in the Excel file. The Site address(everything before the #) is stored in xl\worksheets\_rels\sheet1.xml.rels.
The fragment(everything after the #) is stored in xl\worksheets\sheet1.xml in the location attribute
You'll need another Input tool to grab the contents of Sheet1.xml, parse out the id from the r:id attrib and join this with the sheet1.xml.rels data.
Note that your sample URL http://www.fake.com/#/test/ isn't a valid URL. The # can only occur after all the "/" as in http://www.fake.com/test#Fragment. Excel can't properly parse this and puts /test in both the rel and the location
Dan