Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Don't forget to submit your entry for the Excellence Awards by October 30! | Need more information about the program? Check out the blog here

How do I pull a dynamic hyperlink from the source code?

8 - Asteroid

I'm trying to get Alteryx to pull a certain hyperlink from this source code view-source: (specifically line 484).


I think I could download the source code as text (although I'm not sure how) and change the text to columns and from there filter to the hyperlink I need. 


I'm wondering if it's possible to direct Alteryx to pull the data straight from that location so even if the hyperlink changes, the data will be updated. Or if not, if it's possible for Alteryx to automatically pull text data from the source code?

17 - Castor
17 - Castor

Hi @helenjin1 


Here's a WF that gets(eventually) your pdf.  It demonstrates the basics of web scraping




It starts by building the absolute path from the site and relative path addresses.  The download tool gets the html from this path and the Text to Columns splits it to rows on the new line character.  After adding a Record ID, the filter pulls out line 484 and the Regex Parse tool gets the relative path.  


Since there's Patents page between the home page and the file you're looking for, the process repeats for this page.  


Finally in the PDF container, the pdf file is downloaded and saved to disk in the same directory as the workflow.


This isn't the best way to find this since the process uses line numbers to find the links.  You'll want to modify this to look for the HREF tags using some kind of search instead of using line numbers.  This will future-proof the workflow against any HTML changes that move the lines around



8 - Asteroid

Thank you! That was really helpful