Early bird tickets for Inspire 2023 are now available! Discounted pricing closes on January 31st. Save your spot!

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.

How do I pull a dynamic hyperlink from the source code?

8 - Asteroid

I'm trying to get Alteryx to pull a certain hyperlink from this source code view-source:https://www.fda.gov/drugs/drug-approvals-and-databases/approved-drug-products-therapeuti... (specifically line 484).


I think I could download the source code as text (although I'm not sure how) and change the text to columns and from there filter to the hyperlink I need. 


I'm wondering if it's possible to direct Alteryx to pull the data straight from that location so even if the hyperlink changes, the data will be updated. Or if not, if it's possible for Alteryx to automatically pull text data from the source code?

18 - Pollux
18 - Pollux

Hi @helenjin1 


Here's a WF that gets(eventually) your pdf.  It demonstrates the basics of web scraping




It starts by building the absolute path from the site and relative path addresses.  The download tool gets the html from this path and the Text to Columns splits it to rows on the new line character.  After adding a Record ID, the filter pulls out line 484 and the Regex Parse tool gets the relative path.  


Since there's Patents page between the home page and the file you're looking for, the process repeats for this page.  


Finally in the PDF container, the pdf file is downloaded and saved to disk in the same directory as the workflow.


This isn't the best way to find this since the process uses line numbers to find the links.  You'll want to modify this to look for the HREF tags using some kind of search instead of using line numbers.  This will future-proof the workflow against any HTML changes that move the lines around



8 - Asteroid

Thank you! That was really helpful