Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How do I pull a dynamic hyperlink from the source code?

helenjin1
8 - Asteroid

I'm trying to get Alteryx to pull a certain hyperlink from this source code view-source:https://www.fda.gov/drugs/drug-approvals-and-databases/approved-drug-products-therapeuti... (specifically line 484).

 

I think I could download the source code as text (although I'm not sure how) and change the text to columns and from there filter to the hyperlink I need. 

 

I'm wondering if it's possible to direct Alteryx to pull the data straight from that location so even if the hyperlink changes, the data will be updated. Or if not, if it's possible for Alteryx to automatically pull text data from the source code?

2 REPLIES 2
danilang
19 - Altair
19 - Altair

Hi @helenjin1 

 

Here's a WF that gets(eventually) your pdf.  It demonstrates the basics of web scraping

 

WF.png

 

It starts by building the absolute path from the site and relative path addresses.  The download tool gets the html from this path and the Text to Columns splits it to rows on the new line character.  After adding a Record ID, the filter pulls out line 484 and the Regex Parse tool gets the relative path.  

 

Since there's Patents page between the home page and the file you're looking for, the process repeats for this page.  

 

Finally in the PDF container, the pdf file is downloaded and saved to disk in the same directory as the workflow.

 

This isn't the best way to find this since the process uses line numbers to find the links.  You'll want to modify this to look for the HREF tags using some kind of search instead of using line numbers.  This will future-proof the workflow against any HTML changes that move the lines around

 

Dan

helenjin1
8 - Asteroid

Thank you! That was really helpful

Labels