Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web Scraping off Webpage

cmiller9115
5 - Atom

Newer to alteryx any responses are appreciated. Trying to parse an xlsx file from a webpage and having issues getting to the actual file when using the down load tool. Below is the link to the webpage along with a screen shot of where the xlsx file is embedded. I have also attached an example workflow. Any suggestions on how to use the down load tool/other means within Alteryx to down load the xlsx file and be able to get the data into a useable format? 

https://www.fedex.com/en-us/service-alerts.html 

Embedded XLSX file from within HTML

FedEx_Service_Update_2019_09_04_PM_Hurricane_Dorian_vf_955560470.xlsx

 

 

 

clipboard_image_0.png

5 REPLIES 5
SamDesk
11 - Bolide

Hello @cmiller9115,

 

Firstly your input URL was missing an "h" from "https".

 

Secondly, choosing to download the file to your filename field means you can then call this same field in your dynamic input tool. You will, however, have to append a sheet name to your filename so Alteryx knows which sheet of the spreadsheet to load, e.g:

[Filename]+"|||FedEx Custom Critical$"

Capture.PNG

Sam 🙂

 

geraldo
13 - Pulsar

Hi,

 

Below is your reconfigured workflow in a simpler way to download

 

 

cmiller9115
5 - Atom

Thank you for both of your responses worked great. 

cmiller9115
5 - Atom

Any idea on if the file name will be changing from a day to day basis the best way to treat that to ensure puling in the latest updated data from the webpage. For example yesterday's file nameing convention looks like  https://www.fedex.com/content/dam/fedex/us-united-states/Service-Alerts/images/2020/Q2/FedEx_Service... compared with the original from the post.

 

https://www.fedex.com/content/dam/fedex/us-united-states/Service-Alerts/images/2020/Q2/FedEx_Service...

mceleavey
17 - Castor
17 - Castor

Hi @cmiller9115 ,

 

rather than hard-coding the URL, if the actual URL is going to be dynamic, you can scrape the raw HTML from the website containing that URL, then parse out the URL using the Regex tool. This will then be the url to feed into your downlaod tool. This means you can dynamically determine the URL each time.

 

M.



Bulien

Labels