Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web Scraping - Importing data from the internet.

Mamamia
6 - Meteoroid

Hi,

I'm trying to import currency exchange data from this website: https://www.gov.uk/government/publications/hmrc-exchange-rates-for-2023-monthly

I want to import the monthly data directly from the website into Alteryx. I'll love to see the data in rows and columns like this. I'll appreciate any help I can get.

 

Kadmia_0-1675782159081.png

 

Regards,

Kadmia

 

4 REPLIES 4
Luke_C
17 - Castor
17 - Castor

hi @Mamamia 

 

You can pass the URL of the download file link through a download tool, then just parse the data out. I had to rename the fields manually because there's a special character that prevented the dynamic rename tool from reading the first row as headers.

 

You could probably go a step forward and dynamically generate the URL under the assumption that the URL is consistent month over month.

 

Luke_C_0-1675783092742.png

 

 

mceleavey
17 - Castor
17 - Castor

Hi @Mamamia ,

 

The best way to do this is to go to the primary URL and dynamically scrape the URLs from this page:

 

mceleavey_3-1675784132665.png

 

 

The URL that you need will lead to a download asset and will be labelled as such in the URL itself. To parse these, use the following regex in the Regex tool:

(https://assets.*?csv)

This will extract the full URL which contains the word asset and leads to a csv file.

From there, create the fullpath of the temp file location:

 

mceleavey_1-1675784010427.png

 

This allows you to reference the temp file location in the same flow.

If you wish to only take the latest file, use the sample tool to pull the first url and push this into the next download tool. Configure the second download tool as follows:

mceleavey_2-1675784086448.png

this allows you to dynamically determine the import location for each temp file.

 

I have then included a tool which allows you to load in all csv files without getting a schema error.

I have attached the workflow and the macro you will need.

 

If you need more information about web scraping you should try the following resources:

 

https://www.thedataschool.co.uk/jeremy-kneebone/easily-download-data-internet-alteryx

https://www.thedataschool.co.uk/manuela-marolla/web-scraping-101-a-guided-example-in-alteryx-part-1

 

I hope this helps,

 

M.

 

 



Bulien

Mamamia
6 - Meteoroid

Thanks. This was very helpful

Mamamia
6 - Meteoroid

Thanks. This was very helpful 

Labels