Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web scraping to download csv files

danespoors
8 - Asteroid

Good morning from merry old England,

 

I am hoping that someone a lot more savvy with Alteryx than I can help with my issue. I am attempting to download the entire collection of csv files from a website using Alteryx. My issue is that in order to download them, you need to manually click buttons to apply filters and I am wanting EVERY possible table going on this webpage.

 

I have spoken to a colleague and he has suggested web scraping to solve my issue. I have no clue about web scraping unfortunately and he is busy on another project so I am at a loss.

 

The website I am looking at is the following: https://www.hesa.ac.uk/data-and-analysis/performance-indicators/widening-participation

 

If you look at the website, you may see what I mean about all the different iterations of tables with all their filters. Ideally, I am looking to only download the "England" tables where you can apply this filter at the top.

 

I appreciate that this is a large ask but even the smallest amount of help down this road would be outstanding!

 

Thank you in advance,

 

Dane.

5 REPLIES 5
danilang
19 - Altair
19 - Altair

Hi @danespoors 

 

Check out the link under each table called Download source data (csv)

t.png

I believe, though I haven't verified for all the tables, that the Source data table contains all the values necessary to display all the variations for the given table.   Download all 8 of these and you should have all the data to build all variants.

 

Dan

danespoors
8 - Asteroid

Awesome stuff, this is a great start!

 

You wouldn't happen to know how to get Alteryx to download these colossal tables would you? Now that it's just the one table that requires downloading, it should be a much easier task. I do need to download all of these large source tables so all 8 or however many there are but at least the filtering nonsense isn't needed.

 

I can identify the html from the "Inspect tool" in Chrome but I have no idea how to use this.

 

Any help would be outstanding,

 

Thanks again,

 

Dane. 

danespoors
8 - Asteroid

I have found a way to solve this now.

 

I have analysed the exact url of the source table csv files. If I perform a download and then follow basic web scraping procedures (that I've hastily read about) I can collect all of the data from the website. I do need to reconstruct the tables since it piles it all together into an insanely large cell but now it's just a matter of time.

 

Thanks to @danilang  for your help, you definitely helped a lot.

 

I'll mark this and your answer as a solution since this has solved my issue of sourcing the data from this website.

 

Many thanks,

 

Dane.

danilang
19 - Altair
19 - Altair

You're welcome @danespoors 

 

I'll just leave this here since I already had it built. 

 

w.png

This downloads the HTML for the site, finds the tables and associated URLs and downloads them to the current directory as separate files.  This should save you the trouble of parsing the giant single cell.

 

Dan 

 

 

danespoors
8 - Asteroid

You are a star! This is superb! I was going to manually create a text input file with the download urls so this is excellent!

 

Thank you so much for this!

 

Dane.

Labels