We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes. Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Scraping universities webs and description maybe.

MZ900605
8 - Asteroid

Hello, I need help in scraping universities from this website which is really complicated for me: https://whed.net/home.php

since I need names and some of their field of studies or at least websites and it only show if i click on the map or select description anyway this can be done efficiently please.

Thanks in advance.

9 REPLIES 9
IraWatt
17 - Castor
17 - Castor

Hey @MZ900605,

Is it this information your interested in?

IraWatt_0-1656413973559.png

or this:

IraWatt_1-1656413999729.png

Can you screenshot and highlight what information specifically your interested in?

Thanks,

Ira

 

 

smoskowitz
12 - Quasar

Hi @MZ900605 --

 

I took a quick look at the site and I think its do-able. Personally, I think it would be somewhat easier using Python and BeautifulSoup than Alteryx, but the Python script can be put into the Python tool for any downstream processing.

 

I don't have the time to try to code this out (as my coding skills are weak.) The biggest challenge I see is figuring out the URL for each country or state web address. Once you crack that, then you can:

 

  • Write each country URL to a list.
  • Loop through each country and get all school URL's and write all of those to a list.
  • The loop through each school URL and collect the relevsant data and maybe write that to a pandas dataframe.

Hopefully that provides some guidance.

 

Thanks,

Seth

MZ900605
8 - Asteroid

the actual names and websites if we can.

MZ900605
8 - Asteroid

would love python also..

MZ900605
8 - Asteroid

@IraWatt I hope this helps Thanks.
@smoskowitz That's really interesting will give it a try if Alteryx did not help well.

IraWatt
17 - Castor
17 - Castor

Hey @MZ900605,

just looking at your Canada example on the map when you click Canada then 

IraWatt_2-1656415862047.png

This is the request which generates the page:

 

IraWatt_0-1656415829754.png

 

IraWatt_3-1656415912622.png

(this is the view source of that page Search Results – WHED – IAU's World Higher Education Database):

IraWatt_0-1656415397052.png

The links to each box is stored on the page here. Eg the popup for "Canada - Northwest Territories" has the address: https://whed.net/detail_system.php?JTo2MF0tIzRgCmAK which you can request from Alteryx with the download tool. 

 

I think you would need to replicate these requests in Alteryx or Python to download all the countries information. 

 

MZ900605
8 - Asteroid

@IraWatt so what i need is to copy each link and connect it to download tool ?

Screenshot (59).png

IraWatt
17 - Castor
17 - Castor

@MZ900605 Here is an example workflow to get this page:

IraWatt_3-1656416226707.pngIraWatt_4-1656416255881.png

I also updated my initial look at the problem above^ 

 

 

MZ900605
8 - Asteroid

@IraWatt this will take ages :) 

I hope to find a way to collect universities names and the courses they have but this was the only website I found :(

Labels