Hello, I need help in scraping universities from this website which is really complicated for me: https://whed.net/home.php
since I need names and some of their field of studies or at least websites and it only show if i click on the map or select description anyway this can be done efficiently please.
Thanks in advance.
Hey @MZ900605,
Is it this information your interested in?
or this:
Can you screenshot and highlight what information specifically your interested in?
Thanks,
Ira
Hi @MZ900605 --
I took a quick look at the site and I think its do-able. Personally, I think it would be somewhat easier using Python and BeautifulSoup than Alteryx, but the Python script can be put into the Python tool for any downstream processing.
I don't have the time to try to code this out (as my coding skills are weak.) The biggest challenge I see is figuring out the URL for each country or state web address. Once you crack that, then you can:
Hopefully that provides some guidance.
Thanks,
Seth
the actual names and websites if we can.
would love python also..
@IraWatt I hope this helps Thanks.
@smoskowitz That's really interesting will give it a try if Alteryx did not help well.
Hey @MZ900605,
just looking at your Canada example on the map when you click Canada then
This is the request which generates the page:
(this is the view source of that page Search Results – WHED – IAU's World Higher Education Database):
The links to each box is stored on the page here. Eg the popup for "Canada - Northwest Territories" has the address: https://whed.net/detail_system.php?JTo2MF0tIzRgCmAK which you can request from Alteryx with the download tool.
I think you would need to replicate these requests in Alteryx or Python to download all the countries information.
@IraWatt so what i need is to copy each link and connect it to download tool ?
@MZ900605 Here is an example workflow to get this page:
I also updated my initial look at the problem above^
@IraWatt this will take ages :)
I hope to find a way to collect universities names and the courses they have but this was the only website I found :(