Web Scraping from a Password-Protected Site
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Alteryx Community,
I am attempting to design a workflow that can extract credit ratings from a website such as Standard&Poors. These websites however require the user to login to the site before displaying any data.
I have been able to execute a POST request to login to the S&P website according to the instructions found on the webpage linked -> https://www.thedataschool.co.uk/joe-carr/webscraping-through-alteryx-as-if-you-are-logged-in/, however, I cannot figure out how to remain logged in while executing a GET request to initiate my web scraping (even after retaining the cookie returned from the POST request).
I have already tried discussing this with an Alteryx Solutions Engineer but we could not resolve this query. I am hoping somebody in the community can help point me in the right direction. I am not very savvy with web programming so I may be structuring my requests incorrectly as well.
Solved! Go to Solution.
- Labels:
- Download
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Do you have access to their API? This would be a much more direct route: https://www.spglobal.com/marketintelligence/en/documents/spciq_api_v2.pdf
This will likely depend on the licensing that your company has arranged with theirs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Brandon,
I am trying to avoid the use of any APIs as this will complicate the approval process for the tool I am trying to create. If I cannot find any solution then I will be sure to use the link you provided though, thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @RogerWyllie ,
I solved a similar challenge via the Alteryx Python tool and the package Selenium (with this you can automate/remote control your browser).
@DavidM wrote a greate article about that:
Best regards
Phil
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Philip,
Thank you for this, I will be sure to give it a go!
