Webscraping - HTTP/1.1 403 Forbidden Error
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi All - I've been doing some webscraping in Alteryx for several years but have a question about getting the HTTP/1.1 403 Forbidden error.
What's interesting is that if I try to scrape the same page using Power BI or Power Automate, I can get the data without any errors.
I'd much rather use Designer to set up a process to scrape several pages on this particular site, though.
Does anyone have any tips? Besides adding a user agent like "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36)", which doesn't help. Seems weird that the Microsoft products are able to get to the data without errors.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Scrape it in Postman -> see what other headers Postman adds in. Could be a follow redirect or something.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks for the reply. I checked but that didn't help. I decided to go with a combination of Power Automate and Alteryx to get the desired output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
I suppose, a big reason for this is how the requests are set up. Websites often check if the request is coming from a bot or a real person. Even if you've added a User-Agent string, that might not be enough. Sites can look for other things like Referer, Accept-Language, and cookies to track sessions. Tools like Power BI or Power Automate might automatically include all these details, but Alteryx might not. To fix this, check the network requests using Chrome DevTools or Fiddler. You can see what headers Power BI or Power Automate are sending and try to match those in Alteryx with the Download tool.
Another issue could be that the site blocks certain IPs or tries to stop scraping. Power BI and Power Automate might get around this by sending requests at different times or using sessions. If you can, add delays between your requests or manage cookies better in Alteryx. Finally, if you need to log in, start by scraping the login page to get the session cookies. Sticking to the right request methods (GET vs POST) and acting like a real user can also help solve the problem.
