Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Webscraping - HTTP/1.1 403 Forbidden Error

sjm
8 - Asteroid

Hi All - I've been doing some webscraping in Alteryx for several years but have a question about getting the HTTP/1.1 403 Forbidden error.

 

What's interesting is that if I try to scrape the same page using Power BI or Power Automate, I can get the data without any errors.

 

I'd much rather use Designer to set up a process to scrape several pages on this particular site, though. 

 

Does anyone have any tips? Besides adding a user agent like "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36)", which doesn't help. Seems weird that the Microsoft products are able to get to the data without errors.

 

Thanks! 

3 REPLIES 3
apathetichell
19 - Altair

Scrape it in Postman -> see what other headers Postman adds in. Could be a follow redirect or something.

sjm
8 - Asteroid

Thanks for the reply. I checked but that didn't help. I decided to go with a combination of Power Automate and Alteryx to get the desired output.

Ivan_F
6 - Meteoroid

Hi,

I suppose, a big reason for this is how the requests are set up. Websites often check if the request is coming from a bot or a real person. Even if you've added a User-Agent string, that might not be enough. Sites can look for other things like Referer, Accept-Language, and cookies to track sessions. Tools like Power BI or Power Automate might automatically include all these details, but Alteryx might not. To fix this, check the network requests using Chrome DevTools or Fiddler. You can see what headers Power BI or Power Automate are sending and try to match those in Alteryx with the Download tool.

Another issue could be that the site blocks certain IPs or tries to stop scraping. Power BI and Power Automate might get around this by sending requests at different times or using sessions. If you can, add delays between your requests or manage cookies better in Alteryx. Finally, if you need to log in, start by scraping the login page to get the session cookies. Sticking to the right request methods (GET vs POST) and acting like a real user can also help solve the problem.

Labels
Top Solution Authors