Alteryx Designer Desktop Discussions

Ian_Nicholls · ‎02-05-2025

I am trying to download the html from a page, find the links to zips in it, and download those zips. This is a job that currently a person has to do every couple of weeks by just browsing and saving them to our network.

I already successfully do this for a half dozen other websites, but now I am stuck with a page where instead of downloading the html that is rendered via a browser I am ending up with the code for a challenge page. It contains things like 'challenge-error-text' and 'Enable JavaScript and cookies to continue' and does not contain the info I need to get.

The only header I am using in the download tool is User-Agent

User-Agent

Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36

Is there anything more/different I could/should be using here to get around the challenge page and make this site believe I am using a browser? This is the returned download headers.

I am new to this but that reads as if Cloudflare can tell I am scraping and doesn't want to allow it

(EDIT: I should add that the data is public and the body in question know i want to be able to scrape their website - they have made an allowance for my IP in the firewalls.)

Thanks,

Ian

DavidSkaife · ‎02-06-2025

Hi @Ian_Nicholls

I'm no expert on this but from what I've read Cloudflare uses bots to protect the website from scraping, and the connection is failing as there is one of those 'prove you're a human' challenges on the page if I'm not mistaken? Given this i don't think you're going to solve this using the Download tool. Others with far more knowledge may correct me though.

An alternative option is trying Python, there seems to be a few ideas available if you search on the web for 'Cloudflare web scraping' but i suspect this would be a LOT of trial and error with no guarantee it would work.

Ian_Nicholls · ‎02-07-2025

Thanks @DavidSkaife - I suspected that might be the case. fortunately this isn't anything sketchy and the people whose website it is are trying to amend their bot rules to allow me to make this work the way that normally works for me.

I really just wanted to know if there was anything more I could do with the header section in the event of seeing these sorts of messages. But i guess stopping what i am doing is exactly what cloudflare is meant to do...

apathetichell · ‎02-07-2025

If this is a serious business add for you - check out> https://www.zenrows.com/blog/selenium-cloudflare-bypass#undetected-chromedriver

Alteryx Designer Desktop Discussions

Download tool being stumped by website challenge

Use of Download Tool to download file from a websi...

Download Tool

Download Tool and SAP Concur

Downloading FX data from website

Download PDF from website

Re: Need to check if we can activate container bas...

Alteryx 2024.2 Upgrade Issue – Formula Tool Config...

Re: Alteryx Core Exam Data Download Issues

Re: Alteryx single function list

Re: Parsing Multiple Acronyms from a field