Hi all,
I have already created a topic where BrandonB helped me a lot when teaching me how to download several PDF file from a website.
However, I am currently facing with an issue that when I download multiple PDF files simultaneously, not all of them are actually fully downloaded.
Some of the downloaded files are corrupted and cannot be opened (they only have 1 KB size).
Can anybody advise why is that?
I attach the workflow. You can just change the output address from my rive to yours and you will be able to run the workflow yourselves.
All the best,
Radek
Solved! Go to Solution.
Hi @Ray_Pospisil,
I think there is a simple fix for this by simply changing the configuration of the Download-Tool. I am not sure if both are needed, but I'd change the following:
1) Change the Connection-Settings
I'd downsized it to only 1 Connection but a higher timeout. This was simply trial-and-error. I first tried to only upsize the timeout but it started working when I also changed the connection to 1.
2) Add user-agent
This is probably optional the current point, but if you download more pages, the page might detect you as a bot and block you. Therefore it's a good idea to add a user-agent. I simply added the one from Google Chrome.
Result:
Let me know if this helped or if we need to dig further into it.
Best
Alex
Hi Alex,
The solution seems to be working.
Thanks very much for your support.
I did not know how to use these features of the download tool.
Radek
Hi, I have a similar issue and tried to configure based on the settings you mentioned but still, I get an error when I open a pdf -
When I use a public URL, there isn't an issue but with the internal company URL, it doesn't work
ERROR -
Can you please help?
Thanks,
Karishma
@grossal - can you please help ?
Hi @velisetty,
sorry for the late response. I overlooked the mail during the week.
Two questions:
1) Can you show me how the download folder looks?
2) Did it ever work? I see that you used a username/password in the connection tab. How does the login in the website look/work?
Best
Alex
Hello @grossal !
1) the output folder is like this - I have tested it in multiple file locations but the pdf error is same. I even tried to download to Temporary file but the same error
2) it is basically a salesforce site. it works with SSO and whether I enter the username password or not, the pdf file is created and the download headers is HTTP/1.1 200 Connection established.
I have 1000's of URL and alteryx would be the best approach for me if it works. Can you please help ??
Thanks,
Karishma
Hi @velisetty,
I think the SSO is the issue. There is an idea to add SSO possibilities to the download tool:
https://community.alteryx.com/t5/Alteryx-Designer-Ideas/SSO-feature-in-Download-tool/idi-p/626121
I personally would work around it with the Python-Tool and the Selenium library - It's not too hard if you are a bit familiar with Python and obviously the Python Tool has to be allowed in your organization.
The issue here is, that I cannot describe a generic way for you. To make the login work, you need to do some basic webscrapping. I could show you how it's done in a quick 15 minutes Zooms/Teams call if you want.
Maybe someone else knows a non-Python way to do it.
Best
Alex