Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

PDF download from website (some files are with error and cannot be open)

Ray_Pospisil
8 - Asteroid

Hi all,

 

I have already created a topic where BrandonB helped me a lot when teaching me how to download several PDF file from a website.

 

However, I am currently facing with an issue that when I download multiple PDF files simultaneously, not all of them are actually fully downloaded.

 

Some of the downloaded files are corrupted and cannot be opened (they only have 1 KB size).

 

Can anybody advise why is that? 

 

I attach the workflow. You can just change the output address from my rive to yours and you will be able to run the workflow yourselves.

 

All the best,

 

Radek

7 REPLIES 7
grossal
15 - Aurora
15 - Aurora

Hi @Ray_Pospisil,

 

I think there is a simple fix for this by simply changing the configuration of the Download-Tool. I am not sure if both are needed, but I'd change the following:

 

1) Change the Connection-Settings

grossal_0-1608581911554.png

 

I'd downsized it to only 1 Connection but a higher timeout. This was simply trial-and-error. I first tried to only upsize the timeout but it started working when I also changed the connection to 1.

 

2) Add user-agent

grossal_1-1608581945662.png

 

This is probably optional the current point, but if you download more pages, the page might detect you as a bot and block you. Therefore it's a good idea to add a user-agent. I simply added the one from Google Chrome.

 

grossal_2-1608582012712.png

 

Result:

grossal_3-1608582025556.png

 

 

 

Let me know if this helped or if we need to dig further into it.

 

 

Best

Alex

Ray_Pospisil
8 - Asteroid

Hi Alex,

The solution seems to be working.

Thanks very much for your support.

I did not know how to use these features of the download tool.

Radek

velisetty
6 - Meteoroid

Hi, I have a similar issue and tried to configure based on the settings you mentioned but still, I get an error when I open a pdf - 

 

When I use a public URL, there isn't an issue but with the internal company URL, it doesn't work

 

velisetty_5-1612995927634.png

 

 

velisetty_1-1612995701911.pngvelisetty_2-1612995734392.png

velisetty_3-1612995792521.png

 

ERROR - 

 

velisetty_4-1612995855425.png

 

Can you please help?

 

Thanks,

Karishma

 

velisetty
6 - Meteoroid

@grossal - can you please help ?

grossal
15 - Aurora
15 - Aurora

Hi @velisetty,

 

sorry for the late response. I overlooked the mail during the week.

 

Two questions:

1) Can you show me how the download folder looks?

 

2) Did it ever work? I see that you used a username/password in the connection tab. How does the login in the website look/work?

 

Best

Alex

velisetty
6 - Meteoroid

Hello @grossal !

 

1) the output folder is like this -  I have tested it in multiple file locations but the pdf error is same. I even tried to download to Temporary file but the same error

 

velisetty_0-1613241559000.png

 

2) it is basically a salesforce site. it works with SSO and whether I enter the username password or not, the pdf file is created and the download headers is HTTP/1.1 200 Connection established.

 

I have 1000's of URL and alteryx would be the best approach for me if it works. Can you please help ??

 

Thanks,

Karishma

 

grossal
15 - Aurora
15 - Aurora

Hi @velisetty,

 

I think the SSO is the issue. There is an idea to add SSO possibilities to the download tool:

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/SSO-feature-in-Download-tool/idi-p/626121

 

I personally would work around it with the Python-Tool and the Selenium library - It's not too hard if you are a bit familiar with Python and obviously the Python Tool has to be allowed in your organization.

 

The issue here is, that I cannot describe a generic way for you. To make the login work, you need to do some basic webscrapping. I could show you how it's done in a quick 15 minutes Zooms/Teams call if you want.

 

Maybe someone else knows a non-Python way to do it.

 

Best

Alex

Labels