I'm trying to use the download tool to download data from AWS S3 using a presigned URL, but I keep getting the following error: Error transferring data: Failure when receiving data from the peer. What am I doing wrong?
There is not much info to go off here. First guess is encoding as you're trying to use a shortcut (pre-signed URL) in a programmatic process.
Break the problem into parts to track down the issue:
I will try to answer your questions to the best of my ability, with the caveat that I cannot give every bit of info due to company confidentiality. But here goes:
I hope that this answers most of not all of your questions. Let me know if you have more.
This is very not correct.
What you are saying is wrong.
Presigned URL = credentials of the person who created the presigned url. the credentials are embedded in the link. there are no secondary credentials. they are included in the URL with keyfields like X-Amz-Signature=...
S3 download = temporary credentials. share what you are sending to your download tool. If you have a file you need to download using a role/federated access -> you should authenticate outside of Alteryx via CLI and run AWS CLI commands via the run command tool -> or use boto3 with a config file which you use to setup your credentials.
you need to share some configuration of your download tool so we can troubleshoot this. Can you check in postman? Can you see the content-type? can you add this to the download tool.
Here's an example of the format of a presigned URL:
https://{{bucket}}/{{object}}?response-content-disposition=inline&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEBAaCXVzLWVhc3QtMiJHMEUCIEwpZuxkEL6Zjm4xvyp61GLGQDNxY9nA6XO/Fkr/Po8mAiEA7NfAs/RGKmN5G3AyaV9RGrY3XcZFil3M9B3e6I2fVMIq1AMI+f//////////ARABGgw2MTA0ODYzODE2OTIiDOMTQSV5rA4Ad/SNUiqoAwVBjye2iariFt2wtyK+83i7LLGfoPdHNhpiw1VVRwuVPPqov21gjUJHk5lrLuHhzoaplkpcWxUy3wc3WtOP4CX9G6GAdsRfSQUFYmZ4R737jCRrt0c9wLTFggY5g4IsK4cCP7ZliTBeIvYRVLaApl3WpAlshGes35fbPffKj3XnBUFnZtgTMLGiVYjIidVIL6Y/2HfXkL6GOObofY9f9CD9/mwImgqFWMYfQ1mXHlyooxyfOKr5kQWN/1bblGh3rEB6KVDbuaQYXTBoQygnQOuypoyX21epZ0szJw+MnTx3VANp8VwxE8vRxd9536e3TWw8qJjv49OHDVNKr3am+EOEr8SATxIEqFL9RQzx50dklLklD7dg9t3EUezTiboWBiLczEWL1mIwiBxSwcpH/DswKeBAr43RQPDsf/yR01pvmhxrsTsmkJ7UkOpIz1ZJY19sV9OtMn4nEJkPR+UrABZLnJarausBfOXLOuENoakqNSpnlHCZpyh0iV9U2noehuaL0ilurlT0LAvK+gZ4a9DMYDmYahns9toWJNRSBGZS0U67s2hSiBww0tmUvAY65AIzdokskPaEROENWuqly7i91RY33G1UHYAoU7oEeXuoJboIonp/J9cqOYmt1fAqxAcaR6KHrFbXoGVGVNvj1oyOJcAGin6REpgxjpOfS2UEBLudSnqGLpoLHBD4cXL/BDeucq3k0ASf9Lx3ogi4kKb9M0HoXdnmMDxMNNtJORkDJJLSFh+JEtzEufgZTJK48ZkCV0qyY+/ludhVXZmRb+K5oAxOwGPHE0dibbcog/SjCJJ0fs1f2lfkx+C642Fx1P7Y7Qk0q7ecRL2bxenvYmFpvMLVLc3FKQ2rn+hTQ9vO6Z35y9ecDLNXFCiUJV1nR2Buzt3XuQx6BY8KjLFvNaONGBNB+KNKtgVDzoO1KTuqEShKIA9th/Z/2fxzVXoFR/z6CEGb+9O3Vn+XP0af8iywR1ZlvezsYd29gkjABjGAX/FEdnkKXKBUp3xYrUeWQdGVpKEV9ewaT75ZtcDbrH0NF6cCdw==&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAY4I6S5R6HL52JJKI/20250114/us-east-2/s3/aws4_request&X-Amz-Date=20250114T001853Z&X-Amz-Expires=60&X-Amz-SignedHeaders=host&X-Amz-Signature=9dadaa2424abce8f8d21c1b0120d6d9ce4b68d06a27851ef65dbfccb71042433
To see if the simplest explanation works, double encoding... lets not skip past the easiest explanation and thing to investigate. Troubleshooting something like this will require learning some elementary conventions such as URL encoding and what the credentials/bucket/object etc are (@apathetichell has laid out most of this part, I advise getting comfortable with that terminology).
URL encoding changes the parts that won't work in a URL as they are either invalid or reserved. It is not some big algorithm that's hard to understand. e.g replace " " (space character) with "%20".
I assume the following:
Now, on the download tool, is the box "Encode URL Text" ticked? Does the result change with that toggled?
Unfortunately, the error message is the same regardless of whether or not that box is ticked.
I am 90% sure this isn't an encoding situation. ->but yes - it should be encoded.
-> it's more likely:
1) -> your field is not large enough to accommodate the presigned URL field because--- who knows -it's your workflow. use a select tool-> see if it's truncated. This would usually occur if you had a specific field in a text input -> and then adjust it for full link
2) content-type. I don't see this as set in postman.
3) you are using the wrong http action. use GET.
4) you are not putting the entire presignedURL into the URL in the download tool. reminder ->>>>> this is the only thing you need. you do not authenticate a presigned link.
having said that -> without know what your error is -> all I can really do is shrug. Perhaps post your error? What response code are you getting?
Okay, I've included as many screenshots as I can while omitting any URLs for security reasons. I've used a select tool to make my Download URL a V_WString with a max size of 1,000,000 characters, just to be safe. Still seeing errors however.
post your error? What response code are you getting? any of your earlier presigned urls should be timed out by now -> can you post them in the format I did above -> with the bucket/object keys omitted?
Okay here's the error with the bucket and object omitted:
Error: Download (6): Error transferring data: Failure when receiving data from the peer for https://{{Bucket_Name}}.s3.amazonaws.com/{{Object_Name}}?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIASYNCP6NI6M2AIZR7%2F20250114%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250114T140718Z&X-Amz-Expires=5000&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEB8aCXVzLWVhc3QtMSJHMEUCICMalkuupJE06XKLZLbE3m6qfk6v0DJZZyRNHMgUAdxbAiEAz0tyd86uKJ1RhgkHirOkHY7J0mHePtLpsz62rxqzOIUq1gIIFxADGgwxODk4NTYyMTU4ODkiDA50ceg%2FoOB%2FJTjDAyqzAj%2BWH7BZNfpuWxeb2VACrGJ1OebiW4omwI2gJs7SYA3sCnTqVvPOdmFxLFPTrFyPyx7PJXOAKp2QOUNT90MnJpgqxWzeo51n8O8PiesKnsh5BojylYJOnwKGwfc58h4b1nJX5lHgq9YSXBKY4KcklIdH12X468cEsRwXF9vF4Ys5mSeGI%2FxB4G61FM1eNKOcLGgHCgKBt3A6lch3e6X%2BOAHpUMMPd%2BNn8mjt11tCrOa9SNGPFQUDBIFbeffVgJcSQTTYVER%2BhPVCB8SRf1atu0%2FqY%2FslMgPGEHzQ1OgERRNhcLvLXVxuPBRl6uwdVGyiGmGjxWwx%2BY354%2BREkm%2Fg7OKuTfdzURoyE0B9O5A0hZH%2B9uwIKU4Q%2F7RVj141rS7u70uLKOxwPSrbW41RySLhRW9%2Ft0AwkN%2BZvAY6ogH2%2FJe3Jl9NY5qWCQjdttYErzSdSEzYkykjj%2FdgLhpfSxZnF97xla2eikUWG3ma%2FAI1UUKt5ntL9fU8U3lcVivgh%2Ft2W1pPNEQG82BHqJQTcYe0Wk7LkD%2Bt859xOL0GxiYEEKkqbvfCzRxkTzRsJ9vZhERPgk%2Ba7NlLVo%2FWbdF36OCYaba0YvZfJ%2FoQOQBrC2BeOY%2F4cANBvU0SPCSna9PLi4Y%3D&X-Amz-Signature=685b1192d3a5d497e077590f00030b3413c8c138322e8f15a0d7837f1d938361
This sounds like a VPN issue. Any chance your bucket is in a VPN and you are not?
Could be. All I know is that Alteryx is the only place where this presigned URL is failing. I tried running the same code in my IDE, and then I tried opening the URL in a web browser, and it worked both times. I know the bucket I'm trying to access is protected by KMS, but that's about all I know.
EDIT: I could probably get the download to work if I had the python tool do it, but I'm using this workflow to download some large files (~5 GB), and I don't trust the python tool to do that download efficiently.
Ok... so there are a few things that it could be with that info.
1) can you try with a smaller file. -> let me know if that works.
2) can you toggle from download to temp to download to a specific file (and give it a filename)
3) can you confirm that that exact link (or any exact link which fails in Alteryx) runs on Postman?
4) if you have the aws cli -> can you try to donwload it using aws s3 cp .... commands?
also---> try throwing this pair in your headers.
User-Agent
Okay, I've generated a new presigned URL for a smaller data file (~83 KB), and the URL is still failing in Alteryx with the same error. Interestingly, I tried running the URL in Postman too, and the request ended up timing out and failing there too. Not sure what this means though.
ok-> that's a good sign ->that means it's not Alteryx specific which would be a bit of a black box. if you are on VPN -> go off VPN. if you have a KMS -> make sure you have access to the KMS and the S3 when you are creating the presigned URL. If you can -> extend the timeout. Are you creating the presigned URL in boto3 or in console?
I'm using boto3 in order to create the presigned URL in the workflow. This is the work that the python tool is doing in my workflow. I could have the python tool do everything if I wanted to, but it would result in very poor performance.
Okay I think I've narrowed it down further. I was finally able to get my presigned URL to work in postman, and the fix was to change my proxy settings to utilize the company's proxy server. It makes sense why my downloads were fine using the Python tool, because I could use os.environ to update the environment variables to include those proxy settings. I'm not sure how to do that with the download tool, though. Any suggestions?
ok -> so 1) for proxy you can go to options/user settings/edit user settingsl.
2) this may prevent your boto3 from connection to aws -> you may need it there.
3) we talk about bad performance with python when you are executing massive dataframes in python -> you are (presumably) executing a few lines of code -> I do not believe that you would see significantly worse performance in boto3 than the download tool -> and since you can use a config to set concurrency -> I am (high 90s) percent sure you would see enhanced performance for huge files in boto3 vs the download tool with presigned url.... I can take a look at timing for a 9gb file in a concurrent download in boto3 outside of Alteryx. I don't think Alteryx is adding much here.
Okay that was actually pretty revealing. It makes sense that massive dataframes would be the bottleneck in the tool. If that's the case, I don't even need a presigned URL at this point. I'll just use boto3 to download to a temp file, then return the path of that temp file in the dataframe. Then I can use a dynamic input tool to read the data into Alteryx. Would that work?
EDIT: Would it also be possible to use a python library like tqdm to track the progress of the python tool as it's running?
hmm-> not sure. I'd recommend setting up a transfer config as shown here:
https://boto3.amazonaws.com/v1/documentation/api/1.9.42/guide/s3.html
1) you can set up multi-parts for larger files.
2) you can increase concurrency.
This-> I'll just use boto3 to download to a temp file, then return the path of that temp file in the dataframe. Then I can use a dynamic input tool to read the data into Alteryx. Would that work?
Would be my recommendation with two caveats->
1) I might use run command/AWS CLI vs python/boto3. I just like the CLI.
2) I'd use a batch macro vs dynamic input. I am the most anti dynamic input person on community so this might just be me.