Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Amazon S3 Zip file download issue

DBarenburg
7 - Meteor

I have some reports being sent to my S3 bucket in a specific folder. The files have names that also includes the date, are zipped files that have a single csv file each.

 

I would like to use Alteryx to download the most recent file daily and unzip the file, read the csv, clean up the data and append to a table on my Snowflake server.

 

I have tried the use the Amazon S3 Download tool, as well as the Input data tool to do this. I tried setting up a macro using text input. However, everything I have tried has failed to get the data into Alteryx.

 

I have an existing Amazon Upload tool macro that works, with no issues.

Has anyone else had issues and found a clear way to do what I need? Does anyone have an existing workflow that can be “cleaned up” so that I can recycle and use to get my downloads in motion?

 

I would appreciate any assistance you can provide! Thanks!!

25 REPLIES 25
OllieClarke
15 - Aurora
15 - Aurora

Hi @DBarenburg 

 

What didn't work with the S3 download tool?

DBarenburg
7 - Meteor

I can see and "grab" a single file, but I need it to be more automated so that it can go and get the file on a daily basis. Just selecting the file within the tool manually allows me access, but it doesnt solve the problem of choosing the correct daily file when the workflow is running on the server by itself on a schedule.

OllieClarke
15 - Aurora
15 - Aurora

@DBarenburg 

the AWS CLI is the way to go here. There's a KB article about using it to do a 'wildcard input' with a batch macro. 
https://knowledge.alteryx.com/index/s/article/Amazon-S3-Download-Use-Wildcards-to-Select-a-Single-or...

 

This command 

cd\ cd program files\amazon\awscli aws s3 ls s3://alteryxtest > c:\users\\awslist.txt

changes directory to the folder of the awscli, then uses the ls command to list the contents of the s3 bucket, and writes that to a text file. This includes the timestamp of the files/buckets and so you can sort by this and only keep the most recent file.

 

the ls command is documented here: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html 

 

Hope that helps,

 

Ollie

DBarenburg
7 - Meteor

I did attempt using that solution, but I couldn't get the .BAT file or the macro setup and working. After 3 or 4 hours of trying to get working, I kind of got frustrated and stopped. I was hoping there was another set of instructions that was a little clearer on those steps.

apathetichell
19 - Altair

@DBarenburg --- Hey--- you are trying to do a few things which do not have out of the box support:

1) wildcard/.zip downloads from an S3 bucket.

2) wildcard/.zip extraction.

 

you will need to code this. It can be in python or as a .bat file --- but you will need to code this.,

OllieClarke
15 - Aurora
15 - Aurora

Hey @DBarenburg 

Hopefully this helps.

 

This workflow uses the aws cli (v2) to find the most recent file in a specified s3 bucket:

image.png
The command is as follows

use command prompt (cmd)

and then input the following commands 

/C aws s3 ls s3://nhs-prescriptions-data >> %engine.tempfilepath%output.txt

The "/C" lets you write your commands as arguments and pass them into command prompt.

 

In this case I'm outputting the contents of the s3 bucket "s3://nhs-prescriptions-data"

 

We then write this output to the file output.txt in the temp directory used by this workflow.

 

I set the working directory of this instance of command prompt to be the directory where the aws cli.exe is installed

 

image.png

 

I then input the file written to the temp drive as an undelimited text file

image.png

 

Then it's a relatively simple process of parsing this data and then only keeping the most recent record.

 

I did have to configure my aws cli so that I can use it. You set this up with the

aws configure

command, which will create a config and credentials file in your user drive. Once that's there, the workflow should run fine.

image.png

I've not included the batch macro with the configured s3 download tool, but hopefully this covers the harder part of the process.

 

Ollie

DBarenburg
7 - Meteor

@OllieClarke Thanks, I will check that out when I get back from vacation!

DBarenburg
7 - Meteor

Ok, I did that and changed a few things, but I got the output to give me the most recent file name. When I push that to the macro to find the file and download it using the S3 download, I can't get the Object Name field to populate with the correct folder and that exact file name, so that I can create the output I need to process. In the macro, I have the control parameter and action tool setup and connected to the S3 download tool, but that "Object" or File name I sent to the macro, never updates the Action tool with the Update Value or Update Value with Formula (using the #1 connection from the control parameter. What am I missing?

 

apathetichell
19 - Altair

your control parameter and action tool are set up wrong. don't use [#1] ---> use it in replace specific string mode. replace what you need -> keep what you need to be static by not including it in the replace specific string box.

Labels
Top Solution Authors