I have some reports being sent to my S3 bucket in a specific folder. The files have names that also includes the date, are zipped files that have a single csv file each.
I would like to use Alteryx to download the most recent file daily and unzip the file, read the csv, clean up the data and append to a table on my Snowflake server.
I have tried the use the Amazon S3 Download tool, as well as the Input data tool to do this. I tried setting up a macro using text input. However, everything I have tried has failed to get the data into Alteryx.
I have an existing Amazon Upload tool macro that works, with no issues.
Has anyone else had issues and found a clear way to do what I need? Does anyone have an existing workflow that can be “cleaned up” so that I can recycle and use to get my downloads in motion?
I would appreciate any assistance you can provide! Thanks!!
I can see and "grab" a single file, but I need it to be more automated so that it can go and get the file on a daily basis. Just selecting the file within the tool manually allows me access, but it doesnt solve the problem of choosing the correct daily file when the workflow is running on the server by itself on a schedule.
@DBarenburg
the AWS CLI is the way to go here. There's a KB article about using it to do a 'wildcard input' with a batch macro.
https://knowledge.alteryx.com/index/s/article/Amazon-S3-Download-Use-Wildcards-to-Select-a-Single-or...
This command
cd\ cd program files\amazon\awscli aws s3 ls s3://alteryxtest > c:\users\\awslist.txt
changes directory to the folder of the awscli, then uses the ls command to list the contents of the s3 bucket, and writes that to a text file. This includes the timestamp of the files/buckets and so you can sort by this and only keep the most recent file.
the ls command is documented here: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
Hope that helps,
Ollie
I did attempt using that solution, but I couldn't get the .BAT file or the macro setup and working. After 3 or 4 hours of trying to get working, I kind of got frustrated and stopped. I was hoping there was another set of instructions that was a little clearer on those steps.
@DBarenburg --- Hey--- you are trying to do a few things which do not have out of the box support:
1) wildcard/.zip downloads from an S3 bucket.
2) wildcard/.zip extraction.
you will need to code this. It can be in python or as a .bat file --- but you will need to code this.,
Hey @DBarenburg
Hopefully this helps.
This workflow uses the aws cli (v2) to find the most recent file in a specified s3 bucket:
The command is as follows
use command prompt (cmd)
and then input the following commands
/C aws s3 ls s3://nhs-prescriptions-data >> %engine.tempfilepath%output.txt
The "/C" lets you write your commands as arguments and pass them into command prompt.
In this case I'm outputting the contents of the s3 bucket "s3://nhs-prescriptions-data"
We then write this output to the file output.txt in the temp directory used by this workflow.
I set the working directory of this instance of command prompt to be the directory where the aws cli.exe is installed
I then input the file written to the temp drive as an undelimited text file
Then it's a relatively simple process of parsing this data and then only keeping the most recent record.
I did have to configure my aws cli so that I can use it. You set this up with the
aws configure
command, which will create a config and credentials file in your user drive. Once that's there, the workflow should run fine.
I've not included the batch macro with the configured s3 download tool, but hopefully this covers the harder part of the process.
Ollie
@OllieClarke Thanks, I will check that out when I get back from vacation!
Ok, I did that and changed a few things, but I got the output to give me the most recent file name. When I push that to the macro to find the file and download it using the S3 download, I can't get the Object Name field to populate with the correct folder and that exact file name, so that I can create the output I need to process. In the macro, I have the control parameter and action tool setup and connected to the S3 download tool, but that "Object" or File name I sent to the macro, never updates the Action tool with the Update Value or Update Value with Formula (using the #1 connection from the control parameter. What am I missing?
your control parameter and action tool are set up wrong. don't use [#1] ---> use it in replace specific string mode. replace what you need -> keep what you need to be static by not including it in the replace specific string box.