This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Can I read in an Excel file located in a zipped archive file from Amazon S3?
Unfortunately, this is not an option within the Amazon S3 Download Tool, as it only allows you to choose between CSV, DBF and YXDB files. However, this is possible within Alteryx with the use of a simple workflow utilizing a three line batch file, the Run Command Tool (master it here), and the AWS Command Line Interface (CLI).
In order to use the CLI, you must first download it and configure its settings. Please visit this page for information on how to do that. Once that is setup, you simply need to setup the batch file and configure the Run Command Tool.
In the first step, you will use a Text Input Tool to write the batch file code. This code will use the CLI to copy the ZIP file from the S3 bucket to a locally accessible drive. Configure the Text Input Tool as follows:
Make sure that line 2 points to where your CLI is installed.
In line 3, replace "alteryxtest" with the name of your bucket, "ExcelTest.zip" with the name of your ZIP file and enter in the correct location to copy the file to.
In the second step, you will use the Run Command Tool to do the following:
Write out the batch file ("Write Source")
Run the batch file created in the previous step ("Run External Program")
Read the file into the workflow ("Read Results")
When entering in the "Read Results" section, your ZIP file will not exist at this point so you cannot simply navigate to and select the file. So, you have two options:
Click on the "Input" button and enter in the full path of where you are copying the ZIP file (found on line 3 of the Text Input tool) along with the file name, a pipe character, and then in brackets, the sheet name. For Example:
Run the workflow once without the "Read Results" section completed in order to copy the ZIP file from the S3 bucket. Then, click on the "Read Results" button and navigate to the ZIP file and choose the Archive file to read it.
This same workflow can be used to read other archived files as well. However, you will have to make slight adjustments to the "Read Results" section of the Run Command tool. For example, if reading in a CSV file, you would simply include the archived file name. Since a CSV file does not have "sheets", the bracketed sheet name is not needed.
I plan to create a simple macro with a user interface that will do the same thing. Once complete, I will post it in the reply section.
Thanks for reading!
Amazon offers many different services that can be accessed from the AWS Command Line Interface (aka the AWS CLI). Alteryx workflows can take advantage of these services using the AWS CLI from the Run Command tool.
Some of the common uses of the Amazon CLI within Alteryx include:
Using S3 with IAM roles
Uploading/Downloading file types not currently supported by the S3 Upload / Download tools
Interacting with Athena
As an example of using the AWS CLI, let’s build a process that can take a local file and copy to S3, using an AWS IAM profile. Here are the steps involved:
Review the documentation for the service. In this example, we're using the "S3" service, but Amazon has many different AWS services.
Install the AWS CLI on any machines that will need to execute the AWS functions
From Windows Command Line, create a working command
(Optional step to make calling the CLI easier in the future) – Add reference to the AWS CLI executable to Window Environment variables via My Computer > Right Click > Properties > Advanced System Settings > Environment Variables > System Variables > Select “Path” > Edit > (Add the path to the directory of your installed “aws.exe” file, separated by a semicolon. Do not delete/modify any pre-existing values in the Path variable.) > OK > OK
Per the AWS documentation, create a local profile using your IAM role. The local profile references your credentials so that you don't have to pass the credentials in every command.
Test the following command in your Windows command prompt, making sure to use a valid local file to copy, and a valid destination location on your S3, and a valid profile name:
s3 cp "C:\temp\mylocalfile.csv" "s3://mylocation/mydestinationfilename.csv" --profile MyIAMUser
Once you've validated your command in the command prompt, transfer it to an Alteryx workflow
In a new workflow, pull in a “Run Command” tool from the “Developer” category
Configure the Run Command tool using the working command:
Test the above
As an optional step, we can wrap this functionality into a macro to make it easy to update, and to support batch uploads
Add a Control Parameter for “Local File Path” and configure its action to update the specific string value “C:\temp\myfile.csv”
Add a Control Parameter for “Destination File Path” and configure its action to update the specific string value “s3://mybucket/myfiletest1.csv”
Add a Text Box and configure its action to update the command value “aws” in case the user hasn’t placed the CLI exe in their PATH variables
The example above is attached in the 11.3 workflow, SaveToS3_usage.yxzp.
As currently designed, the Amazon S3 Download tool only allows one file, or object, to be read in at a time. This article explains how to create a workflow and batch macro that will read in the list of objects in a bucket and allow you to filter for the file(s) you want using wildcards!