Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

Download a Zipped Excel File from an Amazon S3 Bucket

DanC
Moderator
Moderator
Created

Question

Can I read in an Excel file located ina zipped archive file from Amazon S3?

Answer

Unfortunately, this is not an option withinthe Amazon S3 Download Tool, as it only allows you to choose between CSV, DBF and YXDB files. However, this is possible within Alteryx with the use of a simple workflow utilizing a three line batch file,the Run Command Tool (master it here), and the AWS Command Line Interface (CLI).

In order to use the CLI, you must first download it and configure its settings. Please visit this page for information on how to do that. Once that is setup, you simply need to setup the batch file and configure the Run Command Tool.

Step 1

In the first step, you will use a Text Input Tool to write the batch file code. This code will use the CLI to copy the ZIP file from the S3 bucket to a locally accessibledrive. Configure the Text Input Tool as follows:

TextInput.png

Important:

  • Make sure that line 2 points to where your CLI is installed.
  • In line 3, replace "alteryxtest" with the name of your bucket, "ExcelTest.zip" with the name of your ZIP file and enter in the correct location to copy the file to.

Step 2

In the second step, you will use the Run Command Tool to do the following:

  1. Write out the batch file ("Write Source")
  2. Run the batch file created in the previous step ("Run External Program")
  3. Read the file into the workflow ("Read Results")

RunCommand.png

Important:

When entering in the "Read Results" section, your ZIP file will not exist at this point so you cannot simply navigate to and select the file. So, you have two options:

  1. Click on the "Input" button and enter in the full path of where you are copying the ZIP file (found on line 3 of the Text Input tool) along with the file name, a pipe character, and then in brackets, the sheet name. For Example:

    C:\Users\dchapman\ExcelTest.zip|ExcelZIP.xlsx[sheet1] 
  2. Run the workflow once without the "Read Results" section completed in order to copy the ZIP file from the S3 bucket. Then, click on the "Read Results" button and navigate to the ZIP file andchoose the Archive file to read it.

This same workflow can be used to read other archived files as well. However, you will have to make slight adjustments to the "Read Results" section of the Run Command tool. For example, if reading in a CSV file, you would simply include the archived file name. Since a CSV file does not have "sheets", thebracketed sheet name is not needed.

I plan to create a simple macro with a user interfacethat will do the same thing. Once complete, I will post it in the reply section.

Thanks for reading!

Comments
adrianloong
11 - Bolide

It would be good to offer this functionality natively in the product as most users will have zipped files etc that they wish to pull from s3

DanC
Moderator
Moderator

Hi @adrianloong,

 

Thanks for your comment. I agree 100%. If you wouldn't mind, you could always add that suggestion to the Product Ideas section of the Community. If your idea gets a lot of attention, then it will get the attention of our developers.

 

Thanks!

NandoYx
5 - Atom

Hello,

 

It would be interesting to increase the range of file types that the connector can read, for example the Excel file itself.

pankajsharmaswa
5 - Atom

I am trying to use the workaround explained in the post. I am getting the following error about .bat not being valid format.

Error: Run Command (5): The file "C:\Users\XXXX\Desktop\alteryx_cp_gz_file.bat" is not a known format. (2)

 

Is it possible that my my admins have disabled the use of .bat files?alteryx_issues.JPG

 

TeePee
8 - Asteroid

Thank you so much for this!  Using your post and the post to which you reference, I'm attempting something similar: downloading many individually zipped .csv files from a bucket, each with the file format " *.csv.gz

 

I have a few questions, and would be very grateful if someone knows the answer to any of them as I'm getting nowhere slowly :)

  1. Does the approach outlined above work with .gz files as well as .zip files?
  2. Does it matter if there are multiple file types in the bucket?  The bucket contains both unzipped .csv files and zipped .csv.gz files.  In the Read Results pane, if I specify .csv as the file type, I see all the (unzipped) .csv files contained in the bucket when I hit the Refresh button. However, if I specify .gz, I'm not seeing anything at all when I hit Refresh.  (Please see images below to illustrate better what I mean.)
  3. As I need to extract multiple files, I'm trying to use ls (list files) instead of cp (copy).  Am I going down the wrong route?
  4. May I see the configurations for the "Write Source" > "Output..." and "Read Results" > "Input..." screens please?
  5. In the "Read Results" > "Input...", for this example, should it say "ExcelTest.zip" instead of "ExcelZip.zip"?  If not, could you explain what this file is please?
  6. Any other tips on this for a newbie?

Many thanks in advance again for any help at all.

 

csv_ok.pnggz_empty.png

 

 

DanC
Moderator
Moderator

Hi @TeePee!

 

Thanks for checking out my post! Please see my answers below. Thanks!

 

I have a few questions, and would be very grateful if someone knows the answer to any of them as I'm getting nowhere slowly Smiley Happy

  1. Does the approach outlined above work with .gz files as well as .zip files?
    • It should work with .gz files.
  2. Does it matter if there are multiple file types in the bucket?  The bucket contains both unzipped .csv files and zipped .csv.gz files.  In the Read Results pane, if I specify .csv as the file type, I see all the (unzipped) .csv files contained in the bucket when I hit the Refresh button. However, if I specify .gz, I'm not seeing anything at all when I hit Refresh.  (Please see images below to illustrate better what I mean.)
    • It does matter because the Input option under Read Results in the Run Command tool has to be configured to read a specific file type.
  3. As I need to extract multiple files, I'm trying to use ls (list files) instead of cp (copy) in the batch macro.  Am I going down the wrong route?
    • Yes because you are simply listing the files instead of copying the files locally. You could possibly write a batch file that downloads the zip file, unzips it and then using a batch macro, feed the list of file names to an Input tool. You'd likely need to parse the file name so that you could separate the file extension in order to feed that into the batch macro to change the file type as needed for each file. So, this can get a little complicated.
  4. May I see the configurations for the "Write Source" > "Output..." and "Read Results" > "Input..." screens please?
    • 6-11-2019 11-31-18 AM.jpg
  5. In the "Read Results" > "Input...", for this example, should it say "ExcelTest.zip" instead of "ExcelZip.zip"?  If not, could you explain what this file is please?
    • Yes! You are correct!
  6. Any other tips on this for a newbie?

Many thanks in advance again for any help at all.

 

You're welcome! Hope this helps!

TeePee
8 - Asteroid

Thank you so much @DanC for the speedy reply.  I will attempt to download a single .csv.gz and then try batching!  I will let you know if when I get there.

 

 

TeePee
8 - Asteroid

Hi @DanC .  With your kind help I have managed to successfully download a single .csv.gz from an S3 bucket so thank you!  Now I need to work on iterating...

DanC
Moderator
Moderator

Awesome, @TeePee. Good luck with the rest!

batiyota
5 - Atom

Below is what I get when I run the workflow

 

batiyota_1-1668637474762.png

 

batiyota
5 - Atom

@Danc Below is what I get when I run the workflow

batiyota_0-1668637630090.png