Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Loading data from S3 File to Redshift Table

Amra
6 - Meteoroid

Hello!

 

I'm working on a process in which I've a file (.csv) available in S3. I've to load file data into existing Redshift table. I looked into few resources and was able to read data from S3 file using "Amazon S3 Download" tool. As a next step I'm trying to load data into Redshift table thought OUTPUT DATA.

 

I noticed there are 2 options for me here - Single file load and Bulk load. When I use single file load, the workflow keep on running. When I use bulk load, it took some time but loads the data into the table. Since I'm reading single file from S3, I was expecting that Single file load will work. As a newbee, I may be doing something wrong here and would like to know the correct way to load S3 file data into Redshift table.

 

Thanks!

 

 

5 REPLIES 5
michael_treadwell
ACE Emeritus
ACE Emeritus

When you download data from S3 using S3 Download you are grabbing data from S3 and bringing it into Alterx for processing. Then, when you load that data into Redshift you are writing it from Alteryx back into Redshift. From here you have two options:

 

  1. Create a standard connection to Redshift, which is what I assume you mean when you say 'Single File Load'. If I am incorrect, please correct me.
  2. Write to Redshift using the Bulk Connection.

Option 1 will write data from Alteryx into your Redshift table using INSERT commands for each row. That is why your workflow keeps running and running, especially if you have a lot of data. Option 2 will utilize COPY from S3 to Redshift, that is why the Redshift Bulk Connection requires AWS credentials and an S3 bucket in the configuration. This option (2), will take the data from Alteryx and write it to S3 and then load the data into Redshift. Unless you are dealing with a small amount of data, the bulk connection should be more efficient.

 

Does that answer your question?

 

If you are using Alteryx for the sole purpose of moving data from S3 to Redshift, you might want to have a look at the COPY documentation since Alteryx is an uneccessary middleman in this specific case: http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html

If you still want to use Alteryx for this job, you might look into using Run Command Tool to accomplish your COPY.

Amra
6 - Meteoroid

Thank you!

 

You understand the problem correctly. While explaining the problem, you explained the BULK Load nicely.

 

To fix my problem, option #2 is a better option as record counts will be high in my case. I tried using this option and it worked.

 

 

Nitish_Salve
7 - Meteor

Hi Michael,

 

It will be very much helpful if you can provide more details on how to accomplish this using "Run Command Tool".

raghavenderallam
5 - Atom

In Step 2 if I use bulk connection it is saving on S3 and not on redshift.

how does it move data into redshift after saving into S3

apathetichell
19 - Altair

That mapping should be set up by your redshift admin (ie your redshift cluster should live on S3 data - and you should be updating that data). I looked at this a while ago and my memory is that there is no way to upload to Redshift in batches without an S3. It goes line by line. This was an AWS situation - not an Alteryx situation. This was a huge pain for me.

Labels