Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
ned_blog
8 - Asteroid

A variety of questions have come up from Alteryx users that all have the same answer:

 

  • The Alteryx Gallery will only allow me to upload about 100MB of data, how do I upload more?
  • I need to change my data daily/hourly/monthly for my Gallery module, how do I do that without re-uploading my entire app every time?
  • How do I persist data from run to run in a Gallery module?
  • How do I share data in Alteryx Desktop with coworkers/clients/partners without sending huge files around?

 

The answer to all of these questions is to use the Amazon S3 tools. What is S3?  Amazon says it is:

Amazon S3 is storage for the Internet. It is designed to make web-scale computing easier for developers.

Amazon S3 provides a simple web-services interface that can be used to store and retrieve any amount of data, at any time, from anywhere on the web.

 

In short it is an always available, very fast, and a relatively cheap place to put your data.  I use S3 for various things on my blog and for my job, and my bill has never been more than $3 in a month.  I think that not many people use S3 in Alteryx because they think it is difficult and expensive.  I am going to answer all the above questions, but first, let’s get started with S3.

 

Getting started with S3

I am going to make it super easy to evaluate S3…  I am going to give my readers my S3 login to use within Alteryx. You can download the modules associated with this article here.  Please read the caveats below.

 

Creating your own account is not very hard.  Go to S3 and sign up for an account.  Once set up, go to Security Credentials.  Presuming you are the only user of this account, click “Continue to Security Credentials” on the scary looking warning.  Now got to “Access Keys” and “Create New Access Key.”  You will want to “Show Access Key” and copy/paste it somewhere for reference since it is difficult to get back later.  This is it, put those 2 long strings of text into the S3 tools and you are off and running.

 

Using S3 within Alteryx

Once you have your AWS Access Key & AWS Secret Key entered into the S3 upload or download tool, it acts just like a regular input or output tool.  The list of formats is limited (for technical reasons), but in reality, YXDB is the best format to use here anyway.  It is compressed so it will reduce the network traffic to/from Amazon and it has no record or field type limits.  There is no problem writing multiple GB files – I have tested up to 10GB, but I am certain that it will handle much larger files.  Of course limited by your bandwidth.  One of the cool things about it is that it is streaming.  When it reads the 1st record it pushes it to the next tool in the chain while continuing to download in the background, it doesn’t wait for the whole file to complete downloading.  Similarly on writing, it is uploading in background threads so it doesn’t have to wait for the module to finish in order to start.  When using S3 in modules published to the Gallery you get fantastic speeds, because the Gallery is also hosted by Amazon.  There are also no bandwidth charges going to the Gallery since it is all within Amazons’s network.

 

The Alteryx Gallery will only allow me to upload about 100MB of data, how do I upload more?

Simply run a module on your own machine that writes data to S3 using the Amazon S3 Upload tool.  The module you are uploading to the gallery will have an S3 download tool.  Since the data is not in your package, it doesn’t count against the 100MB limit.

 

I need to change my data daily/hourly/monthly for my Gallery module, how do I do that without re-uploading my entire app every time?

This is super easy.  Simply schedule a module to run on your own computer updating that file on S3 at whatever interval you want.  The Gallery will always read the latest version of it.

 

How do I persist data from run to run in a Gallery module?

You can read and write from the same file within S3 in the same module.  Like any file, make sure to use a Block Until Done tool when reading & writing the same file.  Be careful not to mess up your data though – there aren’t any automatic backups.

 

How do I share data in Alteryx Desktop with coworkers/clients/partners without sending huge files around?

At this point, this is the easy one.  This also makes it so you don’t have to bug your IT department for a file server or backups, etc…  Let Amazon handle all that for you.

 

Where do I find that module to try it out?

You can download it here and use my account, but read the caveats below.

 

Using Ned’s S3 Account

Obviously S3 has the potential to cost some real money and I don’t want this to be abused, so there are a few caveats.  Inside you will find Amazon S3 tools that have my credentials in them.  You can copy and paste these tools all you want.  I am not giving the secret key out other than this, so you will only be able to access this data from within Alteryx.  If you browse the buckets, you will see a few, but these credentials will only get you access to the bucket named AlteryxNed_Public.  Anything you write to this bucket will be readable and over-writable by any of the other readers of this blog.  If you have something super secret, you will need to get your own account.  I might delete data that has been around for a while if it looks like no one is using it.  And finally, if it starts costing too much, I reserve the right to cancel this login.  Caveats aside, this should make it super easy to start evaluating S3 for use within Alteryx.

 

Thanks for reading,

 

ned.

 

This post orignally appeared at http://inspiringingenuity.net/2014/01/16/alteryx-amazon-s3-gallery/

Comments
rstanton
5 - Atom

Hi Ned,

 

Given that the S3 connector is streaming, does that mean that there are multiple GET or PUT requests being executed, and thus uploading/downloading a single file will not equate to 1 GET or PUT request?

 

I'm curious because I've uploaded about 3 files to my bucket but my number of PUT requests is 44.

 

Thanks,

Ryan