Alteryx Designer Desktop Discussions

dasph21 · ‎05-25-2021

Hello.

I'm using Alteryx designer for a workflow to send a snowflake table to an S3 bucket as a single file (I know Snowflake has the option to copy into an S3 bucket directly, but this has a filesize limit of 5 GB or else it splits it and I need the output in 1 file.)

The table itself is around 50 GB when compressed in Snowflake, but I know that its uncompressed size is approximately 150-200 GB.

However, the workflow I have on alteryx seems to be greatly inflating that size. It currently says, on the data stream out step, that it has processed 702 GB.

Where is this number coming from? The file is definitely not that size, and it seems to keep growing, it's been processing for around 5 hours now and the upload process is still 0%, whereas the size of the data stream still grows. How does Alteryx designer handle these streams? Is it staging it in a local cache? Because my current drive only has like 760 GB free, so I'm worried it's not gonna be able to keep up with the reported file size.

Edit: As pointed out by Luke and mcleavey, the issue was that some of the fields in the DB had a massive size overhead, resulting in over 500 GB added to the table size.

Luke_C · ‎05-25-2021

Hi @dasph21

What's the size of the fields/data types? Try using an autofield tool to see if that reduces the size. I've seen this when string fields are much longer than needed.

mceleavey · ‎05-25-2021

Hi @dasph21 ,

In addition to what @Luke_C mentioned, can you confirm you extracting a raw table and not a joined combination of tables?

M.

dasph21 · ‎05-25-2021

I currently cannot check for the fields size because the workflow is running and it's a really simply workflow, just a DB connector > Data stream out > Upload to S3.

Since it's all being done in DB save for the S3 step, I don't know if Alteryx would actually be resizing the fields. If it's not resizing them the fields should all be varchar and have an optimized length.

dasph21 · ‎05-25-2021

It's a view, but it's just a simple SELECT * FROM [path to view].

The view is not a combination of tables, it's just a select all fields with the exception of an ID field being selected as a hash.

mceleavey · ‎05-25-2021

@dasph21 ,

one thing you can try is selecting the individual columns, then going to the sql editor which will allow you to put in AS clauses to restricted the data size before it even extracts.

However, as @Luke_C pointed out, when streaming into Alteryx you can run it through an Autofield tool which restrict the field lengths to the dynamic length of the longest record contained within.

This tool, however, does have a considerable performance overhead on large datasets, but it will achieve what you're looking to do, so if the data size is more important than the processing time then go for it.

Try it anyway, see if you have any serious time impacts.

M.

dasph21 · ‎05-25-2021

I just checked the field types.

Yup, the connector between the connect in DB and the data stream out say that each field is V_WString...on top of that, the first 2 fields have a size of 16777216! no wonder it wasn't working!

My question now is: How do I select different data types within alteryx? Auto field/Select wont work while the data is in snowflake, and it seems it will take forever if I actually have to stage the files locally to change their data type.

This is also very odd since the view in snowflake doesn't have those field sizes, and everything is varchar.

mceleavey · ‎05-25-2021

@dasph21 ,

Can you connect using the In DB tools?

If you can do that then you can amend the sizes of the fields before you data stream out.

M.

Alteryx Designer Desktop Discussions

Alteryx seems to be greatly inflating the size of a table.