Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Best practices for Processing flat file?

lnguyen
8 - Asteroid

Hi, I wonder if anyone has any good suggestions relating to processing flat file.  I am processing a large flat file quarterly from data vendor.  The file is large, +40GB in size with repeating text value in certain column.  There are about 50+ columns and about 1/3 of them contains data.  I can't upload the workflow nor data here for many reasons but I will try to explain my process with as much details as I could.  

 

My process is painfully slow and I am looking for a way to speed it up.  Normally, it would take me 3 hours to 'clean' the data before I could start processing it.  In the cleaning process, i would input the entire file, replace dimension field with numeric value (using a reference table), remove as much string fields as I could, apply auto field to the remaining text column (mostly unique ID), convert value fields from text to numeric before output them to an alteryx db waiting for the next step.  This is necessary because restarting the whole process would take large amount of resource and time.

 

Once the clean up process is done and the first temp file (alteryx db) is created.  I will have a few other workflows to process these data and normalize them before store them in the final location.  They will then be used to create report, tableau data sources, etc. in the next process

 

the screen shot of my current workflow is shown below as reference.  I appreciate any suggestion you may have and I thank you all in advance for pitching them. 

 

Best.

 

LT

 

workflow.jpg

 

2 REPLIES 2
JohnJPS
15 - Aurora

Hi @lnguyen,

 

One potential speed-up could be in getting rid of the Auto-Field.  If you know what the field types are, just use "Select" to force them to the correct type.

 

(Auto-Field is great for small files and for general analysis... but by the time you get to production, hopefully you know the required type for each field, in which case there is no need to have Alteryx analyze a large file).

 

Hope that helps, at least a little.

 

 - John

lnguyen
8 - Asteroid

thx.  I suspected that but I was thinking by reducing the size of the field by auto formatting would speed things up.  Evidently, it does not work out in this case.  Thanks again.

 

LT

Labels