Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Importing large files into Alteryx

ssubramanian
8 - Asteroid

Hi,

 

I have an input file of size 37 GB. The import was 38% complete in 9 hours and I killed the process. Is there  a more efficient way of importing or is there a way we can split the files before importing?

12 REPLIES 12
ups366
8 - Asteroid

Dear ssubramanian, hello !

 

1- which type is your file ? like database, csv, txt, etc...

2- If you can , take action for cut the volume before save the file

 

ssubramanian
8 - Asteroid

Hi @ups366,

 

It's a pipe delimited text file. It's an extract from the client's environment.

ups366
8 - Asteroid

@ssubramanian,

 

1- you can try below code in powershell cmd mode: (n = your cut line number, each file)

 

rem win7sp1 OS and above
set n=1000000
powershell -c "$n=1;$m=1;gc 'your_pipefilename.txt'|%%{$f=''+$m+'.txt';$_>>$f;if($n%%%n% -eq 0){$m++};$n++}"
pause

 

2- or use 'split' command in cmd mode:

split -b 3000m your_filename.txt

KaneG
Alteryx Alumni (Retired)

Hi @ssubramanian,

 

That read time still seems slow. Is the file local or on a network drive?

 

You can try and bring in part of the file at a time using "Record Limit" & "Start Data import on Line X"

 

Chunk_delimited_file.png

 

 

ssubramanian
8 - Asteroid

Thanks @ups366 . I will try this and let you know.

ssubramanian
8 - Asteroid

Hi @KaneG,

 

Thank you for the suggestion. I am trying to import the file from a remote server into Alteryx (installed in my local desktop). I do not know how many records are present in the file, but the size of the file is 37gb, so I am not sure how many times I would have to execute the workflow in parts. I will try this method with the first 50 million records.

Claje
14 - Magnetar

Based on this file being pulled from a remote server - I could definitely see a file of this size taking longer to read, although 9+ hours is still a long time.


Two quick notes:

You could potentially build an iterative macro to loop through your file reading in records until the whole file is read, using the configuration mentioned by @KaneG

 

You could see some performance improvements by copying the file in its entirety locally before reading it in.  At the very least, this would let you identify any performance improvements when reading in this volume of data locally.

ssubramanian
8 - Asteroid

Thank you for all the suggestions. I tried @KaneG's method but the workflow is failing as there is a missing end quote at 5 millionth record. I need to split the file without reading it as there is a record with a missing quote.

KaneG
Alteryx Alumni (Retired)
You could read it with no delimiter (\0) and then split it in Alteryx using the text to columns tool.
Labels