Hi,
I have an input file of size 37 GB. The import was 38% complete in 9 hours and I killed the process. Is there a more efficient way of importing or is there a way we can split the files before importing?
Dear ssubramanian, hello !
1- which type is your file ? like database, csv, txt, etc...
2- If you can , take action for cut the volume before save the file
1- you can try below code in powershell cmd mode: (n = your cut line number, each file)
rem win7sp1 OS and above
set n=1000000
powershell -c "$n=1;$m=1;gc 'your_pipefilename.txt'|%%{$f=''+$m+'.txt';$_>>$f;if($n%%%n% -eq 0){$m++};$n++}"
pause
2- or use 'split' command in cmd mode:
split -b 3000m your_filename.txt
Hi @ssubramanian,
That read time still seems slow. Is the file local or on a network drive?
You can try and bring in part of the file at a time using "Record Limit" & "Start Data import on Line X"
Thanks @ups366 . I will try this and let you know.
Hi @KaneG,
Thank you for the suggestion. I am trying to import the file from a remote server into Alteryx (installed in my local desktop). I do not know how many records are present in the file, but the size of the file is 37gb, so I am not sure how many times I would have to execute the workflow in parts. I will try this method with the first 50 million records.
Based on this file being pulled from a remote server - I could definitely see a file of this size taking longer to read, although 9+ hours is still a long time.
Two quick notes:
You could potentially build an iterative macro to loop through your file reading in records until the whole file is read, using the configuration mentioned by @KaneG
You could see some performance improvements by copying the file in its entirety locally before reading it in. At the very least, this would let you identify any performance improvements when reading in this volume of data locally.
Thank you for all the suggestions. I tried @KaneG's method but the workflow is failing as there is a missing end quote at 5 millionth record. I need to split the file without reading it as there is a record with a missing quote.