Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Reading large files with Python tool

dschneider1
7 - Meteor

How should one go about reading a large file via the Python tool?

 

Using Alteryx.read("#1") results in a memory error. Normally, I would read the file line by line to avoid this, but I am not sure how to do so within the syntax of reading from the Alteryx object.

 

Thanks,

-David

6 REPLIES 6
OldDogNewTricks
10 - Fireball

How large is the file?

 

What is the actual error that you receive?

 

Here is a 'hacky' solution, split the data into multiple chunks.

 

PythonDFMergeExample.jpg

dschneider1
7 - Meteor

About 12GB. That seems like the best solution as of now.

OldDogNewTricks
10 - Fireball

Have you tried the proposed solution?

 

Did it work for you?

 

I still propose that you share more about the actual errors and create a ticket with Alteryx so they know about the limitation/error.

 

I'm not sure if the error is a result of the Python virtual environment running out of space, Jupyter notebooks, Alteryx, or somewhere in between.  It also would depend on your machine, if you only have 8GB of ram then that is obviously a problem.

dschneider1
7 - Meteor

Yes, I am using something similar to the proposed solution in that I am batching out data to read in via separate connections. The issue is that i am trying to read the whole file into memory at once given the layout of Alteryx, unless there is a way to index connection objects that I am not aware of. I would run into the same issue if I were to do the same thing in any other Python environment-- it is simply bad practice. Normally, I would avoid this by reading the file in by line, but given that I am only able to work with the singular connection object I am not sure how to do that within Alteryx. At this point I will likely just write code in Python to do it correctly and execute it via Run Command.

dschneider1
7 - Meteor

Here is how i would do it in pandas, since that is most closely aligned with how Alteryx handles data:

 

reader = pd.read_table("LARGEFILE", sep=',', chunksize=1000000)
master = pd.concat(chunk for chunk in reader)

vijaysuryav93
7 - Meteor

Any solution to this memory issue? I face similar issue with my 8M records. The problem is few times it runs fine over Alteryx server/gallery and few times it fails with memory issue. I believe it is something to do with RAM of the Alteryx server machine. But just wanted to know from you all if there's any solution around

Labels