community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Reading large files with Python tool

Meteor

How should one go about reading a large file via the Python tool?

 

Using Alteryx.read("#1") results in a memory error. Normally, I would read the file line by line to avoid this, but I am not sure how to do so within the syntax of reading from the Alteryx object.

 

Thanks,

-David

How large is the file?

 

What is the actual error that you receive?

 

Here is a 'hacky' solution, split the data into multiple chunks.

 

PythonDFMergeExample.jpg

Meteor

About 12GB. That seems like the best solution as of now.

Have you tried the proposed solution?

 

Did it work for you?

 

I still propose that you share more about the actual errors and create a ticket with Alteryx so they know about the limitation/error.

 

I'm not sure if the error is a result of the Python virtual environment running out of space, Jupyter notebooks, Alteryx, or somewhere in between.  It also would depend on your machine, if you only have 8GB of ram then that is obviously a problem.

Meteor

Yes, I am using something similar to the proposed solution in that I am batching out data to read in via separate connections. The issue is that i am trying to read the whole file into memory at once given the layout of Alteryx, unless there is a way to index connection objects that I am not aware of. I would run into the same issue if I were to do the same thing in any other Python environment-- it is simply bad practice. Normally, I would avoid this by reading the file in by line, but given that I am only able to work with the singular connection object I am not sure how to do that within Alteryx. At this point I will likely just write code in Python to do it correctly and execute it via Run Command.

Meteor

Here is how i would do it in pandas, since that is most closely aligned with how Alteryx handles data:

 

reader = pd.read_table("LARGEFILE", sep=',', chunksize=1000000)
master = pd.concat(chunk for chunk in reader)

Labels