This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a file with 250 million rows and 60 columns and it takes awhile to load the entire thing. Fortunately I only need about 6 of these columns so my first tool to use is the select tool to cut out all unnecessary data. However, is there a way to tell Alteryx that I don't even want it to read the other columns of information? The file in question is comma delimited.
Here's my thought: This is probably not possible because it has to parse the file by the comma delimiter so it needs to read everything regardless so it knows which elements belong to their respective columns, and at that point I can say only use these columns (select tool).
It depends. Where are the six columns. The csv gives you the opportunity to limit a record to say 254 bytes. If the maximum length for the given columns is within say 500 bytes, you can switch the length to 500 and change the delimiter to null (\0) and set the first row as data.
now you can find the Nth comma and delete from there. Next use a text to columns and the use a dynamic rename to take the first row as headers.
But it would be so much easier if you ran a job of hours to convert the data to a yxdb format and select out the fields from it. Often the file can be reformatted and you can avoid the boredom of a can read by creating a yxdb file.
i regularly work with files this size and larger. If you read often and select smaller sets of data, please read up on Calgary.
Alteryx ACE & Top Community Contributor
Chaos reigns within. Repent, reflect and reboot. Order shall return.