Hello,
I'm reading a huge yxdb file (20 GB) with hundred or so columns.
The issue i'm facing is that I only need 2 columns form this entire file. It would be inefficiënt to read all the data and later drop it again.
Is it possible from the start to only read those 2 columns instead of the entire file and NOT use a select tool to deselect the other columns?
Many thanks
Solved! Go to Solution.
I imagine some other process is generating this .yxdb file.
What about having this other process output a yxdb with only these 2 columns? Either that, or you could use .sqlite extension instead of an .yxdb, so you could actually query the file.
Cheers,
what do you mean by use .sqlite extension?
I've never worked with that, so can you explain a bit more?
@MartWClimber a .sqlite file simulates a database file. So, if any other process is outputting to this .yxdb file, you could ask it to change to a .sqlite file, so you would be able to query this file preventing that huge load of data into your wokflow.
Have in mind that sqlite files are slower than yxdb ones.
@MartWClimber it is not possible to read columns from a file.
A file is a structure the resides on a disk where you have access thru a driver. The driver needs to read the full row of data.
On the other hand there are structures like databases or systems where there's a software layer between you and how the data is stored. Usually these technologies uses more complex approaches to store the data in order to send you only the data you want.
In between these 2 approaches I know 2 techniques that can mimic a database behavior with a file. The first was brilliantly brought by @Thableaus that is to use a special file format named sqlite (originally sqlite was a database made to run in mobile phones). This structure accepts SQL language and will bring only the columns you need. The second is to use python libraries that brings only columns you want from a file.
Since both are ways to mimic a database behavior but still having the file stored row by row in your disk, I'm not sure it will make your process faster. It will mostly save some Alteryx memory avoiding unwanted columns to come to your workflow. But it worth trying.
Testing sqlite file format is much easier than python. Then, if you test it, please let us know the results.
@AZuc Can I ask, why doesn't Alteryx let you select data within the inputtool.
part of optimized workflows is that data that isn't used is tossed out as soon as possible. it would be logical for them to add an function within the input tool where you can select what kind of field you want so it doesn't load all the data before tossing them again.
@MartWClimber , you can suggest it in our Ideas forum.
https://community.alteryx.com/t5/Alteryx-Designer-Ideas/idb-p/product-ideas
Alteryx strongly evaluate the most voted ones to include in future versions.