Within Alteryx we got the yxdb format, that one is lightning fast 😀
But I wonder what would be faster in this situation below?
Situation:
csv file: 6M+ rows with around 30 columns.
csv file: 20M+ rows with around 15 columns.
What would be faster: if I load the csv files into alteryx and then use a join (because these files need to join) and then save as an yxdb file
or should i convert the csv files first to yxdb in a seperate workflow and then use that as an input.
In my opinion the last option should be faster but i'm not sure. because if I convert to yxdb and then go to the join workflow this is an extra step (extra time)
Also could AMP Engine help out in improving speed?
Solved! Go to Solution.
I'd suggest converting each csv to yxdb. Inevitably you'll probably end up running your workflow with the joins more than once so you'll save time by converting (you could cache the inputs but that is lost if you close alteryx).
I'd AMP and write to YXDB after the join. The I/O cost of the write and creation of multiple workflows wouldn't warrant any time saved. I would expect this to still run fast. You can SELECT and update the 254 vstring default for the field types after the input to make the join more efficient.
cheers,
mark