Hey,
i'm currently working with a large amount of data, which originally comes from SQL and is prepared by myself in a lot of steps to around 370k rows with 120 columns. At the end i want to write the data to an .xlsx-file for using it afterwards. I heard that using .xlsb is also useful for big data, because it reduces the needed memory, so i tried both.
I've now experienced the following:
Can anyone explain why Alteryx has such problems with .xlsb and are there any workarounds to deal with it? Since using an Anlaytic App is a must have, but a runtime in this dimension is not acceptable.
Thanks & best regards,
Mathias
Hey @MathiasL,
I don't think the .xlsb file is your problem. Your write-times are already extremely high for a relatively small dataset.
For me writing an examplary 175MB Dataset (and running the rest of the workflow) to a .xlsx takes only 12 seconds.
Are you writing the files to a network drive or something? Maybe your ability to write is limited there.
Hey @FrederikE ,
thanks for replying.
Perhaps I was a little imprecise in the description. The times were related to the runtime of the entire workflow.
All the methods, which i tried and mentioned above were equally fast until the data was written into the files. From this I deduced that the problem must be the file format.
Hey @MathiasL,
I see.
The problem might be that Alteryx uses the Microsoft Access Driver to acess .xlsb files, and that's not really a good way, since this Driver is supposed to access databases. Therefore Alteryx is pretty bad at writing .xlsb files.
2 suggested options from my site:
1. Split the workflow in two parts. Write a .yxdb of your normal flow at first and then use this to write the .xlsb - this way Alteryx doesn't try to write while it is still processing the data. (Block until done tool before the output should achive the same thing too).
2. Use a different filetype. Excelfiles are sadly not very well compatible with anything outside of the Microsoft enviroment (often there not as well) and .xlsb is even worse. A .csv will be written much faster and Endusers could still open it in Excel.
Thanks for your tips. I have a Block Until Done tool placed in my workflow before the data is written to the files, because the data will be written into two separate tabs in the file and if i wouldn't place it there, i get an error because of writting two different informations in one file at the same time.
A .csv file is no option, because formatting the data is also a must have and .csv doesn't store the formatting information.
Have you considered exporting to .csv and using Power Query to bring the data into Excel? It would allow for the data to be stored in .csv and formatting to remain in Excel template.
Yes, but in my opinion this is not a good solution because the user then needs an extra step before watching on the final product.
And in my case i would need two .csv-files because currently i have two tabs of data.