Hello,
I have a workflow that generates an outuo with more than 2M records. I used "RecordID" and then a formula to create "sheets" in the excel based om records, in this case 1M per sheet. This report is taking 29 min to execute. If I remove the splitting functionality this report execute in 2min. Do you know any other way to split the data bases on amount of records that can work faster?
Any help would be much appreciated.
Thanks,
Sara
Hi @saralisset
Excel has a hard 1M row limit per sheet.
You can try to use the Block Until Done tool for the 1M records per sheet to see if the performance will be impacted.
You can also try to enable AMP engine to see if performance will improve.
Hope this helps.
As a reminder - Excel is not a database. It's a spreadsheet. There is never a case to output 2mm rows to Excel.
Having said that - add a record id tool. create a partition using something like ceil ([RecordID]/1000000) - add this column to your filename so that you now have a fully formed filename like "myincrediblelargeexcelfile_"+tostring([partition])+".xlsx|sheet1" - now use this field in your output data tool under "change entire path" - your error may be in your selection in your output data. Your filenaming convention seems reasonable.
NOTE - I'd use a number below 1,000,000 like say 700,000 because performance really starts to drag on larger excel files.
Did I mention that excel is not a database and you should not be storing your data there? perhaps you can look at a nice csv file - or json?
also your screen grab is your sql error. You need a datasource named in odbc64 and a configured driver. Configure/text in ODBC 64 and then add it to Alteryx.
thanks for answering. Your solution is exactly what I did… I’ll try with 700k but I don’t think that will help. I debugged it and performance issue is because RecordID. Any other idea?
thanks anyways!
Hi - can you share the error message? Assuming you have a normal size amount of ram (16gb or greater) - I highly doubt the issue is RecordID. If it is in performance profiling - turn off AMP. I just generated 100,000,000 rows. I ran RecordID. I created a partition. It took 21.6 seconds.
Can you confirm there is only one record in the your text input which you are appending? Can you share more of your workflow? Are you by any chance running it on a Mac in a VM setting?
Oh no. I missstated, I think the error is the splitting of data in different tabs. How do you output that 100M records? CSV file ?
qurstions:
RAM in my personal computer? Or the server? What would be the minimum required for a report like this?
What is AMP? Where should I turn it off
how do you create a partition?
Yes, that only is replace by the location where the output will be save
Running in VDI windows.
They are not different tabs - they must be different files... file.xlsx|sheet1...file.xlsx|sheetn - have a 1,000,000 total record limit - you would want file1.xlsx|sheet1 ... filen.xlsx|sheet1
Memory in VDI and performance can be significantly different than in a native windows system - and not sure about your question on Server. Is this being run on a Server or on desktop? If this is run on a VDI on desktop - please increase your memory you are allocating to your windows environment to 16gb.
For Server if you mean an EC2/Compute Instance - that would be the same as a windows system for this purpose.