How to output 2 Million records in excel
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I have a workflow that generates an outuo with more than 2M records. I used "RecordID" and then a formula to create "sheets" in the excel based om records, in this case 1M per sheet. This report is taking 29 min to execute. If I remove the splitting functionality this report execute in 2min. Do you know any other way to split the data bases on amount of records that can work faster?
Any help would be much appreciated.
Thanks,
Sara
- Labels:
- Dynamic Processing
- Workflow
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @saralisset
Excel has a hard 1M row limit per sheet.
You can try to use the Block Until Done tool for the 1M records per sheet to see if the performance will be impacted.
You can also try to enable AMP engine to see if performance will improve.
Hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
As a reminder - Excel is not a database. It's a spreadsheet. There is never a case to output 2mm rows to Excel.
Having said that - add a record id tool. create a partition using something like ceil ([RecordID]/1000000) - add this column to your filename so that you now have a fully formed filename like "myincrediblelargeexcelfile_"+tostring([partition])+".xlsx|sheet1" - now use this field in your output data tool under "change entire path" - your error may be in your selection in your output data. Your filenaming convention seems reasonable.
NOTE - I'd use a number below 1,000,000 like say 700,000 because performance really starts to drag on larger excel files.
Did I mention that excel is not a database and you should not be storing your data there? perhaps you can look at a nice csv file - or json?
also your screen grab is your sql error. You need a datasource named in odbc64 and a configured driver. Configure/text in ODBC 64 and then add it to Alteryx.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
thanks for answering. Your solution is exactly what I did… I’ll try with 700k but I don’t think that will help. I debugged it and performance issue is because RecordID. Any other idea?
thanks anyways!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi - can you share the error message? Assuming you have a normal size amount of ram (16gb or greater) - I highly doubt the issue is RecordID. If it is in performance profiling - turn off AMP. I just generated 100,000,000 rows. I ran RecordID. I created a partition. It took 21.6 seconds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Can you confirm there is only one record in the your text input which you are appending? Can you share more of your workflow? Are you by any chance running it on a Mac in a VM setting?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Oh no. I missstated, I think the error is the splitting of data in different tabs. How do you output that 100M records? CSV file ?
qurstions:
RAM in my personal computer? Or the server? What would be the minimum required for a report like this?
What is AMP? Where should I turn it off
how do you create a partition?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Yes, that only is replace by the location where the output will be save
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Running in VDI windows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
They are not different tabs - they must be different files... file.xlsx|sheet1...file.xlsx|sheetn - have a 1,000,000 total record limit - you would want file1.xlsx|sheet1 ... filen.xlsx|sheet1
Memory in VDI and performance can be significantly different than in a native windows system - and not sure about your question on Server. Is this being run on a Server or on desktop? If this is run on a VDI on desktop - please increase your memory you are allocating to your windows environment to 16gb.
For Server if you mean an EC2/Compute Instance - that would be the same as a windows system for this purpose.
