This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have yxdb input for my workflow, there are 9 million plus rows with 20 plus columns that I only need about 10 of. Is there a tool or sequence of tools that will enable the workflow to run faster? It takes about 40 mins to run and I am seeking to have this automated process faster.
What is the process that creates the yxdb file? Can it be simplified or altered to create a file with just the records you need? This would simplify it in the sense you wouldn't be writing and reading the large file, just creating a separate file while the large file is also being created.
Unfortunately the yxdb file is created from a limited direct connection that only one person has access to but does not have task capacity to make modifications as they have furthered their career in a different business unit.
Unless you're reading across a VERY slow network, opening the db with 9M rows should only take a few seconds. I created one 9M rows and 20 cols, stored it on my network drive which is 300 miles away (8ms lag) and it still opens within 5 seconds.
The 40 minute run time is more likely to be dependent on what your workflow does once the data is loaded. Any chance you can share your workflow? To echo @aguisande, we can't optimize what we can't see.
Well we got the picture. Can't read any of it, but i'm sure that the Community server's fault and not yours.
It looks pretty straightforward from what I can see. Joins and summaries can take a long time so make sure you filter out as much as you can before hand.
There's one avenue that you make consider, but it only works if your data and workflow support it. I know from your comments that your big data file has time series data in it. If your historical data is static and your workflow doesn't do any large time frame averaging, i.e. compare the current month with the average of all data, you can pre-calculate and store your historical data. That way your work flow becomes
1.process the current data,i.e. this week, month, year or what ever is new
2.union with the precalculated results for prior data.
3.continue from there
Of course you'll have to schedule something to periodically reprocess the old data, adding in new data as it becomes available, but that can be done offline.
I have also found on certain datasets, adding in an auto field early on has also optimized file size as it will choose more appropriate field types than what your DB might have data stored as. This might be helpful in addition to the other suggestions on here.