community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
#SANTALYTICS

The highly anticipated Alteryx Community tradition is back! We hope you'll join us!

Learn More

Optimization

Meteoroid

I have yxdb input for my workflow, there are 9 million plus rows with 20 plus columns that I only need about 10 of. Is there a tool or sequence of tools that will enable the workflow to run faster? It takes about 40 mins to run and I am seeking to have this automated process faster.

Bolide

Hello @CAHkdg24,

 

What is the process that creates the yxdb file? Can it be simplified or altered to create a file with just the records you need? This would simplify it in the sense you wouldn't be writing and reading the large file, just creating a separate file while the large file is also being created. 

Alteryx Certified Partner
Alteryx Certified Partner

Hi @CAHkdg24 ,

 

I'd start using the SELECT Tool to eliminate all the fields you don't need.

If you post a copy of the workflow I can see deep and maybe we can help.

Best

Meteoroid

Unfortunately the yxdb file is created from a limited direct connection that only one person has access to but does not have task capacity to make modifications as they have furthered their career in a different business unit.

Alteryx Certified Partner
Alteryx Certified Partner

Hi @CAHkdg24 

We can still work with the actual YXDB.

What I asked is to see a copy of your workflow (the one that uses that .yxdb as an input) to review if there's something we can do to shorten the processing times.

Meteoroid

I attempted a select tool, filtered to just 2019 also

Nebula
Nebula

hi @CAHkdg24 

 

Unless you're reading across a VERY slow network, opening the db with 9M rows should only take a few seconds.  I created one 9M rows and 20 cols, stored it on my network drive which is  300 miles away (8ms lag) and it still opens within 5 seconds.  

 

The 40 minute run time is more likely to be dependent on what your workflow does once the data is loaded.  Any chance you can share your workflow?  To echo @aguisande, we can't optimize what we can't see.  

 

Dan

Meteoroid

Below is a screen grab of the workflow

 

 

 

WF Snip.PNG

Nebula
Nebula

Hi @CAHkdg24 

 

Well we got the picture.  Can't read any of it, but i'm sure that the Community server's fault and not yours.  

 

It looks pretty straightforward from what I can see.  Joins and summaries can take a long time so make sure you filter out as much as you can before hand.

 

There's one avenue that you make consider, but it only works if your data and workflow support it.  I know from your comments that your big data file has time series data in it.  If your historical data is static and your workflow doesn't do any large time frame averaging, i.e. compare the current month with the average of all data, you can pre-calculate and store your historical data.  That way your work flow becomes

 

1.process the current data,i.e. this week, month, year or what ever is new

2.union with the precalculated results for prior data.

3.continue from there

 

Of course you'll have to schedule something to periodically reprocess the old data, adding in new data as it becomes available, but that can be done offline.

 

 

Dan

 

    

Asteroid

I have also found on certain datasets, adding in an auto field early on has also optimized file size as it will choose more appropriate field types than what your DB might have data stored as. This might be helpful in addition to the other suggestions on here.

Labels