Alert: There is a planned Community maintenance outage October 16th from approximately 10 - 11 PM PST. During this time the Alteryx Community will be inaccessible. Thank you for your understanding!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Slow Performance from Calgary Database

schaferc
7 - Meteor

Hi, I'd like to learn under better under what circumstances a Calgary database is expected to outperform a .yxdb.

 

I am working with a large dataset of 64 million records with several dozen columns. I'm getting inconsistent performance.

 

In this first example my Calgary database loads slower than a .yxdb despite applying a pre-filter (on Brand Name, which has about 1,250 distinct values) that eliminates about 97% of the records. This is more surprising given that my Calgary database was created using a subset of columns from the .yxdb. Run times:

Calgary: 2:59 minutes

.YXDB: 1:13 minutes

 

calgary diadora.PNG

yxdb diadora.PNG

However, in my second example, my Calgary database outperforms when I apply a pre-filter to a different field (Canadian province, about 10 distinct values).

Calgary: 0:17 minutes

.YXDB: 1:24 minutes

 

 

calgary ab.PNG

yxdb ab.PNG

Any tips and advice appreciated, thanks!

3 REPLIES 3
CharlieS
17 - Castor
17 - Castor

Hi @schaferc 

 

I think the biggest component here is the indices the Calgary file was built with. When the Calgary file is created with the Calgary Loader tool, there are settings to control which fields and what types of indices are created. These field indices are what make the selective loading/joining of the Calgary records possible. Fun fact, yxdb files support field indices indices, but only on the spatial fields (and are created by default). That's why performing a spatial match with a yxdb file as the universe is so fast.

 

If you don't have access to the workflow that created the indices, you can browse to the Calgary file folder and see what .cyidx files are present. I assume there isn't an index on the Brand Name field. Recreate the Calgary with that index and let us know how that goes. 

schaferc
7 - Meteor

Hi Charlie,

 

I do have an index for Brand Name field. 

 

schaferc_0-1648564877801.png

 

 

In a bizarre twist, I am getting better performance loading my entire Calgary file compared to applying the pre-filter. Loading the entire Calgary file (i.e., no pre-filters) had a load time of 0:53 min compared to 1:59 min when I set Brand Name is 'Diadora' (~3% of records). I will recreate the Calgary database, as well as set the index to high sensitivity (currently Auto). Will share the results shortly.  

 

schaferc_1-1648564902052.png

schaferc_2-1648565226133.png

 

Thanks.

schaferc
7 - Meteor

Update. No improvements from recreating the Calgary database and selecting high sensitivity for Brand Name. I do see very good performance from applying pre-filters to other fields like Province. Perhaps the performance is better because Province is being treated as a list to choose from, whereas Brand Name is string-based without a selectable list. 

 

But the fact that I get worse performance applying a Calgary pre-filter compared to loading the whole file bewilders me.

schaferc_3-1648568174497.png

 

Labels