Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.
SOLVED

Calgary Limitations?

macd279
7 - Meteor

Hello,

 

I'm on a team setting up a data warehouse / BI. Full disclosure - I'm the analyst writing queries in alteryx and am not a DBA. We will be using alteryx to both ETL from a transactional database and writing queries / reports. Data size will be around 100 GB consisting of dozens of tables with millions of records.

 

Having used calgary successfully on a smaller scale, I'm wondering if it is a viable option to use as our database? What would be the limitations of calgary vs a redshift / hadoop / oracle setup? I like calgary, because I'm trying to find a database solution that is more along the lines of self-service, instead of developer-heavy.

 

Thanks for your help.

5 REPLIES 5
MarqueeCrew
20 - Arcturus
20 - Arcturus

@macd279,

 

I have used calgary data files and can offer you some insights.  Here are a few quick bullets, for more insights you can PM me and I'll be glad to chat more in detail.

 

  • Calgary files are STATIC.  You can't insert, update or delete records.  Typically, you'll have a large data build once a period.
  • When loading a calgary file, if you have high selectivity fields indexed that you don't need then don't build them.
  • Calgary joins or inputs work fastest on smaller sets of data.  If you intend to lookup large percentages of your data (say 20% or more), then you might find that using a standard input/join is faster.
  • Calgary tends to use more memory.
  • Calgary data needs to be cleansed so that you don't index on fields with non-displayable ascii characters (e.g. Chinese).
  • I've seen some issues with advanced indexes where I've used "contains" logic in the calgary join.
  • With many fields of data, naming conventions are very important.

With great power comes great responsibility.

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
macd279
7 - Meteor

Thanks Mark! This clears up a lot.

Craig_Curran
6 - Meteoroid

Hi,

 

I have just started to use Calgary files due to the large rows of data it can hold.

 

I have created it so it takes a snapshot of data from one of our databases, then add today's date in a field, then each time you run it it appends to it, so we can see what changes etc. seems to be working okay would you not recommend to use a calgary file this way? if so why? and would you recommend any other formats I could use? looking at 700k rows of data each day, which in turn by end of the project could end up 200mill rows+ 

 

Have attached screenshot of how I am doing the append by using the Union tool 

AndrewPowell
7 - Meteor

Hi Craig,

 

I'd been looking at a very similar solution myself, wanted to use Calgary for storing a dataset of ~3 million records retrieved from an external source, but then running a delta query on that source and appending the results to the Calgary DB in a very similar way.

 

I'd be interested to know how you got on with this solution long term and did you encounter any issues?

 

Kind regards,

Andy

Craig_Curran
6 - Meteoroid

Hi Andrew,

 

yes seem to work fine with no major issues, final rows was around 120 Million, file was about 26GB.

 

still not sure if its the best way of doing it, but it did the job!  you also have Alteryx database file format, .yxdb which you can append data in the same way, that should be able to handle 3 million records with no problem. 

 

Thanks

Craig 

Labels