This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I'm on a team setting up a data warehouse / BI. Full disclosure - I'm the analyst writing queries in alteryx and am not a DBA. We will be using alteryx to both ETL from a transactional database and writing queries / reports. Data size will be around 100 GB consisting of dozens of tables with millions of records.
Having used calgary successfully on a smaller scale, I'm wondering if it is a viable option to use as our database? What would be the limitations of calgary vs a redshift / hadoop / oracle setup? I like calgary, because I'm trying to find a database solution that is more along the lines of self-service, instead of developer-heavy.
I have just started to use Calgary files due to the large rows of data it can hold.
I have created it so it takes a snapshot of data from one of our databases, then add today's date in a field, then each time you run it it appends to it, so we can see what changes etc. seems to be working okay would you not recommend to use a calgary file this way? if so why? and would you recommend any other formats I could use? looking at 700k rows of data each day, which in turn by end of the project could end up 200mill rows+
Have attached screenshot of how I am doing the append by using the Union tool
I'd been looking at a very similar solution myself, wanted to use Calgary for storing a dataset of ~3 million records retrieved from an external source, but then running a delta query on that source and appending the results to the Calgary DB in a very similar way.
I'd be interested to know how you got on with this solution long term and did you encounter any issues?
yes seem to work fine with no major issues, final rows was around 120 Million, file was about 26GB.
still not sure if its the best way of doing it, but it did the job! you also have Alteryx database file format, .yxdb which you can append data in the same way, that should be able to handle 3 million records with no problem.