Hi,
I am a fairly advanced Designer Desktop user and we are in my organisation started the journey to over time migrate to Designer Cloud. At the moment we are lacking som key tools and functionality to do the migration but will start to work as a hybrid with Cloud and Server until Cloud is fully developed.
All the learning paths are focusing on the Trifacta part rather than Designer Cloud.
My question is how the recommended way of working is with Designer Cloud. Normally I build my designer desktop flows step by step, run the flow to see to outcome of each step and understands the need of the next tool based on the outcome. The difference in Designer Cloud is that it works instantly with a sample data and shows values after each tool, but only showing part of the data. In order to full run the flow I need to attach a output tool .I can't really wrap my head around how I am supposed to work.
An example can be that I want to look at all the categories in a product dimension table, normally I would as I work in a explorative way I would add a summarize tool and just group the data based on the category column, but now I only get a few of the categories as it makes a summary of the sample and not the full dataset.
Anyone that understands my question and have figured out a good method of building out your flows in Designer Cloud,
Looking forward to get the communitiy input on this,
JD
Hi @Jensdanerhall86,
Really good question. As a disclaimer: I come from the Trifacta side of things, with very little experience in Designer Desktop. So I look forward to the additional comments of wiser users!
Running the entire flow after every step (or change to a step!) may not scale well with respect to Cloud compute costs, especially for large datasets. Therefore, Designer Cloud uses the in-browser Photon client to dynamically update an up-to-10-MB chunk of the data with the result of each transformation (Note: if the dataset is <= 10 MB, then the entire dataset is shown).
In the Trifacta paradigm, the solution to the problem of exploring data >10 MB is to periodically collect and load a new sample. This is a job, proper, but the result is cached so that the in-browser updates from that point on remain performant. There are various sample types, including Random, Filtered, Stratified, etc. (the latter sounds like it would capture your use case). It looks like there is a comparable approach in Designer Cloud (Designer Experience) called the Sample tool -- see screenshot, attached. While I would say that the sampling types are not as rich as in Designer Cloud (Trifacta Classic) -- yet(?) -- perhaps varying the parameters in this tool will enable you to satisfactorily explore your data as you go.
Other options that come to mind are to...
I hope that this is helpful.
Put another way - Designer allows the developer to see results from the universe of their data. The underlying assumption is that the data is fairly small (ie less than a few million rows) and procesisng time is limited by first and foremost the physical limitations of your system (ram/processor) and your allocation. Designer Cloud operates on the assumption that the data you are building your process for is huge/changes often. It is not stateful and you do not pay for a dedicated warehouse/machine to allow for in-memory processing of whole files. This is similair to competitors/semi-competitors which operate via a workflow/recipe module and after each step you would need to write a new file if you want to see output.
I personally think Designer Cloud should try to be more like Alteryx and less like Trifacta - and require dedicated compute resources.