Why are changes in early stages/steps of a flow not quickly evaluated for later use in the flow when I go to wrangle the later stage? What are the best ways to design flows so users can successfully "refactor" through a wrangle at an early stages and see the evaluated results immediately at later steps in the flow? Are there certain scales of data and data types that are better to use in order to make evaluation in the steps of a flow work more efficiently?
Solved! Go to Solution.
Hi Tom,
Changes to a recipe do immediately propagate to downstream recipe steps so changes should get reflected when you go edit downstream steps/recipes.
There are, however, a few transformations that affect your dataset's schema that are dependent on the data present in the sample when you add them. Pivot, header, and values to columns are all 'data dependent' in this way. Changes to recipe steps before these steps won't change the columns they produce - they need to be edited and re-saved or deleted and re-added for changes to take effect. This is to prevent breaking schema changes based on changing data/samples though in some cases can require undesired manual intervention to resolve. We are thinking about ways of solving for this in upcoming releases.
If you're seeing other examples of changes not reflecting that is unexpected so can you please send a note with details to support@trifacta.com so we can investigate further.
Hi Tom, changes to earlier stages of a flow will require you to collect new samples in the later stages. The changes will be picked up in the new samples. You can also generate an output to validate at any recipe in the flow, and then use that output statically as a dataset going forward.
Athena
thanks all. this is good background. can you point me to a specific example (e.g. video, etc) of how to handle schema changes and resampling while flow refactoring?
Tom,
For a schema change, I have raised similar issue. pls go through below the below link about the issue.
https://community.trifacta.com/s/question/0D51L000058pyfiSAA/trifacta-source-dataset-do-not-get-update-after-datatype-change-or-column-addition-to-the-source-table
suggestions:
I could add more points, but let me know if this what you are looking for? thank you.