When I schedule a dataprep job for a certain flow and I'm importing the dataset from a GCS path, each time the job runs would it take just the new file added to GCS and apply the recipe to it or take all of the files from that GCS folder? If the later is true, is there a way to schedule a job that would be executed only upon a new file added to the source folder?
I can't find clear explanation for this in the documentation, would appreciate a clarification!
Thank you!
Solved! Go to Solution.
Hi @Marija Stojkovska?
When you schedule a dataprep job for a certain flow, it depends on the input you've given it.
Alternatively - if you'd want a job to be executed only upon a new file is added to the source folder - I'd advise you to read Victor's Blog, regarding "How to Automate a Cloud Dataprep Pipeline When a File Arrives"
After reading this article, you will be able to drag and drop a file in a folder, get your entire data pipeline executed and loaded in your data warehouse, and have up-to-date data in your reports and dashboards with a few simple clicks.
Hope this helps,
Amit.
Thank you @Amit Miller?. I'm trying the last option you suggested.
My pleasure.