Hello. I have a workflow that is pretty complex and runs in about 8 minutes. At the end of the workflow it produces a dataset with 9 columns and over 8,500 rows, so not too terribly huge. So I can easily have it create a flat file, but I really need it to upload to a Hadoop/Impala environment.
I'm using a .indbc file in-db connection. The 1st step Data Stream In creates a temporary table, then the 2nd step does an Overwrite Table (Drop) of the view specified in Hadoop. From the screenshot below it works like a charm to just upload the flat file via this method.
But when I add those In-DB tools to the end of the complex workflow, it runs through the data wrangling several minutes fine, but when it gets to these In-DB steps it just spins forever and doesn't complete. Can anyone advise what might be going on here? Why would it work fine uploading the data from a flat file vs the other route when they are the same dataset?
Hello @Adam_Dooley
I would recommend starting with this KB article. This should help to make sure your data writes out faster (which it looks like is the issue!)
I hope this helps!
TrevorS
This is helpful, thank you for posting. We actually have another method which uses a macro for the upload that has worked well. The Data Stream In tool seems to work well some times, so I will post an update here if we ever figure out definitively the cause of the problem.