Aloha Alteryx Community,
I am working on an issue and I am wondering if there's a more efficient way to solve it.
I have a directory with 5847 compressed CSV files in gz format. Currently, I am using a dynamic input tool to read all these files, apply a filter based on the state, and then write the filtered data to YXDB format. I am generating one output file per input file, resulting in 5847 YXDB files.
The problem with my current approach is that it reads all the files first and applies the filter, and finally starts writing the output. This linear workflow isn't efficient as the dynamic input tool needs to read all the files before any writing can begin.
I am looking for a solution to run this workflow concurrently or iteratively, i.e., read one file, apply the filter, write the output, and then move onto the next file.
I have tried using a batch macro to handle this, but the results were the same. I've been struggling to build an iterative macro that can handle this one file at a time process.
The end goal of my project is to transfer the filtered data to a SQL server. Given the volume of the data (~3.5 billion rows), waiting for all the files to be read before beginning the writing process is highly inefficient.
Does anyone know of an existing macro in the gallery that can help with this? Or would building an analytical app be a viable solution for this issue?
Any help or suggestions on how to optimize this process would be greatly appreciated.
Thanks in advance!
PK
good. I found it confusing what are you asking
priemti if you put a block until done tool and a browse on channel1 it will process all the files.
on channel2 you have 0s 5000 files available and apply a filter.
I don't understand if you need 5000 files or if you can use another file that can be appended...
print example
Hi Geraldo - thanks for taking the time to reply.
Stop until done will further slow down the writing process.
I am looking to write to the files as they are read and filtered. Currently, the workflow is waiting to read and filter all the 5847 files first, it then starts writing output files (output is also another 5847 files).
I am asking if there's a way that the workflow would start writing the files that are processed, instead of waiting to read all the files first.