Aloha Alteryx Community,
I am working on an issue and I am wondering if there's a more efficient way to solve it.
I have a directory with 5847 compressed CSV files in gz format. Currently, I am using a dynamic input tool to read all these files, apply a filter based on the state, and then write the filtered data to YXDB format. I am generating one output file per input file, resulting in 5847 YXDB files.
The problem with my current approach is that it reads all the files first and applies the filter, and finally starts writing the output. This linear workflow isn't efficient as the dynamic input tool needs to read all the files before any writing can begin.
I am looking for a solution to run this workflow concurrently or iteratively, i.e., read one file, apply the filter, write the output, and then move onto the next file.
I have tried using a batch macro to handle this, but the results were the same. I've been struggling to build an iterative macro that can handle this one file at a time process.
The end goal of my project is to transfer the filtered data to a SQL server. Given the volume of the data (~3.5 billion rows), waiting for all the files to be read before beginning the writing process is highly inefficient.
Does anyone know of an existing macro in the gallery that can help with this? Or would building an analytical app be a viable solution for this issue?
Any help or suggestions on how to optimize this process would be greatly appreciated.
Thanks in advance!
PK