Hello
I have got csv files into two different folders. The files have same names in both the folders, they just have the additional SUFFIX different.
The names of the files present in this Folder 1 are like this -
20160530-Google.csv
20160531-Google.csv
and so on, for each date.
The names of the files present in this Folder 2 are like this -
20160530--Apple.csv
20160531--Apple.csv
and so on, for each date.
I need to append the data from both the files into a single file, such that one file will have the data for -
20160530-Google.csv
20160530-Apple.csv
and other file will have the data for -
20160531-Google.csv
20160531-Apple.csv
Basically, right now I have two files for each date and I want to have just one file for each date.
I have to do this on thousands of file present inside the Folder 1 for Google and Folder 2 for Apple.
The data schema for both folder files are DITTO SAME.
Can someone please suggest which would be the best way for doing this in Alteryx? Please remember I need to repeat this process for 1000+ files. And the data from 20160531-Google.csv should get appended with the data from 20160531-Apple.csv only. I do not want to end up mixing the data of different dates with each other.
I have attached the sample workflow having the sample data.
Any ideas are welcome.
Thanks a lot
Solved! Go to Solution.
Thanks for the reply @gabrielvilella
I am sorry this is not practical in my situation, because every single csv file has got MILLIONS of records in it and I have to do this for thousands of csv files. So doing the union for such huge data is not practical for my machine.
That is why I am looking for some solution in which I can simply do the union one by one based on the respective FILE NAMES, as the initial file name is ditto same for the two files which needs to be unioned. Only their suffix is different. If this is not possible, then I am also open to the idea to first rename the files, so that they have DITTO SAME names, or if I have to keep the files from both the folders, then I can do that as well. But I need to perform this step one by one and repeat it through some Batch Macro Process etc.
I prepared very tiny dummy data files by manually typing a few rows to just give the idea about the data schema. The real files are Huge in size. Sorry for not being clear about this in the original post.
Any ideas regarding how this could be done are welcome.
Best Regards.
Your are correct, if you have a large dataset this approach will not be the recommended one. The solution would be to first list all file names from each folder using the Directory tool, extract just the date from the file name, have a list with all the unique values for the dates and use this list as control parameter on a macro that will take in all the full paths for each file, read just the one that matches the dates on the name, then union and create the output. See attached.
@gabrielvilella Thank you so much for the updated workflow. It works perfectly.
Although I must say that I was a bit overwhelmed initially, looking at the number of tools that have been used in the workflow, because I was thinking that this looks like an easy task and would be done with minimal tools. Since I am new in Alteryx so I was not much familiar with a few tools used in this workflow and therefor I had to read about those tools first, in order to make complete sense of the workflow.
Thank you so much for making me learn these new tools and methods. Your help is greatly appreciated.
Best Regards
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |