Hello
I have got thousand plus CSV Files in a single folder. All the files have got exact same schema, same columns and data structure. There is one file for each date, containing all the data for that date. And everyday a new CSV file is being copied to that folder which contains the data for the new date.
The CSV files are named in this format YYYYMMDD.CSV and the FileName is also saved as a column in the CSV files.
I have designed a workflow in Alteryx which does all the data cleaning and modification steps according to my requirements. Then I have saved it in the form of a BATCH MACRO.
Whenever I have to add a new column to these CSV files or have to do any other type of modification, then I simply run the Batch Macro, and it applies all those new changes to every single CSV Files without any problems of any kind.
Now my current problem is that, I have decided to move from the CSV File Format to the Sqlite file format because of the advantages of Sqlite file format. My CSV files are really big in size and they contains many million rows of data in each file, therefor I need to save them in Sqlite format. I cannot use Alteryx .yxdb because that does not support any type of APPEND JOB etc.
Can someone please suggest me the approach which I should use in my case, like what type of settings should I do in the Output Tool, so that I am able to run my Batch Macros on them, "which are already working fine for my csv files".
Which of these options will work best for my current requirement of Batch Macros -
1. Separate Sqlite Database for each CSV File Output.
2. Same Sqlite Database but Separate Table for each CSV File Output.
3. Same Sqlite Database and Same Table for each CSV File Output. ------ this will act like an Append Job.
I am totally new to using the Sqlite file format, so any more tips or tricks for this file format are most welcome. Please share any links that could help me to use this file format more efficiently with the Alteryx Workflows.
I have attached the sample CSV data files containing very small number of rows. Actual files have millions of rows in each file. If any more information is needed from my side, then please let me know.
Thanks and Best Regards
Solved! Go to Solution.
Please note that my MAIN CONCERN is that I should be able to run Batch Macros on these Sqlite files without any issues, just as I am being able to do with the CSV files currently.
I need to make a lot of changes/modifications to these thousand plus CSV files on a regular basis and by using Batch Macros, I am able to apply the new modifications on all of them very easily.
Any input is welcome guys. Please suggest something.
Are there not many users of SQLite files in Alteryx or what?
I thought this would be a pretty common scenario and many guys would be using either one of the three scenarios that I have mentioned above.
Or if there is no way possible for using the Batch Macros with SQLite file formats, then I will have to dump the whole idea and go back to csv files itself.
Please suggest.
I am not really sure if my question is not clear, or if it is something extremely difficult to do in Alteryx, as no one has yet given a single reply.
Since this step is very essential for me and without this decision, the whole project is stuck, so I have decided to create a small sample batch macro example itself, which is working perfectly fine. My requirement is that how do I convert this batch macro, so that it works with Sqlite data files, instead of csv files.
Any suggestions or ideas regarding how this could be done, will be greatly appreciated.
@Aakash55
I think it is working now.
I moved the action tool to the formula tool to change the path for the output.
In the output tool the append File/Table works only for CSV, so we have to provide the full path.
Hi @Aakash55
The main concern with this use case is it is not completely clear.
1. My first question is each time you run a batch is it an append operation or overwrite operation. Because if it is append and there are new columns coming in append wont work since the existing table doesn't have the column to accommodate. So the option is to use overwrite to accommodate.
2. If it's overwrite you can easily use .yxdb or just do a SQL Lite overwrite table. If its append then again the same concern the possibility is read all the old/existing data into Alteryx and use the append tool to append the newly modified data (old data+new data) and now perform overwrite file. This should be possible in any filename .yxdb or SQL Lite
Let me know your thoughts on this.
Hope this helps : )
Thank you so much Qiu. You have successfully converted the csv file based batch macro into the Sqlite file based one.
Deeply Appreciated.
As atcodedog05 mentioned, my requirement might not have been expressed clearly. I am trying to explain it with an example. I will post detailed reply in a few minutes.
Best Regards
Thank you so much for your reply @atcodedog05 I am sorry for not being totally clear with my user case.
I have tried to explain my user case in a new way, as follows -
Suppose my current CSV file has got 10 columns in it. And on a daily basis, I have to simply APPEND the new data to the database table for each new CSV file.
But after a few days, a new requirement arises for which I have to create 2 more new columns into the csv file, therefor now the total columns becomes 12. And because of this, I have to recreate all the data for these 12 columns for all the older csv files as well and append all this data of 12 columns to the table. Then on a daily basis I will simply APPEND the new data to this table.
Then again after a month, some new requirement arises for which I have to create 3 more columns, therefor now the total columns becomes 15. And because of this, I have to recreate the data for the 15 columns for all the older csv files as well and append all this data of 15 columns to the table.
This way, as the time passes by, the number of columns keeps on increasing in my database table, because this is a work in progress and new columns keep on getting added to the original data on a regular basis. And this is going to remain like this for the next few months at least.
Now, under such a scenario, which of these options will work best for my current requirement of Batch Macros -
1. Separate Sqlite Database for each CSV File Output.
2. Same Sqlite Database but Separate Table for each CSV File Output.
3. Same Sqlite Database and Same Table for each CSV File Output. ------ this will act like an Append Job.
The current solution provided by Qiu, is doing it with the Format 1, as it is generating the new Sqlite file for each csv file. I am doing various experiments with it to figure out ways in which I will be able to accommodate it to my need in which new columns get added after a few days.
I will have to make use of these newly created Sqlite files, in the Input Tool, going forward.
If you could share any insights on this, that would be very helpful.
Thanks a lot.
Hi @Aakash55
When you say 2 more columns are added are these columns added with data in them or as blank? If yes they are added as data in them how exactly are mapped. You can something like the below. Let me know your thoughts on this.
Hope this helps : )
Thank you so much for the inputs.
When I say new columns are added, I mean that the new columns are created from the already existing columns from the original csv files. Just as we create new calculations by using formula tool for calculating the change, % change etc.
No new file is being used for creating these new columns. They are all based on the older data columns itself.
And I am planning to switch over to the Sqlite file format completely. So going forward both my input, as well as output files will be in Sqlite formats, if everything goes smoothly.
If any more information is needed, please let me know.
User | Count |
---|---|
18 | |
14 | |
13 | |
9 | |
8 |