This seems like a trivial problem, but I was wondering what some best practices are.
I'm doing a bulk ingest from a single folder with CSV files into SQL Server 2012. Each CSV file contains the fields (Key, Value, DateIssued). I'm using the SQL Server Bulk Loader (ssvb) because the CSV files are massive. New CSV files are put in the folder every day, and I run the bulk ingest job once a week.
1) Upsert. Newer CSV supersede data in older files. I sort the directory by Create time and process the files in time order. In Alteryx's bulk loader, how do I specify that if the current DateIssued field (non-primary key) value is newer, overwrite the existing row in the SQL database, else insert a new row? (there isn't an update-if-newer in the ssvb tool)
2) Best practices for keeping track of files already processed. How do I keep track of the files I have already processed so I can skip them the next time the Alteryx job is run? Can I store the processed filenames in a table and use Alteryx to somehow skip all filenames in that table?
Thanks!
Solved! Go to Solution.
Hi @wcpx
I'm not sure about the upsert'ing question, but maybe one of our users with more SQL know-how could chime in regarding this. However, glacing at this documentation, I'm not sure if an Upsert process like that can be done using the bulk loader.
What may be possible here is using something for a primary key that will be consistent across files with different dates, and the "Update, Insert if New" option without the bulk loader.. That is, if each set of csv's has 100 rows, each with a PK id number, and these are the same the next day just with updated data, then the Update option would replace the older data. If a new PK id is introduced, it would append it to the table.
If this is not possible for your data, the only other approach I can think of is an In-DB process that allows you to filter the stuff to update or insert.
There are some great resources on this here in the Community, and In-Database also has a nice video up in our Virtual Training section.
Cheers,
Alex
Customer Support
support@alteryx.com
Thanks @AlexKo. That makes sense. I worked around it by bulk loading the data into a staging table, and then running a MERGE SQL to upsert the records into the final table.
Yes, the normal ODBC connection does support "Update, Insert if New" but it is too slow for my application (in my case, CSVs can have 100,000s rows, and there are thousands of CSVs).
An In-DB approach sounds like it could work -- I'm going to look into this more. I appreciate your time.