I am joining data from 2 different sources and I need to use the 2nd source to update the first. The first data source is some database data and the second data source is a csv with items that need to be updated within the database data (these updates could be new rows, changes in value to existing rows, etc.). The first source has 50+ columns and the second source will have varying amounts of columns.
I need to join/merge these two data sources and overwrite values from the database with the values in the csv when the column names are the same and add in any data that is missing
Ex:
DB
key | col1 | col2 | col3 | col4 | col5 |
1 | 0 | 300 | string1 | null | null |
2 | 1 | 250 | string2 | null | null |
3 | 1 | 200 | string3 | null | null |
4 | 0 | 225 | string4 | null | null |
csv
key | col2 | col3 | new col |
2 | 75 | string two | b |
3 | 25 | string three | c |
5 | 50 | string five | e |
output
key | col1 | col2 | col3 | col4 | col5 | new col |
1 | 0 | 300 | string1 | null | null | null |
2 | 1 | 75 | string two | null | null | b |
3 | 1 | 25 | string three | null | null | c |
4 | 0 | 225 | string4 | null | null | null |
5 | null | 50 | string five | null | null | e |
I need for this to be pretty dynamic since the columns that are updated will vary and also because of the amount of columns in the db. If anyone has any thoughts on this or can provide any insight on how I could do this it would be greatly appreciated.
Solved! Go to Solution.
Hi @iwilliams
Use join, union and dynamic select tool combination like below. It should dynamically accommodate new columns. If it doesn't let me know.
Workflow:
Hope this helps : )
Hello @atcodedog05 thanks for the quick reply. Your solution is not quite what I am looking for. In the final output the col2 should have the updated values for 2 and 3 (250->75 and 200->25) and in col3 there should be updates for 2 and 3 (string2->string two and string3 -> string three).
The last thing I tried to do try fix my issue was to take columns that have the "right_" or "input #2" prefix and use a formula to update the corresponding column without the prefix - but I struggled with finding a dynamic way to do this
So just as an example, after the join we would have some duplicate columns and I could try and take two columns like this:
col2 | Right_col2 |
300 | |
250 | 75 |
200 | 25 |
225 | |
50 |
Then use if-then formula to update col2 and after that I can discard Right_col2
col2 | Right_col2 |
300 | |
75 | 75 |
25 | 25 |
225 | |
50 |
But as I said before this would have to be done dynamically for each column with a corresponding "Right_" or "Input #2" match and I haven't figured out how to do so far
reversing the order of the inputs gave the desired output, thanks for your help
Happy to help : ) @iwilliams
Cheers and have a nice day!