append output with existing records

Question

Hi All,

Let' say I have 2 input csv files: INPUT A and INPUT B, which contains records of last 6 months. Now I apply some transformation steps in both the files and do a inner join and get an output file: OUTPUT Z.

Now next month, I'll again receive INPUT A and INPUT B files with last 6 months of records as per next month's date.

What I want from the output is that when I update next month's INPUT files, status gets updated for current records and additional records for new 1 month gets appended along with last 6 month's records. Over all I should have 7 months of data with the latest status.

Request your assistance in the resolution.

Prometheus · Answer

@shikhar6339 Please let me know if you need more screenshots to help you flesh out the solution.

This one is just a representation of what you already do with your workflow to create OUTPUT Z, making it your historical file.

In this one, the top left container is again your normal process with two input files, except now they don't create OUTPUT Z, but instead join to OUTPUT Z. The records that come out of the J output anchor on the second Join are records that are in the new data but are in the old data as well. I wanted to match on them so I don't have duplicates in my history. The records that come out of the L anchor are new records. You can union them to the outputs of J and R to give you your history again but with new records. You can overwrite your historical file. With not so many thousands of records, you can use Excel as your output, but if you're going to be doing this with millions of records, you can use YXDB.

shikhar6339 · Answer

Hi,

Thank you for your response. Could you please attach the snapshot of the workflow for me to get a view since I won't be able to download the attached flow due to compliance restrictions. That would be of great help!!!

Prometheus · Answer

@shikhar6339 I'm attaching two workflows. In the first workflow, you're creating OUTPUT Z to be used as your historical file. In the second workflow, you'll have new inputs A and B to join then you can join them to your historical data. The data that comes out of the J output is already in history. The data that comes out of the L output is new. You can Union those data streams to create a new history that includes the old history. You can also join the R output, if you're worried that there will be records in history that won't show up in the new data.

Appending Historical Data1.yxzp

Appending Historical Data2.yxzp