I have something I am trying to build but not sure what the optimal way of doing it is and looking for advice.
I have several yxdb. files containing UK postcodes, where each file is from a different month. The files are in a format YYMM.yxdb so are named like:
- 1911.yxdb
- 1912.yxdb
- 2001.yxdb
- 2002.yxdb
The number of files going into the process can change each time so the process needs to be dynamic. I can easily merge all the files into one file with a MONTH field if this is better.
I want the process to find when a postcode was introduced, and when it was terminated. If a postcode is in all the files, it will not have an introduced or terminated value. The current (very outdated) process joins the first two files on postcode, assigns an introduced or terminated date if the postcode doesn't exist on either and then moves on. Postcodes that existed on both, and newly introduced postcodes are unioned and then joined with the next file in the process where it repeats the process. Then all records are finally unioned back together (including all the terminated postcodes) into one big file.
- Join first two files.
- Middle and right output are unioned and joined with third file.
- Middle and right output of that are unioned and joined with fourth file.
- Repeat until all files done.
- Create one file file of all the left outputs and the final middle and right output.

Currently as you can see, this takes a lot of manual updating before running and the joins need setting up for the number of files there are. I want this to run dynamically. This feels like it needs an iterative macro because it needs to repeat this process for the number of files.
Anyone have an optimal suggested method to tackle this? I am reasonably confident with macros so any pseudocode for any optimal method to get me going would be useful! There are millions of records in each file so processing time is important.