Hi -
I have a large (1.7million records, 14GB) XML file. It has a lot of nested node structures. I've set up a parsing routine that does most of the work, but it takes ~3 hours to run on a pretty decent machine. I'll be running this daily, and I'm looking for strategies to improve performance. I was thinking to try some / all of the following but wanted to see if anyone has some good advice first:
1. split the file and run several smaller but identical jobs
2. since I've added a record id, after the main parse slit the resulting major OuterXML sections to their own jobs
3. somehow inspect the modified_time element within the XML and then (on day 2+) parse only the records that have been modified
Before I do this, I wonder if I'm missing a much more basic / fundamental approach change.
Also for #3, any tips on how to inspect and use an element like that would be appreciated.
Thanks!
Pete
