There are two images attached.
Yes, that's probably the problem. The field that I want to parse is a string from the concatenate of four other string variables (system_no,school_no,race_alpha,grade_no) The school systems reuse the numbering sequence (0010), and the system designations have leading zeros on a three-digit code. Converting to numbers prior to concatenating produces mismatched duplicates such that different schools would share the same number. To create a unique identifier for the school, we have used a concatenate of the strings. My match will also rely on race ( and inconsistent alphabetical designation that changes by year) and grade (two digits) is named CRZYCON_DSTSCHGDRCG. It's an ugly mess, but I need the student matches to occur within the same school, within the same race and grade.
I'm comparing assessments and discipline rates for children of military families to their peers in their respective schools. The state did not provide a poverty marker, so we cannot match on that dimension. I'm exploring using the tile function as a substitute.
That field (CRZYCON_DSTSCHGDRCG) sorts beautifully in Excel, with the military/non-military field as a second criterion. If I have five military students in a school with X characteristics, I need five non-military students from that same school with X characteristics (already randomly sorted), and I will discard the unmatched. I tried this manually in Excel, but need a more efficient way to do it (600,000+ records, but only 9,000 military kids).