Hello!
I am trying to automate the analysis of a survey - for context, we have a dataset of all the questions, the options, and the codes that are used to encode/decode the data. I have 5 years worth of survey responses that have been coded in different ways by different people, and I am establishing a way to automate the processing to standardize them as well as future surveys. I have a question order dataset for each year to match with those in the survey responses, as well as a master dataset to categorize the questions based on the analysis objectives - they have the question and answer code numbers in common.
The issue I am running into is, the survey has multiple choice questions, Likert scale questions, numeric entry questions, and text entry questions. If I Join them with both identifiers, the numeric and text entry questions are lost. If I join them with only one, there is too much duplication. I cannot join by 2 and then use a Union later, because it does not fully capture the other fields in the master datasets.
I made some fake data for this and attaching it as a .yxmd file.
I have a brute force way to do this with joins, unions, and filters, but I'm wondering if there's a more elegant solution - is there a tool that will capture partial matches? Am I overthinking this and/or missing something glaringly obvious?
Thanks!
Nivi