This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Do you use Alteryx in a language other than English? If so, we want to hear from you! Please help us improve the international experience of our products by participating in this 5 minute survey.
We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.
I have been using AYX for a couple of years now but am new to using the predictive tools.
Here is my situation: I am using data on container specs. I have about 2000 records but only 1500 has the full data (both categorical and quantitative variables). For the remaining 500, I am missing some fields but still have some matching fields. I was hoping to replace the missing data with SUBSTITUTE DATA from the closest matching container based on the matching fields.
Is it possible to find a "closest match" for the missing data with the data that is available? Is this possible when conducting something like a K-centroids cluster analysis? If not, where do I start?
The first step would be to use a filter or series of filters to isolate the 500 records that have missing data. Then you could either do a series of joins on a smaller subset of matching fields for exact matches on that subset, or potentially fuzzy matching. Do you have some sample (non sensitive) data that you could provide?
Attached is the dummy data with 1000 rows. The last three fields contain missing values that I will need to replace with substitute values from rows that are the closest match. Only rows that have missing values for all 3 need substitutions.