Do you use Alteryx in a language other than English? If so, we want to hear from you! Please help us improve the international experience of our products by participating in this 5 minute survey.

We are updating the requirements for Community registration. As of 7/21/21 all users will be required to register a phone number with their My Alteryx accounts. If you have already registered, you will be prompted on your next login to add your phone number.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Substituting Missing Data with Closest Matched Data

paultno
8 - Asteroid

Hello,

 

I have been using AYX for a couple of years now but am new to using the predictive tools. 

 

Here is my situation:  I am using data on container specs.  I have about 2000 records but only 1500 has the full data (both categorical and quantitative variables).  For the remaining 500, I am missing some fields but still have some matching fields.  I was hoping to replace the missing data with SUBSTITUTE DATA from the closest matching container based on the matching fields.  

 

Is it possible to find a "closest match" for the missing data with the data that is available?  Is this possible when conducting something like a K-centroids cluster analysis?  If not, where do I start?

 

Any help is appreciated!

5 REPLIES 5
BrandonB
Alteryx
Alteryx

The first step would be to use a filter or series of filters to isolate the 500 records that have missing data. Then you could either do a series of joins on a smaller subset of matching fields for exact matches on that subset, or potentially fuzzy matching. Do you have some sample (non sensitive) data that you could provide?

wwatson
12 - Quasar

There is a tool called Imputation in the preparation toolbar that might be useful to check out. It is designed to populate null value fields with values.

 

 

wwatson_1-1594731407431.png

 

PedrodeOl
8 - Asteroid

Hello @paultno,

 

Could you share a sample?

 

Regards

paultno
8 - Asteroid

Hello Everyone,

 

Attached is the dummy data with 1000 rows.  The last three fields contain missing values that I will need to replace with substitute values from rows that are the closest match.  Only rows that have missing values for all 3 need substitutions.  

 

Any help is greatly appreciated!

 

 

 

TimothyL
Alteryx
Alteryx

Hi @paultno @PedrodeOl @wwatson @BrandonB

 

 

We have built some new missing value imputation macros here: https://community.alteryx.com/t5/Data-Science/Expand-Your-Predictive-Palette-IV-Imputation-Beyond-Me...

 

Based on your secnario, the missForest macro would be a great fit for predicting the closest match data. Let us know what you think!

 

TL

Labels