Hello everyone,
I am in a bit of a dilemma. Firstly I am new to the python tool and secondly, I am not sure how it works aside from it looking like a normal Jupyter notebook.
My objective is simply to impute missing data using the following prebuilt function from sci-kit learn. This would be an alternative to the imputation via mean, mode, or median. I have had positive results using this:
https://machinelearningmastery.com/knn-imputation-for-missing-values-in-machine-learning/
So here is what I can do in Jupyter:
- Load data frame from a CSV(using the Titanic dataset)
- run a function that essentially gives me the variants of odd neighbors that then are collected in a pandas data frame and then we get the average of the Root Mean Squared Error. That is a float. We then round it up to the nearest whole number. We use that number as our best choice for the nearest neighbor and proceed to impute.
- once imputed the nans are changed and we can merge back with the rest of the dataset that would have all the nonnumerical categorical data.
What I have gotten done in Alteryx:
- Modified the script to change the data frame loading to the incoming data from Alteryx
- Changed the data types and field selection
- created the final data frame and written it to the output
So far no cigar. It seems that it does not essentially run the notebook every time I run the notebook which I am not sure of. I guess my desire is to ultimately make this into something that could be used the same way as the impute tool. I have included the jupyter notebook and the workflow.
Any help would be appreciated in getting this to work. As the alternative is to just not use Alteryx for data prep or to pre-rinse the data with python before loading both which seem to defeat the spirit of a one-source solution. Am I going about this wrong? Should I be trying to build this out in the python sdk?
**also why does the alteryx notebook keep not saving the library addition to the code. Maybe I am missing something but it seems like it does not save changes.
if you have issues with the code you might need to add the following line to the imports.
from ayx import Alteryx