This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I have a product catalogue which contains different types of products, it includes the features, materials used, current price, etc. Basically around 20 features (most of them are categorical, such as material -> aluminum, wood, steel, etc.) that are available to be a predictor variables.
What I want to do:
To create a model that can predict the price range based on the catalogue data. Suppose I have a new product, I can enter the nearest features available and the output will be a price range with certain confidence level.
I tried to use linear regression with dummy coding to compansate the categorical data problem. However, the result is not good enough as it gives a very large range which is meaningless. I also checked some research paper that I can use NN/SVM to first reduce the number of predictor variables...but I am not sure how to implement.
Do you have any idea or what are the alternative approach?
It's hard to say much without data to play with, but 20 predictors doesn't seem like that many: I'm playing with a Kaggle dataset right now that has over 100 categorical predictors, plus another 20 some numeric predictors. A couple random thoughts which perhaps you've already looked into...
If price ranges are pretty high, you could do log(price) first, and predict that instead
Try some other model types... the nice thing about Alteryx is it's mostly just drag and drop to try different models. (Boosted Model tends to do well in almost any setting, though parameter tweaking will be necessary; Google parameter tuning for R's "gbm" since that's what is used behind the scenes; the config panel in Alteryx's Boosted Model tool should match up pretty well with docs you find online.)