Hi Team,
I am trying to put together a model that can predict whether an article should be selected for publishing. We have lots of articles we write but only a handful are chosen for publishing. These are based on certain criteria such key words and the content of the paper. In terms of the data, I have a column with a list of articles content and another column indicating if the article was chosen for publishing (i.e. Yes/ No column). I would like to use machine learning to predict which papers are more likely to be selected for publishing based on what has been done in the past.
Any ideas if this can be done in Alteryx and how you would go about it? How would you set up the work flow?
Many Thanks
B
I think you should check out the Data Science Learning Path which covers the various models that are available in Alteryx.
https://community.alteryx.com/t5/Learning-Paths/Data-Science-Learning-Path/ta-p/504157
Dawn.
Shortest answer:
Given the lack of additional variables, for a single column you better just do a standard hypotheses testing on proportion.
Slightly short answer:
From what you have described, it's a classification problem, so just try all the model. Assume you have less than 1 million rows, it shouldn't take you long with just one feature column.
Long answer:
If you have more variables that can be used to predict, then ML models are likely to be more useful. Although you need to decide what question do you want to answer? Is this paper going to be published vs Why is this paper being published? Also given the low count of positive case (publication), there is more work to be done about sampling your data.