Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Prediction Classifier

Bushra_Akhtar
7 - Meteor

Hi Team,

 

I am trying to put together a model that can predict whether an article should be selected for publishing.  We have lots of articles we write but only a handful are chosen for publishing.  These are based on certain criteria such key words and the content of the paper.  In terms of the data,  I have a column with a list of articles content and another column indicating if the article was chosen for publishing (i.e. Yes/ No column).  I would like to use machine learning to predict which papers are more likely to be selected for publishing based on what has been done in the past.

 

Any ideas if this can be done in Alteryx and how you would go about it?  How would you set up the work flow?

 

Many Thanks

 

B

2 REPLIES 2
DawnDuong
13 - Pulsar
13 - Pulsar

hi @Bushra_Akhtar 

I think you should check out the Data Science Learning Path which covers the various models that are available in Alteryx.

https://community.alteryx.com/t5/Learning-Paths/Data-Science-Learning-Path/ta-p/504157

Dawn.

leozhang2work
10 - Fireball

@Bushra_Akhtar 

Shortest answer:

Given the lack of additional variables, for a single column you better just do a standard hypotheses testing on proportion.

 

Slightly short answer:

From what you have described, it's a classification problem, so just try all the model. Assume you have less than 1 million rows, it shouldn't take you long with just one feature column.

 

Long answer: 

If you have more variables that can be used to predict, then ML models are likely to be more useful. Although you need to decide what question do you want to answer? Is this paper going to be published vs Why is this paper being published? Also given the low count of positive case (publication), there is more work to be done about sampling your data.

Labels
Top Solution Authors