Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

AUC value in ROC Chart

derekt
5 - Atom

Hi Experts,

 

When I tried to use Logistics Regression in Alteryx, there is a result to show ROC Chart.

 

Since I require to get AUC (Area Under Curve) to know the model performance, may I know how to get the AUC value in the ROC Chart?

 

In R, it's very simple to get that value. Below please find the R Code and attached graph for your reference:

mydata <- read.csv("https://stats.idre.ucla.edu/stat/data/binary.csv")
mylogit <- glm(admit ~ gre, data = mydata, family = "binomial")
summary(mylogit)
prob=predict(mylogit,type=c("response"))
mydata$prob=prob
require(pROC)
g<-roc(admit ~ prob,data=mydata)
plot(g,print.auc=T)

 

Appreciate much for your advice! Many thanks.

 

Regards,
Derek

14 REPLIES 14
jamie1
5 - Atom

Ah yes - that seems to be what is happening with my model also.

 

Many thanks for the help, great to know what was causing the issue!

mbarone
16 - Nebula
16 - Nebula

Welcome!

carguerriero
5 - Atom

I have the same error!

 

I understand now it depends on the user, could you explain how did you do it in details?

jamie1
5 - Atom

Hi carguerriero,

 

Check to see if you have any levels in a categorical variable that exist in the test data set, but that did not appear in the training data set.

 

This was the case with my data, and the model did not know how to deal with this new categorical value. 

mbarone
16 - Nebula
16 - Nebula

carguerriero,

 

I published an example in this thread, about 3 or 4 posts back.  I've copied and pasted them here:

 

Jamie - after much research and playing around and digging............for me, this happened to be (ugh, hate to say it) user error.

 

What was happening is this.....

 

In the data set I used to build the model, some of the levels of the categorical  variables that were in the Evaluation data set were NOT in the Validation data set.  So when you use the Model Comparison Tool, it can't find some values in the Eval set that were in the Validation Set.

 

For example:

 

Variable "Business Type" in the Evaluation set has levels of "Pizza Shop, Auto Repair, Glass Cleaning".  But in the Validation set, the levels are "Pizza Shop, Auto Repair, Glass Cleaning, Car Wash".

 

When it goes to do the Model Comparison, it looks at all the levels and sees that it can't trace all the ones in the Validation set back to the Model itself, which was built using the Eval set.

 

I've built into my model builds a step where I check the Eval set categorical variables levels against the Validation set categorical variables levels.  If there are mismatches, then I force some observations in so all levels in the Eval and Validation sets are accounted for.

 

Labels