Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

BUG?; Why do I get conflicting preformance measures in my Logistic Regression?

Atabarezz
13 - Pulsar

 

 

I attached a sample workflow where;

  1. I load a sample dataset (german credit in the regression sample)
  2. Calculate Yes/No, ratio of defaults
  3. Split estimation and validation samples
  4. Trained logistic model on estimation sample
  5. Checked performance measures thru estimation sample running Lift Chart Tool and Model Comparison Tool
  6. Then check performance measures thru validation sample for proper measure again using both Lift Chart Tool and Model Comparison Tool...

 

This is the workflow;

Picture1.png

 

Here is the lift curve for estimation sample, AUC is 0,69, Gini is 0,395 (2AUC-1=GINI should be 0,38 by the way hmm a minor mismatch)

 

Picture2.png

 

 

 

Something awkward happened and Model comparison gave us AUC=0.8867 not the above 0,690777

 

Picture3.png

 

 

Here is the lift curve for validation sample this time, AUC is 0,656, Gini is 0,314 (2AUC-1=GINI should be 0,312 a minor mismatch again but OK)

 

Picture4.png

 

 

 

Second awkward thing happened and Model comparison gave us AUC=0.8221 for validation sample not the above 0,656522!!!

Picture5.png

 

 

So which one is true?

  • measures from Lift Tool
  • or measures from Model Validation Tool

 

#logistic #regression #liftchart #modelcomparison #AUC #Gini

4 REPLIES 4
Atabarezz
13 - Pulsar

Here is the workflow with data...

 

CristonS
Alteryx Alumni (Retired)

hi @Atabarezz

 

Cumulative captured response (Lift Chart) and ROC (Model Comparison) curves are not based on the same metrics, and handle data differently. The cumulative captured response curve works with decile level aggregates, while the ROC curve works with individual records. As a result, the results should not be identical across the two curves.

 

Please let me know if you have further questions, thanks!

Atabarezz
13 - Pulsar

Thanks for clarifying;

 

  • "Area" mentioned in the Lift Chart Tool is actually called "Area Under the Gains Chart"
  • "AUC" mentioned in the Model Comparison Tool is called "Area Under the Receiver Operating Curve"

Got it... I suggest mentioning this in the tool and Help files as well and even put a comparison...

 

Best

CristonS
Alteryx Alumni (Retired)

thanks, @Atabarezz, you're totally right.  I will work with our tech writing team on this.  Perhaps a KB post, as well...

Labels
Top Solution Authors