Model Comparison - ROC Curves & AUC

INTRODUCTION

Whether you are a data professional or in a job that requires data driven decisions, predictive analytics and related products (aka machine learning aka ML aka artificial intelligence aka AI) are here and understanding them is paramount. They are being used to drive industry. Because of this, understanding how to compare predictive models is very important.

This post gets into a very popular method of decribing how well a model performs: the Area Under the Curve (AUC) metric.

As the term implies, AUC is a measure of area under the curve. The curve referenced is the Reciever Operating Characteristic (ROC) curve. The ROC curve is a way to visually represent how the True Positive Rate (TPR) increases as the False Positive Rate (FPR) increases.

In plain english, the ROC curve is a visualization of how well a predictive model is ordering the outcome - can it separate the two classes (TRUE/FALSE)? If not (most of the time it is not perfect), how close does it get? This last question can be answered with the AUC metric.

THE BACKGROUND

Before I explain, let’s take a step back and understand the foundations of TPR and FPR.

For this post we are talking about a binary prediction (TRUE/FALSE). This could be answering a question like: Is this fraud? (TRUE/FALSE).

In a predictive model, you get some right and some wrong for both the TRUE and FALSE. Thus, you have four categories of outcomes:

True positive (TP): I predicted TRUE and it was actually TRUE
False positive (FP): I predicted TRUE and it was actually FALSE
True negative (TN): I predicted FALSE and it was actually FALSE
False negative (FN): I predicted FALSE and it was actually TRUE

From these, you can create a number of additional metrics that measure various things. In ROC Curves, there are two that are important:

True Positive Rate aka Sensitivity (TPR): out of all the actual TRUE outcomes, how many did I predict TRUE?
- \(TPR = sensitivity = \frac{TP}{TP + FN}\)
- Higher is better!
False Positive Rate aka 1 - Specificity (FPR): out of all the actual FALSE outcomes, how many did I predict TRUE?
- \(FPR = 1 - sensitivity = 1 - (\frac{TN}{TN + FP})\)
- Lower is better!

BUILDING THE ROC CURVE

For the sake of the example, I built 3 models to compare: Random Forest, Logistic Regression, and random prediction using a uniform distribution.

Step 1: Rank Order Predictions

To build the ROC curve for each model, you first rank order your predictions:

Actual	Predicted
FALSE	0.9291
FALSE	0.9200
TRUE	0.8518
TRUE	0.8489
TRUE	0.8462
TRUE	0.7391

Step 2: Calculate TPR & FPR for First Iteration

Now, we step through the table. Using a “cutoff” as the first row (effectively the most likely to be TRUE), we say that the first row is predicted TRUE and the remaining are predicted FALSE.

From the table below, we can see that the first row is FALSE, though we are predicting it TRUE. This leads to the following metrics for our first iteration:

Iteration	TPR	FPR	Sensitivity	Specificity	True.Positive	False.Positive	True.Negative	False.Negative
1	0	0.037	0	0.963	0	1	26	11

This is what we’d expect. We have a 0% TPR on the first iteration because we got that single prediction wrong. Since we’ve only got 1 false positve, our FPR is still low: 3.7%.

Step 3: Iterate Through the Remaining Predictions

Now, let’s go through all of the possible cut points and calculate the TPR and FPR.

Actual Outcome	Predicted Outcome	Model	Rank	True Positive Rate	False Positive Rate	Sensitivity	Specificity	True Negative	True Positive	False Negative	False Positive
FALSE	0.9291	Logistic Regression	1	0.0000	0.0370	0.0000	0.9630	26	0	11	1
FALSE	0.9200	Logistic Regression	2	0.0000	0.0741	0.0000	0.9259	25	0	11	2
TRUE	0.8518	Logistic Regression	3	0.0909	0.0741	0.0909	0.9259	25	1	10	2
TRUE	0.8489	Logistic Regression	4	0.1818	0.0741	0.1818	0.9259	25	2	9	2
TRUE	0.8462	Logistic Regression	5	0.2727	0.0741	0.2727	0.9259	25	3	8	2
TRUE	0.7391	Logistic Regression	6	0.3636	0.0741	0.3636	0.9259	25	4	7	2

Step 4: Repeat Steps 1-3 for Each Model

Calculate the TPR & FPR for each rank and model!

Step 5: Plot the Results & Calculate AUC

As you can see below, the Random Forest does remarkably well. It perfectly separated the outcomes in this example (to be fair, this is really small data and test data). What I mean is, when the data is rank ordered by the predicted likelihood of being TRUE, the actual outcome of TRUE are grouped together. There are no false positives. The Area Under the Curve (AUC) is 1 (\(area = hieght * width\) for a rectangle/square).

Logistic Regression does well - ~80% AUC is nothing to sneeze at.

The random prediction does just better than a coin flip (50% AUC), but this is just random chance and a small sample.

SUMMARY

The AUC is a very important metric for comparing models. To properly understand it, you need to understand the ROC curve and the underlying calculations.

In the end, AUC is showing how well a model is at classifying. The better it can separate the TRUEs from the FALSEs, the closer to 1 the AUC will be. This means the True Positive Rate is increasing faster than the False Positive Rate. More True Positives is better than more False Positives in prediction.