Classification Metrics for Imbalanced Datasets
This blog is in continuation of Why Accuracy is Bad Metric for Imbalanced Datasets
Let us consider the same example as assumed in the previous blog.
Example: Consider a loan default prediction problem which has a total of 1000 data points out of which 100 are ‘default’ and remaining 900 are ‘Not default’. The ratio of ‘default’ to ‘Not default’ is 1:9. This is an imbalanced dataset. Let us consider a dumb model that predicts ‘Not default’ for all data points. Below shows the confusion matrix.
Let us calculate the F1-score for the above model.
Precision tells us that, of all the predicted values, how many of them are actually correct. Recall tells us that, of all the true values, how many of them are correctly predicted. F1-score is the harmonic mean of Precision and Recall. One important thing about harmonic mean is that it will be closer to the smaller value.
Note: In our case, correct implies default class
As TP = 0, F1-score, Precision and Recall will be 0. So, our 90% accurate dumb model has F1-score of zero.
Now, let us flip a coin and predict default when HEADS and Not default when TAILS. So, ideally we should get the below confusion matrix:
Now, the accuracy is 50%, while Precision is 0.1, Recall is 0.5 and F1-score is 0.1667. We can see that F1-score is closer to smaller value, i.e., Precision.
Let us consider some model (namely ML model) with the below confusion matrix:
For the ML model, Precision is 0.629, Recall is 0.85 and F1-score is 0.723.
In this way, F1-score of various models can be compared. Sometimes, if False Positives are more important, we take a look at Precision and if False Negatives are more important, then we look at Recall. But looking only Precision or Recall in isolation may mislead the comparison especially when either FP = 0 or FN = 0. It is better to look at all the 3 metrics (Precision, Recall and F1-score) before getting into conclusion.
Another thing we can look at, for model comparison, is confusion matrix from where we calculate Precision, Recall and F1-score.