[Review] The Relationship Between Precision-Recall and ROC Curves
This is the first post of the series on "Statistics ❤️AI", in which I will be reviewing papers, ideas, advancement about statistical methods for evaluating, understanding, and interpreting AI algorithms.
When the number of negative examples greatly exceeds the number of positives examples, precision-recall (PR) curves give a more informative picture of an algorithm’s performance
For a given dataset of positive and negative examples, there exists a one-to-one correspondence between a curve in ROC space and a curve in PR space, such that the curves contain exactly the same confusion matrices, if Recall != 0
For a fixed number of positive and negative examples, one curve dominates a second curve in ROC space if and only if the first dominates the second in Precision-Recall space
Given a set of points in PR space, there exists an achievable PR curve that dominates the other valid curves that could be constructed with these points
It would be methodologically incorrect to construct a convex hull or achievable PR curve by looking at performance on the test data and then constructing a convex hull
As the level of Recall varies, the Precision does not necessarily change linearly due to the fact that FP replaces F N in the denominator of the Precision metric. Linear interpolation is a mistake that yields an overly-optimistic estimate of performance
Algorithms that optimize the area under the ROC curve are not guaranteed to optimize the area under the PR curve.