Using the area under an estimated ROC curve to test the adequacy of binary predictors*

Robert P. Lieli, Yu Chin Hsu

Research output: Contribution to journalArticlepeer-review

Abstract (may include machine translation)

We consider using the area under an empirical receiver operating characteristic curve to test the hypothesis that a predictive index combined with a range of cutoffs performs no better than pure chance in forecasting a binary outcome. This corresponds to the null hypothesis that the area in question, denoted as AUC, is 1/2. We show that if the predictive index comes from a first-stage regression model estimated over the same data set, then testing the null based on the standard asymptotic normality results leads to severe size distortion in general settings. We then analytically derive the proper asymptotic null distribution of the empirical AUC in a special case; namely, when the first-stage regressors are Bernoulli random variables. This distribution can be utilised to construct a fully in-sample test of H0 : AUC = 1/2 with correct size and more power than out-of-sample tests based on sample splitting, though practical application becomes cumbersome with more than two regressors.

Original languageEnglish
Pages (from-to)100-130
Number of pages31
JournalJournal of Nonparametric Statistics
Volume31
Issue number1
DOIs
StatePublished - 2 Jan 2019

Keywords

  • Area under the ROC curve
  • binary classification
  • in-sample hypothesis testing
  • model evaluation
  • overfitting

Fingerprint

Dive into the research topics of 'Using the area under an estimated ROC curve to test the adequacy of binary predictors*'. Together they form a unique fingerprint.

Cite this