Statistics of risk: a novel framework for predictive models of binary events

by Michael Mackay

16:00 (40 min) in USB G.003

The need to predict the outcome of binary events is ubiquitous across domains and a huge amount of work has been done to create models which can do this. It is often useful to represent this prediction as a risk of a given outcome occurring. I will present a novel way to appraise the performance of such predictive models, allowing improved estimation of decision thresholds as well as the ability to perform sample size calculations required to detect predictive ability in a dataset.

Any binary outcome is influenced by k variables. Knowledge of all these variables allows perfect predictive ability; knowledge of none only allows the risk to be predicted as the mean occurrence. We can show that risk is an inherent property of a binary event, defined by the frequency of an event, and the amount of information you have to inform your prediction. I will show that from this starting point it is possible to simulate the population-level output of any predictive model of binary events. This can be used to analyse the output of any existing model to derive a true R2 value for discriminatory performance and to choose an optimal threshold for classification and testing.