Data partition methodology for validation of predictive models

by Rebecca E. Morrison,, Corey M. Bryant, Gabriel Terejanu, Serge Prudhomme
Refereed Journals Year: 2013


Rebecca E. Morrison, Corey M. Bryant, Gabriel Terejanu, Serge Prudhomme, Kenji Miki. Data partition methodology for validation of predictive models. Computers & Mathematics with Applications, Volume 66, Issue 10, December 2013, Pages 2114–2125.


In many cases, model validation requires that legacy data be partitioned into calibration and validation sets, but how to do so is a nontrivial and open area of research. We present a systematic procedure to partition the data, adapted from cross-validation and in the context of predictive modeling. By considering all possible partitions, we proceed with post-processing steps to find the optimal partition of the data subject to given constraints. We are concerned here with mathematical models of physical systems whose predictions of a given unobservable quantity of interest are the basis for critical decisions. Thus, the proposed approach addresses two critical issues: (1) that the model be evaluated with respect to its ability to reproduce the data and (2) that the model be highly challenged by the validation set with respect to predictions of the quantity of interest. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler’s and decision-maker’s requirements. The framework is general and may be applied to a wide range of problems. It is illustrated here through an example using generated experiments of a nonlinear one degree-of-freedom oscillator.



Cross-validation Calibration Parameter Estimation inverse problems Quantity of interest