TY - JOUR
T1 - Consistencies and inconsistencies between model selection and link prediction in networks
AU - Vallès-Català, Toni
AU - Peixoto, Tiago P.
AU - Sales-Pardo, Marta
AU - Guimerà, Roger
N1 - Publisher Copyright:
© 2018 American Physical Society.
PY - 2018/6/28
Y1 - 2018/6/28
N2 - A principled approach to understand network structures is to formulate generative models. Given a collection of models, however, an outstanding key task is to determine which one provides a more accurate description of the network at hand, discounting statistical fluctuations. This problem can be approached using two principled criteria that at first may seem equivalent: selecting the most plausible model in terms of its posterior probability; or selecting the model with the highest predictive performance in terms of identifying missing links. Here we show that while these two approaches yield consistent results in most cases, there are also notable instances where they do not, that is, where the most plausible model is not the most predictive. We show that in the latter case the improvement of predictive performance can in fact lead to overfitting both in artificial and empirical settings. Furthermore, we show that, in general, the predictive performance is higher when we average over collections of models that are individually less plausible than when we consider only the single most plausible model.
AB - A principled approach to understand network structures is to formulate generative models. Given a collection of models, however, an outstanding key task is to determine which one provides a more accurate description of the network at hand, discounting statistical fluctuations. This problem can be approached using two principled criteria that at first may seem equivalent: selecting the most plausible model in terms of its posterior probability; or selecting the model with the highest predictive performance in terms of identifying missing links. Here we show that while these two approaches yield consistent results in most cases, there are also notable instances where they do not, that is, where the most plausible model is not the most predictive. We show that in the latter case the improvement of predictive performance can in fact lead to overfitting both in artificial and empirical settings. Furthermore, we show that, in general, the predictive performance is higher when we average over collections of models that are individually less plausible than when we consider only the single most plausible model.
UR - http://www.scopus.com/inward/record.url?scp=85049311255&partnerID=8YFLogxK
U2 - 10.1103/PhysRevE.97.062316
DO - 10.1103/PhysRevE.97.062316
M3 - Article
C2 - 30011606
AN - SCOPUS:85049311255
SN - 2470-0045
VL - 97
JO - Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics
JF - Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics
IS - 6
M1 - 062316
ER -