TY - JOUR
T1 - Assessment of model fit via network comparison methods based on subgraph counts
AU - Ospina-Forero, Luis
AU - Deane, Charlotte M.
AU - Reinert, Gesine
AU - Peixoto, Tiago
N1 - Publisher Copyright:
© 2018 The authors. Published by Oxford University Press. All rights reserved.
PY - 2019/4/1
Y1 - 2019/4/1
N2 - While the number of network comparison methods is increasing, benchmarking of these methods is still in its infancy. The lack of understanding of complex dependencies among network characteristics makes it difficult to fully understand the meaning of the different network comparison methodologies and the relations between them. In this article, we use a Monte Carlo framework as a way to address three general questions about the network comparison methods based on subgraph counts: (1) Can the methods differentiate between networks generated from different network generation mechanisms? (2) Are the number of nodes or average degree, confounding factors for the comparison of networks? (3) Do all methods reach the same conclusions? We further use the Monte Carlo framework to test the fit of ER, Chung-Lu and a duplication-divergence model to the protein-protein interaction (PPI) networks of Yeast, Fly, Worm, Human, Escherichia Coli, five herpes virus networks and five social networks. In contrast to previous claims in the literature, we show that the large PPI networks are not well modelled by the Chung-Lu model according to any of our tested methods. We find that network comparison statistics are not completely invariant to changes in the number of nodes and edges. Some methods focus on fine grain similarities, such as graphlet correlation distance, while other methods such as Netdis, can capture the similarities of networks despite them having different numbers of nodes and edges.
AB - While the number of network comparison methods is increasing, benchmarking of these methods is still in its infancy. The lack of understanding of complex dependencies among network characteristics makes it difficult to fully understand the meaning of the different network comparison methodologies and the relations between them. In this article, we use a Monte Carlo framework as a way to address three general questions about the network comparison methods based on subgraph counts: (1) Can the methods differentiate between networks generated from different network generation mechanisms? (2) Are the number of nodes or average degree, confounding factors for the comparison of networks? (3) Do all methods reach the same conclusions? We further use the Monte Carlo framework to test the fit of ER, Chung-Lu and a duplication-divergence model to the protein-protein interaction (PPI) networks of Yeast, Fly, Worm, Human, Escherichia Coli, five herpes virus networks and five social networks. In contrast to previous claims in the literature, we show that the large PPI networks are not well modelled by the Chung-Lu model according to any of our tested methods. We find that network comparison statistics are not completely invariant to changes in the number of nodes and edges. Some methods focus on fine grain similarities, such as graphlet correlation distance, while other methods such as Netdis, can capture the similarities of networks despite them having different numbers of nodes and edges.
KW - model fit
KW - network comparison
KW - subgraph counts
UR - http://www.scopus.com/inward/record.url?scp=85071072836&partnerID=8YFLogxK
U2 - 10.1093/comnet/cny017
DO - 10.1093/comnet/cny017
M3 - Article
AN - SCOPUS:85071072836
SN - 2051-1310
VL - 7
SP - 226
EP - 253
JO - Journal of Complex Networks
JF - Journal of Complex Networks
IS - 2
M1 - cny017
ER -