TY - JOUR
T1 - Quantifying the impact of biobanks and cohort studies
AU - Dorantes-Gilardi, Rodrigo
AU - Ivey, Kerry L.
AU - Costa, Lauren
AU - Matty, Rachael
AU - Cho, Kelly
AU - Gaziano, John Michael
AU - Barabási, Albert László
N1 - Publisher Copyright:
Copyright © 2025 the Author(s).
PY - 2025/4/22
Y1 - 2025/4/22
N2 - Biobanks advance biomedical and clinical research by collecting and offering data and biological samples for numerous studies. However, the impact of these repositories varies greatly due to differences in their purpose, scope, governance, and data collected. Here, we computationally identified 2,663 biobanks and their textual mentions in 228,761 scientific articles, 16,210 grants, 15,469 patents, 1,769 clinical trials, and 9,468 public policy documents, helping characterize the academic communities that utilize and support them. We found a strong concentration of biobank-related research on a few diseases, including obesity, Alzheimer’s disease, breast cancer, and diabetes. Moreover, collaboration, rather than citation count, shapes the community’s recognition of a biobank. We show that, on average, 41.1% of articles fail to reference any of the biobank’s reference papers, but 59.6% include a biobank member as a coauthor. Using a generalized linear model, we identified the key factors that contribute to the impact of a biobank, finding that an impactful biobank tends to be more open to external researchers and that quality data—especially linked medical records—as opposed to large data, correlates with a higher impact in science, innovation, and disease. The collected data and findings are accessible through an open-access web application intended to inform strategies to expand access and maximize the value of these resources.
AB - Biobanks advance biomedical and clinical research by collecting and offering data and biological samples for numerous studies. However, the impact of these repositories varies greatly due to differences in their purpose, scope, governance, and data collected. Here, we computationally identified 2,663 biobanks and their textual mentions in 228,761 scientific articles, 16,210 grants, 15,469 patents, 1,769 clinical trials, and 9,468 public policy documents, helping characterize the academic communities that utilize and support them. We found a strong concentration of biobank-related research on a few diseases, including obesity, Alzheimer’s disease, breast cancer, and diabetes. Moreover, collaboration, rather than citation count, shapes the community’s recognition of a biobank. We show that, on average, 41.1% of articles fail to reference any of the biobank’s reference papers, but 59.6% include a biobank member as a coauthor. Using a generalized linear model, we identified the key factors that contribute to the impact of a biobank, finding that an impactful biobank tends to be more open to external researchers and that quality data—especially linked medical records—as opposed to large data, correlates with a higher impact in science, innovation, and disease. The collected data and findings are accessible through an open-access web application intended to inform strategies to expand access and maximize the value of these resources.
KW - biobanks
KW - hidden citations
KW - research impact
KW - science of science
UR - http://www.scopus.com/inward/record.url?scp=105003496833&partnerID=8YFLogxK
U2 - 10.1073/pnas.2427157122
DO - 10.1073/pnas.2427157122
M3 - Article
AN - SCOPUS:105003496833
SN - 0027-8424
VL - 122
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 16
M1 - e2427157122
ER -