Big data, big metadata and quantitative study of science: A workflow model for big scientometrics

Sarah Bratt, Jeff Hemsley, Jian Qin, Mark Costa

Research output: Contribution to journalArticlepeer-review

12 Scopus citations


Large cyberinfrastructure-enabled data repositories generate massive amounts of metadata, enabling big data analytics to leverage on the intersection of technological and methodological advances in data science for the quantitative study of science. This paper introduces a definition of big metadata in the context of scientific data repositories and discusses the challenges in big metadata analytics due to the messiness, lack of structures suitable for analytics and heterogeneity in such big metadata. A methodological framework is proposed, which contains conceptual and computational workflows intercepting through collaborative documentation. The workflow-based methodological framework promotes transparency and contributes to research reproducibility. The paper also describes the experience and lessons learned from a four-year big metadata project involving all aspects of the workflow-based methodologies. The methodological framework presented in this paper is a timely contribution to the field of scientometrics and the science of science and policy as the potential value of big metadata is drawing more attention from research and policy maker communities.

Original languageEnglish (US)
Pages (from-to)36-45
Number of pages10
JournalProceedings of the Association for Information Science and Technology
Issue number1
StatePublished - Jan 2017


  • big metadata analytics
  • methodology
  • scientometrics
  • workflows

ASJC Scopus subject areas

  • General Computer Science
  • Library and Information Sciences


Dive into the research topics of 'Big data, big metadata and quantitative study of science: A workflow model for big scientometrics'. Together they form a unique fingerprint.

Cite this