Data to knowledge in action: A longitudinal analysis of GenBank metadata

Jeff Hemsley, Jian Qin, Sarah E. Bratt

Research output: Contribution to journalArticlepeer-review

5 Scopus citations


Studies typically use publication-based authorship data to study the relationships between collaboration networks and knowledge diffusion. However, collaboration in research often starts long before publication with data production efforts. In this project we ask how collaboration in data production networks affects and contributes to knowledge diffusion, as represented by patents, another form of knowledge diffusion. We drew our data from the metadata associated with genetic sequence records stored in the National Institutes of Health's GenBank database. After constructing networks for each year and aggregating summary statistics, regressions were used to test several hypotheses. Key among our findings is that data production team size is positively related to the number of patents each year. Also, when actors on average have more links, we tend to see more patents. Our study contributes in the area of science of science by highlighting the important role of data production in the diffusion of knowledge as measured by patents.

Original languageEnglish (US)
Article numbere253
JournalProceedings of the Association for Information Science and Technology
Issue number1
StatePublished - 2020


  • collaboration networks
  • data authors
  • knowledge diffusion
  • metadata analytics
  • scientometric measures

ASJC Scopus subject areas

  • General Computer Science
  • Library and Information Sciences


Dive into the research topics of 'Data to knowledge in action: A longitudinal analysis of GenBank metadata'. Together they form a unique fingerprint.

Cite this