Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations

Daniel E. Acuna, Ziyue Xiang

Research output: Contribution to journalArticle


When there is a suspicious figure reuse case in science, research integrity investigators often find it difficult to rebut authors claiming that "it happened by chance". In other words, when there is a "collision" of image features, it is difficult to justify whether it appears rarely or not. In this article, we provide a method to predict the rarity of an image feature by statistically estimating the chance of it randomly occurring across all scientific imagery. Our method is based on high-dimensional density estimation of ORB features using 7+ million images in the PubMed Open Access Subset dataset. We show that this method can lead to meaningful feedback during research integrity investigations by providing a null hypothesis for scientific image reuse and thus a p-value during deliberations. We apply the model to a sample of increasingly complex imagery and confirm that it produces decreasingly smaller p-values as expected. We discuss applications to research integrity investigations as well as future work.
Original languageEnglish (US)
StatePublished - Feb 22 2020


  • cs.CV
  • cs.LG
  • stat.ML


Dive into the research topics of 'Estimating a Null Model of Scientific Image Reuse to Support Research Integrity Investigations'. Together they form a unique fingerprint.

Cite this