Developmental validation of PACE™: Automated artifact identification and contributor estimation for use with GlobalFiler™ and PowerPlex® fusion 6c generated data

Michael Marciano, Jonathan D. Adelman

Research output: Contribution to journalArticle

Abstract

DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.

Original languageEnglish (US)
Article number102140
JournalForensic Science International: Genetics
Volume43
DOIs
StatePublished - Nov 1 2019

Fingerprint

Artifacts
DNA
Alleles
Noise
Incidence

Keywords

  • artifact identification
  • complex interpretation
  • DNA mixture
  • machine learning
  • number of contributors
  • random forest

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Genetics

Cite this

@article{4a7b8086b3604a9dbe2f9d1ebbad8490,
title = "Developmental validation of PACE™: Automated artifact identification and contributor estimation for use with GlobalFiler™ and PowerPlex{\circledR} fusion 6c generated data",
abstract = "DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5{\%}. The systems yield the correct n classifications in over 90{\%} of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.",
keywords = "artifact identification, complex interpretation, DNA mixture, machine learning, number of contributors, random forest",
author = "Michael Marciano and Adelman, {Jonathan D.}",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.fsigen.2019.102140",
language = "English (US)",
volume = "43",
journal = "Forensic Science International: Genetics",
issn = "1872-4973",
publisher = "Elsevier",

}

TY - JOUR

T1 - Developmental validation of PACE™

T2 - Automated artifact identification and contributor estimation for use with GlobalFiler™ and PowerPlex® fusion 6c generated data

AU - Marciano, Michael

AU - Adelman, Jonathan D.

PY - 2019/11/1

Y1 - 2019/11/1

N2 - DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.

AB - DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.

KW - artifact identification

KW - complex interpretation

KW - DNA mixture

KW - machine learning

KW - number of contributors

KW - random forest

UR - http://www.scopus.com/inward/record.url?scp=85072195379&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85072195379&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2019.102140

DO - 10.1016/j.fsigen.2019.102140

M3 - Article

AN - SCOPUS:85072195379

VL - 43

JO - Forensic Science International: Genetics

JF - Forensic Science International: Genetics

SN - 1872-4973

M1 - 102140

ER -