TY - JOUR
T1 - Developmental validation of PACE™
T2 - Automated artifact identification and contributor estimation for use with GlobalFiler™ and PowerPlex® fusion 6c generated data
AU - Marciano, Michael A.
AU - Adelman, Jonathan D.
N1 - Publisher Copyright:
© 2019 The Authors
PY - 2019/11
Y1 - 2019/11
N2 - DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.
AB - DNA mixture interpretation remains one of the major challenges in forensic DNA analysis. DNA mixture samples are inherently complex due to several factors including the variations in the quantity of DNA, the presence of non-allelic artifactual peaks and the presence of multiple contributors with variable levels of allele sharing. The Probabilistic Assessment for Contributor Estimation (PACE) is a fully continuous probabilistic machine learning-based method to predict the number of contributors (n) in a sample, and was previously developed for use with the Identifiler amplification kit. This system required manual preprocessing of data and was limited, exclusively, to samples amplified using said kit. This study introduces PACE™ v1.3.7 for use with both the GlobalFiler and PowerPlex Fusion 6c amplification kits. An automated artifact identification and management system has been added to accompany the rapid estimation of the number of donors in a given mixture. The artifact management module, when evaluated using previously unseen data, identified true allelic peaks and removed artifacts such as elevated baseline noise, stutter, and pull-up with accuracy over 93.5%. The systems yield the correct n classifications in over 90% of the samples, and demonstrate consistent accuracies as the number of donors and the overall mixture complexity increase. Misclassified samples generally exhibited high levels of allele sharing among donors, low DNA template amounts and high incidence of allelic dropout. This system offers a means for both artifact management and n estimation as well as a quantitative and reproducible method of assessing the quality of a profile.
KW - DNA mixture
KW - artifact identification
KW - complex interpretation
KW - machine learning
KW - number of contributors
KW - random forest
UR - http://www.scopus.com/inward/record.url?scp=85072195379&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072195379&partnerID=8YFLogxK
U2 - 10.1016/j.fsigen.2019.102140
DO - 10.1016/j.fsigen.2019.102140
M3 - Article
C2 - 31536876
AN - SCOPUS:85072195379
SN - 1872-4973
VL - 43
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
M1 - 102140
ER -