TY - JOUR
T1 - A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning
AU - Marciano, Michael A.
AU - Williamson, Victoria R.
AU - Adelman, Jonathan D.
N1 - Publisher Copyright:
© 2018 Elsevier B.V.
PY - 2018/7
Y1 - 2018/7
N2 - The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2%, an 11.4% increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79%). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.
AB - The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2%, an 11.4% increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79%). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.
KW - Analytical threshold
KW - Artifact removal
KW - Baseline
KW - Capillary electrophoresis
KW - Deep learning
KW - Forensic DNA
KW - Machine learning
KW - Random forest
KW - Support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85044931692&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85044931692&partnerID=8YFLogxK
U2 - 10.1016/j.fsigen.2018.03.017
DO - 10.1016/j.fsigen.2018.03.017
M3 - Article
C2 - 29627762
AN - SCOPUS:85044931692
SN - 1872-4973
VL - 35
SP - 26
EP - 37
JO - Forensic Science International: Genetics
JF - Forensic Science International: Genetics
ER -