A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning

Michael Marciano, Victoria R. Williamson, Jonathan D. Adelman

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2%, an 11.4% increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79%). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.

Original languageEnglish (US)
Pages (from-to)26-37
Number of pages12
JournalForensic Science International: Genetics
Volume35
DOIs
StatePublished - Jul 1 2018

Fingerprint

Artifacts
Noise
Loss of Heterozygosity
Capillary Electrophoresis
Coloring Agents
Alleles
Costs and Cost Analysis
Injections
Machine Learning
Incidence

Keywords

  • Analytical threshold
  • Artifact removal
  • Baseline
  • Capillary electrophoresis
  • Deep learning
  • Forensic DNA
  • Machine learning
  • Random forest
  • Support vector machine

ASJC Scopus subject areas

  • Pathology and Forensic Medicine
  • Genetics

Cite this

A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning. / Marciano, Michael; Williamson, Victoria R.; Adelman, Jonathan D.

In: Forensic Science International: Genetics, Vol. 35, 01.07.2018, p. 26-37.

Research output: Contribution to journalArticle

@article{581dff718a98444a9e483ee7c07c675e,
title = "A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning",
abstract = "The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2{\%}, an 11.4{\%} increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79{\%}). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.",
keywords = "Analytical threshold, Artifact removal, Baseline, Capillary electrophoresis, Deep learning, Forensic DNA, Machine learning, Random forest, Support vector machine",
author = "Michael Marciano and Williamson, {Victoria R.} and Adelman, {Jonathan D.}",
year = "2018",
month = "7",
day = "1",
doi = "10.1016/j.fsigen.2018.03.017",
language = "English (US)",
volume = "35",
pages = "26--37",
journal = "Forensic Science International: Genetics",
issn = "1872-4973",
publisher = "Elsevier",

}

TY - JOUR

T1 - A hybrid approach to increase the informedness of CE-based data using locus-specific thresholding and machine learning

AU - Marciano, Michael

AU - Williamson, Victoria R.

AU - Adelman, Jonathan D.

PY - 2018/7/1

Y1 - 2018/7/1

N2 - The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2%, an 11.4% increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79%). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.

AB - The interpretation of genetic profiles require a robust and reliable method to discriminate true allelic information from noise, regardless of the instrumentation or methods used. Traditionally, static peak detection thresholds (analytical thresholds) have been applied to capillary electrophoresis generated data to distinguish the true allelic peaks from noise. While the rigid nature of these thresholds attempts to conservatively account for baseline variability across instrument runs, samples, capillaries, dye-channels, injection times, and voltage, its static nature is unable to adapt, leading to a loss of allelic information that exists below the threshold. The method described herein is able to account for this variability by collectively minimizing the incorrect detection of non-allelic artifacts (false positives) and the threshold-induced dropout of true allelic information (false negatives). This is accomplished by using a dynamic locus and sample specific analytical threshold and a machine learning-derived probabilistic artifact detection model. The system produced an allele detection accuracy of 97.2%, an 11.4% increase from the lowest static threshold (50 RFU), with a low incidence of incorrectly identified artifacts (0.79%). This adaptive method outperformed static thresholds in the retention of allelic information content at minimal cost.

KW - Analytical threshold

KW - Artifact removal

KW - Baseline

KW - Capillary electrophoresis

KW - Deep learning

KW - Forensic DNA

KW - Machine learning

KW - Random forest

KW - Support vector machine

UR - http://www.scopus.com/inward/record.url?scp=85044931692&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85044931692&partnerID=8YFLogxK

U2 - 10.1016/j.fsigen.2018.03.017

DO - 10.1016/j.fsigen.2018.03.017

M3 - Article

C2 - 29627762

AN - SCOPUS:85044931692

VL - 35

SP - 26

EP - 37

JO - Forensic Science International: Genetics

JF - Forensic Science International: Genetics

SN - 1872-4973

ER -