TY - JOUR
T1 - Quality assessment parameters for EST-derived SNPs from catfish
AU - Wang, Shaolin
AU - Sha, Zhenxia
AU - Sonstegard, Tad S.
AU - Liu, Hong
AU - Xu, Peng
AU - Somridhivej, Benjaporn
AU - Peatman, Eric
AU - Kucuktas, Huseyin
AU - Liu, Zhanjiang
N1 - Funding Information:
This project was supported by a grant from USDA NRI Animal Genome Basic Genome Reagents and Tools Program (USDA/NRICGP award # 2006-35616-16685), and partially by an AAES Ag Initiatives grant.
PY - 2008/9/30
Y1 - 2008/9/30
N2 - Background: SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs. Results: wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding. Conclusion: Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.
AB - Background: SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs. Results: wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding. Conclusion: Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.
UR - http://www.scopus.com/inward/record.url?scp=54349120289&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=54349120289&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-9-450
DO - 10.1186/1471-2164-9-450
M3 - Article
C2 - 18826589
AN - SCOPUS:54349120289
SN - 1471-2164
VL - 9
JO - BMC Genomics
JF - BMC Genomics
M1 - 450
ER -