TY - JOUR
T1 - Generation of Physical Map Contig-Specific Sequences Useful for Whole Genome Sequence Scaffolding
AU - Jiang, Yanliang
AU - Ninwichian, Parichart
AU - Liu, Shikai
AU - Zhang, Jiaren
AU - Kucuktas, Huseyin
AU - Sun, Fanyue
AU - Kaltenboeck, Ludmilla
AU - Sun, Luyang
AU - Bao, Lisui
AU - Liu, Zhanjiang
PY - 2013/10/24
Y1 - 2013/10/24
N2 - Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.
AB - Along with the rapid advances of the nextgen sequencing technologies, more and more species are added to the list of organisms whose whole genomes are sequenced. However, the assembled draft genome of many organisms consists of numerous small contigs, due to the short length of the reads generated by nextgen sequencing platforms. In order to improve the assembly and bring the genome contigs together, more genome resources are needed. In this study, we developed a strategy to generate a valuable genome resource, physical map contig-specific sequences, which are randomly distributed genome sequences in each physical contig. Two-dimensional tagging method was used to create specific tags for 1,824 physical contigs, in which the cost was dramatically reduced. A total of 94,111,841 100-bp reads and 315,277 assembled contigs are identified containing physical map contig-specific tags. The physical map contig-specific sequences along with the currently available BAC end sequences were then used to anchor the catfish draft genome contigs. A total of 156,457 genome contigs (~79% of whole genome sequencing assembly) were anchored and grouped into 1,824 pools, in which 16,680 unique genes were annotated. The physical map contig-specific sequences are valuable resources to link physical map, genetic linkage map and draft whole genome sequences, consequently have the capability to improve the whole genome sequences assembly and scaffolding, and improve the genome-wide comparative analysis as well. The strategy developed in this study could also be adopted in other species whose whole genome assembly is still facing a challenge.
UR - http://www.scopus.com/inward/record.url?scp=84886257653&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84886257653&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0078872
DO - 10.1371/journal.pone.0078872
M3 - Article
C2 - 24205335
AN - SCOPUS:84886257653
SN - 1932-6203
VL - 8
JO - PloS one
JF - PloS one
IS - 10
M1 - e78872
ER -