A fast sorting algorithm for aptamer identification using deep sequencing

Yiou Xiao, Kishan G. Mehrotra, Damian G. Allis, Phillip N. Borer

Research output: Chapter in Book/Entry/PoemConference contribution

Abstract

In recent years, with the advent of fast sequencing technology, the genomic database is growing rapidly. Researchers in the bioinformatics field are expecting faster and more accurate tools to effectively analyze the gigantic data sets. In the context of aptamer search, the goal is to search for the over-represented DNA sequences from the randomly generated aptamer libraries. Hash functions are widely used in substring comparison, sequence alignment and clustering tools. We have developed a light-weight tool that takes advantage of the hash functions to reduce the size of genomic data and conducts η-neighbor searches on the centroid sequence. This greatly improves the efficiency of the search compared with existing tools. Furthermore, the prior calculation of hash values of η-neighbors decreases the searching overhead. In a dataset of 2.23 million sequences, the proposed algorithm accurately count the frequency of the Human α-Thrombin aptamer sequences in less than 40 seconds, whereas the current script-based method takes 2 hours and 18 minutes.

Original languageEnglish (US)
Title of host publicationASONAM 2014 - Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
EditorsXindong Wu, Xindong Wu, Martin Ester, Guandong Xu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages759-763
Number of pages5
ISBN (Electronic)9781479958771
DOIs
StatePublished - Oct 10 2014
Event2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014 - Beijing, China
Duration: Aug 17 2014Aug 20 2014

Publication series

NameASONAM 2014 - Proceedings of the 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Other

Other2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2014
Country/TerritoryChina
CityBeijing
Period8/17/148/20/14

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A fast sorting algorithm for aptamer identification using deep sequencing'. Together they form a unique fingerprint.

Cite this