A motif extraction algorithm based on hashing and modulo-4 arithmetic.

Huitao Sheng, Kishan Mehrotra, Chilukuri K Mohan, Ramesh Raina

Research output: Contribution to journalArticle

Abstract

We develop an algorithm to identify cis-elements in promoter regions of coregulated genes. This algorithm searches for subsequences of desired length whose frequency of occurrence is relatively high, while accounting for slightly perturbed variants using hash table and modulo arithmetic. Motifs are evaluated using profile matrices and higher-order Markov background model. Simulation results show that our algorithm discovers more motifs present in the test sequences, when compared with two well-known motif-discovery tools (MDScan and AlignACE). The algorithm produces very promising results on real data set; the output of the algorithm contained many known motifs.

Original languageEnglish (US)
Pages (from-to)185-199
Number of pages15
JournalInternational Journal of Computational Biology and Drug Design
Volume1
Issue number2
DOIs
StatePublished - 2008

Fingerprint

Genetic Promoter Regions
Genes
Datasets

ASJC Scopus subject areas

  • Computer Science Applications
  • Drug Discovery

Cite this

A motif extraction algorithm based on hashing and modulo-4 arithmetic. / Sheng, Huitao; Mehrotra, Kishan; Mohan, Chilukuri K; Raina, Ramesh.

In: International Journal of Computational Biology and Drug Design, Vol. 1, No. 2, 2008, p. 185-199.

Research output: Contribution to journalArticle

@article{46670a383fe94040bc05e32e32874880,
title = "A motif extraction algorithm based on hashing and modulo-4 arithmetic.",
abstract = "We develop an algorithm to identify cis-elements in promoter regions of coregulated genes. This algorithm searches for subsequences of desired length whose frequency of occurrence is relatively high, while accounting for slightly perturbed variants using hash table and modulo arithmetic. Motifs are evaluated using profile matrices and higher-order Markov background model. Simulation results show that our algorithm discovers more motifs present in the test sequences, when compared with two well-known motif-discovery tools (MDScan and AlignACE). The algorithm produces very promising results on real data set; the output of the algorithm contained many known motifs.",
author = "Huitao Sheng and Kishan Mehrotra and Mohan, {Chilukuri K} and Ramesh Raina",
year = "2008",
doi = "10.1504/IJCBDD.2008.020209",
language = "English (US)",
volume = "1",
pages = "185--199",
journal = "International Journal of Computational Biology and Drug Design",
issn = "1756-0756",
publisher = "Inderscience Enterprises Ltd",
number = "2",

}

TY - JOUR

T1 - A motif extraction algorithm based on hashing and modulo-4 arithmetic.

AU - Sheng, Huitao

AU - Mehrotra, Kishan

AU - Mohan, Chilukuri K

AU - Raina, Ramesh

PY - 2008

Y1 - 2008

N2 - We develop an algorithm to identify cis-elements in promoter regions of coregulated genes. This algorithm searches for subsequences of desired length whose frequency of occurrence is relatively high, while accounting for slightly perturbed variants using hash table and modulo arithmetic. Motifs are evaluated using profile matrices and higher-order Markov background model. Simulation results show that our algorithm discovers more motifs present in the test sequences, when compared with two well-known motif-discovery tools (MDScan and AlignACE). The algorithm produces very promising results on real data set; the output of the algorithm contained many known motifs.

AB - We develop an algorithm to identify cis-elements in promoter regions of coregulated genes. This algorithm searches for subsequences of desired length whose frequency of occurrence is relatively high, while accounting for slightly perturbed variants using hash table and modulo arithmetic. Motifs are evaluated using profile matrices and higher-order Markov background model. Simulation results show that our algorithm discovers more motifs present in the test sequences, when compared with two well-known motif-discovery tools (MDScan and AlignACE). The algorithm produces very promising results on real data set; the output of the algorithm contained many known motifs.

UR - http://www.scopus.com/inward/record.url?scp=75649107615&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=75649107615&partnerID=8YFLogxK

U2 - 10.1504/IJCBDD.2008.020209

DO - 10.1504/IJCBDD.2008.020209

M3 - Article

C2 - 20058489

AN - SCOPUS:75649107615

VL - 1

SP - 185

EP - 199

JO - International Journal of Computational Biology and Drug Design

JF - International Journal of Computational Biology and Drug Design

SN - 1756-0756

IS - 2

ER -