Modeling promoter grammars with evolving hidden Markov models

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Modeling promoter grammars with evolving hidden Markov models. / Won, Kyoung-Jae; Sandelin, Albin; Marstrand, Troels Torben; Krogh, Anders.

I: Bioinformatics, Bind 24, Nr. 15, 2008, s. 1669-75.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Won, K-J, Sandelin, A, Marstrand, TT & Krogh, A 2008, 'Modeling promoter grammars with evolving hidden Markov models', Bioinformatics, bind 24, nr. 15, s. 1669-75. https://doi.org/10.1093/bioinformatics/btn254

APA

Won, K-J., Sandelin, A., Marstrand, T. T., & Krogh, A. (2008). Modeling promoter grammars with evolving hidden Markov models. Bioinformatics, 24(15), 1669-75. https://doi.org/10.1093/bioinformatics/btn254

Vancouver

Won K-J, Sandelin A, Marstrand TT, Krogh A. Modeling promoter grammars with evolving hidden Markov models. Bioinformatics. 2008;24(15):1669-75. https://doi.org/10.1093/bioinformatics/btn254

Author

Won, Kyoung-Jae ; Sandelin, Albin ; Marstrand, Troels Torben ; Krogh, Anders. / Modeling promoter grammars with evolving hidden Markov models. I: Bioinformatics. 2008 ; Bind 24, Nr. 15. s. 1669-75.

Bibtex

@article{bbc46670c79811dd9473000ea68e967b,
title = "Modeling promoter grammars with evolving hidden Markov models",
abstract = "MOTIVATION: Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS: With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. AVAILABILITY: The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz",
author = "Kyoung-Jae Won and Albin Sandelin and Marstrand, {Troels Torben} and Anders Krogh",
note = "Keywords: Base Sequence; Binding Sites; Computer Simulation; Markov Chains; Models, Genetic; Models, Statistical; Molecular Sequence Data; Promoter Regions (Genetics); Protein Binding; Semantics; Sequence Analysis, DNA; Transcription Factors",
year = "2008",
doi = "10.1093/bioinformatics/btn254",
language = "English",
volume = "24",
pages = "1669--75",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "15",

}

RIS

TY - JOUR

T1 - Modeling promoter grammars with evolving hidden Markov models

AU - Won, Kyoung-Jae

AU - Sandelin, Albin

AU - Marstrand, Troels Torben

AU - Krogh, Anders

N1 - Keywords: Base Sequence; Binding Sites; Computer Simulation; Markov Chains; Models, Genetic; Models, Statistical; Molecular Sequence Data; Promoter Regions (Genetics); Protein Binding; Semantics; Sequence Analysis, DNA; Transcription Factors

PY - 2008

Y1 - 2008

N2 - MOTIVATION: Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS: With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. AVAILABILITY: The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz

AB - MOTIVATION: Describing and modeling biological features of eukaryotic promoters remains an important and challenging problem within computational biology. The promoters of higher eukaryotes in particular display a wide variation in regulatory features, which are difficult to model. Often several factors are involved in the regulation of a set of co-regulated genes. If so, promoters can be modeled with connected regulatory features, where the network of connections is characteristic for a particular mode of regulation. RESULTS: With the goal of automatically deciphering such regulatory structures, we present a method that iteratively evolves an ensemble of regulatory grammars using a hidden Markov Model (HMM) architecture composed of interconnected blocks representing transcription factor binding sites (TFBSs) and background regions of promoter sequences. The ensemble approach reduces the risk of overfitting and generally improves performance. We apply this method to identify TFBSs and to classify promoters preferentially expressed in macrophages, where it outperforms other methods due to the increased predictive power given by the grammar. AVAILABILITY: The software and the datasets are available from http://modem.ucsd.edu/won/eHMM.tar.gz

U2 - 10.1093/bioinformatics/btn254

DO - 10.1093/bioinformatics/btn254

M3 - Journal article

C2 - 18535083

VL - 24

SP - 1669

EP - 1675

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 15

ER -

ID: 9068135