GeneMark.hmm: new solutions for gene finding
- PMID: 9461475
- PMCID: PMC147337
- DOI: 10.1093/nar/26.4.1107
GeneMark.hmm: new solutions for gene finding
Abstract
The number of completely sequenced bacterial genomes has been growing fast. There are computer methods available for finding genes but yet there is a need for more accurate algorithms. The GeneMark. hmm algorithm presented here was designed to improve the gene prediction quality in terms of finding exact gene boundaries. The idea was to embed the GeneMark models into naturally derived hidden Markov model framework with gene boundaries modeled as transitions between hidden states. We also used the specially derived ribosome binding site pattern to refine predictions of translation initiation codons. The algorithm was evaluated on several test sets including 10 complete bacterial genomes. It was shown that the new algorithm is significantly more accurate than GeneMark in exact gene prediction. Interestingly, the high gene finding accuracy was observed even in the case when Markov models of order zero, one and two were used. We present the analysis of false positive and false negative predictions with the caution that these categories are not precisely defined if the public database annotation is used as a control.
Similar articles
-
Probabilistic methods of identifying genes in prokaryotic genomes: connections to the HMM theory.Brief Bioinform. 2004 Jun;5(2):118-30. doi: 10.1093/bib/5.2.118. Brief Bioinform. 2004. PMID: 15260893
-
Finding prokaryotic genes by the 'frame-by-frame' algorithm: targeting gene starts and overlapping genes.Bioinformatics. 1999 Nov;15(11):874-86. doi: 10.1093/bioinformatics/15.11.874. Bioinformatics. 1999. PMID: 10743554
-
How to interpret an anonymous bacterial genome: machine learning approach to gene identification.Genome Res. 1998 Nov;8(11):1154-71. doi: 10.1101/gr.8.11.1154. Genome Res. 1998. PMID: 9847079
-
An Experimental Approach to Genome Annotation: This report is based on a colloquium sponsored by the American Academy of Microbiology held July 19-20, 2004, in Washington, DC.Washington (DC): American Society for Microbiology; 2004. Washington (DC): American Society for Microbiology; 2004. PMID: 33001599 Free Books & Documents. Review.
-
Comparative analysis of regulatory patterns in bacterial genomes.Brief Bioinform. 2000 Nov;1(4):357-71. doi: 10.1093/bib/1.4.357. Brief Bioinform. 2000. PMID: 11465053 Review.
Cited by
-
EMPathways2: Estimation of Enzyme Expression and Metabolic Pathway Activity Using RNA-Seq Reads.Methods Mol Biol. 2024;2812:39-46. doi: 10.1007/978-1-0716-3886-6_3. Methods Mol Biol. 2024. PMID: 39068356
-
Chromosome-level genome sequencing and multi-omics of the Hungarian White Goose (Anser anser domesticus) reveals novel miRNA-mRNA regulation mechanism of waterfowl feather follicle development.Poult Sci. 2024 Jun 12;103(9):103933. doi: 10.1016/j.psj.2024.103933. Online ahead of print. Poult Sci. 2024. PMID: 38943801 Free PMC article.
-
Cicer super-pangenome provides insights into species evolution and agronomic trait loci for crop improvement in chickpea.Nat Genet. 2024 Jun;56(6):1225-1234. doi: 10.1038/s41588-024-01760-4. Epub 2024 May 23. Nat Genet. 2024. PMID: 38783120
-
Genomic characterisation and ecological distribution of Mantoniella tinhauana: a novel Mamiellophycean green alga from the Western Pacific.Front Microbiol. 2024 May 7;15:1358574. doi: 10.3389/fmicb.2024.1358574. eCollection 2024. Front Microbiol. 2024. PMID: 38774501 Free PMC article.
-
Nucleus-forming jumbophage PhiKZ therapeutically outcompetes non-nucleus-forming jumbophage Callisto.iScience. 2024 Apr 18;27(5):109790. doi: 10.1016/j.isci.2024.109790. eCollection 2024 May 17. iScience. 2024. PMID: 38726363 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources