Mapping short DNA sequencing reads and calling variants using mapping quality scores
- PMID: 18714091
- PMCID: PMC2577856
- DOI: 10.1101/gr.078212.108
Mapping short DNA sequencing reads and calling variants using mapping quality scores
Abstract
New sequencing technologies promise a new era in the use of DNA sequence. However, some of these technologies produce very short reads, typically of a few tens of base pairs, and to use these reads effectively requires new algorithms and software. In particular, there is a major issue in efficiently aligning short reads to a reference genome and handling ambiguity or lack of accuracy in this alignment. Here we introduce the concept of mapping quality, a measure of the confidence that a read actually comes from the position it is aligned to by the mapping algorithm. We describe the software MAQ that can build assemblies by mapping shotgun short reads to a reference genome, using quality scores to derive genotype calls of the consensus sequence of a diploid genome, e.g., from a human sample. MAQ makes full use of mate-pair information and estimates the error probability of each read alignment. Error probabilities are also derived for the final genotype calls, using a Bayesian statistical model that incorporates the mapping qualities, error probabilities from the raw sequence quality scores, sampling of the two haplotypes, and an empirical model for correlated errors at a site. Both read mapping and genotype calling are evaluated on simulated data and real data. MAQ is accurate, efficient, versatile, and user-friendly. It is freely available at http://maq.sourceforge.net.
Figures
Similar articles
-
Fast and accurate short read alignment with Burrows-Wheeler transform.Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18. Bioinformatics. 2009. PMID: 19451168 Free PMC article.
-
Fast and SNP-aware short read alignment with SALT.BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6. BMC Bioinformatics. 2021. PMID: 34433415 Free PMC article.
-
ComB: SNP calling and mapping analysis for color and nucleotide space platforms.J Comput Biol. 2011 Jun;18(6):795-807. doi: 10.1089/cmb.2011.0027. Epub 2011 May 12. J Comput Biol. 2011. PMID: 21563978 Free PMC article.
-
Model-based quality assessment and base-calling for second-generation sequencing data.Biometrics. 2010 Sep;66(3):665-74. doi: 10.1111/j.1541-0420.2009.01353.x. Biometrics. 2010. PMID: 19912177 Free PMC article. Review.
-
A survey of sequence alignment algorithms for next-generation sequencing.Brief Bioinform. 2010 Sep;11(5):473-83. doi: 10.1093/bib/bbq015. Epub 2010 May 11. Brief Bioinform. 2010. PMID: 20460430 Free PMC article. Review.
Cited by
-
Single-cell omics: experimental workflow, data analyses and applications.Sci China Life Sci. 2024 Jul 23. doi: 10.1007/s11427-023-2561-0. Online ahead of print. Sci China Life Sci. 2024. PMID: 39060615 Review.
-
The performance of homopolymer detection using dichromatic and tetrachromatic fluorogenic next-generation sequencing platforms.BMC Genomics. 2024 May 31;25(1):542. doi: 10.1186/s12864-024-10474-0. BMC Genomics. 2024. PMID: 38822237 Free PMC article.
-
RNA-sequencing exploration on SIR2 and SOD genes in Polyalthia longifolia leaf methanolic extracts (PLME) mediated anti-aging effects in Saccharomyces cerevisiae BY611 yeast cells.Biogerontology. 2024 Aug;25(4):705-737. doi: 10.1007/s10522-024-10104-y. Epub 2024 Apr 15. Biogerontology. 2024. PMID: 38619670
-
Investigating the potential roles of intra-colonial genetic variability in Pocillopora corals using genomics.Sci Rep. 2024 Mar 18;14(1):6437. doi: 10.1038/s41598-024-57136-5. Sci Rep. 2024. PMID: 38499737 Free PMC article.
-
Whole genome sequencing identifies novel mutations in malaria parasites resistant to artesunate (ATN) and to ATN + mefloquine combination.Front Cell Infect Microbiol. 2024 Mar 1;14:1353057. doi: 10.3389/fcimb.2024.1353057. eCollection 2024. Front Cell Infect Microbiol. 2024. PMID: 38495651 Free PMC article.
References
-
- Altshuler D., Pollara V.J., Cowles C.R., Van Etten W.J., Baldwin J., Linton L., Lander E.S. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
-
- Barski A., Cuddapah S., Cui K., Roh T.-Y., Schones D.E., Wang Z., Wei G., Chepelev I., Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
-
- Bentley D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 2006;16:545–552. - PubMed
-
- Buhler J. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics. 2001;17:419–428. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources