Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Jul;170(3):1411-21.
doi: 10.1534/genetics.104.035097. Epub 2005 May 6.

A composite-likelihood approach for detecting directional selection from DNA sequence data

Affiliations

A composite-likelihood approach for detecting directional selection from DNA sequence data

Lan Zhu et al. Genetics. 2005 Jul.

Abstract

We present a novel composite-likelihood-ratio test (CLRT) for detecting genes and genomic regions that are subject to recurrent natural selection (either positive or negative). The method uses the likelihood functions of Hartl et al. (1994) for inference in a Wright-Fisher genic selection model and corrects for nonindependence among sites by application of coalescent simulations with recombination. Here, we (1) characterize the distribution of the CLRT statistic (Lambda) as a function of the population recombination rate (R=4Ner); (2) explore the effects of bias in estimation of R on the size (type I error) of the CLRT; (3) explore the robustness of the model to population growth, bottlenecks, and migration; (4) explore the power of the CLRT under varying levels of mutation, selection, and recombination; (5) explore the discriminatory power of the test in distinguishing negative selection from population growth; and (6) evaluate the performance of maximum composite-likelihood estimation (MCLE) of the selection coefficient. We find that the test has excellent power to detect weak negative selection and moderate power to detect positive selection. Moreover, the test is quite robust to bias in the estimate of local recombination rate, but not to certain demographic scenarios such as population growth or a recent bottleneck. Last, we demonstrate that the MCLE of the selection parameter has little bias for weak negative selection and has downward bias for positively selected mutations.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Comparison of expected site-frequency spectra for three scenarios. “Neutral” is the expected SFS under the standard neutral model (see Hudson 1990). “Population structure” is the expected site-frequency spectrum for neutral mutations in a two-deme model with low symmetric migration rate (4Nm = 0.2) found via 1000 coalescent simulations using ms (Hudson 2002). “Selection” is the expected SFS under genic selection for the model described by Hartl et al. (1994). We use a value of 2Ns = 1.353, which maximizes the likelihood of the expected population structure data under the selected model. As one can see, the site-frequency spectrum under population structure can look similar to that under recurrent positive selection.
F<sc>igure</sc> 2.—
Figure 2.—
Distribution of the test statistics (Λ) for the test assuming Hartl et al.'s (1994) model as a function of population recombination rate (R). The y-axis is quantiles of Λ's calculated by CLRT from sampled sequences, the x-axis is quantiles of data drawn from a χ21-distribution. Λ converges to a χ21-distribution with large R. One thousand replicates of data sets were sampled from Hudson's “ms” program, each with sample size n = 50, fixed number of segregating sites S = 100, and various levels of recombination rate.
F<sc>igure</sc> 3.—
Figure 3.—
95% critical value of the test statistic (Λ*) converges to formula image (plotted in log scale for both x- and y-axes). Data were drawn from Hudson's “ms” program with sample size n ∈ { 10, 50, 100} and fixed segregating sites S = 100.
F<sc>igure</sc> 4.—
Figure 4.—
Effect of the bias of the recombination rate estimator on the size of the CLRT. Data were drawn from Hudson's “ms” program with sample size n = 50 and fixed segregating sites S = 100. Recombination rates were estimated by the “SITES” program (Hey and Wakeley 1997).
F<sc>igure</sc> 5.—
Figure 5.—
Effect of the population structure on the size of the CLRT. Data were drawn from the island model using Hudson's “ms” program with given number of demes, D ∈ {2, 5, 10, 20, 50} with R = 0.
F<sc>igure</sc> 6.—
Figure 6.—
Effect of the population size changes on the size of the CLRT. Data were drawn from the population exponentially growing model by Hudson's “ms” program with sample size n = 50, fixed segregating sites S = 100, growth rate β ∈ {0.1, 0.2, 0.4, 0.8, 1.6, 3.2}, and various levels of recombination rate.
F<sc>igure</sc> 7.—
Figure 7.—
Site frequency spectrum of data from a single population having undergone a recent bottleneck. Bottleneck occurred 0.1Ne generations ago, and it lasted 0.05Ne generations. Sample size n = 10, with fixed segregating sites S = 100. f is the ratio of population size during bottleneck to the original size. α is the type I error of the CLRT. (A) Moderate bottleneck with f = 0.1; (B) strong bottleneck with f = 0.01.
F<sc>igure</sc> 8.—
Figure 8.—
Effect of recent population bottleneck on the size of the CLRT. f is the ratio of population size during bottleneck to the original size. The data sampling scheme is the same as that described in Figure 7.
F<sc>igure</sc> 9.—
Figure 9.—
Power of the CLRT under varying levels of selection. The x-axis is the value of the selection parameter in the PRF model under which the data were simulated.
F<sc>igure</sc> 10.—
Figure 10.—
Site-frequency spectrum under recurrent negative selection, neutral, and positive selection with varying levels of mutation and recombination rates. The y-axis is the proportion of SNP sites that were found at frequencies 1/15, 2/15, … , 14/15.
F<sc>igure</sc> 11.—
Figure 11.—
Power of the CLRT in distinguishing negative selection from the population exponentially growing model. Data were simulated by the “FISHER” program under the assumption of constant population size with sample size n = 50, θ = 30, R = 100 under the forward simulation model with selection coefficient γ = −1, −5, −10, respectively. The x-axis is the growth rate β, the parameter of the data where the empirical distribution of the test statistics was obtained to get the critical value for the test.
F<sc>igure</sc> 12.—
Figure 12.—
γ̂/γ for data drawn from forward simulation with the recombination model (by the “FISHER” program). γ̂ is the maximum-likelihood estimator of the selection coefficient, and γ is the true parameter value under which the data were simulated.

Similar articles

Cited by

References

    1. Ashburner, M., 1989 Drosophila: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
    1. Bouffard, G. G., J. R. Idol, V. V. Braden, L. M. Iyer, A. F. Cunningham et al., 1997. A physical map of human chromosome 7: an integrated YAC contig map with average STS spacing of 79kb. Genome Res. 7 673–692. - PubMed
    1. Bustamante, C. D., J. Wakeley, S. Sawyer and D. L. Hartl, 2001. Directional selection and the site-frequency spectrum. Genetics 159 1779–1788. - PMC - PubMed
    1. Bustamante, C. D., R. Nielsen and D. L. Hartl, 2003. Maximum likelihood and Bayesian methods for estimating the distribution of selective effects among classes of mutations using DNA polymorphism data. Theor. Popul. Biol. 63(2): 91–103. - PubMed
    1. Comeron, J. M., and M. Kreitman, 2002. Population, evolutionary and genomic consequences of interference selection. Genetics 161 389–410. - PMC - PubMed

Publication types