Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Nov 30:1:131.
doi: 10.1038/ncomms1130.

Deep resequencing reveals excess rare recent variants consistent with explosive population growth

Affiliations
Free PMC article

Deep resequencing reveals excess rare recent variants consistent with explosive population growth

Alex Coventry et al. Nat Commun. .
Free PMC article

Abstract

Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count ∼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Physical location of selected variants.
For each variant shown, the figure shows the reference residue, the location, the variant residue and, in parentheses, the variant's posterior probability. Variants identified by Polyphen as potentially damaging to the protein product are shown in magenta, others are in cyan. (a) Variants that change the protein structure in KCNJ11. (b) Variants in HHEX. No sufficiently homologous crystal structure for HHEX is available for homology modelling; hence, we show the gene structure instead. Blue regions depict exons. Green regions depict neighbouring intronic/untranslated regions (30 base pairs in both directions). Black bars indicate excluded intronic sequence. Non-coding variants are shown in grey, and show the reference allele, the build 36 coordinate on chromosome 10, the variant allele and the posterior probability of the variant.
Figure 2
Figure 2. Number of variants as a function of sample size.
Counts of the number of observed segregating sites as a function of sample size for (a) HHEX and (b) KCNJ11. Solid blue line shows the total number of segregating sites. Red shows singletons, and yellow, brown and purple lines show the numbers of variants with relative minor allele frequency <0.01, 0.01–0.05 and more than 0.05, respectively. Roughness in these curves indicates stochasticity in the number of variants observed across multiple sample populations. Dashed lines show extrapolations of the expected number of segregating sites in larger samples according to Watterson's classical estimate. In all cases, we found far more segregating sites at larger sample sizes than Watterson's estimate would have predicted.
Figure 3
Figure 3. Site-frequency spectra.
Site-frequency spectra in (a) HHEX and (b) KCNJ11 over 'neutral sites' (see Methods) in the two genes for the European sub-population. The x axis depicts the number of variants observed at a site; the y axis depicts the expected number of sites at which that many variants were seen. Green bars show the expected number of sites, as determined by sampling from the posterior genotypic distributions for each sampled individual, and error bars show the 99% confidence intervals from these samples. The black line shows the expected SFS spectrum, given the Wright–Fisher constant population size model and mutation rate Θ estimated by Watterson's method (Equation 4.16, Hartl & Clark (2007)) The blue line shows the mean posterior SFS given the population model used to calculate the mutation rate in Figure 4.
Figure 4
Figure 4. Mutation rate estimates.
These estimates are based on drawing an average over 100 coalescent trees per grid point. (a) Estimated marginal posterior distribution over growth rates per generation during the exponential growth phase. Red error bar in the lower left-hand corner shows the 95% confidence interval of the growth rate in the European lineage estimated in Table 1 of Gutenkunst et al. which is much lower, because the more common variants used in that estimate pertain to a more remote time in our history. (b) Estimated marginal posterior distributions on the time when variants of various relative minor allele frequencies arose in the population, relative to the logarithm of number of generations ago. Blue, green, red, cyan and magenta lines correspond to distributions for variants with relative minor allele frequency (RMAF) of 5×10−5, 5×10−4, 5×10−3, 5×10−2 and 5×10−1, respectively. A RMAF of 5×10−5 corresponds to singletons in our data set, which, according to our model, mostly arose in the last 2,500 years. Most previous analyses have dealt with SNPs with a RMAF on the order of 5×10−2, corresponding to much earlier mutations. (c) Estimated marginal posterior distribution over mutation rates given the SFS in the two genes. Blue and green lines are for HHEX and KCNJ11, respectively.

Similar articles

Cited by

References

    1. Cohen J. C. et al.. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. N. Engl. J. Med. 354, 1264–1272 (2006). - PubMed
    1. Fawcett K. A. et al.. Detailed investigation of the role of common and low-frequency WFS1 variants in type 2 diabetes risk. Diabetes 59, 741–746 (2010). - PMC - PubMed
    1. Glatt C. et al.. Screening a large reference sample to identify very low frequency sequence variants: comparisons between two genes. Nat. Genet. 27, 435–438 (2001). - PubMed
    1. Johansen C. T. et al.. Excess of rare variants in genes identified by genome-wide association study of hypertriglyceridemia. Nat. Genet. 42, 684–687 (2010). - PMC - PubMed
    1. 1000 Genomes. A deep catalog of human genetic variation, http://www.1000genomes.org (2010).