Evolutionary and Population Genomics of the Cavity Causing Bacteria Streptococcus mutans

Author Notes

Abstract

Streptococcus mutans is widely recognized as one of the key etiological agents of human dental caries. Despite its role in this important disease, our present knowledge of gene content variability across the species and its relationship to adaptation is minimal. Estimates of its demographic history are not available. In this study, we generated genome sequences of 57 S. mutans isolates, as well as representative strains of the most closely related species to S. mutans (S. ratti, S. macaccae, and S. criceti), to identify the overall structure and potential adaptive features of the dispensable and core components of the genome. We also performed population genetic analyses on the core genome of the species aimed at understanding the demographic history, and impact of selection shaping its genetic variation. The maximum gene content divergence among strains was approximately 23%, with the majority of strains diverging by 5–15%. The core genome consisted of 1,490 genes and the pan-genome approximately 3,296. Maximum likelihood analysis of the synonymous site frequency spectrum (SFS) suggested that the S. mutans population started expanding exponentially approximately 10,000 years ago (95% confidence interval [CI]: 3,268–14,344 years ago), coincidental with the onset of human agriculture. Analysis of the replacement SFS indicated that a majority of these substitutions are under strong negative selection, and the remainder evolved neutrally. A set of 14 genes was identified as being under positive selection, most of which were involved in either sugar metabolism or acid tolerance. Analysis of the core genome suggested that among 73 genes present in all isolates of S. mutans but absent in other species of the mutans taxonomic group, the majority can be associated with metabolic processes that could have contributed to the successful adaptation of S. mutans to its new niche, the human mouth, and with the dietary changes that accompanied the origin of agriculture.

Streptococcus mutans, demographic inference, cavities, bacterial evolution, pan and core genome, infectious disease

Introduction

Streptococcus mutans is known to be one of the most prevalent bacteria in human oral flora and is widely recognized as a key etiological agent of human dental caries (reviewed in Burne 1998). Evidence in support of this latter issue include the following key points: S. mutans is frequently isolated from caries lesions; it induces caries formation in animals fed a sucrose-rich diet; the species is highly acidogenic and aciduric (Hamada and Slade 1980; Loesche 1986; van Houte 1994); it can form an effective biofilm by producing surface antigens which promote adhesion to the tooth surface and other bacteria (Hamada and Slade 1980); and it flourishes in cariogenic plaque because it is better able to grow and metabolize carbohydrate in a low pH environment (Bender, et al. 1986; Bender and Marquis 1987; Marquis 1990; Belli and Marquis 1991; Arthur et al. 2011). Despite its recognized importance in this important human disease, there are very few publications using comparative genomics to gain insights on basic biology, evolutionary history and pathogenesis of this organism (Ajdic et al. 2002; Maruyama et al. 2009) and there are but three genome sequences currently on GenBank. A number of studies have demonstrated substantial genetic heterogeneity across clinical isolates of S. mutans (Zhang et al. 2009; Arthur et al. 2011; Cheon et al. 2011; Phattarataratip et al. 2011); however, our present knowledge of gene content variability across the species and its relationship to adaptation and virulence is minimal.

One of the more significant recent discoveries in bacterial genomics is that bacteria species appear to be comprised of both a set of core and dispensable genes, with only the former present in all isolates of that species and with the sum of the two components forming the species pan-genome (or supra-genome). This concept was first introduced for S. agalactiae in 2005 (Tettelin et al. 2005) and is now generally regarded as a principle common to most or all bacteria. Much speculation has centered on the origin, composition, and size of bacteria pan-genomes and whether they are finite or infinite (Tettelin et al. 2008; Lapierre and Gogarten 2009). Recently, we examined the role of the core and dispensable genes in defining two sympatric and closely related species of Campylobacter—C. jejuni and C. coli (Lefebure et al. 2010), and addressed whether their pan-genomes are finite or infinite. We demonstrated, through the analysis of 96 genome sequences, that their pan-genome is indeed finite and that there are unique and cohesive features to each of their genomes defining their genomic identity. The two species have a similar pan-genome size; however, C. coli has acquired a larger core genome and each species has evolved a number of species-specific core genes, possibly reflecting different adaptive strategies. Understanding the pan-genome components of any species of bacteria means that the core genome and the dispensable genome can be identified for the group in question. Comparisons made to the genomes of representative isolates of other closely related species, can then identify the unique core genome of the group—that is, the genes common to all isolates of that group, not present in its closest relatives. This set of unique core genes is of particular interest, because they are amongst the genes likely to define the essence of that group’s adaptive specifics.

Demographic models inferred from genetic data have an important role in modern population genetic analysis. Because demographic processes affect the accumulation of variation along the entire genome, the analysis of comparative population genome sequence data offers the possibility to address questions about the demographic history of populations. Of particular interest are genome-wide single nucleotide polymorphisms (SNPs) from multiple individuals of the same species representing many thousands of quasi-independent data points. Site frequency spectrum (SFS) methods for the analysis of such data have proven to be a powerful means of assessing demographic history and have recently been applied to questions involving a diversity of organisms (Caicedo et al. 2007; Gutenkunst et al. 2009). Demographic analyses of bacterial species based on population genetic analysis of whole genomes, using the SFS, have yet to be published, although such methods should be entirely applicable if the necessary data were available.

Most of the intraspecific evolutionary analyses of bacterial species have focused on understanding the phylogenetic relationships among bacterial isolates or have concentrated on assessments of population diversity based on seven gene multilocus sequence typing (MLST). Studies based on MLST suggest that recombination in S. mutans contributes eight times more to the maintenance of genetic diversity than mutation (r/u ∼8.3) (Do et al. 2010). Homologous recombination in S. mutans, as in the rest of naturally transformable species, occurs by a process analogous to gene conversion in eukaryotes, by replacement of short fragments of DNA (Smith et al. 1991). Members of the genus Streptococcus have recently been the subject of population genomic analyses (Donati et al. 2010; Croucher et al. 2011; Muzzi and Donati 2011), highlighting the importance of the dispensable gene component, which carries a high proportion of loci associated with different virulence and phenotypic attributes (Suzuki et al. 2011). These studies have also pointed out the large impact of homologous recombination in the maintenance of genetic diversity of the core genome (Donati et al. 2010; Croucher et al. 2011). Given the relatively high rates of homologous recombination in naturally transformable species like S. mutans, we propose employing methods based on the analysis of the SFS to provide insight on the evolutionary history of the species.

In this study, we generated genome sequences of 57 S. mutans isolates, as well as representative strains of the most closely related species to S. mutans, to identify the overall structure and potential adaptive features of the dispensable and core components of the genome, and performed population genetic analyses on the core genome of the species aimed at understanding the demographic history, and impact of selection shaping its genetic variation.

Results

SNPs and the Core Genome

Next generation technology was used to obtain genome sequences of an international collection of 57 clinical isolates of S. mutans. Comparison of gene content among strains indicated that the maximum gene content divergence was approximately 23%, with many strains diverging by 5–15% in number of annotated genes (supplementary fig. S1, Supplementary Material online). On average, each genome contained 1,636 genes. To conduct a population genomic analysis of demographic history in S. mutans, we needed to identify the core genome components because the necessary genetic information for reconstructing the demographic history of S. mutans is contained in those genes that are shared by all isolates of the species. Our comparisons indicated that there were 1,490 genes common to all 57 strains (supplementary fig. S2, Supplementary Material online, for an estimate of the core genome of S. mutans), out of which 1,430 had sufficient information (more than 90% of the gene length for all strains), to perform population genetic analyses; the pan-genome of the species approached 3,296 genes (supplementary fig. S3, Supplementary Material online). From the 1,430 core genes, we identified 29,805 silent and 21,997 replacement SNPs, which summarized across genes, resulted in estimates of sequence polymorphism ranging from 0.5% to 1.9%. We estimated summary statistics and found that the average GC content in S. mutans is approximately 37%, the average Watterson’s θ (0.008) is larger than the average nucleotide pairwise difference (0.005), and this translates into a prevalence of negative Tajima’s D across genes, as has been observed in other species of bacteria (Hughes 2005). A graphic representation with the estimates of these summary statistics is included in the Supplementary Material online (supplementary fig. S4, Supplementary Material online). Only 1.16% of the SNPs found across all 1,430 core genes used in the analysis were multi-allelic leaving us with 98.84% bi-allelic SNPs for the demographic and selection analysis. Principal component analysis (PCA) (Novembre and Stephens 2008), employing silent sites from the core genome, was used to inspect the structure of genetic variation. For this, we only considered synonymous polymorphic sites with minor allele frequencies above 5% and r²≤ 0.4; the correlation between alleles was used as a measure of linkage disequilibrium. The rationale behind the elimination of sites is that alleles in low frequency do not contribute to the clustering of strains and because the information in linked sites is redundant. Nevertheless, PCA performed with the complete data set (all synonymous polymorphic sites from the core genome) produced similar results. Consistent with the findings of other studies on S. mutans (Do et al. 2010), our analysis suggested little genetic differentiation among isolates sampled in different geographic locations (supplementary fig. S5, Supplementary Material online). In addition to the PCA, we performed pairwise Fst analyses with permutation procedures to assess their significance. None of the pairwise Fst between geographic locations was significant. This facilitates the work of historical demographic reconstruction because single population models can be explored and fit to the data with greater power, because there are fewer numbers of parameters. Additional analyses on the dispensable part of the genome suggest that there is no association between the sharing of genes and the geographic location of the samples and like the core genome, there is no indication of population genetic structure evident in the analysis of high frequency dispensable genes.

Demographic History

To reconstruct the demographic history of S. mutans, we employed a maximum likelihood inference method based on the distribution of allele frequencies across silent SNPs, or SFS, and estimated confidence intervals (CIs) by bootstrapping. Five different population models were explored (supplementary fig. S6, Supplementary Material online) in this framework and the selection of the best-fit model was performed using the Akaike Information Criteria. The large number of singleton (unique) substitutions observed in S. mutans SFS is consistent with a recent expansion (fig. 1a and b). Recently expanded populations leave a signature of mutations found in very low frequency, that have not had chance to disappear, or increase in frequency, by genetic drift (Slatkin and Hudson 1991; Griffiths and Tavare 1994). Estimations performed with δaδi recover two main parameters under the exponential growth model. The first is ν, the ratio of current to ancestral population size; and the second is τ, the time since the change in demographic, representing in our case the start of the demographic expansion. τ is equal to time (T) scaled by 2Na, which is the ancestral effective population size. After appropriate rescaling of the inferred parameters, the maximum likelihood analysis suggested that the SFS of S. mutans is consistent with a demographic scenario in which the population started expanding exponentially approximately 10,000 years ago (95% CI_{classic_boostrap}: 3,268–14,344 years ago; possible uncertainties in mutation rate and generation time were taken into consideration in the computation of this CI—see Supplementary Material online for details; fig. 1a and b and table 1) and the absolute fit of the observed and simulated SFS’s under this demographic model indicated no significant difference in their distributions (two-sided Kolmogorov–Smirnov D = 0.2069, P = 0.564). The fit of the observed data to our simulations suggested that the effective population size of S. mutans has increased 4.8 to 5.5 times since the origin of human agriculture (fig. 1c), estimates much larger than those reported for humans (Coventry et al. 2010). The likelihood surface of the parameter search was evaluated to assure that it did not fall into local maxima. Models were searched with different initial parameter values and evaluated if they all converged to the same maxima. The likelihood surface in figure 1c was obtained by finding the likelihood value for each combination of parameters in a grid. We estimated the CIs for the parameters by bootstrapping the synonymous data matrix and fitting the model (supplementary fig. S7, Supplementary Material online).

Fig. 1.

Demographic history of Streptococcus mutans. (a) Schematic representation of S. mutans population history. The timeline (in years before present) represents the start of the expansion of cariogenic bacteria after the onset of agriculture, calibrated using an experimentally determined mutation rate for bacteria (Drake 1991), concomitant with an in vivo determined generation time for oral flora bacteria (Gibbons 1964) (see Materials and Methods and Supplementary Material online, for details). (b) The observed distribution of number of synonymous SNPs at a given frequency in the sample of 58 isolates (blue) is shown, as well as the expectation under the parameters that generate the best fit demographic model (dark blue). The difference between the two distributions is not significant. The distribution under a standard neutral model with constant population size is shown in light blue (significant KS, P < 0.0001). (c) The bi-dimensional likelihood profile for combination of parameters ν (ratio of current to ancestral population size) in the x axis and the time at the beginning of the demographic expansion (scaled in generations/2Na) in the y axis. The maximum likelihood value is shown as a white dot and the 95% CI is highlighted as a white dotted line. 95% CI estimated from bootstrapped data can be found in supplementary figure S7, Supplementary Material online.

Open in new tab Download slide

Table 1.

Open in new tab

Selection of Demographic Models.

Model^a	Ln L^b	No. of Free^c Parameters	AIC^d
Exponential growth	−156.67	2	317.34
2 Epoch	−168.96	2	341.92
Bottleneck + growth	−156.67	3	319.34
3 Epoch	−168.97	4	345.94

Model^a	Ln L^b	No. of Free^c Parameters	AIC^d
Exponential growth	−156.67	2	317.34
2 Epoch	−168.96	2	341.92
Bottleneck + growth	−156.67	3	319.34
3 Epoch	−168.97	4	345.94

^aThe logarithm of the maximum likelihood (Ln) for each of the demographic models fit to the data.

^bThe models assessed were exponential growth or decay (exponential growth), 2 epoch (constant and instantaneous increase), a bottleneck in the past, combined with exponential growth (bottleneck + growth), and 3 epoch (bottleneck, followed by an instantaneous increase).

^cThe number of parameters for each model.

^dAkaike Information criteria [AIC = 2 × (no. free parameters) – 2 ln). The model with the minimum AIC (exponential growth) was selected as the model that best explains the data.

Table 1.

Open in new tab

Selection of Demographic Models.

Model^a	Ln L^b	No. of Free^c Parameters	AIC^d
Exponential growth	−156.67	2	317.34
2 Epoch	−168.96	2	341.92
Bottleneck + growth	−156.67	3	319.34
3 Epoch	−168.97	4	345.94

Model^a	Ln L^b	No. of Free^c Parameters	AIC^d
Exponential growth	−156.67	2	317.34
2 Epoch	−168.96	2	341.92
Bottleneck + growth	−156.67	3	319.34
3 Epoch	−168.97	4	345.94

^aThe logarithm of the maximum likelihood (Ln) for each of the demographic models fit to the data.

^cThe number of parameters for each model.

^dAkaike Information criteria [AIC = 2 × (no. free parameters) – 2 ln). The model with the minimum AIC (exponential growth) was selected as the model that best explains the data.

The expected SFS of variation is not affected by linkage, but that is not true of the variance (Bustamante et al. 2001; Zhu and Bustamante 2005). It is germane therefore to demonstrate that S. mutans, being a naturally transformable bacterial species, undergoes significant homologous gene exchange. We assessed the magnitude of recombination in S. mutans by identifying the number of significant gene conversion events among isolates using Sawyer’s algorithm (Sawyer 1989). This algorithm is conservative (Sawyer 1989) and therefore we regard it as an estimate of the minimum set of gene conversion events. Power analyses of the method have concluded that it tends to underestimate the gene conversion events for scenarios in which the rate of gene conversion is high (Mansai and Innan 2010). Our analysis indicates that there has been extensive gene exchange between lineages represented by the isolates in our sample (fig. 2a), with a wide distribution of gene conversion tract lengths. Additionally, our analysis with ClonalFrame suggests that on average, recombination contributes approximately six times more than mutation to the genetic variation in S. mutans (ρ/θ ≈ 6; 95% credibility interval ∼3.09–19.6; supplementary fig. S8, Supplementary Material online). We performed simulations in ms (Hudson 2002) assuming gene conversion rates similar to those estimated for S. mutans (six times larger than the mutation rate), all under the same demographic scenario estimated from the original data. The mean SFS over 600 simulations is similar to the observed SFS (supplementary fig. S9, Supplementary Material online). We then used the simulations to investigate the demographic inferences performed with δaδi. For each simulation, we estimated the SFS and inferred maximum likelihood parameters (τ and ν) for a demographic scenario of population expansion in δaδi. The distribution of maximum likelihood estimates of τ and ν present more variability than that obtained with classical bootstrap (τ: 95% CI = 0.2–2.3; ν: 95% CI = 2.5–11.3), and our maximum likelihood estimated parameters (τ = 0.355 and ν = 5.09) fall within the CIs obtained from the parametric bootstrap, suggesting that δaδi is producing results that are consistent with what would be observed in a 2 Mb chromosome evolving under such a scenario (supplementary fig. S10, Supplementary Material online). Furthermore, in our simulations, the patterns of decay of linkage disequilibrium with distance approximately resemble the decay pattern observed in the data.

Fig. 2.

Recombination in Streptococcus mutans. (a) The inferred distribution of recombination tracts (gene conversion) among isolates of S. mutans. Gene tracts of the core genome that served as alignment for the estimation of recombination along the genome are represented in blue and red. Tracts of significant gene conversion events detected along the genome are represented in green. (b) The distribution of gene conversion tract lengths, characterized by a wide range of values that follow a geometric distribution.

Open in new tab Download slide

Adaptation to the Postagriculture Habitat

To understand the role of natural selection in shaping genomic variation in S. mutans, we explored a variety of selection models under a similar maximum likelihood framework to that employed for the demographic fitting, used to explain the SFS of the replacement SNPs. Our analysis suggests that the majority of the changes (70%) that cause amino acid substitutions are under strong negative selection, and the remainder evolved neutrally (fig. 3). The frequency of rare variants is much higher, and the frequency of common variants much lower, than expected under a neutral model, even after correcting for demographic expansion. This is a pattern consistent with strong purifying selection acting genome-wide (Bustamante et al. 2001; Boyko et al. 2008), and is in agreement with what has been observed in other bacterial species (Hughes 2005).

Fig. 3.

Evidence of genome-wide selective constraints in Streptococcus mutans. The observed distribution of number of replacement SNPs at a given frequency in the sample of 58 isolates is shown in red. The expectation is that replacement changes will have an effect on the fitness of individuals, so it is unlikely that they behave neutrally. Correcting for population expansion inferred from the silent SNPs (fig. 1), does not account for the excess of singletons observed in the data (light green). On the other hand, a model that allows for selection affecting changes in allele frequency, after correcting for demography, yields a superior fit, suggesting that in the S. mutans genome 30% of the replacement changes are neutral and 70% are under strong selection (γ = −17, where γ = 2N_es, and N_e is the current population size and s is the coefficient of selection).

Open in new tab Download slide

We explored the molecular adaptation of individual genes of S. mutans in two ways: 1) by performing neutrality tests comparing the odds ratio of replacement to silent divergent versus polymorphic changes via McDonald–Kreitman (MK) tests, and a Bayesian generalization of the Log-linear model that is the basis for the MK test (SNIPER); and 2) by identifying the protein domains, as well as the putative metabolic pathways in which these proteins are involved, of the genes present in all isolates of S. mutans, but not present in the outgroup S. ratti and two other closely related species of the mutans group (namely S. macacae and S. criceti). In particular, we were looking for proteins involved in aciduricity (resistance to acid), sugar metabolism, resistance to oxidative stress, antibiotics, and adherence to human tissue. The annotated genes of S. ratti were compared with the 59 S. mutans genomes using all versus all BLAST and from the protein clusters we identified the orthologous genes present in both species. A total of 2,033 genes were annotated in S. ratti and of these, 1,282 genes were in a 1:1 relationship with S. mutans. On average, we estimated the divergence between S. mutans and S. ratti to be approximately 26%. We estimated very few of these proteins showing signatures of positive selection (more fixed replacement changes than synonymous). The combined estimate of MK and SNIPER tests identified 14 genes that were under selection (after Bonferroni correction), the majority of which are involved in either sugar metabolism or acid tolerance (supplementary table S1, Supplementary Material online). It is possible that the relatively high divergence between S. mutans and S. ratti reduced the power of these tests, as multiple hits could mask the signature of selection; however, our results are nonetheless consistent with the SFS-based analysis.

On the other hand, the analysis of proteins present in all isolates of S. mutans, but absent in their close relatives (the S. mutans unique core genome) suggests that most of these genes are involved in adaptation to the postagriculture human mouth niche. Of the 1,490 genes that conform to the core genome of S. mutans, 148 are present in S. mutans and absent in the putative sister group S. ratti (Tapp et al. 2003; Hung et al. 2005), whereas 73 loci are unique to S. mutans in comparisons involving the three species, S. ratti, S. macacae, and S. criceti (fig. 4a and supplementary table S2, Supplementary Material online). Within this set of S. mutans unique core genes, 36 are hypothetical proteins with no similarity to known domains or protein clusters (fig. 4a). The remaining proteins show similarity with domains of proteins involved in processes of carbohydrate metabolism, resistance to acidic environments, transcriptional regulation, oxidative stress, metal and peptide translocation, and adhesion to host tissue (fig. 4a and supplementary tables S2 and Supplementary Data, Supplementary Material online). In addition, some of these unique core genes contain domains potentially involved in resistance to antimicrobials, suggesting they could be of more recent acquisition (fig. 4a). Undoubtedly, one of the major challenges that S. mutans had to overcome as the carbohydrate content of the human diet increased was surviving at low pH. Although S. mutans does not constitute a significant proportion of the oral flora colonizing healthy dentition, it can become numerically significant when there is repeated and sustained acidification of the biofilms associated with excess dietary carbohydrates or impaired salivary function (Burne 1998). Interestingly, 14% of the gene products found in the S. mutans unique core genome have been shown to be upregulated in transcriptomic analyses at low pH (Gong et al. 2009) (binomial test comparison with core genome, P = 0.01). Among these are cation flux pumps that contribute to ionic equilibrium. Analysis of the functional groups (GO terms) of the dispensable genes present in 54, 55, 56, and 57 isolates suggest that there is no enrichment of any particular functional group (P value > 0.05 for all comparisons), when compared with the core genome and none of the genes presented mean ratio of replacement to synonymous changes (k_N/k_S) larger than 1.

Fig. 4.

Genome map of Streptococcus mutans. (a) Representation of the forward coding (light blue) and reverse coding (light red) genes comprising the core genome of S. mutans. The third inner circle, displays the unique core genes, present in S. mutans only, colored by the metabolic functions in which they are involved. The most inner circles present the unique genes shown to be upregulated or downregulated by the impacts coincident with the diet change of humans after the origin of agriculture: starch and sucrose metabolism and low environmental pH. (b) Putative origin of laterally transferred unique core genes in S. mutans..

Open in new tab Download slide

Discussion

The genomes of different strains of S. mutans, like those of many other species of Streptococcus, are highly dynamic and their overall gene composition differs markedly from one isolate to another because of lateral gene transfer (LGT). However, as with other bacteria, this difference in gene content involves only a portion of the genome, generally referred to as the dispensable component, in contrast to an alternative set of genes common to all strains, known as the core genome. Together these two components comprise the pan-genome of the species (Tettelin et al. 2005, 2008). The core genome is a clearly identifiable component of Streptococcus species, as well as species from other genera, and indeed may represent that set of genes which can best define bacterial species (Lan and Reeves 2000, 2001; Tettelin et al. 2008; Lapierre and Gogarten 2009; Lefebure et al. 2010). Our analyses suggest that the core genome of the species is readily identifiable and consists of 1,430 genes. The complete gene repertoire of the dispensable component of the genome is much harder to capture. Nevertheless, after examining the accumulation curve of the pan-genome in S. mutans (supplementary fig. S3, Supplementary Material online), we suggest that by sampling 58 genomes in the population we are reaching a point at which there is no appreciable increase in the number of new genes as more genomes are included in the analysis.

Demographic History

Our estimates for the timing of the demographic expansion in S. mutans coincide with the origin of human agriculture. It has been hypothesized that many infectious diseases could only originate and be maintained after humankind developed agriculture (Fiennes 1978; Dobson and Carper 1996; Diamond 2002). Numerous studies in physical anthropology have shown an increased prevalence of dental caries in human remains from post-agricultural societies (5–50%) when compared with remains of Mesolithic hunter–gatherers (0–2%) (Cohen and Armelagos 1984; Formicola 1987; Lukacs 1992). This pattern has been attributed to changes in diet and the consequent increase in consumption of carbohydrates in human populations after the development of starchy crops, leading to the establishment of infectious agents causing dental caries (Cohen and Armelagos 1984; Lukacs 1992). Our timing and expansion estimates are consistent with S. mutans as a possible etiological agent of human dental caries; however, it is not possible to rule out other oral bacteria species as also contributing to the development of this disease.

We have attempted to be thorough in our analysis, by including not only rigorous demonstration that S. mutans undergoes recombination but also that the methods of analysis are appropriate for the limited conditions of a small genome size, and other constraints imposed by the biology of this organism. Nevertheless, we have not explored all alternative possibilities that could give rise to patterns of variation like those observed in this data set. We have provided a parsimonious explanation for the pattern of variation found in the synonymous substitutions along the core genome; however, the biology of bacterial species is such that there are potential alternative conditions that could give rise to similar patterns. For instance, it is widely appreciated that selection can impact the population dynamics of clonal organisms like bacteria (Dykhuizen 1990a, 1990b, 1993; Guttman and Dykhuizen 1994; Buckee et al. 2008). Although it is possible that a scenario of multiple instances of selection and linkage, because of linkage to replacement sites under selection (Braverman et al. 1995), could explain the excess of singletons in the distribution of synonymous changes, it is difficult to envision a selection scenario, with specific selective pressures, that would explain our observations. Another potential scenario is the existence of different regimes of mutation in time. In laboratory experiments, it has been shown how clones with greater mutation rates (mutators) can facilitate adaptation to new environmental conditions (Sniegowski, et al. 1997; Shaver, et al. 2002; Gerrish et al. 2007). These mutators also have the effect of changing the overall mutation rate in the population, potentially affecting the pattern of observed variation, giving rise to an observed excess of singletons, via a combination of negative selection on replacement changes and variation of the rates at which new synonymous and replacement changes are being generated. Unfortunately, there is no information about the prevalence of mutator strains in natural populations (carrier or clinical) of S. mutans, or in experiments performed with laboratory strains. Our analyses of the population genetic structure of the data, consistent with results from previous work (Do et al. 2010), suggest that there is no geographic differentiation. We believe that our simulations under the inferred demographic scenario, which exclude selection, provide a sufficiently parsimonious explanation in agreement with the observed data, to explain the genome wide accumulation of synonymous variation in S. mutans. It has been suggested that S. mutans genotypes are maternally transmitted (Alaluusua et al. 1996; Emanuelsson et al. 1998; Emanuelsson and Thornqvist 2000), and this could act to produce a strong pattern of geographic differentiation of the isolates over time. Our results showing a lack of geographic structure are not consistent with such a scenario. Recent work by Klein et al. (2004) suggests a high turnover rate of lineages in the mouth of children after colonization from the mother, which would be consistent with our observations.

Adaptation in the Postagricultural Era

The distribution of replacement variants in the S. mutans population is consistent with strong purifying selection acting genome-wide, and it raises the question of what are the features of molecular adaptation that underlie S. mutans successful colonization of, and proliferation in, the human host more than 10,000 years ago. To adapt to the new niche of the “post-agricultural” human mouth, S. mutans faced several challenges. Among them, S. mutans needed to develop, or increase, efficiency in the metabolism of new sugars, successfully compete with bacterial species already present in the mouth of humans, develop defenses against increased oxidative stress, and resist the acidic by-products of its own new efficient carbohydrate metabolism (Jacobson et al. 1989). Or results suggest that only a handful of genes (14 genes) present a pattern that deviates from the neutral expectation. On the other hand, analysis of the function of the unique core genes suggests that a large fraction of these genes are involved in sugar metabolism and pH resistance. The absence of these putative adaptive genes in other species of the Mutans taxonomic group (Tapp et al. 2003) suggests their acquisition via LGT to the S. mutans lineage. Consistent with this hypothesis, these proteins tend to be similar to those arising from a wide variety of bacterial species including other oral flora bacteria, as well as taxa which produce lactic acid (fig. 4b, supplementary table S2, Supplementary Material online), and many of them appear to be involved in carbohydrate metabolism. As an example, the gene SMU.438 (supplementary fig. S11, Supplementary Material online) is closely related to a hypothetical protein present in Lactococcus lactis, which includes domains corresponding to an activator of 2-hydroxyglutaryl-CoA dehydratase. This protein includes three domains identified in Lactobacillus: FGGY family of carbohydrate kinases, N-terminal domain (2), and a CoA-substrate-specific enzyme activase, suggesting that this protein could be involved in carbohydrate metabolism (Klees et al. 1992). The protein SMU.1410 (supplementary fig. S12, Supplementary Material online) is very similar to a putative reductase of S. gallolyticus (formerly known as S. bovis). S. gallolyticus is found in the gastrointestinal tract of ruminants and humans, and has been linked to endocarditis in humans (Hoen et al. 2005). When ruminants consume a diet high in sugars, the fermentable carbohydrates allow for the growth of S. bovis (Asanuma and Hino 2002). This organism also exhibits lactic acid fermentation, reducing the pH of the rumen leading to ruminal acidosis (Asanuma and Hino 2002). Reductases are also involved as ion pumps in electron transport chains. We suggest that this putative reductase is involved in resistance to acidic conditions in S. mutans and was obtained via a LGT event from S. gallolyticus, or some very closely related taxon. The protein SMU.1561 has been shown to be up-regulated in S. mutans in acid conditions (Gong et al. 2009). Our phylogenetic analysis indicates that SMU.1561 (supplementary fig. S13, Supplementary Material online) is closely related to the product of the trkA gene from Ethanoligenens harbinense, which is another bacteria forming part of the oral flora of humans (Aas et al. 2005). The trkA gene encodes a protein involved in the transport of ions across the membrane (Anion permease ArsB/NhaD), which is consistent with the fact that it is upregulated in acidic conditions (Gong et al. 2009). An alternative explanation, to a history of LGT for these unique core genes, is that they arose through vertical descent from one of these close relatives of S. mutans, but the genes are not part of the core genome of these other taxa and instead are present in their dispensable genomes, and we simply have not yet sampled them in a single genome sequence. We have identified elsewhere (Lefebure et al. 2010) that core genes in one bacterial species can have their origins in the dispensable genome of closely related bacteria. Whatever their precise evolutionary history, these genes may be important loci in defining the caries-associated phenotype of S. mutans and its adaptation to the human mouth environment.

Although low pH has been considered a primary ecological determinant influencing oral biofilm ecology, oxygen is also a critical factor (Marquis 1995), and it appears to be much better tolerated by commensal streptococci and other members of the normal microbiota than by S. mutans (Marquis 1995). In fact, exposure to oxygen strongly inhibits biofilm formation by S. mutans and alters the transcriptome and metabolism in a way that renders it less cariogenic (Ahn and Burne 2007; Ahn et al. 2009). Thus, S. mutans likely does not compete well in conditions of high redox or oxygen tension. Recently, hydrogen peroxide production by health-associated streptococci, such as S. gordonii, has been demonstrated to strongly inhibit S. mutans in mixed culture (Kreth et al. 2008). In addition to genes involved in acid tolerance, the unique core genes of S. mutans contain a high proportion of small peptides and gene products that could potentially be involved in oxidative stress tolerance (supplementary table S3, Supplementary Material online). For example, the secreted peptide encoded by SMU.1131c (ciaX) has been shown to modulate signal transduction in the CiaRH two-component system to regulate acid and oxidative stress tolerance and genetic competence (He et al. 2008). A variety of other peptides encoded in the unique core genome are in operons with transporters (e.g., SMU.18, SMU.185, and SMU.390), with transcriptional regulators that control gene expression in response to stressors (e.g., the MarR-type regulators associated with SMU.390, SMU.438, SMU.444, and SMU.631), or with proteins that may influence oxidative stress tolerance (e.g., SMU.545 is genetically linked to the dpr gene that encodes a protein necessary for oxidative stress resistance [supplementary table S3, Supplementary Material online]). Of note, a distinct MarR transcriptional regulator of S. mutans (SMU.925) was recently demonstrated to regulate the expression of two ABC-exporters to link (p)ppGpp metabolism, genetic competence and oxidative stress tolerance, possibly through a peptide-based sensing system (Seaton et al. 2011). SMU.1047c is a small peptide genetically linked to the relQ operon, which encodes the RelQ (p)ppGpp synthase and three other gene products contributing to oxidative stress tolerance (Kim et al. 2012). Smu.185 is part of the slo operon, which regulates manganous ion internalization, oxidative stress tolerance, and expression of a variety of genes including superoxide dismutate. Likewise, SMU.1645 is predicted to mediate tellurite resistance, which is generally associated with thiol-mediated reduction of tellurite, whereas SMU.642 is cotranscribed in a complex operon with a predicted oxidoreductase, competence-associated peptide and peptidase; possibly linking oxidative stress and competence similar to the SMU.925 operon (Seaton et al. 2011). Thus, while low pH provides strong selective pressure for aciduric species, during fermentable carbohydrate consumption and caries initiation and progression, oxygen may be an equally important environmental factor influencing the composition, biochemistry, and pathogenic potential of oral biofilms (Abbe et al. 1991); and may be the underlying explanation for the relative prevalence of proteins with roles in oxidative stress carried in the unique core genome of S. mutans (supplementary table S3, Supplementary Material online).

S. mutans is also capable of mounting a substantial defense against commensal streptococci. In particular, strains of S. mutans produce a variety of lantibiotic and nonlantiobiotic bacteriocins that can kill related organisms (Balakrishnan et al. 2002). Peptide-based quorum-sensing systems, including the ComC competence cascade, multiple two-component systems, density-dependent signaling complexes and global regulatory systems all cooperate to influence the production of bacteriocin-like molecules (Martin et al. 2006). The hdrR gene (SMU.1854) of the S. mutans unique core genome encodes a DNA binding protein that regulates bacteriocin production in a density-dependent fashion through its interactions with the HdrM membrane protein (SMU.1855) to control genetic competence and bacteriocin production (Merritt et al. 2007; Okinaga et al. 2010). Interestingly, exposure to air uniformly activates the bacteriocin pathways and endogenous bacteriocin immunity systems, probably as a defense mechanism against competing organisms in immature, comparatively aerobic dental biofilms (Ahn et al. 2009). Therefore, it is significant that the unique core genes of S. mutans contain a higher proportion of small peptides and gene products (smaller than 100 amino acids) than the core genome as a whole (approximately 6:1 ratio) that could potentially be involved in signaling and/or gene regulation (binomial test comparison with core genome, P = 1.23e−10; supplementary table S5, Supplementary Material online).

Conclusion

Taken collectively, our findings suggest that the S. mutans unique core genes may represent important pathogen-specific factors that could be targeted with species-specific therapeutics that might decrease the competitive fitness of S. mutans without interfering with the propagation of health-associated commensal organisms. This study also suggests that one of the innovations that formed the basis of civilization may have precipitated a long-term association with an important human pathogen, highlighting the interconnections that exist between our sociocultural and biological evolution.

Materials and Methods

DNA Sequencing and Ortholog Recovery

A total of 57 strains of S. mutans were selected representing different MLST and countries of origin: Brazil (n = 16), UK (n = 29), Iceland (n = 5), Hong Kong (n = 2), South Africa (n = 2), Turkey (n = 2), and USA (n = 1) (supplementary table S4, Supplementary Material online). Single end sequencing was performed using the Illumina GA2 sequencer, with one lane per strain. The sequence read lengths averaged 36–46 bp. This ensured high coverage (∼70×) of the ∼2 MB genome of S. mutans. Sequence reads were aligned to the S. mutans UA159 and S. mutans NN2025 complete genomes, respectively, using MAQ (Li, Ruan, et al. 2008), with appropriate mapping quality and coverage filters applied to capture the sequence information. Consensus alignments generated from the MAQ mapping were employed to extract, realign, and perform SNP calls in core genome genes; that is, genes for which all strains could be mapped to the reference. De novo assemblies were performed using Velvet (Zerbino and Birney 2008). Several hash lengths (from 21 to 31) and coverage cut-offs (0 to 80) were used, and the best assembly per strain was selected based on a combination of the N50 (which refers to a weighted median with 50% of the entire assembly contained in contigs equal to or greater than this value), the total assembly and the longest contig size. A schematic diagram for the mapping/assembly pipeline is included in supplementary figure S14, Supplementary Material online. Assembled genomes were annotated using the NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP). This pipeline combines HMM-based predictions with sequence similarity approaches to generate comparisons of the query genes or coding sequences against the gene products available in nonredundant protein databases: Entrez Protein Clusters, the Conserved Domain Database and the COGs (Clusters of Orthologous Groups). The algorithms employed in the pipeline are those implemented in GeneMark and Glimmer (Borodovsky and Mcininch 1993; Lukashin and Borodovsky 1998; Delcher et al. 1999).

To identify the orthologous sequences present in S. ratti, the putative sister group to S. mutans (Tapp et al. 2003), we sequenced the genome of one isolate of S. ratti using 454 GS-FLX with paired end, followed by partial gap closure with PCR. The sequence of this isolate was assembled using Newbler and annotated using PGAAP (supplementary fig. S15, Supplementary Material online). The rationale for sequencing S. ratti and identifying S. ratii/S. mutans orthologs was 2-fold: 1) identify orthologous genes that could be used as outgroups in a selection analysis that involved comparison of polymorphism and divergence data; and 2) identify core genes present in S. mutans, but not the most closely related species—that is, the putative unique core genome of S. mutans. Orthologs were determined by performing an all-versus-all BLASTP search combined with clustering using OrthoMCL2 (Li et al. 2003), and included all the S. mutans de novo assembled genomes and the genome sequence for the sister group S. ratti. The history of the S. mutans unique genes could involve losses along the S. ratti lineage or gains via LGT along the S. mutans lineage. Core genes present in S. mutans but absent in other closely related mutans group streptococci, in addition to the sister group S. ratti, represent genes most likely to have been acquired by LGT. To identify such genes, we obtained draft genome sequences (454 GS-FLX, paired end) of two other mutans group streptococci, closely related to S. mutans: S. macacae, and S. criceti (Tapp et al. 2003). The sequences of S. criceti and S. macacae were assembled with Newbler and annotated using PGAAP. Once the protein clusters were identified, we performed an all versus all BLAST comparison of the putatively unique S. mutans genes with the annotated genes of these additional two species, using OrthoMCL2 (Li, Stoeckert, et al. 2003). A conservative similarity threshold of 50% was set as a measure of whether a gene in S. mutans had an ortholog in either of the other two more divergent species. The presence or absence of the genes putatively unique to S. mutans, in these other members of the mutans group, helped us distinguish which genes were lost in the S. ratti lineage and which ones were gained by S. mutans. Finally, additional BLAST searches against existing databases (NCBI nr), followed by maximum likelihood phylogenetic analysis of the aligned sequences, were employed to evaluate the evolutionary history and putative donor species of the unique genes identified as gained on the S. mutans lineage. A list of the additional streptococci genomes, included in this analysis (available at NCBI at the time), is provided in supplementary table S5, Supplementary Material online.

Genome sequence data for 57 strains of S. mutans and single strains each of S. ratti (FA-1), S. criceti (HS-6), and S. macacae (NCTC 11558) have been deposited in GenBank under the following accession numbers: S. ratti: AJTZ01000000; S. criceti: AEUV01000016.1; S. macacae: AEUW01000012.1. Accessions for the S. mutans isolates can be found in supplementary table S4, Supplementary Material online.

SNP Calling

The 1,430 genes constituting the core genome of S. mutans, were realigned at the protein level to ensure that the alignments were in frame. Synonymous and replacement changes (and potential sites) were estimated following an “in house” pipeline coupled to the dNdS routine implemented in the Libsequence suit (Thornton 2003). Because of the deep coverage of our data (>70×), we were confident in the call of rare variants (singletons) and no further sophisticated methods were employed for their identification. Each gene alignment was analyzed with the Polydnds program of the Libsequence suite developed by Thornton to estimate the number of synonymous and nonsynonymous unambiguous changes in the alignment. These data were used in three ways: 1) genome-wide synonymous and nonsynonymous changes were employed to estimate the SFS for the synonymous and nonsynonymous changes respectively, as well as for the PCA to assess structure; 2) the synonymous and nonsynonymous changes per gene, with the inclusion of the outgroup S. ratti were used in selection analyses; 3). the number of synonymous and replacement positions for each gene served as denominators in assessing the polymorphism per site, as well as in estimating the mutation parameter θ (2Neu in haploid organisms) for the demographic and selection analyses.

Estimation of Recombination

Recombination, or more formally gene conversion, was estimated on the core genome alignment of the full data set using Sawyer’s algorithm as implemented in GeneConv (Sawyer 1989), keeping only those tracts that were judged significant after Bonferroni correction for multiple comparisons, in a manner similar to which it was implemented in Leopold et al. (2009). This algorithm estimates the distribution of the length of identical sequences between any two pairs, considering the distribution of variant sites in the multiple sequence alignment. Using a permutation-based test, any identical fragment between pairs of sequences that is unusually long, as expected from the estimated distribution, is considered to be the result of a gene conversion (recombination) event. In addition to these analyses, we constructed three sets, each comprised of 600 genomic regions of 500-bp long. These sets were used to estimate the relative contribution of recombination to mutation as proposed by Didelot and Falush (2007), and implemented in the program ClonalFrame. The conditions of the run included 100,000 generations of burnin, 100,000 generations for each chain to run, and step sizes for the sampling of 100 generations.

Demographic and Selection Analyses

PCA (Patterson et al. 2006) of synonymous SNPs with frequencies greater than 5%, was performed using the R project for Statistical Computing (http://www.r-project.org/, last accessed December 15, 2012). Rare variants were not included, as these do not contribute to distinguishing relatedness among individuals in putative subpopulations. The frequency distribution of variants, or SFS, was calculated for synonymous and replacement changes independently in R. Demographic parameters for different competing models were estimated from the SFS of synonymous changes using a diffusion-based approximation implemented in the program δaδi (Gutenkunst et al. 2009) in a maximum likelihood framework. The selection of the best-fit model was done using the Akaike Information Criteria. Changes in effective population size and time since the change in demographics are estimated in 2Neu and 2Ne scaled parameters, respectively. To convert these values to actual population sizes (expressed in individuals) and time (in years) we assumed a mutation rate estimated experimentally for bacteria between 4.08e−10 and 6.93e−10 subs/site/generation (Drake 1991; Ochman 2003), corresponding to 0.112–0.379 subst/genome/year (given that there are 374,571 synonymous sites along the genome), under the conservative assumption of a generation time of two divisions per day, as estimated for oral flora in vivo (Gibbons 1964). This mutation rate is consistent with experimentally determined mutation rates, estimated via fluctuation tests, in wild type clones of another member of the genus, S. pyogenes, at approximately 5.3 × 10e−10 mutations/generation (Scott et al. 2008). CIs of the parameters were estimated by maximum likelihood fitting of 500 bootstrapped data sets (details in Supplementary Material online). We also performed simulations under the estimated demographics and recombination rates, and performed estimations of the parameters in the simulated data sets using δaδi.

Genome wide selection analyses were performed on the replacement SFS by a similar diffusion-based approximation as the one employed for the demographic analysis. The impact of selection on the distribution of replacement mutations genome-wide was evaluated considering models that treat selection as a point mass effect or as a distribution of selective effects. Of these, three main models (one, two, or three categories of sites under selection with constant selection coefficients) were evaluated by maximum likelihood and over bootstrapped data sets, in the program PrFreq (Boyko et al. 2008). Again, the best model was selected using the Akaike Information Criteria. We also performed a standard MK test (McDonald and Kreitman 1991), and an extension of the MK test approach based on a Bayesian Loglinear model, implemented in SnIPRE (Eilertson et al. 2012), to compare the polymorphism and divergent changes in synonymous and replacement sites in the genes for which an orthologous sequence could be identified in S. ratti (82% of the core genes in S. mutans). The model implemented in SnIPRE is a Bayesian generalization of the log-linear model underlying the MK test that makes use of both genome-wide and gene-specific effects to model the variability among categories of mutations, resulting in increased power to detect the effects of selection on a gene-by-gene basis. It is nonparametric in the sense that it makes no assumptions (and does not require estimation) of parameters such as mutation rate and species divergence time to identify genes under selection.

Acknowledgments

Scott Durkin of JCVI provided some comparative genomic analysis on early drafts of the S. criceti and S. macacae genome sequences. This work was supported by the National Institute of Allergy and Infectious Disease, US National Institutes of Health (grant number AI073368 to M.J.S.).

References

Aas

Paster

Stokes

Olsen

Dewhirst

Defining the normal bacterial flora of the oral cavity

J Clin Microbiol.

2005

, vol.

(pg.

5721

5732

)

Abbe

Carlsson

Takahashi-Abbe

Yamada

Oxygen and the sugar metabolism in oral streptococci

Proc Finn Dent Soc.

1991

, vol.

(pg.

477

487

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Ahn

Browngardt

Burne

Changes in biochemical and phenotypic properties of Streptococcus mutans during growth with aeration

Appl Environ Microbiol.

2009

, vol.

(pg.

2517

2527

)

Ahn

Burne

Effects of oxygen on biofilm formation and the AtlA autolysin of Streptococcus mutans

J Bacteriol.

2007

, vol.

189

(pg.

6293

6302

)

Ajdic

McShan

McLaughlin

, et al. ,

(19 co-authors)

Genome sequence of Streptococcus mutans UA159, a cariogenic dental pathogen

Proc Natl Acad Sci U S A.

2002

, vol.

(pg.

14434

14439

)

Alaluusua

Matto

Gronroos

Innila

Torkko

Asikainen

Jousimies-Somer

Saarela

Oral colonization by more than one clonal type of mutans Streptococcus in children with nursing-bottle dental caries

Arch Oral Biol.

1996

, vol.

(pg.

167

173

)

Arthur

Cury

Graner

Rosalen

Vale

Paes Leme

Cury

Tabchoury

Genotypic and phenotypic analysis of S. mutans isolated from dental biofilms formed in vivo under high cariogenic conditions

Braz Dent J.

2011

, vol.

(pg.

267

274

)

Asanuma

Hino

Regulation of fermentation in a ruminal bacterium, Streptococcus bovis, with special reference to rumen acidosis

Animal Sci J.

2002

, vol.

(pg.

313

325

)

Google Scholar

Crossref

WorldCat

Balakrishnan

Simmonds

Kilian

Tagg

Different bacteriocin activities of Streptococcus mutans reflect distinct phylogenetic lineages

J Med Microbiol.

2002

, vol.

(pg.

941

948

)

Belli

Marquis

Adaptation of Streptococcus mutans and Enterococcus hirae to acid stress in continuous culture

Appl Environ Microbiol.

1991

, vol.

(pg.

1134

1138

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Bender

Marquis

Membrane ATPases and acid tolerance of Actinomyces viscosus and Lactobacillus casei

Appl Environ Microbiol.

1987

, vol.

(pg.

2124

2128

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Bender

Sutton

Marquis

Acid tolerance, proton permeabilities, and membrane ATPases of oral streptococci

Infect Immun.

1986

, vol.

(pg.

331

338

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Borodovsky

Mcininch

Genmark—parallel gene recognition for both DNA strands

Comput Chem.

1993

, vol.

(pg.

123

133

)

Google Scholar

Crossref

WorldCat

Boyko

Williamson

Indap

, et al. ,

(14 co-authors)

Assessing the evolutionary impact of amino acid mutations in the human genome

PLoS Genet.

2008

, vol.

pg.

e1000083

Braverman

Hudson

Kaplan

Langley

Stephan

The hitchhiking effect on the site frequency spectrum of DNA polymorphisms

Genetics

1995

, vol.

140

(pg.

783

796

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Buckee

Jolley

Recker

Penman

Kriz

Gupta

Maiden

Role of selection in the emergence of lineages and the evolution of virulence in Neisseria meningitidis

Proc Natl Acad Sci U S A.

2008

, vol.

105

(pg.

15082

15087

)

Burne

Oral Streptococci: products of their environment

J Dental Res.

1998

, vol.

(pg.

445

452

)

Google Scholar

Crossref

WorldCat

Bustamante

Wakeley

Sawyer

Hartl

Directional selection and the site-frequency spectrum

Genetics

2001

, vol.

159

(pg.

1779

1788

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Caicedo

Williamson

Hernandez

, et al. ,

(12 co-authors)

Genome-wide patterns of nucleotide polymorphism in domesticated rice

PLoS Genet.

2007

, vol.

(pg.

1745

1756

)

Cheon

Moser

Whiddon

Osgood

Momeni

Ruby

Cutter

Allison

Childers

Genetic diversity of plaque mutans streptococci with rep-PCR

J Dent Res.

2011

, vol.

(pg.

331

335

)

Cohen

Armelagos

Paleopathology at the origins of agriculture

1984

Orlando (FL)

Academic Press

Google Scholar

Coventry

Bull-Otterson

Liu

, et al. ,

(29 co-authors)

Deep resequencing reveals excess rare recent variants consistent with explosive population growth

Nat Commun.

2010

, vol.

pg.

131

Croucher

Harris

Fraser

, et al. ,

(23 co-authors)

Rapid pneumococcal evolution in response to clinical interventions

Science

2011

, vol.

331

(pg.

430

434

)

Delcher

Harmon

Kasif

White

Salzberg

Improved microbial gene identification with GLIMMER

Nucleic Acids Res.

1999

, vol.

(pg.

4636

4641

)

Diamond

Evolution, consequences and future of plant and animal domestication

Nature

2002

, vol.

418

(pg.

700

707

)

Didelot

Falush

Inference of bacterial microevolution using multilocus sequence data

Genetics

2007

, vol.

175

(pg.

1251

1266

)

Gilbert

Clark

, et al. ,

(10 co-authors)

Generation of diversity in Streptococcus mutans genes demonstrated by MLST

PLoS One

2010

, vol.

pg.

e9073

Dobson

Carper

Infectious diseases and human population history

Bioscience

1996

, vol.

(pg.

115

126

)

Google Scholar

Crossref

WorldCat

Donati

Hiller

Tettelin

, et al. ,

(18 co-authors)

Structure and dynamics of the pan-genome of Streptococcus pneumoniae and closely related species

Genome Biol.

2010

, vol.

pg.

R107

Drake

A constant rate of spontaneous mutation in DNA-based microbes

Proc Natl Acad Sci U S A.

1991

, vol.

(pg.

7160

7164

)

Dykhuizen

Evolution. Mountaineering with microbes

Nature

1990a

, vol.

346

(pg.

)

Google Scholar

Crossref

WorldCat

Dykhuizen

Experimental studies of natural selection in bacteria

Annu Rev Ecol System.

1990b

, vol.

(pg.

373

398

)

Google Scholar

Crossref

WorldCat

Dykhuizen

Chemostats used for studying natural selection and adaptive evolution

Methods Enzymol.

1993

, vol.

224

(pg.

613

631

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Eilertson

Booth

Bustamante

SnIPRE: Selection inference using a Poisson random effects model

PLoS Comput Biol.

2012

, vol.

pg.

e1002806

Emanuelsson

Bratthall

Genotyping shows different strains of mutans Streptococci between father and child and within parental pairs in Swedish families

Oral Microbiol Immunol.

1998

, vol.

(pg.

271

277

)

Emanuelsson

Thornqvist

Genotypes of mutans Streptococci tend to persist in their host for several years

Caries Res.

2000

, vol.

(pg.

133

139

)

Fiennes

Zoonoses and the origins and ecology of human disease

1978

London

Academic Press

Google Scholar

Formicola

Neolithic transition and dental changes—the case of an Italian site

J Hum Evol.

1987

, vol.

(pg.

231

239

)

Google Scholar

Crossref

WorldCat

Gerrish

Colato

Perelson

Sniegowski

Complete genetic linkage can subvert natural selection

Proc Natl Acad Sci U S A.

2007

, vol.

104

(pg.

6266

6271

)

Gibbons

Bacteriology of dental caries

J Dent Res.

1964

, vol.

43(Suppl)

(pg.

1021

1028

)

Google Scholar

Crossref

WorldCat

Gong

Tian

Sutherland

Sisson

Mai

Ling

Global transcriptional analysis of acid-inducible genes in Streptococcus mutans: multiple two-component systems involved in acid adaptation

Microbiology

2009

, vol.

155

(pg.

3322

3332

)

Griffiths

Tavare

Sampling theory for neutral alleles in a varying environment

Philos Trans R Soc Lond B Biol Sci.

1994

, vol.

344

(pg.

403

410

)

Gutenkunst

Hernandez

Williamson

Bustamante

Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data

PLoS Genet.

2009

, vol.

pg.

e1000695

Guttman

Dykhuizen

Detecting selective sweeps in naturally occurring Escherichia coli

Genetics

1994

, vol.

138

(pg.

993

1003

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Hamada

Slade

Biology, immunology, and cariogenicity of Streptococcus mutans

Microbiol Rev.

1980

, vol.

(pg.

331

384

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Yarbrough

Sim

Niu

Merritt

Shi

Acid diffusion through extracellular polysaccharides produced by various mutants of Streptococcus mutans

Mol Microbiol.

2008

, vol.

(pg.

112

126

)

Hoen

Chirouze

Cabell

, et al. ,

(13 co-authors)

Emergence of endocarditis due to group D streptococci: findings derived from the merged database of the International Collaboration on Endocarditis

Eur J Clin Microbiol Infect Dis.

2005

, vol.

(pg.

)

Hudson

Generating samples under a Wright-Fisher neutral model of genetic variation

Bioinformatics

2002

, vol.

(pg.

337

338

)

Hughes

Evidence for abundant slightly deleterious polymorphisms in bacterial populations

Genetics

2005

, vol.

169

(pg.

533

538

)

Hung

Tsai

Hsueh

Chia

Teng

Species identification of mutans streptococci by groESL gene sequence

J Med Microbiol.

2005

, vol.

(pg.

857

862

)

Jacobson

Lodge

Poy

Carbohydrate uptake in the oral pathogen Streptococcus mutans: mechanisms and regulation by protein phosphorylation

Biochimie

1989

, vol.

(pg.

997

1004

)

Kim

Ahn

Seaton

Garrett

Burne

Transcriptional organization and physiological contributions of the relQ operon of Streptococcus mutans

J Bacteriol.

2012

, vol.

194

(pg.

1968

1978

)

Klees

Linder

Buckel

2-Hydroxyglutaryl-CoA dehydratase from Fusobacterium nucleatum (subsp. nucleatum): an iron-sulfur flavoprotein

Arch Microbiol.

1992

, vol.

158

(pg.

294

301

)

Klein

Florio

Pereira

Hofling

Goncalves

Longitudinal study of transmission, diversity, and stability of Streptococcus mutans and Streptococcus sobrinus genotypes in Brazilian nursery children

J Clin Microbiol.

2004

, vol.

(pg.

4620

4626

)

Kreth

Zhang

Herzberg

Streptococcal antagonism in oral biofilms: Streptococcus sanguinis and Streptococcus gordonii interference with Streptococcus mutans

J Bacteriol.

2008

, vol.

190

(pg.

4632

4640

)

Lan

Reeves

Intraspecies variation in bacterial genomes: the need for a species genome concept

Trends Microbiol.

2000

, vol.

(pg.

396

401

)

Lan

Reeves

When does a clone deserve a name? A perspective on bacterial species based on population genetics

Trends Microbiol.

2001

, vol.

(pg.

419

424

)

Lapierre

Gogarten

Estimating the size of the bacterial pan-genome

Trends Genet.

2009

, vol.

(pg.

107

110

)

Lefebure

Bitar

Suzuki

Stanhope

Evolutionary dynamics of complete Campylobacter pan-genomes and the bacterial species concept

Genome Biol Evol.

2010

, vol.

(pg.

646

655

)

Leopold

Magrini

Holt

, et al. ,

(15 co-authors)

A precise reconstruction of the emergence and constrained radiations of Escherichia coli O157 portrayed by backbone concatenomic analysis

Proc Natl Acad Sci U S A.

2009

, vol.

106

(pg.

8713

8718

)

Ruan

Durbin

Mapping short DNA sequencing reads and calling variants using mapping quality scores

Genome Res.

2008

, vol.

(pg.

1851

1858

)

Stoeckert

Roos

OrthoMCL: identification of ortholog groups for eukaryotic genomes

Genome Res.

2003

, vol.

(pg.

2178

2189

)

Loesche

Role of Streptococcus mutans in human dental decay

Microbiol Rev.

1986

, vol.

(pg.

353

380

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Lukacs

Dental paleopathology and agricultural intensification in south Asia: new evidence from Bronze Age Harappa

Am J Phys Anthropol.

1992

, vol.

(pg.

133

150

)

Lukashin

Borodovsky

GeneMark.hmm: new solutions for gene finding

Nucleic Acids Res.

1998

, vol.

(pg.

1107

1115

)

Mansai

Innan

The power of the methods for detecting interlocus gene conversion

Genetics

2010

, vol.

184

(pg.

517

527

)

Marquis

Diminished acid tolerance of plaque bacteria caused by fluoride

J Dent Res.

1990

, vol.

(Spec no. 672–675); discussion 682–683

Google Scholar

OpenURL Placeholder Text

WorldCat

Marquis

Oxygen metabolism, oxidative stress and acid-base physiology of dental plaque biofilms

J Ind Microbiol.

1995

, vol.

(pg.

198

207

)

Martin

Quentin

Fichant

Claverys

Independent evolution of competence regulatory cascades in streptococci?

Trends Microbiol.

2006

, vol.

(pg.

339

345

)

Maruyama

Kobata

Kurokawa

, et al. ,

(13 co-authors)

Comparative genomic analyses of Streptococcus mutans provide insights into chromosomal shuffling and species-specific content

BMC Genomics

2009

, vol.

pg.

358

McDonald

Kreitman

Adaptive protein evolution at the Adh locus in Drosophila

Nature

1991

, vol.

351

(pg.

652

654

)

Merritt

Zheng

Shi

Genetic characterization of the hdrRM operon: a novel high-cell-density-responsive regulator in Streptococcus mutans

Microbiology

2007

, vol.

153

(pg.

2765

2773

)

Muzzi

Donati

Population genetics and evolution of the pan-genome of Streptococcus pneumoniae

Int J Med Microbiol.

2011

, vol.

301

(pg.

619

622

)

Novembre

Stephens

Interpreting principal component analyses of spatial population genetic variation

Nat Genet.

2008

, vol.

(pg.

646

649

)

Ochman

Neutral mutations and neutral substitutions in bacterial genomes

Mol Biol Evol.

2003

, vol.

(pg.

2091

2096

)

Okinaga

Niu

Xie

Merritt

The hdrRM operon of Streptococcus mutans encodes a novel regulatory system for coordinated competence development and bacteriocin production

J Bacteriol.

2010

, vol.

192

(pg.

1844

1852

)

Patterson

Price

Reich

Population structure and eigenanalysis

PLoS Genet.

2006

, vol.

pg.

e190

Phattarataratip

Olson

Broffitt

Qian

Brogden

Drake

Levy

Banas

Streptococcus mutans strains recovered from caries-active or caries-free individuals differ in sensitivity to host antimicrobial peptides

Mol Oral Microbiol.

2011

, vol.

(pg.

187

199

)

Sawyer

Statistical tests for detecting gene conversion

Mol Biol Evol.

1989

, vol.

(pg.

526

538

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Scott

Thompson-Mayberry

Lahmamsi

King

McShan

Phage-associated mutator phenotype in group A Streptococcus

J Bacteriol.

2008

, vol.

190

(pg.

6290

6301

)

Seaton

Ahn

Sagstetter

Burne

A transcriptional regulator and ABC transporters link stress tolerance, (p)ppGpp, and genetic competence in Streptococcus mutans

J Bacteriol.

2011

, vol.

193

(pg.

862

874

)

Shaver

Dombrowski

Sweeney

Treis

Zappala

Sniegowski

Fitness evolution and the rise of mutator alleles in experimental Escherichia coli populations

Genetics

2002

, vol.

162

(pg.

557

566

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Slatkin

Hudson

Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations

Genetics

1991

, vol.

129

(pg.

555

562

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Smith

Dowson

Spratt

Localized sex in bacteria

Nature

1991

, vol.

349

(pg.

)

Sniegowski

Gerrish

Lenski

Evolution of high mutation rates in experimental populations of E

coli. Nature

1997

, vol.

387

(pg.

703

705

)

Google Scholar

Crossref

WorldCat

Suzuki

Lefebure

Hubisz

Pavinski Bitar

Lang

Siepel

Stanhope

Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution

Genome Biol Evol.

2011

, vol.

(pg.

168

185

)

Tapp

Thollesson

Herrmann

Phylogenetic relationships and genotyping of the genus Streptococcus by sequence determination of the RNase P RNA gene, rnpB

Int J Syst Evol Microbiol.

2003

, vol.

(pg.

1861

1871

)

Tettelin

Masignani

Cieslewicz

, et al. ,

(46 co-authors)

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial “pan-genome.”

Proc Natl Acad Sci U S A.

2005

, vol.

102

(pg.

13950

13955

)

Tettelin

Riley

Cattuto

Medini

Comparative genomics: the bacterial pan-genome

Curr Opin Microbiol.

2008

, vol.

(pg.

472

477

)

Thornton

Libsequence: a C++ class library for evolutionary genetic analysis

Bioinformatics

2003

, vol.

(pg.

2325

2327

)

van Houte

Role of micro-organisms in caries etiology

J Dent Res.

1994

, vol.

(pg.

672

681

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Zerbino

Birney

Velvet: algorithms for de novo short read assembly using de Bruijn graphs

Genome Res.

2008

, vol.

(pg.

821

829

)

Zhang

Foxman

Drake

Srinivasan

Henderson

Olson

Marrs

Warren

Marazita

Comparative whole-genome analysis of Streptococcus mutans isolates within and among individuals of different caries status

Oral Microbiol Immunol.

2009

, vol.

(pg.

197

203

)

Zhu

Bustamante

A composite-likelihood approach for detecting directional selection from DNA sequence data

Genetics

2005

, vol.

170

(pg.

1411

1421

)

Author notes

^†Present address: Université de Lyon; CNRS, UMR5023 Ecologie des Hydrosystèmes Naturels et Anthropisés; Université Lyon 1, Villeurbanne, F-69622, France

^‡Present address: Department of Plant Pathology & Plant-Microbe Biology, Cornell University, Ithaca, NY

Associate editor: Ryan Hernandez

Download all slides

Month:	Total Views:
January 2017	17
February 2017	38
March 2017	39
April 2017	27
May 2017	52
June 2017	39
July 2017	27
August 2017	19
September 2017	53
October 2017	26
November 2017	63
December 2017	255
January 2018	182
February 2018	220
March 2018	245
April 2018	170
May 2018	164
June 2018	159
July 2018	154
August 2018	155
September 2018	129
October 2018	145
November 2018	176
December 2018	151
January 2019	162
February 2019	172
March 2019	228
April 2019	183
May 2019	167
June 2019	140
July 2019	157
August 2019	141
September 2019	93
October 2019	177
November 2019	127
December 2019	138
January 2020	147
February 2020	91
March 2020	52
April 2020	95
May 2020	48
June 2020	42
July 2020	57
August 2020	54
September 2020	89
October 2020	82
November 2020	111
December 2020	46
January 2021	43
February 2021	109
March 2021	120
April 2021	80
May 2021	78
June 2021	84
July 2021	64
August 2021	67
September 2021	63
October 2021	74
November 2021	63
December 2021	74
January 2022	55
February 2022	51
March 2022	80
April 2022	69
May 2022	72
June 2022	57
July 2022	70
August 2022	55
September 2022	93
October 2022	127
November 2022	96
December 2022	143
January 2023	62
February 2023	69
March 2023	77
April 2023	107
May 2023	99
June 2023	60
July 2023	35
August 2023	60
September 2023	70
October 2023	74
November 2023	96
December 2023	85
January 2024	63
February 2024	111
March 2024	64
April 2024	98
May 2024	210
June 2024	277
July 2024	44

Article Contents

Abstract

Introduction

Results

SNPs and the Core Genome

Demographic History

Adaptation to the Postagriculture Habitat

Discussion

Demographic History

Adaptation in the Postagricultural Era

Conclusion

Materials and Methods

DNA Sequencing and Ortholog Recovery

SNP Calling

Estimation of Recombination

Demographic and Selection Analyses

Acknowledgments

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Evolutionary and Population Genomics of the Cavity Causing Bacteria Streptococcus mutans

Abstract

Introduction

Results

SNPs and the Core Genome

Demographic History

Adaptation to the Postagriculture Habitat

Discussion

Demographic History

Adaptation in the Postagricultural Era

Conclusion

Materials and Methods

DNA Sequencing and Ortholog Recovery

SNP Calling

Estimation of Recombination

Demographic and Selection Analyses

Acknowledgments

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only