Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2011:3:168-85.
doi: 10.1093/gbe/evr006. Epub 2011 Jan 31.

Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution

Affiliations
Comparative Study

Comparative genomic analysis of the Streptococcus dysgalactiae species group: gene content, molecular adaptation, and promoter evolution

Haruo Suzuki et al. Genome Biol Evol. 2011.

Abstract

Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen. Here, we present complete genome sequences for both taxa, with analyses involving other species of Streptococcus but focusing on adaptation in the SD species group. We found little evidence for enrichment in biochemical categories of genes carried by each SD strain, however, differences in the virulence gene repertoire were apparent. Some of the differences could be ascribed to prophage and integrative conjugative elements. We identified approximately 9% of the nonrecombinant core genome to be under positive selection, some of which involved known virulence factors in other bacteria. Analyses of proteomes by pooling data across genes, by biochemical category, clade, or branch, provided evidence for increased rates of evolution in several gene categories, as well as external branches of the tree. Promoters were primarily evolving under purifying selection but with certain categories of genes evolving faster. Many of these fast-evolving categories were the same as those associated with rapid evolution in proteins. Overall, these results suggest that adaptation to changing environments and new hosts in the SD species group has involved the acquisition of key virulence genes along with selection of orthologous protein-coding loci and operon promoters.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.
FIG. 1.
A dendrogram constructed by hierarchical clustering (UPGMA) based on dissimilarities in gene content (binary data for presence or absence of protein families) among the 45 Streptococcus strains. The dissimilarities were measured using Jaccard distance (one minus the Jaccard coefficient), ranging from 0 to 1, represented by the horizontal bar at the base of the figure. Species that comprise the primary focus of this paper appear in color. Species abbreviations are as follows: Ssan, Streptococcus sanguinis; Sgor, Streptococcus gordonii; Smut, Streptococcus mutans; Sthe, Streptococcus thermophilus; Spne, Streptococcus pneumoniae; Ssui, Streptococcus suis; Saga, Streptococcus agalactiae; Sube, Streptococcus uberis; Spyo, Streptococcus pyogenes; Sequ_MGCS10565, Streptococcus equi subsp. zooepidemicus; Sequ_H70, Streptococcus equi subsp. zooepidemicus; Sequ_4047, Streptococcus equi subsp. equi.
F<sc>IG</sc>. 2.
FIG. 2.
Heatmap showing % identity of Blast best hit of the 45 Streptococcus proteomes, against the 88 Streptococcus virulence genes from VFDB.
F<sc>IG</sc>. 3.
FIG. 3.
Heatmap showing % identity of Blast best hit of the 45 Streptococcus proteomes, against the 47 kb pathogenicity island in Streptococcus pyogenes SF370; arrows refer to gene orientation.
F<sc>IG</sc>. 4.
FIG. 4.
Pairwise comparisons of prophages from SDD (φSdd_1 and φSdd_2) and SDE2 (φSde2_1 and φSde2_2) and Streptococcus pyogenes MGAS315 (φ315.3 and φ315.5). The colored bars separating sequences (red and green) represent similarity matches identified by Blast analysis. Red lines link matches in the same orientation; green lines link matches in the reverse orientation.
F<sc>IG</sc>. 5.
FIG. 5.
Heatmap showing % identity of Blast best hit of the 45 Streptococcus proteomes, against the ICEs from SDE NS3396 (ICESde3396); arrows refer to gene orientation.
F<sc>IG</sc>. 6.
FIG. 6.
(A) Streptococcus Genome Browser (http://strep-genome.bscb.cornell.edu/cgi-bin/hgGateway). The reference genome for the browser is SDE ATCC 12394 (denoted SDE1 in this article). The two other dysgalactiae strains, SDE GGS 124 (SDE2) and SDD are shown via alignments with SDE1, as are two Streptococcus pyogenes strains (MGAS315 and MGAS10750) and an S. equi equi strain (4047). Selected tracks from the browser are shown, including a measure of G + C skew in 100-bp windows [computed as (C − G)/(C + G)], the gene annotations for SDE1, predicted operons from PathoLogic, genes predicted to be under PS using PAML, conservation scores produced by the phyloP (Pollard et al. 2010) and phastCons (Siepel et al. 2005) programs, predicted conserved elements from phastCons, and genome-wide multiple alignments produced with the multiz program (Blanchette et al. 2004). Notice the pronounced correlation between the direction of replication and both the G + C skew and the direction in which genes are transcribed (the origin of replication is at position 0; genes in red are transcribed on the positive strand and genes in blue on the negative strand), as has been observed with other Streptococcus genomes (Ferretti et al. 2001). Other tracks not shown here include predicted transcription factor binding sites and RNA genes, and alignment chains and nets revealing regions of conserved synteny (Kent et al. 2003). (B) Illustration of lineage-specific evolutionary patterns evident from aligned SD genomes. Shown is a browser display of a cluster of CRISPR-DR22 noncoding RNAs, which were annotated using the Rfam database and INFERNAL software (Gardner et al. 2009). Notice the relatively high levels of conservation inside the CRISPR elements (green peaks in phastCons track), contrasting with high levels of divergence in the spacer regions between them (red downward spikes in phyloP track), which is typical for CRISPR elements (Marraffini and Sontheimer 2010). This array of noncoding RNAs appears to be present in both sequenced SDE genomes but does not align with the other genomes because of extensive rearrangements or gains and losses of elements or because high levels of sequence divergence prohibit an alignment from being obtained.
F<sc>IG</sc>. 7.
FIG. 7.
Rates of protein evolution. Estimates of ω (dN/dS) across all branches of the six-species phylogeny for all 673 genes (highlighted in purple) and for genes assigned to each of several Gene Ontology categories. Shown are the estimates for all genes (middle), the ten fastest-evolving categories (top), and the ten slowest-evolving categories (bottom). The numbers at right indicate ratios with respect to the estimate for all genes. For all categories shown, the differences from the average are highly statistically significant (p0), according to a LRT (see Materials and Methods).
F<sc>IG</sc>. 8.
FIG. 8.
Rates of clade-specific protein evolution. Estimates of ω (dN/dS) for three clades of interest (foreground; see supplementary fig. S2-B, Supplementary Material online) versus estimates for the remaining branches of the tree (background). Shown are estimates for all genes and for the ten GO categories showing the greatest increase in ω per clade. The categories are ranked by the ratio of foreground:background ω estimates (see labels at right). All these differences are highly statistically significant (p0) by a LRT (see Materials and Methods).
F<sc>IG</sc>. 9.
FIG. 9.
Rates of promoter evolution. Estimated rates of evolution for promoter sequences as a fraction of the neutral rate (r). The neutral rate is estimated from 4-fold degenerate (4D) sites in coding regions, and the parameter r is estimated as a scaling factor by maximum likelihood (see Materials and Methods). Estimates are shown for all genes (middle), the ten fastest-evolving GO categories (top), and the ten slowest-evolving GO categories (bottom). Promoters were defined as upstream sequences of predicted transcriptional units and were assigned the GO categories of all constituent genes. The numbers at right indicate ratios with respect to the rate for all genes. Notice that, on average, promoter regions evolve at about 37% the rate of 4D sites, suggesting that they have generally experienced fairly strong purifying selection. However, there is considerable variation across GO categories, with category-specific rates ranging from r = 0.23 (0.63 times the average) to r = 0.57 (1.53 times the average). The rates for the ten fastest and ten slowest categories are all significantly different from the rates at other promoters by a likelihood ratio test (P < 10−11).
F<sc>IG</sc>. 10.
FIG. 10.
Rates of clade-specific promoter evolution. Estimated rates of promoter evolution for three clades of interest (foreground; see supplementary fig. S2-B, Supplementary Material online) versus estimates for the remaining branches of the tree (background). All estimates are obtained by maximum likelihood and are relative to a neutral rate estimated from 4D sites. The categories are ranked by the ratio of foreground:background rates (see labels at right). The ten categories with the highest ratios for each clade are shown. Notice that rates are somewhat elevated in SDD/SDE and SDD across all genes, but they are much more elevated for certain categories than for others. The foreground rate is significantly different from the background rate in all cases by a LRT, except for “FMN binding” in the SDE clade (P = 0.06).

Similar articles

Cited by

References

    1. Abdelsalam M, Chen SC, Yoshida T. Dissemination of streptococcal pyrogenic exotoxin G (spegg) with an IS-like element in fish isolates of Streptococcus dysgalactiae. FEMS Microbiol Lett. 2010;309:105–113. - PubMed
    1. Altschul SF, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–1152. - PubMed
    1. Arakawa K, et al. G-language Genome Analysis Environment: a workbench for nucleotide sequence data mining. Bioinformatics. 2003;19:305–306. - PubMed
    1. Arakawa K, Tomita M. G-language System as a platform for large-scale analysis of high-throughput omics data. J Pesticide Sci. 2006;31:282–288.

Publication types

Substances