Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Sep 27;102(39):13950-5.
doi: 10.1073/pnas.0506758102. Epub 2005 Sep 19.

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome"

Affiliations

Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome"

Hervé Tettelin et al. Proc Natl Acad Sci U S A. .

Erratum in

  • Proc Natl Acad Sci U S A. 2005 Nov 8;102(45):16530

Abstract

The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and also limits genome-wide screens for vaccine candidates or for antimicrobial targets. We have generated the genomic sequence of six strains representing the five major disease-causing serotypes of Streptococcus agalactiae, the main cause of neonatal infection in humans. Analysis of these genomes and those available in databases showed that the S. agalactiae species can be described by a pan-genome consisting of a core genome shared by all isolates, accounting for approximately 80% of any single genome, plus a dispensable genome consisting of partially shared and strain-specific genes. Mathematical extrapolation of the data suggests that the gene reservoir available for inclusion in the S. agalactiae pan-genome is vast and that unique genes will continue to be identified even after sequencing hundreds of genomes.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Whole genome alignment of GBS strains. The eight genomes are compared to each other by using COG (41) and nucmer analyses (see Materials and Methods). Each genome (shaded strain name) is colored with a gradient that ranges from yellow (nucleotide 1) to blue (end). Differences in color between a reference sequence (the last colored line in each genome) and the other genomes indicate conserved protein-coding regions that have been rearranged. Uncolored segments denote coding regions in which no conserved genes were detected. nucmer matches for contigs that do not contain protein-coding genes are displayed by red blocks (matches within the reference strain are displayed on the line directly above it). Genomic islands of diversity are boxed and numbered “x.y,” where x is the panel or strain number where the island first appeared and y is the island location in that genome from left to right. A + indicates an island that was not identified in a previous genome. Islands that overlap by at least 50% (based on the number of shared genes) with previously identified islands receive the same number as the initial island. The gene content of the 69 islands identified is listed in Table 2, which is published as supporting information on the PNAS web site. Strain-specific regions, free of COG or nucmer matches, are displayed in black at the bottom of each panel. Portions of these regions that harbor protein-coding genes are indicated in gray below the black blocks. The curves on top of each panel represent the nucleotide composition (χ2 analysis) (see Materials and Methods) of the reference strain of the panel, and peaks indicate regions of atypical composition.
Fig. 2.
Fig. 2.
GBS core genome. The number of shared genes is plotted as a function of the number n of strains sequentially added (see Materials and Methods). For each n, circles are the 8!/[(n – 1)!·(8 – n)!] values obtained for the different strain combinations. Squares are the averages of such values. The continuous curve represents the least-squares fit of the function Fc = κc exp[–nc] + Ω (see Eq. 1 in Supporting Text) to data. The best fit was obtained with correlation r2 = 0.990 for κc = 610 ± 38, τc = 2.16 ± 0.28, and Ω= 1,806 ± 16. The extrapolated GBS core genome size Ω is shown as a dashed line.
Fig. 3.
Fig. 3.
GBS pan-genome. The number of specific genes is plotted as a function of the number n of strains sequentially added (see Materials and Methods). For each n, circles are the 8!/[(n – 1)!·(8 – n)!] values obtained for the different strain combinations; squares are the averages of such values. The blue curve is the least-squares fit of the function Fs(n) = κs exp[–ns] + tg(θ) (see Eq. 2 in Supporting Text) to the data. The best fit was obtained with correlation r2 = 0.995 for κs = 476 ± 62, τs = 1.51 ± 0.15, and tg(θ) = 33 ± 3.5. The extrapolated average number tg(θ) of strain-specific genes is shown as a dashed line. (Inset) Size of the GBS pan-genome as a function of n. The red curve is the calculated pan-genome size formula image (see Eq. 4 in Supporting Text), with values of the parameters obtained from the fit of Fs(n) (see Eq. 2 in Supporting Text).
Fig. 4.
Fig. 4.
Dendrogram of the eight GBS genomes. Shared gene information was used to cluster proteins into groups by using the single-linkage method of the program cluster (http://rana.lbl.gov). Groups were then converted into profiles of presence or absence of each gene (0 or 1) in the eight GBS strains and used as input to paup* 4.0b10 (Sinauer, Sunderland, MA) for dendrogram drawing and bootstrapping. Numbers at the nodes indicate bootstrap values. Serotypes and MLST types of each strain are within parentheses.

Similar articles

Cited by

References

    1. Wayne, L., Brenner, D., Colwell, R., Grimont, P., Kandler, O., Krichevsky, L., Moore, L., Moore, W., Murray, R., Stackebrandt, E., et al. (1987) Int. J. Syst. Bacteriol. 37, 463–464.
    1. Schuchat, A. & Wenger, J. D. (1994) Epidemiol. Rev. 16, 374–402. - PubMed
    1. Tyrrell, G. J., Senzilet, L. D., Spika, J. S., Kertesz, D. A., Alagaratnam, M., Lovgren, M. & Talbot, J. A. (2000) J. Infect. Dis. 182, 168–173. - PubMed
    1. Harrison, L. H., Elliott, J. A., Dwyer, D. M., Libonati, J. P., Ferrieri, P., Billmann, L. & Schuchat, A. (1998) J. Infect. Dis. 177, 998–1002. - PubMed
    1. Lin, F. Y., Clemens, J. D., Azimi, P. H., Regan, J. A., Weisman, L. E., Philips, J. B., III, Rhoads, G. G., Clark, P., Brenner, R. A. & Ferrieri, P. (1998) J. Infect. Dis. 177, 790–792. - PubMed

Publication types