Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 May;40(5):646-9.
doi: 10.1038/ng.139. Epub 2008 Apr 20.

Interpreting principal component analyses of spatial population genetic variation

Affiliations

Interpreting principal component analyses of spatial population genetic variation

John Novembre et al. Nat Genet. 2008 May.

Abstract

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of PC-maps of [3] with theoretical and empirical predictions. The first column shows the theoretical expected PC-maps for a class of models in which genetic similarity decays with geographic distance (see text for details). The second column shows PC-maps for population genetic data simulated with no range expansions, but constant homogeneous migration rate in a 2-dimensional habitat. The columns marked Asia, Europe, and Africa are redrawn from the originals of [3]. Each map is marked by which PC it represents. The order of maps in each of the last three columns was chosen to correspond with the shapes in the first two columns.
Figure 2
Figure 2
Results of PCA applied to data from a one-dimensional habitat. (A) Schematic of the one-dimensional habitat, with circles marking sampling locations and shades of blue marking order along the line. (B) One-dimensional PC-maps (i.e. plots of each PC element against the geographic position of the corresponding sample location). (C) Biplots of PC1 vs. PC2, PC2 vs. PC3, and PC3 vs. PC4. Colors correspond to those in Panel A. In many datasets without spatially referenced samples, the colors and the lines connecting neighboring points would not be observed; here they are shown to aid interpretation.

Comment in

Similar articles

Cited by

References

    1. Menozzi P, Piazza A, Cavalli-Sforza L. Science. 1978;201:786–792. - PubMed
    1. Cavalli-Sforza L, Menozzi P, Piazza A. Science. 1993;259(5095):639–646. - PubMed
    1. Cavalli-Sforza LL, Menozzi P, Piazza A. The History and Geography of Human Genes. Princeton University Press; 1994.
    1. Jobling M, Hurles M, Tyler-Smith C. Human evolutionary genetics. Garland Science; 2004.
    1. Rendine S, Piazza A, Cavalli-Sforza LL. The American Naturalist. 1986;128:681–706.

Publication types

LinkOut - more resources