Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2003 Sep;13(9):2178-89.
doi: 10.1101/gr.1224503.

OrthoMCL: identification of ortholog groups for eukaryotic genomes

Affiliations
Comparative Study

OrthoMCL: identification of ortholog groups for eukaryotic genomes

Li Li et al. Genome Res. 2003 Sep.

Abstract

The identification of orthologous groups is useful for genome annotation, studies on gene/protein evolution, comparative genomics, and the identification of taxonomically restricted sequences. Methods successfully exploited for prokaryotic genome analysis have proved difficult to apply to eukaryotes, however, as larger genomes may contain multiple paralogous genes, and sequence information is often incomplete. OrthoMCL provides a scalable method for constructing orthologous groups across multiple eukaryotic taxa, using a Markov Cluster algorithm to group (putative) orthologs and paralogs. This method performs similarly to the INPARANOID algorithm when applied to two genomes, but can be extended to cluster orthologs from multiple species. OrthoMCL clusters are coherent with groups identified by EGO, but improved recognition of "recent" paralogs permits overlapping EGO groups representing the same gene to be merged. Comparison with previously assigned EC annotations suggests a high degree of reliability, implying utility for automated eukaryotic genome annotation. OrthoMCL has been applied to the proteome data set from seven publicly available genomes (human, fly, worm, yeast, Arabidopsis, the malaria parasite Plasmodium falciparum, and Escherichia coli). A Web interface allows queries based on individual genes or user-defined phylogenetic patterns (http://www.cbil.upenn.edu/gene-family). Analysis of clusters incorporating P. falciparum genes identifies numerous enzymes that were incompletely annotated in first-pass annotation of the parasite genome.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of the OrthoMCL algorithmfor clustering orthologous proteins.
Figure 2
Figure 2
Illustration of sequence relationships and similarity matrix construction. Dotted arrows represent “recent” paralogy (duplication subsequent to speciation); solid arrows represent orthology. The upper right half of the matrix contains initial weights calculated as average –log10 (P-value) frompairwise WU-BLASTP similarities. The lower left half contains corrected weights supplied to the MCL algorithm; the edge weight connecting each pair of sequences wij is divided by Wij/W, where W represents the average weight among all ortholog (underlined) and “recent” paralog (italicized) pairs, and Wij represents the average edge weight among all ortholog pairs from species i and j. The net result of this normalization is to correct for systematic differences in comparisons between two species (e.g., differences attributable to nucleotide composition bias), and when i = j, to minimize the impact of “recent” paralogs (duplication within a given species) on the clustering of cross-species orthologs.
Figure 3
Figure 3
Example of a group from the EGO subset that is extended by OrthoMCL. Five synaptobrevin genes were clustered together by OrthoMCL (GroupID #379767), including yeast SNC1 and SNC2, fly Syb and n-syb, and worm snb-1. Thick solid arrows represent orthology identified by reciprocal best matches, dotted arrows represent “recent” paralogs, and thin solid arrows represent one-way best matches indicating the direction from query to subject (based on BLASTP comparisons). Only snb-1, n-syb, and Syb (dark gray) were identified by the EGO subset (groups TOG257010, TOG272289, TOG273790), and these genes were only grouped because their gene index sequences (TC72314, TC140251, TC134828) formed `triangles' of reciprocal best matches based on BLASTN comparisons with other species not shown in this analysis.
Figure 4
Figure 4
Screenshots of the Web interface. A keyword search (top left) identifies 11 ortholog groups containing sequences with the word “tubulin” in sequence name or description (top right). Clicking the group ID pulls up a page describing sequences in the group (bottom left), a graphical display of relationships among these sequences (bottom right), and a CLUSTALW multiple sequence alignment (bottom center).

Similar articles

Cited by

References

    1. Abascal, F. and Valencia, A. 2002. Clustering of proximal sequence space for the identification of protein families. Bioinformatics 18: 908–921. - PubMed
    1. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25: 25–29. - PMC - PubMed
    1. Bahl, A., Brunk, B., Crabtree, J., Fraunholz, M.J., Gajria, B., Grant, G.R., Ginsburg, H., Gupta, D., Kissinger, J.C., Labo, P., et al. 2003. PlasmoDB: The Plasmodium Genome Resource. A database integrating experimental and computational data. Nucleic Acids Res. 31: 212–215. - PMC - PubMed
    1. Carlton, J.M., Angiuoli, S.V., Suh, B.B., Kooij, T.W., Pertea, M., Silva, J.C., Ermolaeva, M.D., Allen, J.E., Selengut, J.D., Koo, H.L., et al. 2002. Genome sequence and comparative analysis of the model rodent malaria parasite Plasmodium yoelii yoelii. Nature 419: 512–519. - PubMed
    1. Chervitz, S.A., Aravind, L., Sherlock, G., Ball, C.A., Koonin, E.V., Dwight, S.S., Harris, M.A., Dolinski, K., Mohr, S., Smith, T., et al. 1998. Comparison of the complete protein sets of worm and yeast: Orthology and divergence. Science 282: 2022–2028. - PMC - PubMed

WEB SITE REFERENCES

    1. http://www.cbil.upenn.edu/gene-family; Putative ortholog groups generated by OrthoMCL, University of Pennsylvania.
    1. http://www.ncbi.nlm.nih.gov/COG/; The Clusters of Orthologous Groups (COG) database, NCBI.
    1. http://www.allgenes.org; The human and mouse gene index, University of Pennsylvania.
    1. http://www.tigr.org/tdb/tgi/; TIGR Gene Indices.
    1. http://www.tigr.org/tdb/tgi/ego/index.shtml; Eukaryotic Gene Orthologs (EGO), TIGR.

Publication types