The sequence similarity relationships between the members of a protein family contain information on its
evolutionary history, such as the relative time of horizontal transfer events, and the differential acceleration or
deceleration of evolution in particular organisms in response to selective pressures. This paper presents a quantitative
representation and comparison of evolutionary profiles of proteins, and finds correlations between evolutionary profile
similarities and evolutionary or functional links between proteins. Using a dataset of 84 orthologous protein families
ubiquitous in prokaryotes, we obtain the evolutionary profile of each family as a vector of inter-sequence distances. We
then compare the family-specific evolutionary vectors and quantitate the evolutionary similarity between families. Two
primary methods for vector comparison were used, namely the angle between vectors and correlation distance between
vectors. Both approaches are powerful enough to recognize known evolutionary similarities, and yield similar inter-family
relationships, but they also display important differences. These differences are shown to exist because the two methods
recognize different aspects of the evolutionary profile. The inter-vector angle is an effective measure of the difference in
the overall form of phylogenetic trees even in cases where the topology of the tree is not well-defined, whereas the
correlation distance is especially effective in recognizing similarities in topology. When the protein families are clustered
based on either the angle or the correlation distance between them, the cluster dendrogram shows a core cluster consisting
of ancient protein families with the standard phylogeny. In addition, evolutionary profile comparison also detects
plausible evolutionary similarities between unannotated proteins and proteins of known function. For instance, the
bacterial yjeF gene and the ygjD/ydiE gene are both predicted to be involved in cell envelope biogenesis. In summary, we
describe quantitative comparisons of protein family specific evolutionary profiles, and illustrate their power in detecting
broader evolutionary trends and specific functional relationships between proteins.
Comparative bioinformatics, functional motifs, phylogenetic methods, prokaryotic evolution, sequence profiles.
Departments of Bioengineering, Cellular and Molecular Medicine and Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA.