A Review of Protein Function Prediction Under Machine Learning Perspective
Juliana S. Bernardes and Carlos E. Pedreira
Pages 122-141 (20)
Protein function prediction is one of the most challenging problems in the post-genomic era. The number of
newly identified proteins has been exponentially increasing with the advances of the high-throughput techniques.
However, the functional characterization of these new proteins was not incremented in the same proportion. To fill this
gap, a large number of computational methods have been proposed in the literature. Early approaches have explored
homology relationships to associate known functions to the newly discovered proteins. Nevertheless, these approaches
tend to fail when a new protein is considerably different (divergent) from previously known ones. Accordingly, more
accurate approaches, that use expressive data representation and explore sophisticate computational techniques are
required. Regarding these points, this review provides a comprehensible description of machine learning approaches that
are currently applied to protein function prediction problems. We start by defining several problems enrolled in
understanding protein function aspects, and describing how machine learning can be applied to these problems. We aim to
expose, in a systematical framework, the role of these techniques in protein function inference, sometimes difficult to
follow up due to the rapid evolvement of the field. With this purpose in mind, we highlight the most representative
contributions, the recent advancements, and provide an insightful categorization and classification of machine learning
methods in functional proteomics.
protein, gene ontology, machine learning, classification, pattern recognition, high-throughput techniques.
COPPE-UFRJ, Av. Horácio Macedo, 2030, Prédio do CT , Bloco H, 3o andar CEP 21941-914, Rio de Janeiro, Brazil.