Students studying evolutionary biology at Harvey Mudd College are learning about phylogenetic tree reconciliation, a computational method used to reconstruct the evolutionary histories of related pairs of organisms, such as hosts and parasites, or species and gene families. Work in this area earned Jordan Haack ’19 a Best Student Paper award at an international conference.
Haack, who enjoys using algorithms to solve computational biology problems, received the award at the 2018 Asia-Pacific Bioinformatics Conference in Yokohama, Japan, for the paper “Computing the Diameter of the Space of Maximum Parsimony Reconciliations in the Duplication-Transfer-Loss Model” on which he was the first author. Along with Harvey Mudd computer science professors Yi-Chieh (Jessica) Wu and Ran Libeskind-Hadas and fellow research students Eli Zupke (Cal Poly Pomona) and Andrew Ramirez (Caltech), Haack demonstrated a fast, polynomial-time algorithm that helps to measure differences between two evolutionary trees and compute them efficiently.
“The overall goal of the problem is to take as input two trees and a set of associations between their leaves, and output a mapping between the trees that minimizes the cost of implied evolutionary events,” Haack explains. “Our research focused on the duplication-transfer-loss (DTL) model, which is widely used, and covers four biological events: speciation, duplication, transfer and loss. Previous publications have computed sets of cost, minimizing reconciliations in polynomial time. Notably, there can potentially be an exponential number of maximum parsimony reconciliations (MPRs) with respect to the sizes of the gene and species trees.”
The researchers sought to compute the “diameter” of this space of maximum parsimony reconciliations. “This diameter metric is useful because it helps a biologist characterize the space of MPRs,” says Haack. “For example, it may be the case that there are a large number of MPRs, but they only differ on a few events. Or, there may be a small number of MPRs, but they are mostly different.”
“This paper provides some new insights into a well-known problem that arises in computational biology,” says Libeskind-Hadas, R. Michael Shanahan Professor of Computer Science. “The problem arises, for example, when trying to understand the evolutionary histories of a gene family and the species in which the genes are found. While there exist good algorithms for finding mappings between two such evolutionary ‘trees,’ it’s been known for some time that the number of ‘good’ mappings can be extremely large, much too large to examine them all. So, an open question has been: When there are many equally good mappings between two evolutionary trees, how different are all of those mappings? This work is likely to provide biologists with new insights into large datasets that they previously didn’t have.”
Libeskind-Hadas says that Haack made key contributions to this work, both in developing the algorithms and implementing and testing them on real datasets.
Libeskind-Hadas and Wu are continuing to work with students on a number of problems related to algorithms and software tools for phylogenetic tree reconciliation. Their work is supported by internal funds and grants from the National Science Foundation.