Dr. Bernard Moret
EPFL (Swiss Federal Institute of Technology),
Lausanne, Switzerland
Phylogenetic Analyses: Past, Present, and Future Bernard M. Moret School of Computer and Communication Sciences EPFL
(Swiss Federal Institute of Technology) Lausanne, Switzerland http://lcbb.epfl.ch/
Phylogenies are simplified histories of the evolution of a group of taxa
(organisms, genes, biological networks, computer malware, artistic styles,
etc.) These phylogenies are inferred from modern-day specimens, in a process
that starts by collecting comparable data about the taxa (such as the
sequences of a few genes), then devising an appropriate model of evolution
for the data, and finally running an inference procedure (machine-learning)
to obtain a tree and some parameter values about that tree. Each year,
thousands of citations are made to existing phylogenetic inference packages,
mostly in the life sciences, but also in computer science, linguistics, forensics,
and art history. As enounced by Th. Dobzhansky in the title of one of his papers,
"biology makes no sense except in the light of evolution" and
phylogenetic analyses are our spotlights. Yet in this talk I will argue that
phylogenetic analyses are underused and in need of generalization. For the
last 80 years, phylogenies have used sequence data as the basis for
inference; at first these sequences coded for morphological characteristics
or simple genomic characteristics such as chromosomal banding; for the last
40 years, they have been RNA or DNA sequences. Phylogenetic analyses of
languages, artistic styles, criminal activities, biological networks, or
entire genomes have had to use tools developed to analyze relatively short
sequences with very simple evolutionary models: the complexity of
evolutionary models for other data, along with the relative paucity of
studies based on such data, prevented the development of analysis techniques
better adapted to the data. We thus need to enlarge and generalize existing
techniques to improve the quality of phylogenetic analyses of data other than
genome sequence data and to enable phylogenetic analyses for entirely new
types of data. In particular we need new models,
sophisticated preprocessing, and reasonable optimization criteria. Most
importantly, we need an enlightened view of phylogenetic analyses in science.
We are all familiar with comparative methods, but a comparison between two
taxa or a collection of pairwise comparisons among a collection of taxa is
just a \emph{degenerate} phylogenetic analysis, one
that makes no (or minimal) use of evolution and models. In any area where the
objects of study are subject to some form of evolution, phylogenetic analyses
will yield much better results than simple comparative studies.
|