Computer cladistics / ¡Cladística a la lata!: Homology and parsimony

Today is a transaltion day ;). This post is a translation of my previous post in spanish about the subject.

This post is somewhat inspired in a discussion by Ebach and Williams in their blog systematics & biogeography (they had many post about the subject!). Here I want to show the relationship between parsimony (or cladistic analysis) and homology.

Homology

It is a long tradition of discussion about the definition of homology. I use a definition similar to the traditional one, then two structures are homologs when both are considered the same structure in different organism. There are countless arguments to acept two structures as being the same: God's plan, natural order, morphotype, or the one I endorse, by common ancestry.

If two structures are homologs, it implies that they are the same structure, and they are inherited from a common ancestor of the two examined organims. Inheritance implies several things by definition: specially from genetics and development biology. Moreover, the structures could not be highly similar, even if they are the same, because they can be strongly modified. Nevertheless both are the same, so it implies that structures change across the time, in our assumptions include several process that promote the differentiation, such as genetic interaction and population dynamics.

Cladistic analysis do not have a direct interest in these phenomena associated with homology. The process are of interest in other fields like evolutionary biology, development biology, genetics, molecular biology, population genetics, ecology, etc. But lack of interest do not imply that they could be ignored! In the character definition for a cladistic analysis (i.e selection of homolog characters) several of those factors would be taken into account to propose character limits and character codification.

Unfortunately, as process is not of direct interest for cladistics allows the erroneous idea that the process is irrelevant in character definition. The the so called pattern cladists (as Nelson and Platnick [1] and recently Pleijel [2], Brower [3] and Ebach and Williams) to think that cladistics is free of evolutionary thinking, but it is a totally wrong position: the use of homolog characters implies a framework based on origin by common ancestry, the every character--and not only its apomorphic state--are the same structure [4, 5].

But it is more, implications of evolution are big for several characters, specially if there is a well knowledge of the character, the the idea of Kluge [6] who claims that only assumption of cladistic analysis is 'descent with modification' is also wrong. When you include inheritance and evolution, many things are included in the definition of character. Of course, for some characters we only got a morphological knowledge of the character, then assuming a simple 'descent with modification' seems to be correct, but for other characters (as in many vertebrates) we got knowledge from development and genetics of the structure, in such cases the assumption are far more complex than descent with modification.

Parsimony: the algorithm

The basic principle of parsimony algorithm is fairly simple. If you want to know the character state in the node 'x', which descendants are 'y' and 'z', and we know their character states, then the character state of 'x' is the intersection between states of 'y' and 'z', if the intersection is void then the state of 'x' is the union of 'y' and 'z' states. Using an union implies a character change (a step). Of course, optimization of states is more complex, but for the present discussion we only need the basic steps.

State assignation using the parsimony algorithm: (A) and (B) the state of ancestor 'x' is equal to the intersection of descendants 'y' and 'z', i this case, white; (c) 'y' and 'z' do not share any state, then the state assigned to 'x' is the state union of his descendants.

As you see, the algorithm is 'independent' of the data used, you can use any kind of character (in computer science this problem is know as the coloring problem, and the 'characters' are colors of a map) and any kind of terminal. It is a common character of every method formalized in an algorithmic form. Then it is necessary to provide a proper justification to use the algorithm in a particular problem.

Parsimony: cladistic analysis

With the evolutive concept of homology, its union with the algorithmic parsimony is direct. If a character, no matter its state, is the same between two organisms that share a common ancestor, it implies that the character is inherited from the common ancestor, then the common ancestor would have the character.

Moreover if both organisms share the same form of the character, that is the same state, then that state would be in its common ancestor (the character is the same!), but if both organisms had different forms of the characters, we do not know which form would be present in the common ancestor, so we assume that it could be any of the both states, in that case, if both organisms really share the same character then a transformation would be happen.

This is exactly the same description of the parsimony algorithm. In cladistics the use of parsimony algorithm is justified because used characters are homologs. In this context the algorithm maximize our homology propositions when minimize transformations: this allows that most terminals with the same states would be contiguous. Then the hypothesis ad hoc of homoplasy are minimized [7].

Parsimony and homology tests

It is a common idea between cladists to say that parsimony is a test of homology: congruence. O disagree, because as I argument here the basis of parsimony is assuming from the very beginning that character are homologs! Any homology test would be previous to a rigorous cladistic analysis.

Homology test could have several forms, they could be morphology arguments (usally put under 'similarity' label), anatomical position, structural organization, ontogeny, genetics, and in most cases the 'test' is a conjunction of these procedures--for these reasons defining a character implies an strong theoretical background--. After examining all of those alternatives you got a good character. Is for these reasons that homoplasy is an ad hoc hypothesis: homoplasy is only justified in realtion with the cladogram.

Of course, character revision is always welcomed, and homoplasic ones maybe demand a close examination, but it is equally valid to examine every character. It is possible that codification form some characters is dubious, in such cases it is possible to use, as an exploratory devise different codifications (similar to the proposition of Ramirez [8] for morphology, and Wheeler [9] for molecules). But beware: this codification is not supported by the cladogram, because the argument used to defend that codification is the same used for homoplasy: it is justified only in relation with the cladoogram, the the codification is an ad hoc codification. In ambiguous cases I prefer a weighting schema as proposed by Neff [10]: because we know little about the character, and we have some doubts about its coding, it is better that it has a lower weight than characters that we know better.

**
Bonus: an historical speculation

He I show homology and parsimony ideas in a separated fashion and then I fuse them. I do it in that way to clarify the argument. But historically the development is intertwined from the beginning. If you read Wagner [11] in the algorithmic pathway, and Hennig [12] from the logical point it is clear that both positions are very close. Both visions were fused in an excelent fashion by Farris and its collaborators [7, 13, 14], which ideas (especially from [14]) agree in many points exposed here. Then from the beginning cladistic analysis and the parsimony algorithm walk together.

Wagner, Hennig and Farris development their ideas from a morphology context. At the same time Dayoff [15] experiment with several algorithms for molecular sequences, which at least today, homology ideas for molecular biologist are different to the morphological concept, I do not know what homology ideas used molecular biologist form 60s, but it seems that she did not believe that two bases (in case of Dayoff, two aminoacids) equal in two organisms imply common origin, the idea of point mutations precludes the idea. It is worth to note that Dayoff never could find a way to assign states in ancestors.

References
[1] Nelson, G., Platnick, N. 1981. Systematics and biogeography. Columbia Univ., New York.
[2] Pleijel, F. 1995. On character coding for phylogeny reconstruction. Cladistics 11: 309-315.
[3] Brower, A.V.Z. 2000. Evolution is not a necessary assumption of cladistics. Cladistics 16: 143-154.
[4] Fitzhugh, K. 2006. The philosophical basis of character coding for inference of phylogenetic hypothesis. Zoologica scripta 35: 261-286.
[5] Grant, T., kluge, A.G. 2004. Transformation series as an ideographic character concept. Cladistics 20: 23-31.
[6] Kluge, A.G. 2003. On the deduction of species relationships: a précis. Cladistics 19: 233-239.
[7] Farris, J.S. 1983. The logical basis of phylogenetic systematics. In: Platnick, M., Funk, V.A. (Eds.), Advances in cladistics, vol. 2. Columbia Univ., New York, Pp. 7-36.
[8] Ramirez, M.J. 2007. Homology as a parsimony problem: a dynamic homology approach for morphological data. Cladistics 23: 588-612.
[9] Wheeler, W. 1996. Optimization alignment: the end of multiple sequence alignment in phylogenetics? Cladistics 12: 1-9.
[10] Neff, N.A. 1986. A rational basis for a priori character weighting. Systematic zoology 35: 110-123.
[11] Wagner, W.H. 1961. Problems in the classification of ferns. In: Recent advances in botany, vol. 1, Univ. Toronto, Toronto, Pp. 841-844.
[12] Hennigh, W. 1966. Phylogenetic systematics. Univ. Illinois, Urbana.
[13] Kluge, A.G., Farris, J.S. 1969. Quantitative phyletic and the evolution of anurans. Systematic zoology 18: 1-32.
[14] Farris, J.S., Kluge, A.G., Eckardt, M.J. 1970. A numerical approach to phylogenetic systematics. Systematic zoology 19: 172-189.
[15] Dayoff, M.O. 1969. Computer analysis of protein evolution. Scientific american 221: 87-95.

Original post

Computer cladistics / ¡Cladística a la lata!

sábado, diciembre 22, 2007

Homology and parsimony

No hay comentarios.:

Acerca de mí

Journals / Revistas

Links / Enlaces

Archivo del Blog

Etiquetas