Influenza-specific amino acid substitution model
Amino acid substitution model is a crucial component in protein sequence comparative systems such as protein sequence similarity searching, protein sequence alignment, and protein phylogenetic analysis. Although several general amino acid substitution models have been estimated from large protein databases, they might not be appropriate for analyzing specific species. In this paper, we apply the maximum likelihood approach to all influenza protein sequences to estimate an amino acid substitution model of so-called I09 for influenza viruses. Comparing I09 with fourteen other widely used models, we achieve remarkable results: (1) a likelihood improvement of phylogenetic trees based on I09 compared with other models. Precisely, I09 results in the best likelihood in 436 out of 489 cases tested; (2) tree topologies constructed with I09 and other models are frequently different indicating that the impact of I09 is not only on the likelihood improvement but also in tree topologies; (3) marked differences between I09 and other models revealing that existing models are not be able to capture the amino acid substitution process of influenza viruses. © 2009 IEEE.