Title | Alignment of protein sequences by their profiles |
Publication Type | Journal Article |
Year of Publication | 2004 |
Authors | Marti-Renom, MA, Madhusudhan, MS, Sali, A |
Journal | Protein Sci |
Volume | 13 |
Pagination | 1071-87 |
Keywords | *Algorithms Amino Acid Sequence Computational Biology Databases; Protein Markov Chains Molecular Sequence Data *Protein Folding Protein Structure; Tertiary Proteins/*chemistry *Sequence Alignment Sequence Homology *Software |
Abstract | The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences. |
Notes | Marti-Renom, Marc A Madhusudhan, M S Sali, Andrej P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Protein science : a publication of the Protein Society Protein Sci. 2004 Apr;13(4):1071-87. |
URL | http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15044736 |