TY - JOUR T1 - Quantifying the relationship between sequence and three-dimensional structure conservation in RNA. JF - BMC bioinformatics Y1 - 2010 A1 - E. Capriotti A1 - M. A. Marti-Renom AB -

ABSTRACT: BACKGROUND: In recent years, the number of available RNA structures has rapidly grown reflecting the increased interest on RNA biology. Similarly to the studies carried out two decades ago for proteins, which gave the fundamental grounds for developing comparative protein structure prediction methods, we are now able to quantify the relationship between sequence and structure conservation in RNA. RESULTS: Here we introduce an all-against-all sequence- and three-dimensional (3D) structure-based comparison of a representative set of RNA structures, which have allowed us to quantitatively confirm that: (i) there is a measurable relationship between sequence and structure conservation that weakens for alignments resulting in below 60% sequence identity, (ii) evolution tends to conserve more RNA structure than sequence, and (iii) there is a twilight zone for RNA homology detection. DISCUSSION: The computational analysis here presented quantitatively describes the relationship between sequence and structure for RNA molecules and defines a twilight zone region for detecting RNA homology. Our work could represent the theoretical basis and limitations for future developments in comparative RNA 3D structure prediction.

VL - 11 ER - TY - JOUR T1 - A kernel for open source drug discovery in tropical diseases JF - PLoS Negl Trop Dis Y1 - 2009 A1 - Orti, L. A1 - Carbajo, R. J. A1 - Pieper, U. A1 - Eswar, N. A1 - Maurer, S. M. A1 - Rai, A. K. A1 - Taylor, G. A1 - Todd, M. H. A1 - Pineda-Lucena, A. A1 - Sali, A. A1 - M. A. Marti-Renom AB - BACKGROUND: Conventional patent-based drug development incentives work badly for the developing world, where commercial markets are usually small to non-existent. For this reason, the past decade has seen extensive experimentation with alternative R&D institutions ranging from private-public partnerships to development prizes. Despite extensive discussion, however, one of the most promising avenues-open source drug discovery-has remained elusive. We argue that the stumbling block has been the absence of a critical mass of preexisting work that volunteers can improve through a series of granular contributions. Historically, open source software collaborations have almost never succeeded without such "kernels". METHODOLOGY/PRINCIPAL FINDINGS: HERE, WE USE A COMPUTATIONAL PIPELINE FOR: (i) comparative structure modeling of target proteins, (ii) predicting the localization of ligand binding sites on their surfaces, and (iii) assessing the similarity of the predicted ligands to known drugs. Our kernel currently contains 143 and 297 protein targets from ten pathogen genomes that are predicted to bind a known drug or a molecule similar to a known drug, respectively. The kernel provides a source of potential drug targets and drug candidates around which an online open source community can nucleate. Using NMR spectroscopy, we have experimentally tested our predictions for two of these targets, confirming one and invalidating the other. CONCLUSIONS/SIGNIFICANCE: The TDI kernel, which is being offered under the Creative Commons attribution share-alike license for free and unrestricted use, can be accessed on the World Wide Web at http://www.tropicaldisease.org. We hope that the kernel will facilitate collaborative efforts towards the discovery of new drugs against parasites that cause tropical diseases. VL - 3 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19381286 N1 - Orti, Leticia Carbajo, Rodrigo J Pieper, Ursula Eswar, Narayanan Maurer, Stephen M Rai, Arti K Taylor, Ginger Todd, Matthew H Pineda-Lucena, Antonio Sali, Andrej Marti-Renom, Marc A United States PLoS neglected tropical diseases PLoS Negl Trop Dis. 2009;3(4):e418. Epub 2009 Apr 21. ER - TY - JOUR T1 - A kernel for the Tropical Disease Initiative JF - Nat Biotechnol Y1 - 2009 A1 - Orti, L. A1 - Carbajo, R. J. A1 - Pieper, U. A1 - Eswar, N. A1 - Maurer, S. M. A1 - Rai, A. K. A1 - Taylor, G. A1 - Todd, M. H. A1 - Pineda-Lucena, A. A1 - Sali, A. A1 - M. A. Marti-Renom VL - 27 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19352362 N1 -

Orti, Leticia Carbajo, Rodrigo J Pieper, Ursula Eswar, Narayanan Maurer, Stephen M Rai, Arti K Taylor, Ginger Todd, Matthew H Pineda-Lucena, Antonio Sali, Andrej Marti-Renom, Marc A P01 AI035707/AI/NIAID NIH HHS/United States P01 GM71790/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States U54 GM074945/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t United States Nature biotechnology Nat Biotechnol. 2009 Apr;27(4):320-1.

ER - TY - JOUR T1 - MODBASE, a database of annotated comparative protein structure models and associated resources JF - Nucleic Acids Res Y1 - 2009 A1 - Pieper, U. A1 - Eswar, N. A1 - Webb, B. M. A1 - Eramian, D. A1 - Kelly, L. A1 - Barkan, D. T. A1 - Carter, H. A1 - Mankoo, P. A1 - Karchin, R. A1 - M. A. Marti-Renom A1 - Davis, F. P. A1 - Sali, A. KW - *Databases KW - Molecular Mutation Polymorphism KW - Protein Genomics Humans Ligands *Models KW - Protein User-Computer Interface KW - Single Nucleotide Protein Folding Protein Interaction Domains and Motifs *Protein Structure KW - Tertiary Proteins/genetics *Structural Homology AB - MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/). VL - 37 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18948282 N1 - Pieper, Ursula Eswar, Narayanan Webb, Ben M Eramian, David Kelly, Libusha Barkan, David T Carter, Hannah Mankoo, Parminder Karchin, Rachel Marti-Renom, Marc A Davis, Fred P Sali, Andrej GM08284/GM/NIGMS NIH HHS/United States P01 GM71790/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States U01 GM61390/GM/NIGMS NIH HHS/United States U54 GM074929/GM/NIGMS NIH HHS/United States U54 GM074945/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, Non-P.H.S. England Nucleic acids research Nucleic Acids Res. 2009 Jan;37(Database issue):D347-54. Epub 2008 Oct 23. ER - TY - JOUR T1 - ModLink+: Improving fold recognition by using protein-protein interactions JF - Bioinformatics Y1 - 2009 A1 - Fornes, O. A1 - Aragues, R. A1 - Espadaler, J. A1 - M. A. Marti-Renom A1 - Sali, A. A1 - Oliva, B. KW - protein folding AB -

MOTIVATION: Several strategies have been developed to predict the fold of a target protein sequence, most of which are based on aligning the target sequence to other sequences of known structure. Previously, we demonstrated that the consideration of protein-protein interactions significantly increases the accuracy of fold assignment compared to PSI-BLAST sequence comparisons. A drawback of our method was the low number of proteins to which a fold could be assigned. Here, we present an improved version of the method that addresses this limitation. We also compare our method to other state-of-the-art fold assignment methodologies. RESULTS: Our approach (ModLink+) has been tested on 3,716 proteins with domain folds classified in the Structural Classification Of Proteins (SCOP) as well as known interacting partners in the Database of Interacting Proteins (DIP). For this test set, the ratio of success (PPV) on fold assignment increases from 75% for PSI-BLAST, 83% for HHSearch and 81% for PRC to more than 90% for ModLink+ at the e-value cutoff of 10(-3). Under this e-value, ModLink+ can assign a fold to 30-45% of the proteins in the test set, while our previous method could cover less than 25%. When applied to 6,384 proteins with unknown fold in the yeast proteome, ModLink+ combined with PSI-BLAST assigns a fold for domains in 3,738 proteins, while PSI-BLAST alone only covers 2,122 proteins, HHSearch 2,969 and PRC 2,826 proteins, using a threshold e-value that would represent a PPV higher than 82% for each method in the test set. AVAILABILITY: The ModLink+ server is freely accessible in the World Wide Web at http://sbi.imim.es/modlink/. CONTACT: boliva@imim.es.

UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19357100 N1 -

Journal article Bioinformatics (Oxford, England) Bioinformatics. 2009 Apr 8.

ER - TY - JOUR T1 - SARA: a server for function annotation of RNA structures JF - Nucl. Acids Res. Y1 - 2009 A1 - Capriotti, Emidio A1 - M. A. Marti-Renom KW - RNA KW - RNA structure AB -

Recent interest in non-coding RNA transcripts has resulted in a rapid increase of deposited RNA structures in the Protein Data Bank. However, a characterization and functional classification of the RNA structure and function space have only been partially addressed. Here, we introduce the SARA program for pair-wise alignment of RNA structures as a web server for structure-based RNA function assignment. The SARA server relies on the SARA program, which aligns two RNA structures based on a unit-vector root-mean-square approach. The likely accuracy of the SARA alignments is assessed by three different P-values estimating the statistical significance of the sequence, secondary structure and tertiary structure identity scores, respectively. Our benchmarks, which relied on a set of 419 RNA structures with known SCOR structural class, indicate that at a negative logarithm of mean P-value higher or equal than 2.5, SARA can assign the correct or a similar SCOR class to 81.4% and 95.3% of the benchmark set, respectively. The SARA server is freely accessible via the World Wide Web at http://sgu.bioinfo.cipf.es/services/SARA/.

UR - http://nar.oxfordjournals.org/cgi/content/abstract/gkp433v1 ER - TY - JOUR T1 - Evolutionary potentials: structure specific knowledge-based potentials exploiting the evolutionary record of sequence homologs JF - Genome Biol Y1 - 2008 A1 - Panjkovich, A. A1 - Melo, F. A1 - M. A. Marti-Renom AB - ABSTRACT: We introduce a new type of knowledge-based potentials for protein structure prediction, called ’evolutionary potentials’, which are derived using a single experimental protein structure and all three-dimensional models of its homologous sequences. The new potentials have been benchmarked against other knowledge-based potentials, resulting in a significant increase in accuracy for model assessment. In contrast to standard knowledge-based potentials, we propose that evolutionary potentials capture key determinants of thermodynamic stability and specific sequence constraints required for fast folding. VL - 9 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18397517 N1 - Journal article Genome biology Genome Biol. 2008 Apr 8;9(4):R68. ER - TY - JOUR T1 - Prediction of enzyme function by combining sequence similarity and protein interactions JF - BMC Bioinformatics Y1 - 2008 A1 - Espadaler, J. A1 - Eswar, N. A1 - Querol, E. A1 - Aviles, F. X. A1 - Sali, A. A1 - M. A. Marti-Renom A1 - Oliva, B. KW - Amino Acid *Software Structure-Activity Relationship Substrate Specificity/genetics KW - Amino Acid Sequence/physiology Databases KW - Automated Predictive Value of Tests Protein Interaction Mapping Proteins/analysis/metabolism Sequence Alignment Sequence Analysis KW - Protein *Sequence Homology KW - Protein Enzymes/analysis/*metabolism Fuzzy Logic Pattern Recognition AB - BACKGROUND: A number of studies have used protein interaction data alone for protein function prediction. Here, we introduce a computational approach for annotation of enzymes, based on the observation that similar protein sequences are more likely to perform the same function if they share similar interacting partners. RESULTS: The method has been tested against the PSI-BLAST program using a set of 3,890 protein sequences from which interaction data was available. For protein sequences that align with at least 40% sequence identity to a known enzyme, the specificity of our method in predicting the first three EC digits increased from 80% to 90% at 80% coverage when compared to PSI-BLAST. CONCLUSION: Our method can also be used in proteins for which homologous sequences with known interacting partners can be detected. Thus, our method could increase 10% the specificity of genome-wide enzyme predictions based on sequence matching by PSI-BLAST alone. VL - 9 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18505562 N1 - Espadaler, Jordi Eswar, Narayanan Querol, Enrique Aviles, Francesc X Sali, Andrej Marti-Renom, Marc A Oliva, Baldomero GM54762/GM/NIGMS NIH HHS/United States GM71790/GM/NIGMS NIH HHS/United States GM74929/GM/NIGMS NIH HHS/United States GM74945/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t England BMC bioinformatics BMC Bioinformatics. 2008 May 27;9:249. ER - TY - JOUR T1 - RNA structure alignment by a unit-vector approach JF - Bioinformatics Y1 - 2008 A1 - E. Capriotti A1 - M. A. Marti-Renom KW - Algorithms Base Sequence Computer Simulation *Models KW - Chemical *Models KW - Molecular Molecular Sequence Data Nucleic Acid Conformation RNA/*chemistry/*ultrastructure Sequence Alignment/*methods Sequence Analysis KW - RNA/*methods *Software AB - MOTIVATION: The recent discovery of tiny RNA molecules such as microRNAs and small interfering RNA are transforming the view of RNA as a simple information transfer molecule. Similar to proteins, the native three-dimensional structure of RNA determines its biological activity. Therefore, classifying the current structural space is paramount for functionally annotating RNA molecules. The increasing numbers of RNA structures deposited in the PDB requires more accurate, automatic and benchmarked methods for RNA structure comparison. In this article, we introduce a new algorithm for RNA structure alignment based on a unit-vector approach. The algorithm has been implemented in the SARA program, which results in RNA structure pairwise alignments and their statistical significance. RESULTS: The SARA program has been implemented to be of general applicability even when no secondary structure can be calculated from the RNA structures. A benchmark against the ARTS program using a set of 1275 non-redundant pairwise structure alignments results in inverted approximately 6% extra alignments with at least 50% structurally superposed nucleotides and base pairs. A first attempt to perform RNA automatic functional annotation based on structure alignments indicates that SARA can correctly assign the deepest SCOR classification to >60% of the query structures. AVAILABILITY: The SARA program is freely available through a World Wide Web server http://sgu.bioinfo.cipf.es/services/SARA/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. VL - 24 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18689811 N1 - Capriotti, Emidio Marti-Renom, Marc A Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2008 Aug 15;24(16):i112-8. ER - TY - JOUR T1 - Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans JF - Hum Mutat Y1 - 2008 A1 - E. Capriotti A1 - Arbiza, L. A1 - Casadio, R. A1 - Dopazo, J. A1 - H. Dopazo A1 - M. A. Marti-Renom KW - Algorithms Codon/genetics Computational Biology/*methods *DNA Mutational Analysis Databases KW - Human Humans Iduronic Acid/analogs & derivatives/metabolism *Point Mutation Polymorphism KW - Molecular *Genetic Predisposition to Disease Genetic Variation Genome KW - Protein *Evolution KW - Single Nucleotide Proteins/chemistry/*genetics Tumor Suppressor Protein p53/genetics AB - Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies. VL - 29 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17935148 N1 - Capriotti, Emidio Arbiza, Leonardo Casadio, Rita Dopazo, Joaquin Dopazo, Hernan Marti-Renom, Marc A Evaluation Studies Research Support, Non-U.S. Gov’t United States Human mutation Hum Mutat. 2008 Jan;29(1):198-204. ER - TY - JOUR T1 - The AnnoLite and AnnoLyze programs for comparative annotation of protein structures JF - BMC Bioinformatics Y1 - 2007 A1 - M. A. Marti-Renom A1 - Rossi, A. A1 - Fatima Al-Shahrour A1 - Davis, F. P. A1 - Pieper, U. A1 - Dopazo, J. A1 - Sali, A. KW - *Algorithms Amino Acid Sequence Confidence Intervals Data Interpretation KW - Amino Acid *Software Structure-Activity Relationship KW - Protein Information Storage and Retrieval/methods Molecular Sequence Data Proteins/*chemistry/classification/*metabolism Sensitivity and Specificity Sequence Alignment/*methods Sequence Analysis KW - Protein/*methods Sequence Homology KW - Statistical *Databases AB - BACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of 90% and average precision of 80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of 70% and average precision of 30%, correctly localizing binding sites for small molecules in 95% of its predictions. CONCLUSION: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/. VL - 8 Suppl 4 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17570147 N1 - Marti-Renom, Marc A Rossi, Andrea Al-Shahrour, Fatima Davis, Fred P Pieper, Ursula Dopazo, Joaquin Sali, Andrej Research Support, Non-U.S. Gov’t England BMC bioinformatics BMC Bioinformatics. 2007 May 22;8 Suppl 4:S4. ER - TY - JOUR T1 - Characterization of protein hubs by inferring interacting motifs from protein interactions JF - PLoS Comput Biol Y1 - 2007 A1 - Aragues, R. A1 - Sali, A. A1 - Bonet, J. A1 - M. A. Marti-Renom A1 - Oliva, B. KW - Amino Acid Motifs Amino Acid Sequence Binding Sites Computer Simulation *Models KW - Chemical *Models KW - Molecular Molecular Sequence Data Protein Binding Protein Interaction Mapping/*methods Proteins/*chemistry Sequence Analysis KW - Protein/*methods AB - The characterization of protein interactions is essential for understanding biological systems. While genome-scale methods are available for identifying interacting proteins, they do not pinpoint the interacting motifs (e.g., a domain, sequence segments, a binding site, or a set of residues). Here, we develop and apply a method for delineating the interacting motifs of hub proteins (i.e., highly connected proteins). The method relies on the observation that proteins with common interaction partners tend to interact with these partners through a common interacting motif. The sole input for the method are binary protein interactions; neither sequence nor structure information is needed. The approach is evaluated by comparing the inferred interacting motifs with domain families defined for 368 proteins in the Structural Classification of Proteins (SCOP). The positive predictive value of the method for detecting proteins with common SCOP families is 75% at sensitivity of 10%. Most of the inferred interacting motifs were significantly associated with sequence patterns, which could be responsible for the common interactions. We find that yeast hubs with multiple interacting motifs are more likely to be essential than hubs with one or two interacting motifs, thus rationalizing the previously observed correlation between essentiality and the number of interacting partners of a protein. We also find that yeast hubs with multiple interacting motifs evolve slower than the average protein, contrary to the hubs with one or two interacting motifs. The proposed method will help us discover unknown interacting motifs and provide biological insights about protein hubs and their roles in interaction networks. VL - 3 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17941705 N1 - Aragues, Ramon Sali, Andrej Bonet, Jaume Marti-Renom, Marc A Oliva, Baldo PN2 EY016525,/EY/NEI NIH HHS/United States U54 RR022220/RR/NCRR NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t United States PLoS computational biology PLoS Comput Biol. 2007 Sep;3(9):1761-71. Epub 2007 Jul 30. ER - TY - JOUR T1 - DBAli tools: mining the protein structure space JF - Nucleic Acids Res Y1 - 2007 A1 - M. A. Marti-Renom A1 - Pieper, U. A1 - Madhusudhan, M. S. A1 - Rossi, A. A1 - Eswar, N. A1 - Davis, F. P. A1 - Fatima Al-Shahrour A1 - Dopazo, J. A1 - Sali, A. KW - *Algorithms Amino Acid Sequence Computational Biology/*methods Data Interpretation KW - Amino Acid *Software Structure-Activity Relationship KW - Protein Internet Molecular Sequence Data Protein Conformation Proteins/*chemistry/classification/*metabolism Pseudomonas aeruginosa/*metabolism Sequence Alignment/*methods Sequence Analysis KW - Protein/*methods Sequence Homology KW - Statistical *Databases AB - The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions. VL - 35 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17478513 N1 - Marti-Renom, Marc A Pieper, Ursula Madhusudhan, M S Rossi, Andrea Eswar, Narayanan Davis, Fred P Al-Shahrour, Fatima Dopazo, Joaquin Sali, Andrej GM 62529/GM/NIGMS NIH HHS/United States GM074929/GM/NIGMS NIH HHS/United States GM54762/GM/NIGMS NIH HHS/United States GM71790/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2007 Jul;35(Web Server issue):W393-7. Epub 2007 May 3. ER - TY - JOUR T1 - Protein translocation into peroxisomes by ring-shaped import receptors JF - FEBS Lett Y1 - 2007 A1 - Stanley, W. A. A1 - Fodor, K. A1 - M. A. Marti-Renom A1 - Schliebs, W. A1 - Wilmanns, M. KW - Amino Acid Sequence Binding Sites Humans Molecular Sequence Data Peroxisomes/*metabolism Protein Structure KW - Cytoplasmic and Nuclear/*chemistry KW - Tertiary Protein Transport Receptors AB - Folded and functional proteins destined for translocation from the cytosol into the peroxisomal matrix are recognized by two different peroxisomal import receptors, Pex5p and Pex7p. Both cargo-loaded receptors dock on the same translocon components, followed by cargo release and receptor recycling, as part of the complete translocation process. Recent structural and functional evidence on the Pex5p receptor has provided insight on the molecular requirements of specific cargo recognition, while the remaining processes still remain largely elusive. Comparison of experimental structures of Pex5p and a structural model of Pex7p reveal that both receptors are built by ring-like arrangements with cargo binding sites, central to the respective structures. Although, molecular insight into the complete peroxisomal translocon still remains to be determined, emerging data allow to deduce common molecular principles that may hold for other translocation systems as well. VL - 581 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17884042 N1 - Stanley, Will A Fodor, Krisztian Marti-Renom, Marc A Schliebs, Wolfgang Wilmanns, Matthias Review Netherlands FEBS letters FEBS Lett. 2007 Oct 16;581(25):4795-802. Epub 2007 Sep 11. ER - TY - JOUR T1 - Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets JF - Proteins Y1 - 2006 A1 - Melo, F. A1 - M. A. Marti-Renom KW - Amino Acid Sequence Amino Acids/*chemistry/classification/*metabolism Consensus Sequence Molecular Sequence Data Oxidation-Reduction *Protein Folding Proteins/*chemistry/*metabolism Sequence Alignment/*methods Structural Homology KW - Protein AB - Reduced or simplified amino acid alphabets group the 20 naturally occurring amino acids into a smaller number of representative protein residues. To date, several reduced amino acid alphabets have been proposed, which have been derived and optimized by a variety of methods. The resulting reduced amino acid alphabets have been applied to pattern recognition, generation of consensus sequences from multiple alignments, protein folding, and protein structure prediction. In this work, amino acid substitution matrices and statistical potentials were derived based on several reduced amino acid alphabets and their performance assessed in a large benchmark for the tasks of sequence alignment and fold assessment of protein structure models, using as a reference frame the standard alphabet of 20 amino acids. The results showed that a large reduction in the total number of residue types does not necessarily translate into a significant loss of discriminative power for sequence alignment and fold assessment. Therefore, some definitions of a few residue types are able to encode most of the relevant sequence/structure information that is present in the 20 standard amino acids. Based on these results, we suggest that the use of reduced amino acid alphabets may allow to increasing the accuracy of current substitution matrices and statistical potentials for the prediction of protein structure of remote homologs. VL - 63 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16506243 N1 - Melo, Francisco Marti-Renom, Marc A Research Support, Non-U.S. Gov’t United States Proteins Proteins. 2006 Jun 1;63(4):986-95. ER - TY - JOUR T1 - Comparative protein structure modeling using Modeller JF - Curr Protoc Bioinformatics Y1 - 2006 A1 - Eswar, N. A1 - Webb, B. A1 - M. A. Marti-Renom A1 - Madhusudhan, M. S. A1 - Eramian, D. A1 - Shen, M. Y. A1 - Pieper, U. A1 - Sali, A. KW - Algorithms Amino Acid Sequence Computer Simulation Crystallography/*methods *Models KW - Chemical *Models KW - Molecular Molecular Sequence Data Protein Conformation Protein Folding Proteins/*chemistry/*ultrastructure Sequence Analysis KW - Protein/*methods *Software AB - Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. VL - Chapter 5 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18428767 N1 - Eswar, Narayanan Webb, Ben Marti-Renom, Marc A Madhusudhan, M S Eramian, David Shen, Min-Yi Pieper, Ursula Sali, Andrej P01 A135707/PHS HHS/United States P01 GM71790/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States U54 GM62529/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t United States Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis ... [et al.] Curr Protoc Bioinformatics. 2006 Oct;Chapter 5:Unit 5.6. ER - TY - JOUR T1 - A composite score for predicting errors in protein structure models JF - Protein Sci Y1 - 2006 A1 - Eramian, D. A1 - Shen, M. Y. A1 - Devos, D. A1 - Melo, F. A1 - Sali, A. A1 - M. A. Marti-Renom KW - *Models KW - Molecular Models KW - Theoretical Proteins/*chemistry AB - Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling. VL - 15 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16751606 N1 - Eramian, David Shen, Min-yi Devos, Damien Melo, Francisco Sali, Andrej Marti-Renom, Marc A GM 08284/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, Non-P.H.S. United States Protein science : a publication of the Protein Society Protein Sci. 2006 Jul;15(7):1653-66. Epub 2006 Jun 2. ER - TY - JOUR T1 - Localization of binding sites in protein structures by optimization of a composite scoring function JF - Protein Sci Y1 - 2006 A1 - Rossi, A. A1 - M. A. Marti-Renom A1 - Sali, A. KW - Amino Acid Sequence Binding Sites Biomechanics Hydrophobicity Ligands *Monte Carlo Method Protein Conformation Proteins/*chemistry Static Electricity AB - The rise in the number of functionally uncharacterized protein structures is increasing the demand for structure-based methods for functional annotation. Here, we describe a method for predicting the location of a binding site of a given type on a target protein structure. The method begins by constructing a scoring function, followed by a Monte Carlo optimization, to find a good scoring patch on the protein surface. The scoring function is a weighted linear combination of the z-scores of various properties of protein structure and sequence, including amino acid residue conservation, compactness, protrusion, convexity, rigidity, hydrophobicity, and charge density; the weights are calculated from a set of previously identified instances of the binding-site type on known protein structures. The scoring function can easily incorporate different types of information useful in localization, thus increasing the applicability and accuracy of the approach. To test the method, 1008 known protein structures were split into 20 different groups according to the type of the bound ligand. For nonsugar ligands, such as various nucleotides, binding sites were correctly identified in 55%-73% of the cases. The method is completely automated (http://salilab.org/patcher) and can be applied on a large scale in a structural genomics setting. VL - 15 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16963645 N1 - Rossi, Andrea Marti-Renom, Marc A Sali, Andrej P01 AI035707/AI/NIAID NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t United States Protein science : a publication of the Protein Society Protein Sci. 2006 Oct;15(10):2366-80. Epub 2006 Sep 8. ER - TY - JOUR T1 - MODBASE: a database of annotated comparative protein structure models and associated resources JF - Nucleic Acids Res Y1 - 2006 A1 - Pieper, U. A1 - Eswar, N. A1 - Davis, F. P. A1 - Braberg, H. A1 - Madhusudhan, M. S. A1 - Rossi, A. A1 - M. A. Marti-Renom A1 - Karchin, R. A1 - Webb, B. M. A1 - Eramian, D. A1 - Shen, M. Y. A1 - Kelly, L. A1 - Melo, F. A1 - Sali, A. KW - Binding Sites *Databases KW - Molecular Polymorphism KW - Protein Humans Internet Ligands *Models KW - Protein Systems Integration User-Computer Interface KW - Single Nucleotide Protein Structure KW - Tertiary Proteins/*chemistry/genetics/metabolism Software *Structural Homology AB - MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (http://salilab.org/modweb). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, http://salilab.org/dbali), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, http://salilab.org/pibase) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, http://salilab.org/LS-SNP). VL - 34 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16381869 N1 - Pieper, Ursula Eswar, Narayanan Davis, Fred P Braberg, Hannes Madhusudhan, M S Rossi, Andrea Marti-Renom, Marc Karchin, Rachel Webb, Ben M Eramian, David Shen, Min-Yi Kelly, Libusha Melo, Francisco Sali, Andrej GM 08284/GM/NIGMS NIH HHS/United States P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM 54762/GM/NIGMS NIH HHS/United States R33 CA84699/CA/NCI NIH HHS/United States U54 GM074945/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2006 Jan 1;34(Database issue):D291-5. ER - TY - JOUR T1 - Refinement of protein structures by iterative comparative modeling and CryoEM density fitting JF - J Mol Biol Y1 - 2006 A1 - Topf, M. A1 - Baker, M. L. A1 - M. A. Marti-Renom A1 - Chiu, W. A1 - Sali, A. KW - Amino Acid Sequence Cryoelectron Microscopy *Models KW - Molecular Molecular Sequence Data Plant Viruses/chemistry *Protein Conformation Software Viral Proteins/*chemistry/genetics AB - We developed a method for structure characterization of assembly components by iterative comparative protein structure modeling and fitting into cryo-electron microscopy (cryoEM) density maps. Specifically, we calculate a comparative model of a given component by considering many alternative alignments between the target sequence and a related template structure while optimizing the fit of a model into the corresponding density map. The method relies on the previously developed Moulder protocol that iterates over alignment, model building, and model assessment. The protocol was benchmarked using 20 varied target-template pairs of known structures with less than 30% sequence identity and corresponding simulated density maps at resolutions from 5A to 25A. Relative to the models based on the best existing sequence profile alignment methods, the percentage of C(alpha) atoms that are within 5A of the corresponding C(alpha) atoms in the superposed native structure increases on average from 52% to 66%, which is half-way between the starting models and the models from the best possible alignments (82%). The test also reveals that despite the improvements in the accuracy of the fitness function, this function is still the bottleneck in reducing the remaining errors. To demonstrate the usefulness of the protocol, we applied it to the upper domain of the P8 capsid protein of rice dwarf virus that has been studied by cryoEM at 6.8A. The C(alpha) root-mean-square deviation of the model based on the remotely related template, bluetongue virus VP7, improved from 8.7A to 6.0A, while the best possible model has a C(alpha) RMSD value of 5.3A. Moreover, the resulting model fits better into the cryoEM density map than the initial template structure. The method is being implemented in our program MODELLER for protein structure modeling by satisfaction of spatial restraints and will be applicable to the rapidly increasing number of cryoEM density maps of macromolecular assemblies. VL - 357 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16490207 N1 - Topf, Maya Baker, Matthew L Marti-Renom, Marc A Chiu, Wah Sali, Andrej 2 PN2 EY016525-02/EY/NEI NIH HHS/United States P20RR020647/RR/NCRR NIH HHS/United States P41RR02250/RR/NCRR NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, Non-P.H.S. England Journal of molecular biology J Mol Biol. 2006 Apr 14;357(5):1655-68. Epub 2006 Feb 2. ER - TY - JOUR T1 - Variable gap penalty for protein sequence-structure alignment JF - Protein Eng Des Sel Y1 - 2006 A1 - Madhusudhan, M. S. A1 - M. A. Marti-Renom A1 - Sanchez, R. A1 - Sali, A. KW - Algorithms Amino Acid Sequence Models KW - Amino Acid *Software KW - Molecular Molecular Sequence Data Proteins/*chemistry Sequence Alignment/*methods Sequence Analysis KW - Protein/*methods *Sequence Homology AB - The penalty for inserting gaps into an alignment between two protein sequences is a major determinant of the alignment accuracy. Here, we present an algorithm for finding a globally optimal alignment by dynamic programming that can use a variable gap penalty (VGP) function of any form. We also describe a specific function that depends on the structural context of an insertion or deletion. It penalizes gaps that are introduced within regions of regular secondary structure, buried regions, straight segments and also between two spatially distant residues. The parameters of the penalty function were optimized on a set of 240 sequence pairs of known structure, spanning the sequence identity range of 20-40%. We then tested the algorithm on another set of 238 sequence pairs of known structures. The use of the VGP function increases the number of correctly aligned residues from 81.0 to 84.5% in comparison with the optimized affine gap penalty function; this difference is statistically significant according to Student’s t-test. We estimate that the new algorithm allows us to produce comparative models with an additional approximately 7 million accurately modeled residues in the approximately 1.1 million proteins that are detectably related to a known structure. VL - 19 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16423846 N1 - Madhusudhan, M S Marti-Renom, Marc A Sanchez, Roberto Sali, Andrej DE016274/DE/NIDCR NIH HHS/United States GM54762/GM/NIGMS NIH HHS/United States GM62529/GM/NIGMS NIH HHS/United States Comparative Study Research Support, N.I.H., Extramural England Protein engineering, design & selection : PEDS Protein Eng Des Sel. 2006 Mar;19(3):129-33. Epub 2006 Jan 19. ER - TY - JOUR T1 - The C-type lectin fold as an evolutionary solution for massive sequence variation JF - Nat Struct Mol Biol Y1 - 2005 A1 - McMahon, S. A. A1 - Miller, J. L. A1 - Lawton, J. A. A1 - Kerkow, D. E. A1 - Hodes, A. A1 - M. A. Marti-Renom A1 - Doulatov, S. A1 - Narayanan, E. A1 - Sali, A. A1 - Miller, J. F. A1 - Ghosh, P. KW - Amino Acid Sequence Bacterial Outer Membrane Proteins/*chemistry Bacteriophages/*metabolism Bordetella/*virology Evolution KW - Bordetella/*chemistry KW - C-Type/*chemistry Molecular Sequence Data Protein Conformation Protein Folding Viral Proteins/*chemistry/*genetics Virulence Factors KW - Molecular Genetic Variation Genome KW - Viral Lectins AB - Only few instances are known of protein folds that tolerate massive sequence variation for the sake of binding diversity. The most extensively characterized is the immunoglobulin fold. We now add to this the C-type lectin (CLec) fold, as found in the major tropism determinant (Mtd), a retroelement-encoded receptor-binding protein of Bordetella bacteriophage. Variation in Mtd, with its approximately 10(13) possible sequences, enables phage adaptation to Bordetella spp. Mtd is an intertwined, pyramid-shaped trimer, with variable residues organized by its CLec fold into discrete receptor-binding sites. The CLec fold provides a highly static scaffold for combinatorial display of variable residues, probably reflecting a different evolutionary solution for balancing diversity against stability from that in the immunoglobulin fold. Mtd variants are biased toward the receptor pertactin, and there is evidence that the CLec fold is used broadly for sequence variation by related retroelements. VL - 12 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16170324 N1 - McMahon, Stephen A Miller, Jason L Lawton, Jeffrey A Kerkow, Donald E Hodes, Asher Marti-Renom, Marc A Doulatov, Sergei Narayanan, Eswar Sali, Andrej Miller, Jeff F Ghosh, Partho F31AI061840/AI/NIAID NIH HHS/United States F32AI49695/AI/NIAID NIH HHS/United States T32GM008326/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Nature structural & molecular biology Nat Struct Mol Biol. 2005 Oct;12(10):886-92. Epub 2005 Sep 18. ER - TY - JOUR T1 - Detecting remotely related proteins by their interactions and sequence similarity JF - Proc Natl Acad Sci U S A Y1 - 2005 A1 - Espadaler, J. A1 - Aragues, R. A1 - Eswar, N. A1 - M. A. Marti-Renom A1 - Querol, E. A1 - Aviles, F. X. A1 - Sali, A. A1 - Oliva, B. KW - Amino Acid KW - Computational Biology Databases KW - Molecular Protein Conformation Protein Folding Proteins/*genetics/*metabolism Proteomics/*methods *Sequence Homology KW - Protein *Evolution AB - The function of an uncharacterized protein is usually inferred either from its homology to, or its interactions with, characterized proteins. Here, we use both sequence similarity and protein interactions to identify relationships between remotely related protein sequences. We rely on the fact that homologous sequences share similar interactions, and, therefore, the set of interacting partners of the partners of a given protein is enriched by its homologs. The approach was bench-marked by assigning the fold and functional family to test sequences of known structure. Specifically, we relied on 1,434 proteins with known folds, as defined in the Structural Classification of Proteins (SCOP) database, and with known interacting partners, as defined in the Database of Interacting Proteins (DIP). For this subset, the specificity of fold assignment was increased from 54% for position-specific iterative BLAST to 75% for our approach, with a concomitant increase in sensitivity for a few percentage points. Similarly, the specificity of family assignment at the e-value threshold of 10(-8) was increased from 70% to 87%. The proposed method would be a useful tool for large-scale automated discovery of remote relationships between protein sequences, given its unique reliance on sequence similarity and protein-protein interactions. VL - 102 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15883372 N1 - Espadaler, Jordi Aragues, Ramon Eswar, Narayanan Marti-Renom, Marc A Querol, Enrique Aviles, Francesc X Sali, Andrej Oliva, Baldomero R01 GM54762/GM/NIGMS NIH HHS/United States Comparative Study Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Proceedings of the National Academy of Sciences of the United States of America Proc Natl Acad Sci U S A. 2005 May 17;102(20):7151-6. Epub 2005 May 9. ER - TY - JOUR T1 - Alignment of protein sequences by their profiles JF - Protein Sci Y1 - 2004 A1 - M. A. Marti-Renom A1 - Madhusudhan, M. S. A1 - Sali, A. KW - *Algorithms Amino Acid Sequence Computational Biology Databases KW - Protein Markov Chains Molecular Sequence Data *Protein Folding Protein Structure KW - Tertiary Proteins/*chemistry *Sequence Alignment Sequence Homology *Software AB - The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences. VL - 13 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15044736 N1 - Marti-Renom, Marc A Madhusudhan, M S Sali, Andrej P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Protein science : a publication of the Protein Society Protein Sci. 2004 Apr;13(4):1071-87. ER - TY - JOUR T1 - MODBASE, a database of annotated comparative protein structure models, and associated resources JF - Nucleic Acids Res Y1 - 2004 A1 - Pieper, U. A1 - Eswar, N. A1 - Braberg, H. A1 - Madhusudhan, M. S. A1 - Davis, F. P. A1 - Stuart, A. C. A1 - Mirkovic, N. A1 - Rossi, A. A1 - M. A. Marti-Renom A1 - Fiser, A. A1 - Webb, B. A1 - Greenblatt, D. A1 - Huang, C. C. A1 - Ferrin, T. E. A1 - Sali, A. KW - Amino Acid Sequence Animals Binding Sites *Computational Biology *Databases KW - Molecular Molecular Sequence Data Polymorphism KW - Protein Genomics Humans Internet Ligands Models KW - Single Nucleotide Protein Binding Protein Conformation Proteins/*chemistry/genetics Sequence Alignment Software User-Computer Interface AB - MODBASE (http://salilab.org/modbase) is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on the MODELLER package for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE uses the MySQL relational database management system for flexible querying and CHIMERA for viewing the sequences and structures (http://www.cgl.ucsf.edu/chimera/). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, as well as improvements in the software for calculating the models. For ease of access, MODBASE is organized into different data sets. The largest data set contains 1,26,629 models for domains in 659,495 out of 1,182,126 unique protein sequences in the complete Swiss-Prot/TrEMBL database (August 25, 2003); only models based on alignments with significant similarity scores and models assessed to have the correct fold despite insignificant alignments are included. Another model data set supports target selection and structure-based annotation by the New York Structural Genomics Research Consortium; e.g. the 53 new structures produced by the consortium allowed us to characterize structurally 24,113 sequences. MODBASE also contains binding site predictions for small ligands and a set of predicted interactions between pairs of modeled sequences from the same genome. Our other resources associated with MODBASE include a comprehensive database of multiple protein structure alignments (DBALI, http://salilab.org/dbali) as well as web servers for automated comparative modeling with MODPIPE (MODWEB, http://salilab. org/modweb), modeling of loops in protein structures (MODLOOP, http://salilab.org/modloop) and predicting functional consequences of single nucleotide polymorphisms (SNPWEB, http://salilab. org/snpweb). VL - 32 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=14681398 N1 - Pieper, Ursula Eswar, Narayanan Braberg, Hannes Madhusudhan, M S Davis, Fred P Stuart, Ashley C Mirkovic, Nebojsa Rossi, Andrea Marti-Renom, Marc A Fiser, Andras Webb, Ben Greenblatt, Daniel Huang, Conrad C Ferrin, Thomas E Sali, Andrej P41 RR01081/RR/NCRR NIH HHS/United States P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM 54762/GM/NIGMS NIH HHS/United States R33 CA84699/CA/NCI NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. England Nucleic acids research Nucleic Acids Res. 2004 Jan 1;32(Database issue):D217-22. ER - TY - JOUR T1 - Structure-based assessment of missense mutations in human BRCA1: implications for breast and ovarian cancer predisposition JF - Cancer Res Y1 - 2004 A1 - Mirkovic, N. A1 - M. A. Marti-Renom A1 - Weber, B. L. A1 - Sali, A. A1 - Monteiro, A. N. KW - BRCA1 Genetic Predisposition to Disease Humans *Mutation KW - BRCA1 Protein/*chemistry/genetics Breast Neoplasms/*genetics Female *Genes KW - Missense Ovarian Neoplasms/*genetics Pedigree Protein Conformation Structure-Activity Relationship Transcriptional Activation AB - The BRCA1 gene from individuals at risk of breast and ovarian cancers can be screened for the presence of mutations. However, the cancer association of most alleles carrying missense mutations is unknown, thus creating significant problems for genetic counseling. To increase our ability to identify cancer-associated mutations in BRCA1, we set out to use the principles of protein three-dimensional structure as well as the correlation between the cancer-associated mutations and those that abolish transcriptional activation. Thirty-one of 37 missense mutations of known impact on the transcriptional activation function of BRCA1 are readily rationalized in structural terms. Loss-of-function mutations involve nonconservative changes in the core of the BRCA1 C-terminus (BRCT) fold or are localized in a groove that presumably forms a binding site involved in the transcriptional activation by BRCA1; mutations that do not abolish transcriptional activation are either conservative changes in the core or are on the surface outside of the putative binding site. Next, structure-based rules for predicting functional consequences of a given missense mutation were applied to 57 germ-line BRCA1 variants of unknown cancer association. Such a structure-based approach may be helpful in an integrated effort to identify mutations that predispose individuals to cancer. VL - 64 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15172985 N1 - Mirkovic, Nebojsa Marti-Renom, Marc A Weber, Barbara L Sali, Andrej Monteiro, Alvaro N A CA92309/CA/NCI NIH HHS/United States GM54762/GM/NIGMS NIH HHS/United States GM61390/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, Non-P.H.S. Research Support, U.S. Gov’t, P.H.S. United States Cancer research Cancer Res. 2004 Jun 1;64(11):3790-7. ER - TY - JOUR T1 - EVA: Evaluation of protein structure prediction servers JF - Nucleic Acids Res Y1 - 2003 A1 - Koh, I. Y. A1 - Eyrich, V. A. A1 - M. A. Marti-Renom A1 - Przybylski, D. A1 - Madhusudhan, M. S. A1 - Eswar, N. A1 - Grana, O. A1 - Pazos, F. A1 - Valencia, A. A1 - Sali, A. A1 - Rost, B. KW - Automation Databases KW - Protein KW - Protein Internet *Protein Conformation Protein Folding Protein Structure KW - Protein Structural Homology KW - Secondary Proteins/chemistry Reproducibility of Results *Sequence Analysis AB - EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods. VL - 31 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12824315 N1 - Koh, Ingrid Y Y Eyrich, Volker A Marti-Renom, Marc A Przybylski, Dariusz Madhusudhan, Mallur S Eswar, Narayanan Grana, Osvaldo Pazos, Florencio Valencia, Alfonso Sali, Andrej Rost, Burkhard 1-P50-GM62413-01/GM/NIGMS NIH HHS/United States 5-P20-LM7276/LM/NLM NIH HHS/United States P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM54762/GM/NIGMS NIH HHS/United States R01-GM63029-01/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, Non-P.H.S. Research Support, U.S. Gov’t, P.H.S. England Nucleic acids research Nucleic Acids Res. 2003 Jul 1;31(13):3311-5. ER - TY - JOUR T1 - ModView, visualization of multiple protein sequences and structures JF - Bioinformatics Y1 - 2003 A1 - Ilyin, V. A. A1 - Pieper, U. A1 - Stuart, A. C. A1 - M. A. Marti-Renom A1 - McMahan, L. A1 - Sali, A. KW - *Database Management Systems Documentation/methods Imaging KW - Protein/*methods *User-Computer Interface KW - Three-Dimensional/methods Protein Conformation Proteins/*chemistry/genetics Sequence Alignment/*methods Sequence Analysis AB - SUMMARY: We describe ModView, a web application for visualization of multiple protein sequences and structures. ModView integrates a multiple structure viewer, a multiple sequence alignment editor, and a database querying engine. It is possible to interactively manipulate hundreds of proteins, to visualize conservative and variable residues, active and binding sites, fragments, and domains in protein families, as well as to display large macromolecular complexes such as ribosomes or viruses. As a Netscape plug-in, ModView can be included in HTML pages along with text and figures, which makes it useful for teaching and presentations. ModView is also suitable as a graphical interface to various databases because it can be controlled through JavaScript commands and called from CGI scripts. AVAILABILITY: ModView is available at http://guitar.rockefeller.edu/modview. VL - 19 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12499313 N1 - Ilyin, Valentin A Pieper, Ursula Stuart, Ashley C Marti-Renom, Marc A McMahan, Linda Sali, Andrej P50-GM62529/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. England Bioinformatics (Oxford, England) Bioinformatics. 2003 Jan;19(1):165-6. ER - TY - JOUR T1 - Tools for comparative protein structure modeling and analysis JF - Nucleic Acids Res Y1 - 2003 A1 - Eswar, N. A1 - John, B. A1 - Mirkovic, N. A1 - Fiser, A. A1 - Ilyin, V. A. A1 - Pieper, U. A1 - Stuart, A. C. A1 - M. A. Marti-Renom A1 - Madhusudhan, M. S. A1 - Yerkovich, B. A1 - Sali, A. KW - Amino Acid *Software *Structural Homology KW - Internet Models KW - Molecular Protein Folding Proteins/chemistry Reproducibility of Results Sequence Alignment Sequence Homology KW - Protein Systems Integration AB - The following resources for comparative protein structure modeling and analysis are described (http://salilab.org): MODELLER, a program for comparative modeling by satisfaction of spatial restraints; MODWEB, a web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER; MODLOOP, a web server for automated loop modeling that relies on MODELLER; MOULDER, a CPU intensive protocol of MODWEB for building comparative models based on distant known structures; MODBASE, a comprehensive database of annotated comparative models for all sequences detectably related to a known structure; MODVIEW, a Netscape plugin for Linux that integrates viewing of multiple sequences and structures; and SNPWEB, a web server for structure-based prediction of the functional impact of a single amino acid substitution. VL - 31 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12824331 N1 - Eswar, Narayanan John, Bino Mirkovic, Nebojsa Fiser, Andras Ilyin, Valentin A Pieper, Ursula Stuart, Ashley C Marti-Renom, Marc A Madhusudhan, M S Yerkovich, Bozidar Sali, Andrej P50 GM62529/GM/NIGMS NIH HHS/United States R01 GM 54762/GM/NIGMS NIH HHS/United States R33 CA84699/CA/NCI NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. England Nucleic acids research Nucleic Acids Res. 2003 Jul 1;31(13):3375-80. ER - TY - JOUR T1 - Reliability of assessment of protein structure prediction methods JF - Structure Y1 - 2002 A1 - M. A. Marti-Renom A1 - Madhusudhan, M. S. A1 - Fiser, A. A1 - Rost, B. A1 - Sali, A. KW - *Computer Simulation Humans *Models KW - Molecular *Protein Conformation Proteins/*chemistry Reproducibility of Results AB -

The reliability of ranking of protein structure modeling methods is assessed. The assessment is based on the parametric Student’s t test and the nonparametric Wilcox signed rank test of statistical significance of the difference between paired samples. The approach is applied to the ranking of the comparative modeling methods tested at the fourth meeting on Critical Assessment of Techniques for Protein Structure Prediction (CASP). It is shown that the 14 CASP4 test sequences may not be sufficient to reliably distinguish between the top eight methods, given the model quality differences and their standard deviations. We suggest that CASP needs to be supplemented by an assessment of protein structure prediction methods that is automated, continuous in time, based on several criteria applied to a large number of models, and with quantitative statistical reliability assigned to each characterization.

VL - 10 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12005441 N1 -

Marti-Renom, Marc A Madhusudhan, M S Fiser, Andras Rost, Burkhard Sali, Andrej GM 54762/GM/NIGMS NIH HHS/United States GM62413/GM/NIGMS NIH HHS/United States Comparative Study Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Structure (London, England : 1993) Structure. 2002 Mar;10(3):435-40.

ER - TY - JOUR T1 - Use of single point mutations in domain I of beta 2-glycoprotein I to determine fine antigenic specificity of antiphospholipid autoantibodies JF - J Immunol Y1 - 2002 A1 - Iverson, G. M. A1 - Reddel, S. A1 - Victoria, E. J. A1 - Cockerill, K. A. A1 - Wang, Y. X. A1 - M. A. Marti-Renom A1 - Sali, A. A1 - Marquis, D. M. A1 - Krilis, S. A. A1 - Linnik, M. D. KW - Amino Acid Substitution/genetics Antibodies KW - Antibody/genetics Binding KW - Antiphospholipid/blood/*metabolism Antibodies KW - Competitive/genetics/immunology Enzyme-Linked Immunosorbent Assay/methods Epitopes/analysis/*immunology/metabolism Glycine/genetics Glycoproteins/biosynthesis/*genetics/*immunology/isolation & purification/metabolism Humans Models KW - Molecular Peptide Fragments/genetics/immunology/isolation & purification/metabolism *Point Mutation Protein Structure KW - Monoclonal/blood/metabolism Antiphospholipid Syndrome/immunology Arginine/genetics *Binding Sites KW - Tertiary/genetics Recombinant Proteins/biosynthesis/immunology/isolation & purification/metabolism Static Electricity beta 2-Glycoprotein I AB - Autoantibodies against beta(2)-glycoprotein I (beta(2)GPI) appear to be a critical feature of the antiphospholipid syndrome (APS). As determined using domain deletion mutants, human autoantibodies bind to the first of five domains present in beta(2)GPI. In this study the fine detail of the domain I epitope has been examined using 10 selected mutants of whole beta(2)GPI containing single point mutations in the first domain. The binding to beta(2)GPI was significantly affected by a number of single point mutations in domain I, particularly by mutations in the region of aa 40-43. Molecular modeling predicted these mutations to affect the surface shape and electrostatic charge of a facet of domain I. Mutation K19E also had an effect, albeit one less severe and involving fewer patients. Similar results were obtained in two different laboratories using affinity-purified anti-beta(2)GPI in a competitive inhibition ELISA and with whole serum in a direct binding ELISA. This study confirms that anti-beta(2)GPI autoantibodies bind to domain I, and that the charged surface patch defined by residues 40-43 contributes to a dominant target epitope. VL - 169 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12471146 N1 - Iverson, G Michael Reddel, Stephen Victoria, Edward J Cockerill, Keith A Wang, Ying-Xia Marti-Renom, Marc A Sali, Andrej Marquis, David M Krilis, Steven A Linnik, Matthew D GM54762/GM/NIGMS NIH HHS/United States Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. United States Journal of immunology (Baltimore, Md. : 1950) J Immunol. 2002 Dec 15;169(12):7097-103. ER - TY - JOUR T1 - Classification of protein disulphide-bridge topologies JF - J Comput Aided Mol Des Y1 - 2001 A1 - Mas, J. M. A1 - Aloy, P. A1 - M. A. Marti-Renom A1 - Oliva, B. A1 - de Llorens, R. A1 - Aviles, F. X. A1 - Querol, E. KW - Algorithms Computer Simulation Databases as Topic Disulfides/*chemistry Models KW - Molecular Protein Structure KW - Secondary Protein Structure KW - Tertiary Proteins/*chemistry/*classification Software AB - The preferential occurrence of certain disulphide-bridge topologies in proteins has prompted us to design a method and a program, KNOT-MATCH, for their classification. The program has been applied to a database of proteins with less than 65% homology and more than two disulphide bridges. We have investigated whether there are topological preferences that can be used to group proteins and if these can be applied to gain insight into the structural or functional relationships among them. The classification has been performed by Density Search and Hierarchical Clustering Techniques, yielding thirteen main protein classes from the superimposition and clustering process. It is noteworthy that besides the disulphide bridges, regular secondary structures and loops frequently become correctly aligned. Although the lack of significant sequence similarity among some clustered proteins precludes the easy establishment of evolutionary relationships, the program permits us to find out important structural or functional residues upon the superimposition of two protein structures apparently unrelated. The derived classification can be very useful for finding relationships among proteins which would escape detection by current sequence or topology-based analytical algorithms. VL - 15 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11394740 N1 - Mas, J M Aloy, P Marti-Renom, M A Oliva, B de Llorens, R Aviles, F X Querol, E Comparative Study Research Support, Non-U.S. Gov’t Netherlands Journal of computer-aided molecular design J Comput Aided Mol Des. 2001 May;15(5):477-87. ER - TY - JOUR T1 - DBAli: a database of protein structure alignments JF - Bioinformatics Y1 - 2001 A1 - M. A. Marti-Renom A1 - Ilyin, V. A. A1 - Sali, A. KW - Computational Biology *Databases KW - Protein Proteins/*chemistry/*genetics Sequence Alignment/*statistics & numerical data Software Software Design AB - SUMMARY: The DBAli database includes approximately 35000 alignments of pairs of protein structures from SCOP (Lo Conte et al., Nucleic Acids Res., 28, 257-259, 2000) and CE (Shindyalov and Bourne, Protein Eng., 11, 739-747, 1998). DBAli is linked to several resources, including Compare3D (Shindyalov and Bourne, http://www.sdsc.edu/pb/software.htm, 1999) and ModView (Ilyin and Sali, http://guitar.rockefeller.edu/ModView/, 2001) for visualizing sequence alignments and structure superpositions. A flexible search of DBAli by protein sequence and structure properties allows construction of subsets of alignments suitable for a number of applications, such as benchmarking of sequence-sequence and sequence-structure alignment methods under a variety of conditions. AVAILABILITY: http://guitar.rockefeller.edu/DBAli/ VL - 17 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11524379 N1 - Marti-Renom, M A Ilyin, V A Sali, A Research Support, Non-U.S. Gov’t Research Support, U.S. Gov’t, P.H.S. England Bioinformatics (Oxford, England) Bioinformatics. 2001 Aug;17(8):746-7. ER - TY - JOUR T1 - EVA: continuous automatic evaluation of protein structure prediction servers JF - Bioinformatics Y1 - 2001 A1 - Eyrich, V. A. A1 - M. A. Marti-Renom A1 - Przybylski, D. A1 - Madhusudhan, M. S. A1 - Fiser, A. A1 - Pazos, F. A1 - Valencia, A. A1 - Sali, A. A1 - Rost, B. KW - Automation Internet *Protein Conformation Proteins/*analysis *Software AB - Evaluation of protein structure prediction methods is difficult and time-consuming. Here, we describe EVA, a web server for assessing protein structure prediction methods, in an automated, continuous and large-scale fashion. Currently, EVA evaluates the performance of a variety of prediction methods available through the internet. Every week, the sequences of the latest experimentally determined protein structures are sent to prediction servers, results are collected, performance is evaluated, and a summary is published on the web. EVA has so far collected data for more than 3000 protein chains. These results may provide valuable insight to both developers and users of prediction methods. AVAILABILITY: http://cubic.bioc.columbia.edu/eva. CONTACT: eva@cubic.bioc.columbia.edu VL - 17 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11751240 N1 - Eyrich, V A Marti-Renom, M A Przybylski, D Madhusudhan, M S Fiser, A Pazos, F Valencia, A Sali, A Rost, B England Bioinformatics (Oxford, England) Bioinformatics. 2001 Dec;17(12):1242-3. ER -