TY - JOUR T1 - N-glycosylation efficiency is determined by the distance to the C-terminus and the amino acid preceding an Asn-Ser-Thr sequon. JF - Protein science : a publication of the Protein Society Y1 - 2011 A1 - Bañó-Polo, Manuel A1 - Baldin, Francesca A1 - Tamborero, Silvia A1 - Marti-Renom, Marc A A1 - Mingarro, Ismael AB -

N-glycosylation is the most common and versatile protein modification. In eukaryotic cells, this modification is catalyzed cotranslationally by the enzyme oligosaccharyltransferase, which targets the β-amide of the asparagine in an Asn-Xaa-Ser/Thr consensus sequon (where Xaa is any amino acid but proline) in nascent proteins as they enter the endoplasmic reticulum. Because modification of the glycosylation acceptor site on membrane proteins occurs in a compartment-specific manner, the presence of glycosylation is used to indicate membrane protein topology. Moreover, glycosylation sites can be added to gain topological information. In this study, we explored the determinants of N-glycosylation with the in vitro transcription/translation of a truncated model protein in the presence of microsomes and surveyed 25,488 glycoproteins, of which 2,533 glycosylation sites had been experimentally validated. We found that glycosylation efficiency was dependent on both the distance to the C-terminus and the nature of the amino acid that preceded the consensus sequon. These findings establish a broadly applicable method for membrane protein tagging in topological studies.

VL - 20 ER - TY - JOUR T1 - Structure determination of genomic domains by satisfaction of spatial restraints. JF - Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology Y1 - 2011 A1 - Baù, Davide A1 - Marti-Renom, Marc A AB -

The three-dimensional (3D) architecture of a genome is non-random and known to facilitate the spatial colocalization of regulatory elements with the genes they regulate. Determining the 3D structure of a genome may therefore probe an essential step in characterizing how genes are regulated. Currently, there are several experimental and theoretical approaches that aim at determining the 3D structure of genomes and genomic domains; however, approaches integrating experiments and computation to identify the most likely 3D folding of a genome at medium to high resolutions have not been widely explored. Here, we review existing methodologies and propose that the integrative modeling platform ( http://www.integrativemodeling.org ), a computational package developed for structurally characterizing protein assemblies, could be used for integrating diverse experimental data towards the determination of the 3D architecture of genomic domains and entire genomes at unprecedented resolution. Our approach, through the visualization of looping interactions between distal regulatory elements, will allow for the characterization of global chromatin features and their relation to gene expression. We illustrate our work by outlining the recent determination of the 3D architecture of the α-globin domain in the human genome.

VL - 19 ER - TY - JOUR T1 - The three-dimensional folding of the α-globin gene domain reveals formation of chromatin globules. JF - Nature structural & molecular biology Y1 - 2011 A1 - Baù, Davide A1 - Sanyal, Amartya A1 - Lajoie, Bryan R A1 - Capriotti, Emidio A1 - Byron, Meg A1 - Lawrence, Jeanne B A1 - Dekker, Job A1 - Marti-Renom, Marc A AB -

We developed a general approach that combines chromosome conformation capture carbon copy (5C) with the Integrated Modeling Platform (IMP) to generate high-resolution three-dimensional models of chromatin at the megabase scale. We applied this approach to the ENm008 domain on human chromosome 16, containing the α-globin locus, which is expressed in K562 cells and silenced in lymphoblastoid cells (GM12878). The models accurately reproduce the known looping interactions between the α-globin genes and their distal regulatory elements. Further, we find using our approach that the domain folds into a single globular conformation in GM12878 cells, whereas two globules are formed in K562 cells. The central cores of these globules are enriched for transcribed genes, whereas nontranscribed chromatin is more peripheral. We propose that globule formation represents a higher-order folding state related to clustering of transcribed genes around shared transcription machineries, as previously observed by microscopy.

VL - 18 ER - TY - JOUR T1 - Alignment of multiple protein structures based on sequence and structure features. JF - Protein engineering, design & selection : PEDS Y1 - 2009 A1 - Madhusudhan, M. S. A1 - Webb, Benjamin M A1 - Marti-Renom, Marc A A1 - Eswar, Narayanan A1 - Sali, Andrej AB -

Comparing the structures of proteins is crucial to gaining insight into protein evolution and function. Here, we align the sequences of multiple protein structures by a dynamic programming optimization of a scoring function that is a sum of an affine gap penalty and terms dependent on various sequence and structure features (SALIGN). The features include amino acid residue type, residue position, residue accessible surface area, residue secondary structure state and the conformation of a short segment centered on the residue. The multiple alignment is built by following the ’guide’ tree constructed from the matrix of all pairwise protein alignment scores. Importantly, the method does not depend on the exact values of various parameters, such as feature weights and gap penalties, because the optimal alignment across a range of parameter values is found. Using multiple structure alignments in the HOMSTRAD database, SALIGN was benchmarked against MUSTANG for multiple alignments as well as against TM-align and CE for pairwise alignments. On the average, SALIGN produces a 15% improvement in structural overlap over HOMSTRAD and 14% over MUSTANG, and yields more equivalent structural positions than TM-align and CE in 90% and 95% of cases, respectively. The utility of accurate multiple structure alignment is illustrated by its application to comparative protein structure modeling.

VL - 22 ER - TY - JOUR T1 - Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans. JF - Hum Mutat Y1 - 2008 A1 - Capriotti, Emidio A1 - Arbiza, Leonardo A1 - Casadio, Rita A1 - Dopazo, Joaquin A1 - Dopazo, Hernán A1 - Marti-Renom, Marc A KW - Algorithms KW - Codon KW - Computational Biology KW - Databases, Protein KW - DNA Mutational Analysis KW - Evolution, Molecular KW - Genetic Predisposition to Disease KW - Genetic Variation KW - Genome, Human KW - Humans KW - Iduronic Acid KW - Point Mutation KW - Polymorphism, Single Nucleotide KW - Proteins KW - Tumor Suppressor Protein p53 AB -

Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies.

VL - 29 IS - 1 U1 - https://www.ncbi.nlm.nih.gov/pubmed/17935148?dopt=Abstract ER - TY - JOUR T1 - DBAli tools: mining the protein structure space. JF - Nucleic Acids Res Y1 - 2007 A1 - Marti-Renom, Marc A A1 - Pieper, Ursula A1 - Madhusudhan, M S A1 - Rossi, Andrea A1 - Eswar, Narayanan A1 - Davis, Fred P A1 - Al-Shahrour, Fátima A1 - Dopazo, Joaquin A1 - Sali, Andrej KW - Algorithms KW - Amino Acid Sequence KW - Computational Biology KW - Data Interpretation, Statistical KW - Databases, Protein KW - Internet KW - Molecular Sequence Data KW - Protein Conformation KW - Proteins KW - Pseudomonas aeruginosa KW - Sequence Alignment KW - Sequence Analysis, Protein KW - Sequence Homology, Amino Acid KW - Software KW - Structure-Activity Relationship AB -

The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions.

VL - 35 IS - Web Server issue U1 - https://www.ncbi.nlm.nih.gov/pubmed/17478513?dopt=Abstract ER -