TY - JOUR T1 - A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. JF - Nature biotechnology Y1 - 2014 A1 - Su, Z. A1 - Labaj, P.P. A1 - .... A1 - Dopazo, J. A1 - .... A1 - Mason, C.E. A1 - Shi, L KW - NGS KW - RNA-seq KW - SEQC AB - We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. VL - 32 UR - http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2957.html ER - TY - JOUR T1 - Molecular interactions between sugar beet and Polymyxa betae during its life cycle JF - Annals of Applied Biology Y1 - 2014 A1 - N. Desoignies A1 - Carbonell, J. A1 - J.-S. Moreau A1 - A. Conesa A1 - Dopazo, J. A1 - A. Legrève AB - Polymyxa betae is a biotrophic obligate sugar beet parasite that belongs to plasmodiophorids. The infection of sugar beet roots by this parasite is asymptomatic, except when it transmits Beet necrotic yellow vein virus (BNYVV), the causal agent of rhizomania. To date, there has been little work on P. betae–sugar beet molecular interactions, mainly because of the obligate nature of the parasite and also because research on rhizomania has tended to focus on the virus. In this study, we investigated these interactions through differential transcript analysis, using suppressive subtractive hybridization. The analysis included 76 P. betae and 120 sugar beet expressed sequence tags (ESTs). The expression of selected ESTs from both organisms was monitored during the protist life cycle, revealing a potential role of two P. betae proteins, profilin and a Von Willebrand factor domain-containing protein, in the early phase of infection. This study also revealed an over-expression of some sugar beet genes involved in defence, such as those encoding PR proteins, stress resistance proteins or lectins, especially during the plasmodial stage of the P. betae life cycle. In addition to providing new information on the molecular aspects of P. betae–sugar beet interactions, this study also enabled previously unknown ESTs of P. betae to be sequenced, thus enhancing our knowledge of the genome of this protist. VL - 164 UR - http://onlinelibrary.wiley.com/doi/10.1111/aab.12095/abstract ER - TY - JOUR T1 - A novel locus for a hereditary recurrent neuropathy on chromosome 21q21. JF - Neuromuscular disorders : NMD Y1 - 2014 A1 - Calpena, E A1 - Martínez-Rubio, D A1 - Arpa, J A1 - García-Peñas, J J A1 - Montaner, D. A1 - Dopazo, J. A1 - Palau, F A1 - Espinós, C AB - Hereditary recurrent neuropathies are uncommon. Disorders with a known molecular basis falling within this group include hereditary neuropathy with liability to pressure palsies (HNPP) due to the deletion of the PMP22 gene or to mutations in this same gene, and hereditary neuralgic amyotrophy (HNA) caused by mutations in the SEPT9 gene. We report a three-generation family presenting a hereditary recurrent neuropathy without pathological changes in either PMP22 or SEPT9 genes. We performed a genome-wide mapping, which yielded a locus of 12.4Mb on chromosome 21q21. The constructed haplotype fully segregated with the disease and we found significant evidence of linkage. After mutational screening of genes located within this locus, encoding for proteins and microRNAs, as well as analysis of large deletions/insertions, we identified 71 benign polymorphisms. Our findings suggest a novel genetic locus for a recurrent hereditary neuropathy of which the molecular defect remains elusive. Our results further underscore the clinical and genetic heterogeneity of this group of neuropathies. VL - 24 UR - http://www.sciencedirect.com/science/article/pii/S0960896614001060# ER - TY - JOUR T1 - A large scale survey reveals that chromosomal copy-number alterations significantly affect gene modules involved in cancer initiation and progression JF - BMC Medical Genomics Y1 - 2011 A1 - Alloza, E. A1 - Fatima Al-Shahrour A1 - Cigudosa, J. C. A1 - Dopazo, J. AB -

Background

Recent observations point towards the existence of a large number of neighborhoods composed of functionally-related gene modules that lie together in the genome. This local component in the distribution of the functionality across chromosomes is probably affecting the own chromosomal architecture by limiting the possibilities in which genes can be arranged and distributed across the genome. As a direct consequence of this fact it is therefore presumable that diseases such as cancer, harboring DNA copy number alterations (CNAs), will have a symptomatology strongly dependent on modules of functionally-related genes rather than on a unique "important" gene.

Methods

We carried out a systematic analysis of more than 140,000 observations of CNAs in cancers and searched by enrichments in gene functional modules associated to high frequencies of loss or gains.

Results

The analysis of CNAs in cancers clearly demonstrates the existence of a significant pattern of loss of gene modules functionally related to cancer initiation and progression along with the amplification of modules of genes related to unspecific defense against xenobiotics (probably chemotherapeutical agents). With the extension of this analysis to an Array-CGH dataset (glioblastomas) from The Cancer Genome Atlas we demonstrate the validity of this approach to investigate the functional impact of CNAs.

Conclusions

The presented results indicate promising clinical and therapeutic implications. Our findings also directly point out to the necessity of adopting a function-centric, rather a gene-centric, view in the understanding of phenotypes or diseases harboring CNAs.

VL - 4 UR - http://www.biomedcentral.com/1755-8794/4/37 ER - TY - JOUR T1 - Analysis of chronic lymphotic leukemia transcriptomic profile: differences between molecular subgroups JF - Leuk Lymphoma Y1 - 2009 A1 - Jantus Lewintre, E. A1 - Reinoso Martin, C. A1 - Montaner, D. A1 - Marin, M. A1 - Jose Terol, M. A1 - Farras, R. A1 - Benet, I. A1 - Calvete, J. J. A1 - Dopazo, J. A1 - Garcia-Conde, J. KW - cancer KW - microarray data analysis AB -

B cell chronic lymphocytic leukemia (CLL) is a lymphoproliferative disorder with a variable clinical course. Patients with unmutated IgV(H) gene show a shorter progression-free and overall survival than patients with immunoglobulin heavy chain variable regions (IgV(H)) gene mutated. In addition, BCL6 mutations identify a subgroup of patients with high risk of progression. Gene expression was analysed in 36 early-stage patients using high-density microarrays. Around 150 genes differentially expressed were found according to IgV(H) mutations, whereas no difference was found according to BCL6 mutations. Functional profiling methods allowed us to distinguish KEGG and gene ontology terms showing coordinated gene expression changes across subgroups of CLL. We validated a set of differentially expressed genes according to IgV(H) status, scoring them as putative prognostic markers in CLL. Among them, CRY1, LPL, CD82 and DUSP22 are the ones with at least equal or superior performance to ZAP70 which is actually the most used surrogate marker of IgV(H) status.

VL - 50 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19127482 N1 -

Jantus Lewintre, Eloisa Reinoso Martin, Cristina Montaner, David Marin, Miguel Jose Terol, Maria Farras, Rosa Benet, Isabel Calvete, Juan J Dopazo, Joaquin Garcia-Conde, Javier Research Support, Non-U.S. Gov’t England Leukemia & lymphoma Leuk Lymphoma. 2009 Jan;50(1):68-79.

ER - TY - JOUR T1 - Exploring the antimicrobial action of a carbon monoxide-releasing compound through whole-genome transcription profiling of Escherichia coli JF - Microbiology Y1 - 2009 A1 - Nobre, L. S. A1 - Fatima Al-Shahrour A1 - Dopazo, J. A1 - Saraiva, L. M. KW - Bacterial Genes KW - Bacterial/genetics KW - Biofilms Carbon Monoxide/*metabolism Escherichia coli/*genetics/metabolism Escherichia coli Proteins/genetics/metabolism *Gene Expression Profiling Gene Expression Regulation KW - Regulator Genetic Complementation Test Methionine/metabolism Microbial Viability Mutation Oligonucleotide Array Sequence Analysis Organometallic Compounds/*pharmacology Phenotype RNA AB -

We recently reported that carbon monoxide (CO) has bactericidal activity. To understand its mode of action we analysed the gene expression changes occurring when Escherichia coli, grown aerobically and anaerobically, is treated with the CO-releasing molecule CORM-2 (tricarbonyldichlororuthenium(II) dimer). Microarray analysis shows that the E. coli CORM-2 response is multifaceted, with a high number of differentially regulated genes spread through several functional categories, namely genes involved in inorganic ion transport and metabolism, regulators, and genes implicated in post-translational modification, such as chaperones. CORM-2 has a higher impact in E. coli cells grown anaerobically, as judged by the repression of genes belonging to eight functional classes which are not seen in the response of aerobically CORM-2-treated cells. The biological relevance of the variations caused by CORM-2 was substantiated by studying the CORM-2 sensitivity of selected E. coli mutants. The results show that the deletion of redox-sensing regulators SoxS and OxyR increased the sensitivity to CORM-2 and suggest that while SoxS plays an important role in protection against CORM-2 under both growth conditions, OxyR seems to participate only in the aerobic CORM-2 response. Under anaerobic conditions, we found that the heat-shock proteins IbpA and IbpB contribute to CORM-2 defence since the deletion of these genes increases the sensitivity of the strain. The induction of several met genes and the hypersensitivity to CORM-2 of the DeltametR, DeltametI and DeltametN mutant strains suggest that CO has effects on the methionine metabolism of E. coli. CORM-2 also affects the transcription of several E. coli biofilm-related genes and increases biofilm formation in E. coli. In particular, the absence of tqsA or bhsA increases the resistance of E. coli to CORM-2, and deletion of tsqA leads to a strain that has lost its capacity to form biofilm upon treatment with CORM-2. In spite of the relatively stable nature of the CO molecule, our results show that CO is able to trigger a significant alteration in the transcriptome of E. coli which necessarily has effects in several key metabolic pathways.

VL - 155 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19246752 N1 -

Nobre, Ligia S Al-Shahrour, Fatima Dopazo, Joaquin Saraiva, Ligia M Research Support, Non-U.S. Gov’t England Microbiology (Reading, England) Microbiology. 2009 Mar;155(Pt 3):813-24.

ER - TY - JOUR T1 - Formulating and testing hypotheses in functional genomics JF - Artif Intell Med Y1 - 2009 A1 - Dopazo, J. KW - babelomics KW - gene set analysis AB -

OBJECTIVE: The ultimate goal of any genome-scale experiment is to provide a functional interpretation of the results, relating the available genomic information to the hypotheses that originated the experiment. METHODS AND RESULTS: Initially, this interpretation has been made on a pre-selection of relevant genes, based on the experimental values, followed by the study of the enrichment in some functional properties. Nevertheless, functional enrichment methods, demonstrated to have a flaw: the first step of gene selection was too stringent given that the cooperation among genes was ignored. The assumption that modules of genes related by relevant biological properties (functionality, co-regulation, chromosomal location, etc.) are the real actors of the cell biology lead to the development of new procedures, inspired in systems biology criteria, generically known as gene-set methods. These methods have been successfully used to analyze transcriptomic and large-scale genotyping experiments as well as to test other different genome-scale hypothesis in other fields such as phylogenomics.

VL - 45 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18789659 N1 -

Dopazo, Joaquin Research Support, Non-U.S. Gov’t Netherlands Artificial intelligence in medicine Artif Intell Med. 2009 Feb-Mar;45(2-3):97-107. Epub 2008 Sep 11.

ER - TY - JOUR T1 - Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments JF - Nucleic Acids Res Y1 - 2008 A1 - Fatima Al-Shahrour A1 - Carbonell, J. A1 - Minguez, P. A1 - Goetz, S. A1 - A. Conesa A1 - Tarraga, J. A1 - Medina, Ignacio A1 - Alloza, E. A1 - Montaner, D. A1 - Dopazo, J. KW - babelomics KW - funtional profiling AB -

We present a new version of Babelomics, a complete suite of web tools for the functional profiling of genome scale experiments, with new and improved methods as well as more types of functional definitions. Babelomics includes different flavours of conventional functional enrichment methods as well as more advanced gene set analysis methods that makes it a unique tool among the similar resources available. In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones such as Biocarta pathways or text mining-derived functional terms. Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other levels of regulation such as miRNA-mediated interference. Moreover, Babelomics allows for sub-selection of terms in order to test more focused hypothesis. Also gene annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the ’de novo’ functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program. Babelomics has been extensively re-engineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics is available at http://www.babelomics.org.

VL - 36 UR - http://nar.oxfordjournals.org/content/36/suppl_2/W341.long N1 -

Al-Shahrour, Fatima Carbonell, Jose Minguez, Pablo Goetz, Stefan Conesa, Ana Tarraga, Joaquin Medina, Ignacio Alloza, Eva Montaner, David Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W341-6. Epub 2008 May 31.

ER - TY - JOUR T1 - Biological processes, properties and molecular wiring diagrams of candidate low-penetrance breast cancer susceptibility genes JF - BMC Med Genomics Y1 - 2008 A1 - Bonifaci, N. A1 - Berenguer, A. A1 - Diez, J. A1 - Reina, O. A1 - Medina, Ignacio A1 - Dopazo, J. A1 - Moreno, V. A1 - Pujana, M. A. KW - gene set KW - GWAS KW - SNP AB -

ABSTRACT: BACKGROUND: Recent advances in whole-genome association studies (WGASs) for human cancer risk are beginning to provide the part lists of low-penetrance susceptibility genes. However, statistical analysis in these studies is complicated by the vast number of genetic variants examined and the weak effects observed, as a result of which constraints must be incorporated into the study design and analytical approach. In this scenario, biological attributes beyond the adjusted statistics generally receive little attention and, more importantly, the fundamental biological characteristics of low-penetrance susceptibility genes have yet to be determined. METHODS: We applied an integrative approach for identifying candidate low-penetrance breast cancer susceptibility genes, their characteristics and molecular networks through the analysis of diverse sources of biological evidence. RESULTS: First, examination of the distribution of Gene Ontology terms in ordered WGAS results identified asymmetrical distribution of Cell Communication and Cell Death processes linked to risk. Second, analysis of 11 different types of molecular or functional relationships in genomic and proteomic data sets defined the "omic" properties of candidate genes: i/ differential expression in tumors relative to normal tissue; ii/ somatic genomic copy number changes correlating with gene expression levels; iii/ differentially expressed across age at diagnosis; and iv/ expression changes after BRCA1 perturbation. Finally, network modeling of the effects of variants on germline gene expression showed higher connectivity than expected by chance between novel candidates and with known susceptibility genes, which supports functional relationships and provides mechanistic hypotheses of risk. CONCLUSION: This study proposes that cell communication and cell death are major biological processes perturbed in risk of breast cancer conferred by low-penetrance variants, and defines the common omic properties, molecular interactions and possible functional effects of candidate genes and proteins.

VL - 1 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19094230 N1 -

Bonifaci, Nuria Berenguer, Antoni Diez, Javier Reina, Oscar Medina, Ignacio Dopazo, Joaquin Moreno, Victor Pujana, Miguel Angel England BMC medical genomics BMC Med Genomics. 2008 Dec 18;1:62.

ER - TY - JOUR T1 - CLEAR-test: combining inference for differential expression and variability in microarray data analysis JF - J Biomed Inform Y1 - 2008 A1 - Valls, J. A1 - Grau, M. A1 - Sole, X. A1 - Hernandez, P. A1 - Montaner, D. A1 - Dopazo, J. A1 - Peinado, M. A. A1 - Capella, G. A1 - Moreno, V. A1 - Pujana, M. A. KW - *Algorithms Artificial Intelligence *Data Interpretation KW - Statistical Gene Expression Profiling/*methods Gene Expression Regulation/*physiology Oligonucleotide Array Sequence Analysis/*methods Proteome/*metabolism Signal Transduction/*physiology AB -

A common goal of microarray experiments is to detect genes that are differentially expressed under distinct experimental conditions. Several statistical tests have been proposed to determine whether the observed changes in gene expression are significant. The t-test assigns a score to each gene on the basis of changes in its expression relative to its estimated variability, in such a way that genes with a higher score (in absolute values) are more likely to be significant. Most variants of the t-test use the complete set of genes to influence the variance estimate for each single gene. However, no inference is made in terms of the variability itself. Here, we highlight the problem of low observed variances in the t-test, when genes with relatively small changes are declared differentially expressed. Alternatively, the z-test could be used although, unlike the t-test, it can declare differentially expressed genes with high observed variances. To overcome this, we propose to combine the z-test, which focuses on large changes, with a chi(2) test to evaluate variability. We call this procedure CLEAR-test and we provide a combined p-value that offers a compromise between both aspects. Analysis of three publicly available microarray datasets reveals the greater performance of the CLEAR-test relative to the t-test and alternative methods. Finally, empirical and simulated data analyses demonstrate the greater reproducibility and statistical power of the CLEAR-test and z-test with respect to current alternative methods. In addition, the CLEAR-test improves the z-test by capturing reproducible genes with high variability.

VL - 41 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17597009 N1 -

Valls, Joan Grau, Monica Sole, Xavier Hernandez, Pilar Montaner, David Dopazo, Joaquin Peinado, Miguel A Capella, Gabriel Moreno, Victor Pujana, Miguel Angel Comparative Study Research Support, Non-U.S. Gov’t United States Journal of biomedical informatics J Biomed Inform. 2008 Feb;41(1):33-45. Epub 2007 May 17.

ER - TY - JOUR T1 - Controlled ovarian stimulation induces a functional genomic delay of the endometrium with potential clinical implications JF - J Clin Endocrinol Metab Y1 - 2008 A1 - Horcajadas, J. A. A1 - Minguez, P. A1 - Dopazo, J. A1 - Esteban, F. J. A1 - Dominguez, F. A1 - Giudice, L. C. A1 - Pellicer, A. A1 - Simon, C. KW - Algorithms Chorionic Gonadotropin/genetics Endometrium/cytology/pathology/*physiology/physiopathology Female Gene Expression Regulation Genome KW - Human Glutathione Peroxidase/genetics Humans Insulin-Like Growth Factor Binding Proteins/genetics Luteal Phase/physiology Luteinizing Hormone/genetics Menstrual Cycle Oligonucleotide Array Sequence Analysis Ovulation Induction/*methods RNA/genetics/isola AB -

CONTEXT: Controlled ovarian stimulation induces morphological, biochemical, and functional genomic modifications of the human endometrium during the window of implantation. OBJECTIVE: Our objective was to compare the gene expression profile of the human endometrium in natural vs. controlled ovarian stimulation cycles throughout the early-mid secretory transition using microarray technology. METHOD: Microarray data from 49 endometrial biopsies obtained from LH+1 to LH+9 (n=25) in natural cycles and from human chorionic gonadotropin (hCG) +1 to hCG+9 in controlled ovarian stimulation cycles (n=24) were analyzed using different methods, such as clustering, profiling of biological processes, and selection of differentially expressed genes, as implemented in Gene Expression Pattern Analysis Suite and Babelomics programs. RESULTS: Endometria from natural cycles followed different genomic patterns compared with controlled ovarian stimulation cycles in the transition from the pre-receptive (days LH/hCG+1 until LH/hCG+5) to the receptive phase (day LH+7/hCG+7). Specifically, we have demonstrated the existence of a 2-d delay in the activation/repression of two clusters composed by 218 and 133 genes, respectively, on day hCG+7 vs. LH+7. Many of these delayed genes belong to the class window of implantation genes affecting basic biological processes in the receptive endometrium. CONCLUSIONS: These results demonstrate that gene expression profiling of the endometrium is different between natural and controlled ovarian stimulation cycles in the receptive phase. Identification of these differentially regulated genes can be used to understand the different developmental profiles of receptive endometrium during controlled ovarian stimulation and to search for the best controlled ovarian stimulation treatment in terms of minimal endometrial impact.

VL - 93 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18697870 N1 -

Horcajadas, Jose A Minguez, Pablo Dopazo, Joaquin Esteban, Francisco J Dominguez, Francisco Giudice, Linda C Pellicer, Antonio Simon, Carlos Research Support, Non-U.S. Gov’t United States The Journal of clinical endocrinology and metabolism J Clin Endocrinol Metab. 2008 Nov;93(11):4500-10. Epub 2008 Aug 12.

ER - TY - JOUR T1 - Direct functional assessment of the composite phenotype through multivariate projection strategies JF - Genomics Y1 - 2008 A1 - A. Conesa A1 - Bro, R. A1 - Garcia-Garcia, F. A1 - Prats, J. M. A1 - Gotz, S. A1 - Kjeldahl, K. A1 - Montaner, D. A1 - Dopazo, J. KW - Breast Neoplasms/genetics Computational Biology/*methods Databases KW - Genetic Female Gene Expression Profiling/*statistics & numerical data Humans Mathematical Computing Multivariate Analysis Phenotype AB -

We present a novel approach for the analysis of transcriptomics data that integrates functional annotation of gene sets with expression values in a multivariate fashion, and directly assesses the relation of functional features to a multivariate space of response phenotypical variables. Multivariate projection methods are used to obtain new correlated variables for a set of genes that share a given function. These new functional variables are then related to the response variables of interest. The analysis of the principal directions of the multivariate regression allows for the identification of gene function features correlated with the phenotype. Two different transcriptomics studies are used to illustrate the statistical and interpretative aspects of the methodology. We demonstrate the superiority of the proposed method over equivalent approaches.

VL - 92 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18652888 N1 -

Conesa, Ana Bro, Rasmus Garcia-Garcia, Francisco Prats, Jose Manuel Gotz, Stefan Kjeldahl, Karin Montaner, David Dopazo, Joaquin Evaluation Studies Research Support, Non-U.S. Gov’t United States Genomics Genomics. 2008 Dec;92(6):373-83. Epub 2008 Sep 13.

ER - TY - JOUR T1 - GEPAS, a web-based tool for microarray data analysis and interpretation JF - Nucleic Acids Res Y1 - 2008 A1 - Tarraga, J. A1 - Medina, Ignacio A1 - Carbonell, J. A1 - Huerta-Cepas, J. A1 - Minguez, P. A1 - Alloza, E. A1 - Fatima Al-Shahrour A1 - Vegas-Azcarate, S. A1 - Goetz, S. A1 - Escobar, P. A1 - Garcia-Garcia, F. A1 - A. Conesa A1 - Montaner, D. A1 - Dopazo, J. KW - gepas KW - microarray data analysis AB -

Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org.

VL - 36 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18508806 N1 -

Tarraga, Joaquin Medina, Ignacio Carbonell, Jose Huerta-Cepas, Jaime Minguez, Pablo Alloza, Eva Al-Shahrour, Fatima Vegas-Azcarate, Susana Goetz, Stefan Escobar, Pablo Garcia-Garcia, Francisco Conesa, Ana Montaner, David Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W308-14. Epub 2008 May 28.

ER - TY - JOUR T1 - Interoperability with Moby 1.0–it’s better than sharing your toothbrush! JF - Brief Bioinform Y1 - 2008 A1 - Wilkinson, M. D. A1 - Senger, M. A1 - Kawas, E. A1 - Bruskiewich, R. A1 - Gouzy, J. A1 - Noirot, C. A1 - Bardou, P. A1 - Ng, A. A1 - Haase, D. A1 - Saiz Ede, A. A1 - Wang, D. A1 - Gibbons, F. A1 - Gordon, P. M. A1 - Sensen, C. W. A1 - Carrasco, J. M. A1 - Fernandez, J. M. A1 - Shen, L. A1 - Links, M. A1 - Ng, M. A1 - Opushneva, N. A1 - Neerincx, P. B. A1 - Leunissen, J. A. A1 - Ernst, R. A1 - Twigger, S. A1 - Usadel, B. A1 - Good, B. A1 - Wong, Y. A1 - Stein, L. A1 - Crosby, W. A1 - Karlsson, J. A1 - Royo, R. A1 - Parraga, I. A1 - Ramirez, S. A1 - Gelpi, J. L. A1 - Trelles, O. A1 - Pisano, D. G. A1 - Jimenez, N. A1 - Kerhornou, A. A1 - Rosset, R. A1 - Zamacola, L. A1 - Tarraga, J. A1 - Huerta-Cepas, J. A1 - Carazo, J. M. A1 - Dopazo, J. A1 - R. Guigo A1 - Navarro, A. A1 - Orozco, M. A1 - Valencia, A. A1 - Claros, M. G. A1 - Perez, A. J. A1 - Aldana, J. A1 - Rojano, M. M. A1 - Fernandez-Santa Cruz, R. A1 - Navas, I. A1 - Schiltz, G. A1 - Farmer, A. A1 - Gessler, D. A1 - Schoof, H. A1 - Groscurth, A. KW - Computational Biology/*methods *Database Management Systems *Databases KW - Factual Information Storage and Retrieval/*methods *Internet *Programming Languages Systems Integration AB -

The BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased to announce the release of the 1.0 version of the interoperability framework, registry Application Programming Interface and supporting Perl and Java code-bases. Together, these provide interoperable access to over 1400 bioinformatics resources worldwide through the BioMoby platform, and this number continues to grow. Here we highlight and discuss the features of BioMoby that make it distinct from other Semantic Web Service and interoperability initiatives, and that have been instrumental to its deployment and use by a wide community of bioinformatics service providers. The standard, client software, and supporting code libraries are all freely available at http://www.biomoby.org/.

VL - 9 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18238804 N1 -

BioMoby Consortium Wilkinson, Mark D Senger, Martin Kawas, Edward Bruskiewich, Richard Gouzy, Jerome Noirot, Celine Bardou, Philippe Ng, Ambrose Haase, Dirk Saiz, Enrique de Andres Wang, Dennis Gibbons, Frank Gordon, Paul M K Sensen, Christoph W Carrasco, Jose Manuel Rodriguez Fernandez, Jose M Shen, Lixin Links, Matthew Ng, Michael Opushneva, Nina Neerincx, Pieter B T Leunissen, Jack A M Ernst, Rebecca Twigger, Simon Usadel, Bjorn Good, Benjamin Wong, Yan Stein, Lincoln Crosby, William Karlsson, Johan Royo, Romina Parraga, Ivan Ramirez, Sergio Gelpi, Josep Lluis Trelles, Oswaldo Pisano, David G Jimenez, Natalia Kerhornou, Arnaud Rosset, Roman Zamacola, Leire Tarraga, Joaquin Huerta-Cepas, Jaime Carazo, Jose Maria Dopazo, Joaquin Guigo, Roderic Navarro, Arcadi Orozco, Modesto Valencia, Alfonso Claros, M Gonzalo Perez, Antonio J Aldana, Jose Rojano, M Mar Fernandez-Santa Cruz, Raul Navas, Ismael Schiltz, Gary Farmer, Andrew Gessler, Damian Schoof, Heiko Groscurth, Andreas Research Support, Non-U.S. Gov’t Review England Briefings in bioinformatics Brief Bioinform. 2008 May;9(3):220-31. Epub 2008 Jan 31.

ER - TY - JOUR T1 - Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases JF - Nucleic Acids Res Y1 - 2008 A1 - Reumers, J. A1 - L. Conde A1 - Medina, Ignacio A1 - Maurer-Stroh, S. A1 - Van Durme, J. A1 - Dopazo, J. A1 - Rousseau, F. A1 - Schymkowitz, J. KW - Amino Acid Substitution Animals *Databases KW - Genetic Genetic Diseases KW - Inborn/genetics HSP70 Heat-Shock Proteins/metabolism Humans Internet Mice MicroRNAs/metabolism *Mutation *Polymorphism KW - Single Nucleotide Proteins/chemistry/genetics RNA Splice Sites Rats Transcription Factors/metabolism AB -

Single nucleotide polymorphisms (SNPs) are, together with copy number variation, the primary source of variation in the human genome. SNPs are associated with altered response to drug treatment, susceptibility to disease and other phenotypic variation. Furthermore, during genetic screens for disease-associated mutations in groups of patients and control individuals, the distinction between disease causing mutation and polymorphism is often unclear. Annotation of the functional and structural implications of single nucleotide changes thus provides valuable information to interpret and guide experiments. The SNPeffect and PupaSuite databases are now synchronized to deliver annotations for both non-coding and coding SNP, as well as annotations for the SwissProt set of human disease mutations. In addition, SNPeffect now contains predictions of Tango2: an improved aggregation detector, and Waltz: a novel predictor of amyloid-forming sequences, as well as improved predictors for regions that are recognized by the Hsp70 family of chaperones. The new PupaSuite version incorporates predictions for SNPs in silencers and miRNAs including their targets, as well as additional methods for predicting SNPs in TFBSs and splice sites. Also predictions for mouse and rat genomes have been added. In addition, a PupaSuite web service has been developed to enable data access, programmatically. The combined database holds annotations for 4,965,073 regulatory as well as 133,505 coding human SNPs and 14,935 disease mutations, and phenotypic descriptions of 43,797 human proteins and is accessible via http://snpeffect.vib.be and http://pupasuite.bioinfo.cipf.es/.

VL - 36 UR - http://nar.oxfordjournals.org/cgi/content/full/36/suppl_1/D825 N1 -

Reumers, Joke Conde, Lucia Medina, Ignacio Maurer-Stroh, Sebastian Van Durme, Joost Dopazo, Joaquin Rousseau, Frederic Schymkowitz, Joost Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2008 Jan;36(Database issue):D825-9. Epub 2007 Dec 17.

ER - TY - JOUR T1 - Molecular profiling related to poor prognosis in thyroid carcinoma. Combining gene expression data and biological information JF - Oncogene Y1 - 2008 A1 - Montero-Conde, C. A1 - Martin-Campos, J. M. A1 - Lerma, E. A1 - Gimenez, G. A1 - Martinez-Guitarte, J. L. A1 - Combalia, N. A1 - Montaner, D. A1 - Matias-Guiu, X. A1 - Dopazo, J. A1 - de Leiva, A. A1 - M. Robledo A1 - Mauricio, D. KW - Adenoma/genetics/metabolism/pathology Adolescent Adult Aged Carcinoma/genetics/metabolism/pathology Carcinoma KW - Biological/*genetics/metabolism KW - Neoplasm/genetics/metabolism Reverse Transcriptase Polymerase Chain Reaction Signal Transduction Thyroid Neoplasms/classification/*genetics/metabolism Tumor Markers KW - Neoplastic Humans Male Middle Aged *Oligonucleotide Array Sequence Analysis Prognosis RNA KW - Papillary/genetics/metabolism/pathology Cell Differentiation Female *Gene Expression Profiling *Gene Expression Regulation AB -

Undifferentiated and poorly differentiated thyroid tumors are responsible for more than half of thyroid cancer patient deaths in spite of their low incidence. Conventional treatments do not obtain substantial benefits, and the lack of alternative approaches limits patient survival. Additionally, the absence of prognostic markers for well-differentiated tumors complicates patient-specific treatments and favors the progression of recurrent forms. In order to recognize the molecular basis involved in tumor dedifferentiation and identify potential markers for thyroid cancer prognosis prediction, we analysed the expression profile of 44 thyroid primary tumors with different degrees of dedifferentiation and aggressiveness using cDNA microarrays. Transcriptome comparison of dedifferentiated and well-differentiated thyroid tumors identified 1031 genes with >2-fold difference in absolute values and false discovery rate of <0.15. According to known molecular interaction and reaction networks, the products of these genes were mainly clustered in the MAPkinase signaling pathway, the TGF-beta signaling pathway, focal adhesion and cell motility, activation of actin polymerization and cell cycle. An exhaustive search in several databases allowed us to identify various members of the matrix metalloproteinase, melanoma antigen A and collagen gene families within the upregulated gene set. We also identified a prognosis classifier comprising just 30 transcripts with an overall accuracy of 95%. These findings may clarify the molecular mechanisms involved in thyroid tumor dedifferentiation and provide a potential prognosis predictor as well as targets for new therapies.

VL - 27 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17873908 N1 -

Montero-Conde, C Martin-Campos, J M Lerma, E Gimenez, G Martinez-Guitarte, J L Combalia, N Montaner, D Matias-Guiu, X Dopazo, J de Leiva, A Robledo, M Mauricio, D Research Support, Non-U.S. Gov’t England Oncogene Oncogene. 2008 Mar 6;27(11):1554-61. Epub 2007 Sep 17.

ER - TY - JOUR T1 - PhylomeDB: a database for genome-wide collections of gene phylogenies JF - Nucleic Acids Res Y1 - 2008 A1 - Huerta-Cepas, J. A1 - Bueno, A. A1 - Dopazo, J. A1 - Gabaldón, T. KW - Ancient Humans *Phylogeny Proteins/classification/genetics Saccharomyces cerevisiae/classification/genetics Sequence Alignment KW - Base Sequence Escherichia coli/classification/genetics Genes *Genomics History AB - The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es. VL - 36 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17962297 N1 - Huerta-Cepas, Jaime Bueno, Anibal Dopazo, Joaquin Gabaldon, Toni Historical Article Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2008 Jan;36(Database issue):D491-6. Epub 2007 Oct 25. ER - TY - JOUR T1 - SNP and haplotype mapping for genetic analysis in the rat JF - Nat Genet Y1 - 2008 A1 - K. Saar A1 - A. Beck A1 - M. T. Bihoreau A1 - E. Birney A1 - D. Brocklebank A1 - Y. Chen A1 - E. Cuppen A1 - S. Demonchy A1 - Dopazo, J. A1 - P. Flicek A1 - M. Foglio A1 - A. Fujiyama A1 - I. G. Gut A1 - D. Gauguier A1 - R. Guigo A1 - V. Guryev A1 - M. Heinig A1 - O. Hummel A1 - N. Jahn A1 - S. Klages A1 - V. Kren A1 - M. Kube A1 - H. Kuhl A1 - Kuramoto, T. A1 - Kuroki, Y. A1 - Lechner, D. A1 - Lee, Y. A. A1 - Lopez-Bigas, N. A1 - Lathrop, G. M. A1 - Mashimo, T. A1 - Medina, Ignacio A1 - Mott, R. A1 - Patone, G. A1 - Perrier-Cornet, J. A. A1 - Platzer, M. A1 - Pravenec, M. A1 - Reinhardt, R. A1 - Sakaki, Y. A1 - Schilhabel, M. A1 - Schulz, H. A1 - Serikawa, T. A1 - Shikhagaie, M. A1 - Tatsumoto, S. A1 - Taudien, S. A1 - Toyoda, A. A1 - Voigt, B. A1 - Zelenika, D. A1 - Zimdahl, H. A1 - Hubner, N. KW - Animals Chromosome Mapping *Databases KW - Genetic KW - Genetic Genome *Haplotypes Linkage Disequilibrium Phylogeny *Polymorphism KW - Inbred Strains/*genetics Recombination KW - Single Nucleotide *Quantitative Trait Loci Rats/*genetics Rats AB -

The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies.

VL - 40 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18443594 N1 -

STAR Consortium Saar, Kathrin Beck, Alfred Bihoreau, Marie-Therese Birney, Ewan Brocklebank, Denise Chen, Yuan Cuppen, Edwin Demonchy, Stephanie Dopazo, Joaquin Flicek, Paul Foglio, Mario Fujiyama, Asao Gut, Ivo G Gauguier, Dominique Guigo, Roderic Guryev, Victor Heinig, Matthias Hummel, Oliver Jahn, Niels Klages, Sven Kren, Vladimir Kube, Michael Kuhl, Heiner Kuramoto, Takashi Kuroki, Yoko Lechner, Doris Lee, Young-Ae Lopez-Bigas, Nuria Lathrop, G Mark Mashimo, Tomoji Medina, Ignacio Mott, Richard Patone, Giannino Perrier-Cornet, Jeanne-Antide Platzer, Matthias Pravenec, Michal Reinhardt, Richard Sakaki, Yoshiyuki Schilhabel, Markus Schulz, Herbert Serikawa, Tadao Shikhagaie, Medya Tatsumoto, Shouji Taudien, Stefan Toyoda, Atsushi Voigt, Birger Zelenika, Diana Zimdahl, Heike Hubner, Norbert 057733/Z/99/A/Wellcome Trust/United Kingdom 066780/Z/01/Z/Wellcome Trust/United Kingdom Research Support, Non-U.S. Gov’t Technical Report United States Nature genetics Nat Genet. 2008 May;40(5):560-6.

ER - TY - JOUR T1 - Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans JF - Hum Mutat Y1 - 2008 A1 - E. Capriotti A1 - Arbiza, L. A1 - Casadio, R. A1 - Dopazo, J. A1 - H. Dopazo A1 - M. A. Marti-Renom KW - Algorithms Codon/genetics Computational Biology/*methods *DNA Mutational Analysis Databases KW - Human Humans Iduronic Acid/analogs & derivatives/metabolism *Point Mutation Polymorphism KW - Molecular *Genetic Predisposition to Disease Genetic Variation Genome KW - Protein *Evolution KW - Single Nucleotide Proteins/chemistry/*genetics Tumor Suppressor Protein p53/genetics AB - Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies. VL - 29 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17935148 N1 - Capriotti, Emidio Arbiza, Leonardo Casadio, Rita Dopazo, Joaquin Dopazo, Hernan Marti-Renom, Marc A Evaluation Studies Research Support, Non-U.S. Gov’t United States Human mutation Hum Mutat. 2008 Jan;29(1):198-204. ER - TY - JOUR T1 - The AnnoLite and AnnoLyze programs for comparative annotation of protein structures JF - BMC Bioinformatics Y1 - 2007 A1 - M. A. Marti-Renom A1 - Rossi, A. A1 - Fatima Al-Shahrour A1 - Davis, F. P. A1 - Pieper, U. A1 - Dopazo, J. A1 - Sali, A. KW - *Algorithms Amino Acid Sequence Confidence Intervals Data Interpretation KW - Amino Acid *Software Structure-Activity Relationship KW - Protein Information Storage and Retrieval/methods Molecular Sequence Data Proteins/*chemistry/classification/*metabolism Sensitivity and Specificity Sequence Alignment/*methods Sequence Analysis KW - Protein/*methods Sequence Homology KW - Statistical *Databases AB - BACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of 90% and average precision of 80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of 70% and average precision of 30%, correctly localizing binding sites for small molecules in 95% of its predictions. CONCLUSION: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/. VL - 8 Suppl 4 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17570147 N1 - Marti-Renom, Marc A Rossi, Andrea Al-Shahrour, Fatima Davis, Fred P Pieper, Ursula Dopazo, Joaquin Sali, Andrej Research Support, Non-U.S. Gov’t England BMC bioinformatics BMC Bioinformatics. 2007 May 22;8 Suppl 4:S4. ER - TY - JOUR T1 - Association study of 69 genes in the ret pathway identifies low-penetrance loci in sporadic medullary thyroid carcinoma JF - Cancer Res Y1 - 2007 A1 - Ruiz-Llorente, S. A1 - Montero-Conde, C. A1 - Milne, R. L. A1 - Moya, C. M. A1 - Cebrian, A. A1 - Leton, R. A1 - Cascon, A. A1 - Mercadillo, F. A1 - Landa, I. A1 - Borrego, S. A1 - Perez de Nanclares, G. A1 - Alvarez-Escola, C. A1 - Diaz-Perez, J. A. A1 - Carracedo, A. A1 - Urioste, M. A1 - Gonzalez-Neira, A. A1 - Benitez, J. A1 - Santisteban, P. A1 - Dopazo, J. A1 - Ponder, B. A. A1 - M. Robledo KW - 80 and over Carcinoma KW - Adolescent Adult Aged Aged KW - Genetic KW - Genetic Proto-Oncogene Proteins c-ret/*genetics/metabolism Signal Transduction Thyroid Neoplasms/*genetics/metabolism Transcription KW - Medullary/*genetics/metabolism Case-Control Studies Cyclin-Dependent Kinase Inhibitor p15/biosynthesis/genetics Female Genetic Predisposition to Disease Germ-Line Mutation Haplotypes Humans Male Middle Aged Penetrance Polymorphism KW - Single Nucleotide Promoter Regions AB - To date, few association studies have been done to better understand the genetic basis for the development of sporadic medullary thyroid carcinoma (sMTC). To identify additional low-penetrance genes, we have done a two-stage case-control study in two European populations using high-throughput genotyping. We selected 417 single nucleotide polymorphisms (SNP) belonging to 69 genes either related to RET signaling pathway/functions or involved in key processes for cancer development. TagSNPs and functional variants were included where possible. These SNPs were initially studied in the largest known series of sMTC cases (n = 266) and controls (n = 422), all of Spanish origin. In stage II, an independent British series of 155 sMTC patients and 531 controls was included to validate the previous results. Associations were assessed by an exhaustive analysis of individual SNPs but also considering gene- and linkage disequilibrium-based haplotypes. This strategy allowed us to identify seven low-penetrance genes, six of them (STAT1, AURKA, BCL2, CDKN2B, CDK6, and COMT) consistently associated with sMTC risk in the two case-control series and a seventh (HRAS) with individual SNPs and haplotypes associated with sMTC in the Spanish data set. The potential role of CDKN2B was confirmed by a functional assay showing a role of a SNP (rs7044859) in the promoter region in altering the binding of the transcription factor HNF1. These results highlight the utility of association studies using homogeneous series of cases for better understanding complex diseases. VL - 67 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17909067 N1 - Ruiz-Llorente, Sergio Montero-Conde, Cristina Milne, Roger L Moya, Christian M Cebrian, Arancha Leton, Rocio Cascon, Alberto Mercadillo, Fatima Landa, Inigo Borrego, Salud Perez de Nanclares, Guiomar Alvarez-Escola, Cristina Diaz-Perez, Jose Angel Carracedo, Angel Urioste, Miguel Gonzalez-Neira, Anna Benitez, Javier Santisteban, Pilar Dopazo, Joaquin Ponder, Bruce A Robledo, Mercedes Medullary Thyroid Carcinoma Clinical Group Research Support, Non-U.S. Gov’t United States Cancer research Cancer Res. 2007 Oct 1;67(19):9561-7. ER - TY - JOUR T1 - DBAli tools: mining the protein structure space JF - Nucleic Acids Res Y1 - 2007 A1 - M. A. Marti-Renom A1 - Pieper, U. A1 - Madhusudhan, M. S. A1 - Rossi, A. A1 - Eswar, N. A1 - Davis, F. P. A1 - Fatima Al-Shahrour A1 - Dopazo, J. A1 - Sali, A. KW - *Algorithms Amino Acid Sequence Computational Biology/*methods Data Interpretation KW - Amino Acid *Software Structure-Activity Relationship KW - Protein Internet Molecular Sequence Data Protein Conformation Proteins/*chemistry/classification/*metabolism Pseudomonas aeruginosa/*metabolism Sequence Alignment/*methods Sequence Analysis KW - Protein/*methods Sequence Homology KW - Statistical *Databases AB - The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions. VL - 35 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17478513 N1 - Marti-Renom, Marc A Pieper, Ursula Madhusudhan, M S Rossi, Andrea Eswar, Narayanan Davis, Fred P Al-Shahrour, Fatima Dopazo, Joaquin Sali, Andrej GM 62529/GM/NIGMS NIH HHS/United States GM074929/GM/NIGMS NIH HHS/United States GM54762/GM/NIGMS NIH HHS/United States GM71790/GM/NIGMS NIH HHS/United States Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2007 Jul;35(Web Server issue):W393-7. Epub 2007 May 3. ER - TY - JOUR T1 - Evidence for systems-level molecular mechanisms of tumorigenesis JF - BMC Genomics Y1 - 2007 A1 - Hernandez, P. A1 - Huerta-Cepas, J. A1 - Montaner, D. A1 - Fatima Al-Shahrour A1 - Valls, J. A1 - Gomez, L. A1 - Capella, G. A1 - Dopazo, J. A1 - Pujana, M. A. KW - *Cell Transformation KW - Biological Models KW - Genetic Models KW - Messenger/metabolism Signal Transduction Systems Biology KW - Neoplastic *Gene Expression Profiling *Gene Expression Regulation KW - Neoplastic Humans Male Models KW - Statistical Neoplasm Proteins/*physiology Neoplasms/etiology/*genetics Prostatic Neoplasms/genetics Protein Interaction Mapping RNA AB - BACKGROUND: Cancer arises from the consecutive acquisition of genetic alterations. Increasing evidence suggests that as a consequence of these alterations, molecular interactions are reprogrammed in the context of highly connected and regulated cellular networks. Coordinated reprogramming would allow the cell to acquire the capabilities for malignant growth. RESULTS: Here, we determine the coordinated function of cancer gene products (i.e., proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts, hereafter referred to as "CGPs") defined as their topological properties and organization in the interactome network. We show that CGPs are central to information exchange and propagation and that they are specifically organized to promote tumorigenesis. Centrality is identified by both local (degree) and global (betweenness and closeness) measures, and systematically appears in down-regulated CGPs. Up-regulated CGPs do not consistently exhibit centrality, but both types of cancer products determine the overall integrity of the network structure. In addition to centrality, down-regulated CGPs show topological association that correlates with common biological processes and pathways involved in tumorigenesis. CONCLUSION: Given the current limited coverage of the human interactome, this study proposes that tumorigenesis takes place in a specific and organized way at the molecular systems-level and suggests a model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins. VL - 8 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17584915 N1 - Hernandez, Pilar Huerta-Cepas, Jaime Montaner, David Al-Shahrour, Fatima Valls, Joan Gomez, Laia Capella, Gabriel Dopazo, Joaquin Pujana, Miguel Angel Research Support, Non-U.S. Gov’t England BMC genomics BMC Genomics. 2007 Jun 20;8:185. ER - TY - JOUR T1 - FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments JF - Nucleic Acids Res Y1 - 2007 A1 - Fatima Al-Shahrour A1 - Minguez, P. A1 - Tarraga, J. A1 - Medina, Ignacio A1 - Alloza, E. A1 - Montaner, D. A1 - Dopazo, J. KW - babelomics KW - functional enrichment analysys AB -

The ultimate goal of any genome-scale experiment is to provide a functional interpretation of the data, relating the available information with the hypotheses that originated the experiment. Thus, functional profiling methods have become essential in diverse scenarios such as microarray experiments, proteomics, etc. We present the FatiGO+, a web-based tool for the functional profiling of genome-scale experiments, specially oriented to the interpretation of microarray experiments. In addition to different functional annotations (gene ontology, KEGG pathways, Interpro motifs, Swissprot keywords and text-mining based bioentities related to diseases and chemical compounds) FatiGO+ includes, as a novelty, regulatory and structural information. The regulatory information used includes predictions of targets for distinct regulatory elements (obtained from the Transfac and CisRed databases). Additionally FatiGO+ uses predictions of target motifs of miRNA to infer which of these can be activated or deactivated in the sample of genes studied. Finally, properties of gene products related to their relative location and connections in the interactome have also been used. Also, enrichment of any of these functional terms can be directly analysed on chromosomal coordinates. FatiGO+ can be found at: http://www.fatigoplus.org and within the Babelomics environment http://www.babelomics.org.

VL - 35 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17478504 N1 -

Al-Shahrour, Fatima Minguez, Pablo Tarraga, Joaquin Medina, Ignacio Alloza, Eva Montaner, David Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2007 Jul;35(Web Server issue):W91-6. Epub 2007 May 3.

ER - TY - JOUR T1 - From genes to functional classes in the study of biological systems JF - BMC Bioinformatics Y1 - 2007 A1 - Fatima Al-Shahrour A1 - Arbiza, L. A1 - H. Dopazo A1 - Huerta-Cepas, J. A1 - Minguez, P. A1 - Montaner, D. A1 - Dopazo, J. KW - Algorithms Chromosome Mapping/*methods Computer Simulation Gene Expression Profiling/methods *Models KW - babelomics KW - Biological Multigene Family/*physiology Signal Transduction/*physiology *Software Systems Biology/*methods *User-Computer Interface AB -

BACKGROUND: With the popularization of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at: http://www.babelomics.org.

VL - 8 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17407596 N1 -

Al-Shahrour, Fatima Arbiza, Leonardo Dopazo, Hernan Huerta-Cepas, Jaime Minguez, Pablo Montaner, David Dopazo, Joaquin Research Support, Non-U.S. Gov’t England BMC bioinformatics BMC Bioinformatics. 2007 Apr 3;8:114.

ER - TY - JOUR T1 - Functional profiling and gene expression analysis of chromosomal copy number alterations JF - Bioinformation Y1 - 2007 A1 - L. Conde A1 - Montaner, D. A1 - Burguet-Castell, J. A1 - Tarraga, J. A1 - Fatima Al-Shahrour A1 - Dopazo, J. KW - babelomics AB -

Contrarily to the traditional view in which only one or a few key genes were supposed to be the causative factors of diseases, we discuss the importance of considering groups of functionally related genes in the study of pathologies characterised by chromosomal copy number alterations. Recent observations have reported the existence of regions in higher eukaryotic chromosomes (including humans) containing genes of related function that show a high degree of coregulation. Copy number alterations will consequently affect to clusters of functionally related genes, which will be the final causative agents of the diseased phenotype, in many cases. Therefore, we propose that the functional profiling of the regions affected by copy number alterations must be an important aspect to take into account in the understanding of this type of pathologies. To illustrate this, we present an integrated study of DNA copy number variations, gene expression along with the functional profiling of chromosomal regions in a case of multiple myeloma.

VL - 1 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17597935 N1 -

Conde, Lucia Montaner, David Burguet-Castell, Jordi Tarraga, Joaquin Al-Shahrour, Fatima Dopazo, Joaquin Singapore Bioinformation Bioinformation. 2007 Apr 10;1(10):432-5.

ER - TY - JOUR T1 - Functional profiling of microarray experiments using text-mining derived bioentities JF - Bioinformatics Y1 - 2007 A1 - Minguez, P. A1 - Fatima Al-Shahrour A1 - Montaner, D. A1 - Dopazo, J. KW - Artificial Intelligence *Databases KW - babelomics KW - Protein Gene Expression Profiling/*methods Information Storage and Retrieval/*methods *Natural Language Processing Proteins/*classification/*metabolism Research/*methods Systems Integration AB -

MOTIVATION: The increasing use of microarray technologies brought about a parallel demand in methods for the functional interpretation of the results. Beyond the conventional functional annotations for genes, such as gene ontology, pathways, etc. other sources of information are still to be exploited. Text-mining methods allow extracting informative terms (bioentities) with different functional, chemical, clinical, etc. meanings, that can be associated to genes. We show how to use these associations within an appropriate statistical framework and how to apply them through easy-to-use, web-based environments to the functional interpretation of microarray experiments. Functional enrichment and gene set enrichment tests using bioentities are presented.

VL - 23 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17855415 N1 -

Minguez, Pablo Al-Shahrour, Fatima Montaner, David Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2007 Nov 15;23(22):3098-9. Epub 2007 Sep 13.

ER - TY - JOUR T1 - The human phylome JF - Genome Biol Y1 - 2007 A1 - Huerta-Cepas, J. A1 - H. Dopazo A1 - Dopazo, J. A1 - Gabaldón, T. KW - Animals *Evolution Evolution KW - DNA KW - Molecular Gene Duplication *Genome Humans *Phylogeny Proteins/genetics Sequence Analysis AB - BACKGROUND: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them. RESULTS: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes. CONCLUSION: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms. VL - 8 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17567924 N1 - Huerta-Cepas, Jaime Dopazo, Hernan Dopazo, Joaquin Gabaldon, Toni Research Support, Non-U.S. Gov’t England Genome biology Genome Biol. 2007;8(6):R109. ER - TY - JOUR T1 - ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling JF - Nucleic Acids Res Y1 - 2007 A1 - L. Conde A1 - Montaner, D. A1 - Burguet-Castell, J. A1 - Tarraga, J. A1 - Medina, Ignacio A1 - Fatima Al-Shahrour A1 - Dopazo, J. KW - Animals Cluster Analysis Computational Biology/*methods Computer Graphics Gene Expression Profiling/*methods Humans Internet Models KW - Genetic *Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis/*methods Programming Languages *Software Systems Integration User-Computer Interface AB - We present the ISACGH, a web-based system that allows for the combination of genomic data with gene expression values and provides different options for functional profiling of the regions found. Several visualization options offer a convenient representation of the results. Different efficient methods for accurate estimation of genomic copy number from array-CGH hybridization data have been included in the program. Moreover, the connection to the gene expression analysis package GEPAS allows the use of different facilities for data pre-processing and analysis. A DAS server allows exporting the results to the Ensembl viewer where contextual genomic information can be obtained. The program is freely available at: http://isacgh.bioinfo.cipf.es or within http://www.gepas.org. VL - 35 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17468499 N1 - Conde, Lucia Montaner, David Burguet-Castell, Jordi Tarraga, Joaquin Medina, Ignacio Al-Shahrour, Fatima Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2007 Jul;35(Web Server issue):W81-5. Epub 2007 Apr 27. ER - TY - JOUR T1 - Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics JF - Nucleic Acids Res Y1 - 2007 A1 - Tarraga, J. A1 - Medina, Ignacio A1 - Arbiza, L. A1 - Huerta-Cepas, J. A1 - Gabaldón, T. A1 - Dopazo, J. A1 - H. Dopazo KW - Animals Computational Biology/*methods Databases KW - DNA Sequence Analysis KW - Genetic Evolution KW - Molecular Genetic Techniques Humans *Internet Models KW - Protein Software User-Computer Interface KW - Statistical *Phylogeny Programming Languages Sequence Alignment Sequence Analysis AB - Phylemon is an online platform for phylogenetic and evolutionary analyses of molecular sequence data. It has been developed as a web server that integrates a suite of different tools selected among the most popular stand-alone programs in phylogenetic and evolutionary analysis. It has been conceived as a natural response to the increasing demand of data analysis of many experimental scientists wishing to add a molecular evolution and phylogenetics insight into their research. Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction including methods for evolutionary rates analyses and molecular adaptation. Phylemon has several features that differentiates it from other resources: (i) It offers an integrated environment that enables the direct concatenation of evolutionary analyses, the storage of results and handles required data format conversions, (ii) Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding the user and facilitating the integration of multi-step analyses, and (iii) users can define and save complete pipelines for specific phylogenetic analysis to be automatically used on many genes in subsequent sessions or multiple genes in a single session (phylogenomics). The Phylemon web server is available at http://phylemon.bioinfo.cipf.es. VL - 35 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17452346 N1 - Tarraga, Joaquin Medina, Ignacio Arbiza, Leonardo Huerta-Cepas, Jaime Gabaldon, Toni Dopazo, Joaquin Dopazo, Hernan Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2007 Jul;35(Web Server issue):W38-42. Epub 2007 Apr 22. ER - TY - JOUR T1 - Prophet, a web-based tool for class prediction using microarray data JF - Bioinformatics Y1 - 2007 A1 - Medina, Ignacio A1 - Montaner, D. A1 - Tarraga, J. A1 - Dopazo, J. KW - babelomics KW - gepas KW - predictors AB -

Sample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found. Availability: Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/ Supplementary information: http://gepas.bioinfo.cipf.es/tutorial/prophet.html.

VL - 23 UR - http://bioinformatics.oxfordjournals.org/cgi/content/full/23/3/390?view=long&pmid=17138587 N1 -

Medina, Ignacio Montaner, David Tarraga, Joaquin Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2007 Feb 1;23(3):390-1. Epub 2006 Nov 30.

ER - TY - JOUR T1 - BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments JF - Nucleic Acids Res Y1 - 2006 A1 - Fatima Al-Shahrour A1 - Minguez, P. A1 - Tarraga, J. A1 - Montaner, D. A1 - Alloza, E. A1 - Vaquerizas, J. M. A1 - L. Conde A1 - Blaschke, C. A1 - Vera, J. A1 - Dopazo, J. KW - babelomics KW - functional profiling AB -

We present a new version of Babelomics, a complete suite of web tools for functional analysis of genome-scale experiments, with new and improved tools. New functionally relevant terms have been included such as CisRed motifs or bioentities obtained by text-mining procedures. An improved indexing has considerably speeded up several of the modules. An improved version of the FatiScan method for studying the coordinate behaviour of groups of functionally related genes is presented, along with a similar tool, the Gene Set Enrichment Analysis. Babelomics is now more oriented to test systems biology inspired hypotheses. Babelomics can be found at http://www.babelomics.org.

VL - 34 UR - http://nar.oxfordjournals.org/content/34/suppl_2/W472.long N1 -

Al-Shahrour, Fatima Minguez, Pablo Tarraga, Joaquin Montaner, David Alloza, Eva Vaquerizas, Juan M Conde, Lucia Blaschke, Christian Vera, Javier Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W472-6.

ER - TY - JOUR T1 - Bioinformatics and cancer: an essential alliance JF - Clin Transl Oncol Y1 - 2006 A1 - Dopazo, J. AB -

Modern research in cancer has been revolutionized by the introduction of new high-throughput methodologies such as DNA microarrays. Keeping the pace with these technologies, the bioinformatics offer new solutions for data analysis and, what is more important, it permits to formulate a new class of hypothesis inspired in systems biology, more oriented to blocks of functionally-related genes. Although software implementations for this new methodologies is new there are some options already available. Bioinformatic solutions for other high-throughput techniques such as array-CGH of large-scale genotyping is also revised.

VL - 8 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16790393 N1 -

Dopazo, Joaquin Comparative Study Research Support, Non-U.S. Gov’t Review Spain Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico Clin Transl Oncol. 2006 Jun;8(6):409-15.

ER - TY - JOUR T1 - Discovery and hypothesis generation through bioinformatics JF - Genome Biol Y1 - 2006 A1 - Dopazo, J. A1 - Aloy, P. KW - *Computational Biology Genome KW - Genetic Phylogeny KW - Human *Genomics Humans *Models AB - A report on the 4th European Conference on Computational Biology and the 6th Spanish Annual Meeting on Bioinformatics, Madrid, Spain, 28 September-1 October 2005. VL - 7 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16522224 N1 - Dopazo, Joaquin Aloy, Patrick Congresses England Genome biology Genome Biol. 2006;7(2):307. Epub 2006 Feb 27. ER - TY - JOUR T1 - ERCC4 associated with breast cancer risk: a two-stage case-control study using high-throughput genotyping JF - Cancer Res Y1 - 2006 A1 - Milne, R. L. A1 - Ribas, G. A1 - Gonzalez-Neira, A. A1 - Fagerholm, R. A1 - Salas, A. A1 - Gonzalez, E. A1 - Dopazo, J. A1 - Nevanlinna, H. A1 - M. Robledo A1 - Benitez, J. KW - 80 and over Breast Neoplasms/epidemiology/*genetics/pathology Case-Control Studies DNA-Binding Proteins/genetics/*physiology Female Finland/epidemiology Genes KW - Adult Aged Aged KW - Recessive Genetic Predisposition to Disease Genotype Humans Introns/genetics Linkage Disequilibrium Middle Aged Neoplasm Proteins/genetics/*physiology Neoplasm Staging *Polymorphism KW - Single Nucleotide Risk Spain/epidemiology AB - The failure of linkage studies to identify further high-penetrance susceptibility genes for breast cancer points to a polygenic model, with more common variants having modest effects on risk, as the most likely candidate. We have carried out a two-stage case-control study in two European populations to identify low-penetrance genes for breast cancer using high-throughput genotyping. Single-nucleotide polymorphisms (SNPs) were selected across preselected cancer-related genes, choosing tagSNPs and functional variants where possible. In stage 1, genotype frequencies for 640 SNPs in 111 genes were compared between 864 breast cancer cases and 845 controls from the Spanish population. In stage 2, candidate SNPs identified in stage 1 (nominal P < 0.01) were tested in a Finnish series of 884 cases and 1,104 controls. Of the 10 candidate SNPs in seven genes identified in stage 1, one (rs744154) on intron 1 of ERCC4, a gene belonging to the nucleotide excision repair pathway, was associated with recessive protection from breast cancer after adjustment for multiple testing in stage 2 (odds ratio, 0.57; Bonferroni-adjusted P = 0.04). After considering potential functional SNPs in the region of high linkage disequilibrium that extends across the entire gene and upstream into the promoter region, we concluded that rs744154 itself could be causal. Although intronic, it is located on the first intron, in a region that is highly conserved across species, and could therefore be functionally important. This study suggests that common intronic variation in ERCC4 is associated with protection from breast cancer. VL - 66 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17018596 N1 - Milne, Roger Laughlin Ribas, Gloria Gonzalez-Neira, Anna Fagerholm, Rainer Salas, Antonio Gonzalez, Emilio Dopazo, Joaquin Nevanlinna, Heli Robledo, Mercedes Benitez, Javier Comparative Study Multicenter Study Research Support, Non-U.S. Gov’t United States Cancer research Cancer Res. 2006 Oct 1;66(19):9420-7. ER - TY - JOUR T1 - Exploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions JF - BMC Genomics Y1 - 2006 A1 - Goni, J. R. A1 - Vaquerizas, J. M. A1 - Dopazo, J. A1 - Orozco, M. KW - Animals Base Sequence Computational Biology DNA/chemistry/*genetics/*metabolism Genome KW - Genetic/genetics Regulatory Sequences KW - Human/genetics Humans Mice Nucleic Acid Conformation Nucleotides/genetics Oligonucleotides/chemistry/*genetics/*metabolism Promoter Regions KW - Nucleic Acid/*genetics Transcription Factors/metabolism AB - BACKGROUND: DNA duplex sequences that can be targets for triplex formation are highly over-represented in the human genome, especially in regulatory regions. RESULTS: Here we studied using bioinformatics tools several properties of triplex target sequences in an attempt to determine those that make these sequences so special in the genome. CONCLUSION: Our results strongly suggest that the unique physical properties of these sequences make them particularly suitable as "separators" between protein-recognition sites in the promoter region. VL - 7 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16566817 N1 - Goni, Josep Ramon Vaquerizas, Juan Manuel Dopazo, Joaquin Orozco, Modesto Research Support, Non-U.S. Gov’t England BMC genomics BMC Genomics. 2006 Mar 27;7:63. ER - TY - JOUR T1 - Functional interpretation of microarray experiments JF - OMICS Y1 - 2006 A1 - Dopazo, J. KW - babelomics KW - Diabetes Mellitus KW - microarray data analysis AB -

Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.

VL - 10 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17069516 N1 -

Dopazo, Joaquin Research Support, Non-U.S. Gov’t United States Omics : a journal of integrative biology OMICS. 2006 Fall;10(3):398-410.

ER - TY - JOUR T1 - A function-centric approach to the biological interpretation of microarray time-series JF - Genome Inform Y1 - 2006 A1 - Minguez, P. A1 - Fatima Al-Shahrour A1 - Dopazo, J. KW - babelomics AB -

The interpretation of microarray experiments is commonly addressed by means a two-step approach in which the relevant genes are firstly selected uniquely on the basis of their experimental values (ignoring their coordinate behaviors) and in a second step their functional properties are studied to hypothesize about the biological roles they are fulfilling in the cell. Recently, different methods (e.g. GSEA or FatiScan) have been proposed to study the coordinate behavior of blocks of functionally-related genes. These methods study the distribution of functional information across lists of genes ranked according their different experimental values in a static situation, such as the comparison between two classes (e.g. healthy controls versus diseased cases). Nevertheless there is no an equivalent way of studying a dynamic situation from a functional point of view. We present a method for the functional analysis of microarrays series in which the experiments display autocorrelation between successive points (e.g. time series, dose-response experiments, etc.) The method allows to recover the dynamics of the molecular roles fulfilled by the genes along the series which provides a novel approach to functional interpretation of such experiments. The method finds blocks of functionally-related genes which are significantly and coordinately over-expressed at different points of the series. This method draws inspiration from systems biology given that the analysis does not focus on individual properties of genes but on collective behaving blocks of functionally-related genes. The FatiScan algorithm used in the method proposed is available at: http://fatiscan.bioinfo.cipf.es, or within the Babelomics suite: http://www.babelomics.org. Additional material is available at: http://bioinfo.cipf.es/data/plasmodium.

VL - 17 N1 -

Minguez, Pablo Al-Shahrour, Fatima Dopazo, Joaquin Research Support, Non-U.S. Gov’t Japan Genome informatics. International Conference on Genome Informatics Genome Inform. 2006;17(2):57-66.

ER - TY - JOUR T1 - Identification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma JF - Haematologica Y1 - 2006 A1 - Largo, C. A1 - Alvarez, S. A1 - Saez, B. A1 - Blesa, D. A1 - Martin-Subero, J. I. A1 - Gonzalez-Garcia, I. A1 - Brieva, J. A. A1 - Dopazo, J. A1 - Siebert, R. A1 - Calasanz, M. J. A1 - Cigudosa, J. C. KW - B-Cell KW - Caspases Cell Line KW - Human *Gene Amplification Gene Dosage Gene Expression Profiling *Gene Expression Regulation KW - Marginal Zone/genetics Multiple Myeloma/*genetics Neoplasm Proteins/genetics Proto-Oncogene Proteins c-bcl-2/genetics KW - Neoplasm Humans Immunoglobulin Heavy Chains/genetics Lymphoma KW - Neoplastic Gene Rearrangement *Genes KW - Tumor *Chromosomes AB - BACKGROUND AND OBJECTIVES: Multiple myeloma (MM) is a malignancy characterized by clonal expansion of plasma cells. In 50% of the cases, the neoplastic transformation begins with a chromosomal translocation that juxtaposes the IGH gene locus to an oncogene. Gene copy number changes are also frequent in MM but less characterized than in other neoplasias. We aimed to characterize genes that are amplified and overexpressed in human myeloma cell lines (HMCL) to provide putative molecular targets for MM therapy. DESIGN AND METHODS: Nine HMCL were characterized by fluorescent in situ hybridization, comparative genomic hybridization (CGH) and cDNA microarrays for gene expression profiling and copy number changes. RESULTS: After defining the IGH-translocations present in the cell lines, we conducted expression-profiling analysis. Supervised analysis identified 166 genes with significantly different expression among the cell lines harboring MMSET/FGFR3 (4p16), MAF (16q) and CCND1 (11q13) rearrangements. Array-CGH was then performed. Five chromosomes recurrently affected by gains/amplifications in primary samples and cell lines were analyzed in detail. Sixty amplified and overexpressed genes were found and 25 (42%) of them were only overexpressed when amplified; moreover, six showed a significant association between overexpression and gain/amplification. We also found co-amplification and overexpression for genes located within the same amplicons, such as MALT1 and BCL2. INTERPRETATION AND CONCLUSIONS: Parallel analysis of gene copy numbers and expression levels by cDNA microarray in MM allowed efficient identification of genes whose expression levels are elevated because of increased copy number. This is the first time that MALT1 and BCL2 have been shown to be overexpressed and amplified in MM. VL - 91 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16461302 N1 - Largo, Cristina Alvarez, Sara Saez, Borja Blesa, David Martin-Subero, Jose I Gonzalez-Garcia, Ines Brieva, Jose A Dopazo, Joaquin Siebert, Reiner Calasanz, Maria J Cigudosa, Juan C Research Support, Non-U.S. Gov’t Italy Haematologica Haematologica. 2006 Feb;91(2):184-91. ER - TY - JOUR T1 - Next station in microarray data analysis: GEPAS JF - Nucleic Acids Res Y1 - 2006 A1 - Montaner, D. A1 - Tarraga, J. A1 - Huerta-Cepas, J. A1 - Burguet, J. A1 - Vaquerizas, J. M. A1 - L. Conde A1 - Minguez, P. A1 - Vera, J. A1 - Mukherjee, S. A1 - Valls, J. A1 - Pujana, M. A. A1 - Alloza, E. A1 - Herrero, J. A1 - Fatima Al-Shahrour A1 - Dopazo, J. KW - gepas KW - microarray data analysis AB -

The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.

VL - 34 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845056 N1 -

Montaner, David Tarraga, Joaquin Huerta-Cepas, Jaime Burguet, Jordi Vaquerizas, Juan M Conde, Lucia Minguez, Pablo Vera, Javier Mukherjee, Sach Valls, Joan Pujana, Miguel A G Alloza, Eva Herrero, Javier Al-Shahrour, Fatima Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W486-91.

ER - TY - JOUR T1 - Ontology-driven approaches to analyzing data in functional genomics JF - Methods Mol Biol Y1 - 2006 A1 - F. Azuaje A1 - Fatima Al-Shahrour A1 - Dopazo, J. KW - babelomics KW - Cluster Analysis KW - Cluster Analysis Computational Biology/*methods *Data Interpretation KW - Computational Biology KW - Statistical Gene Expression Profiling KW - Statistical Gene Expression Profiling *Genomics Humans AB -

Ontologies are fundamental knowledge representations that provide not only standards for annotating and indexing biological information, but also the basis for implementing functional classification and interpretation models. This chapter discusses the application of gene ontology (GO) for predictive tasks in functional genomics. It focuses on the problem of analyzing functional patterns associated with gene products. This chapter is divided into two main parts. The first part overviews GO and its applications for the development of functional classification models. The second part presents two methods for the characterization of genomic information using GO. It discusses methods for measuring functional similarity of gene products, and a tool for supporting gene expression clustering analysis and validation.

VL - 316 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16671401 N1 -

Azuaje, Francisco Al-Shahrour, Fatima Dopazo, Joaquin Research Support, N.I.H., Extramural Research Support, Non-U.S. Gov’t Review United States Methods in molecular biology (Clifton, N.J.) Methods Mol Biol. 2006;316:67-86.

ER - TY - JOUR T1 - Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome JF - PLoS Comput Biol Y1 - 2006 A1 - Arbiza, L. A1 - Dopazo, J. A1 - H. Dopazo KW - Adaptation KW - Biological/genetics Animals *Evolution KW - Molecular Genome/*genetics Humans Pan troglodytes/*genetics *Selection (Genetics) AB - For years evolutionary biologists have been interested in searching for the genetic bases underlying humanness. Recent efforts at a large or a complete genomic scale have been conducted to search for positively selected genes in human and in chimp. However, recently developed methods allowing for a more sensitive and controlled approach in the detection of positive selection can be employed. Here, using 13,198 genes, we have deduced the sets of genes involved in rate acceleration, positive selection, and relaxation of selective constraints in human, in chimp, and in their ancestral lineage since the divergence from murids. Significant deviations from the strict molecular clock were observed in 469 human and in 651 chimp genes. The more stringent branch-site test of positive selection detected 108 human and 577 chimp positively selected genes. An important proportion of the positively selected genes did not show a significant acceleration in rates, and similarly, many of the accelerated genes did not show significant signals of positive selection. Functional differentiation of genes under rate acceleration, positive selection, and relaxation was not statistically significant between human and chimp with the exception of terms related to G-protein coupled receptors and sensory perception. Both of these were over-represented under relaxation in human in relation to chimp. Comparing differences between derived and ancestral lineages, a more conspicuous change in trends seems to have favored positive selection in the human lineage. Since most of the positively selected genes are different under the same functional categories between these species, we suggest that the individual roles of the alternative positively selected genes may be an important factor underlying biological differences between these species. VL - 2 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16683019 N1 - Arbiza, Leonardo Dopazo, Joaquin Dopazo, Hernan Research Support, Non-U.S. Gov’t United States PLoS computational biology PLoS Comput Biol. 2006 Apr;2(4):e38. Epub 2006 Apr 28. ER - TY - JOUR T1 - PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes JF - Nucleic Acids Res Y1 - 2006 A1 - L. Conde A1 - Vaquerizas, J. M. A1 - H. Dopazo A1 - Arbiza, L. A1 - Reumers, J. A1 - Rousseau, F. A1 - Schymkowitz, J. A1 - Dopazo, J. KW - Algorithms Computer Graphics Databases KW - Molecular Genotype Haplotypes Internet Linkage Disequilibrium *Polymorphism KW - Nucleic Acid Evolution KW - Single Nucleotide *Software User-Computer Interface AB -

We have developed a web tool, PupaSuite, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect, specifically oriented to help in the design of large-scale genotyping projects. PupaSuite uses a collection of data on SNPs from heterogeneous sources and a large number of pre-calculated predictions to offer a flexible and intuitive interface for selecting an optimal set of SNPs. It improves the functionality of PupaSNP and PupasView programs and implements new facilities such as the analysis of user’s data to derive haplotypes with functional information. A new estimator of putative effect of polymorphisms has been included that uses evolutionary information. Also SNPeffect database predictions have been included. The PupaSuite web interface is accessible through http://pupasuite.bioinfo.cipf.es and through http://www.pupasnp.org.

VL - 34 UR - http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W621 N1 -

Conde, Lucia Vaquerizas, Juan M Dopazo, Hernan Arbiza, Leonardo Reumers, Joke Rousseau, Frederic Schymkowitz, Joost Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W621-5.

ER - TY - JOUR T1 - Selective pressures at a codon-level predict deleterious mutations in human disease genes JF - J Mol Biol Y1 - 2006 A1 - Arbiza, L. A1 - Duchi, S. A1 - Montaner, D. A1 - Burguet, J. A1 - Pantoja-Uceda, D. A1 - Pineda-Lucena, A. A1 - Dopazo, J. A1 - H. Dopazo KW - Amino Acid Sequence Amino Acid Substitution Codon/*genetics Databases KW - Genetic Evolution KW - Genetic Models KW - Human Humans Models KW - Inborn/*genetics Genome KW - Molecular Genes KW - Molecular Molecular Sequence Data *Mutation Neoplasms/genetics Proteins/genetics *Selection (Genetics) Tumor Suppressor Protein p53/chemistry/genetics KW - p53 Genetic Diseases AB - Deleterious mutations affecting biological function of proteins are constantly being rejected by purifying selection from the gene pool. The non-synonymous/synonymous substitution rate ratio (omega) is a measure of selective pressure on amino acid replacement mutations for protein-coding genes. Different methods have been developed in order to predict non-synonymous changes affecting gene function. However, none has considered the estimation of selective constraints acting on protein residues. Here, we have used codon-based maximum likelihood models in order to estimate the selective pressures on the individual amino acid residues of a well-known model protein: p53. We demonstrate that the number of residues under strong purifying selection in p53 is much higher than those that are strictly conserved during the evolution of the species. In agreement with theoretical expectations, residues that have been noted to be of structural relevance, or in direct association with DNA, were among those showing the highest signals of purifying selection. Conversely, those changing according to a neutral, or nearly neutral mode of evolution, were observed to be irrelevant for protein function. Finally, using more than 40 human disease genes, we demonstrate that residues evolving under strong selective pressures (omega<0.1) are significantly associated (p<0.01) with human disease. We hypothesize that non-synonymous change on amino acids showing omega<0.1 will most likely affect protein function. The application of this evolutionary prediction at a genomic scale will provide an a priori hypothesis of the phenotypic effect of non-synonymous coding single nucleotide polymorphisms (SNPs) in the human genome. VL - 358 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16584746 N1 - Arbiza, Leonardo Duchi, Serena Montaner, David Burguet, Jordi Pantoja-Uceda, David Pineda-Lucena, Antonio Dopazo, Joaquin Dopazo, Hernan Research Support, Non-U.S. Gov’t England Journal of molecular biology J Mol Biol. 2006 May 19;358(5):1390-404. Epub 2006 Mar 15. ER - TY - JOUR T1 - BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments JF - Nucleic Acids Res Y1 - 2005 A1 - Fatima Al-Shahrour A1 - Minguez, P. A1 - Vaquerizas, J. M. A1 - L. Conde A1 - Dopazo, J. KW - babelomics KW - functional profiling AB -

We present Babelomics, a complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments, which includes the use of information on Gene Ontology terms, interpro motifs, KEGG pathways, Swiss-Prot keywords, analysis of predicted transcription factor binding sites, chromosomal positions and presence in tissues with determined histological characteristics, through five integrated modules: FatiGO (fast assignment and transference of information), FatiWise, transcription factor association test, GenomeGO and tissues mining tool, respectively. Additionally, another module, FatiScan, provides a new procedure that integrates biological information in combination with experimental results in order to find groups of genes with modest but coordinate significant differential behaviour. FatiScan is highly sensitive and is capable of finding significant asymmetries in the distribution of genes of common function across a list of ordered genes even if these asymmetries were not extreme. The strong multiple-testing nature of the contrasts made by the tools is taken into account. All the tools are integrated in the gene expression analysis package GEPAS. Babelomics is the natural evolution of our tool FatiGO (which analysed almost 22,000 experiments during the last year) to include more sources on information and new modes of using it. Babelomics can be found at http://www.babelomics.org.

VL - 33 UR - http://nar.oxfordjournals.org/content/33/suppl_2/W460.long N1 -

Al-Shahrour, Fatima Minguez, Pablo Vaquerizas, Juan M Conde, Lucia Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W460-4.

ER - TY - JOUR T1 - Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information JF - Bioinformatics Y1 - 2005 A1 - Fatima Al-Shahrour A1 - Diaz-Uriarte, R. A1 - Dopazo, J. KW - babelomics KW - Biological Neoplasm Proteins/genetics/*metabolism Phenotype Software Structure-Activity Relationship Systems Integration Tumor Markers KW - Biological/genetics/*metabolism KW - Breast Neoplasms/genetics/*metabolism Computer Simulation *Database Management Systems *Databases KW - Protein Documentation/methods Gene Expression Profiling/*methods Humans *Models AB -

MOTIVATION: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. RESULTS: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted.

VL - 21 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15840702 N1 -

Al-Shahrour, Fatima Diaz-Uriarte, Ramon Dopazo, Joaquin Evaluation Studies Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2005 Jul 1;21(13):2988-93. Epub 2005 Apr 19.

ER - TY - JOUR T1 - Genome-scale evidence of the nematode-arthropod clade JF - Genome Biol Y1 - 2005 A1 - H. Dopazo A1 - Dopazo, J. KW - Animals Arthropods/*classification/genetics Caenorhabditis elegans/classification/genetics Evolution KW - Molecular *Genome Genomics Nematoda/*classification/genetics *Phylogeny AB - BACKGROUND: The issue of whether coelomates form a single clade, the Coelomata, or whether all animals that moult an exoskeleton (such as the coelomate arthropods and the pseudocoelomate nematodes) form a distinct clade, the Ecdysozoa, is the most puzzling issue in animal systematics and a major open-ended subject in evolutionary biology. Previous single-gene and genome-scale analyses designed to resolve the issue have produced contradictory results. Here we present the first genome-scale phylogenetic evidence that strongly supports the Ecdysozoa hypothesis. RESULTS: Through the most extensive phylogenetic analysis carried out to date, the complete genomes of 11 eukaryotic species have been analyzed in order to find homologous sequences derived from 18 human chromosomes. Phylogenetic analysis of datasets showing an increased adjustment to equal evolutionary rates between nematode and arthropod sequences produced a gradual change from support for Coelomata to support for Ecdysozoa. Transition between topologies occurred when fast-evolving sequences of Caenorhabditis elegans were removed. When chordate, nematode and arthropod sequences were constrained to fit equal evolutionary rates, the Ecdysozoa topology was statistically accepted whereas Coelomata was rejected. CONCLUSIONS: The reliability of a monophyletic group clustering arthropods and nematodes was unequivocally accepted in datasets where traces of the long-branch attraction effect were removed. This is the first phylogenomic evidence to strongly support the ’moulting clade’ hypothesis. VL - 6 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15892869 N1 - Dopazo, Hernan Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Genome biology Genome Biol. 2005;6(5):R41. Epub 2005 Apr 28. ER - TY - JOUR T1 - GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data JF - Nucleic Acids Res Y1 - 2005 A1 - Vaquerizas, J. M. A1 - L. Conde A1 - Yankilevich, P. A1 - Cabezon, A. A1 - Minguez, P. A1 - Diaz-Uriarte, R. A1 - Fatima Al-Shahrour A1 - Herrero, J. A1 - Dopazo, J. KW - gepas KW - microarray data analysis AB -

The Gene Expression Profile Analysis Suite, GEPAS, has been running for more than three years. With >76,000 experiments analysed during the last year and a daily average of almost 300 analyses, GEPAS can be considered a well-established and widely used platform for gene expression microarray data analysis. GEPAS is oriented to the analysis of whole series of experiments. Its design and development have been driven by the demands of the biomedical community, probably the most active collective in the field of microarray users. Although clustering methods have obviously been implemented in GEPAS, our interest has focused more on methods for finding genes differentially expressed among distinct classes of experiments or correlated to diverse clinical outcomes, as well as on building predictors. There is also a great interest in CGH-arrays which fostered the development of the corresponding tool in GEPAS: InSilicoCGH. Much effort has been invested in GEPAS for developing and implementing efficient methods for functional annotation of experiments in the proper statistical framework. Thus, the popular FatiGO has expanded to a suite of programs for functional annotation of experiments, including information on transcription factor binding sites, chromosomal location and tissues. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.

VL - 33 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980548 N1 -

Vaquerizas, Juan M Conde, Lucia Yankilevich, Patricio Cabezon, Amaya Minguez, Pablo Diaz-Uriarte, Ramon Al-Shahrour, Fatima Herrero, Javier Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W616-20.

ER - TY - JOUR T1 - HCAD, closing the gap between breakpoints and genes JF - Nucleic Acids Res Y1 - 2005 A1 - Hoffmann, R. A1 - Dopazo, J. A1 - Cigudosa, J. C. A1 - Valencia, A. KW - *Chromosome Breakage Chromosome Disorders/diagnosis/*genetics *Databases KW - Genetic Genes *Genetic Predisposition to Disease Humans PubMed Systems Integration AB - Recurrent chromosome aberrations are an important resource when associating human pathologies to specific genes. However, for technical reasons a large number of chromosome breakpoints are defined only at the level of cytobands and many of the genes involved remain unidentified. We developed a web-based information system that mines the scientific literature and generates textual and comprehensive information on all human breakpoints. We show that the statistical analysis of this textual information and its combination with genomic data can identify genes directly involved in DNA rearrangements. The Human Chromosome Aberration Database (HCAD) is publicly accessible at http://www.pdg.cnb.uam.es/UniPub/HCAD/. VL - 33 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608250 N1 - Hoffmann, Robert Dopazo, Joaquin Cigudosa, Juan C Valencia, Alfonso Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2005 Jan 1;33(Database issue):D511-3. ER - TY - JOUR T1 - Highly specific and accurate selection of siRNAs for high-throughput functional assays JF - Bioinformatics Y1 - 2005 A1 - J. Santoyo A1 - Vaquerizas, J. M. A1 - Dopazo, J. KW - *Algorithms Base Sequence *Gene Silencing Molecular Sequence Data RNA KW - RNA/*methods *Software *User-Computer Interface KW - Small Interfering/*genetics Sequence Alignment/*methods Sequence Analysis AB - MOTIVATION: Small interfering RNA (siRNA) is widely used in functional genomics to silence genes by decreasing their expression to study the resulting phenotypes. The possibility of performing large-scale functional assays by gene silencing accentuates the necessity of a software capable of the high-throughput design of highly specific siRNA. The main objective sought was the design of a large number of siRNAs with appropriate thermodynamic properties and, especially, high specificity. Since all the available procedures require, to some extent, manual processing of the results to guarantee specific results, specificity constitutes to date, the major obstacle to the complete automation of all the steps necessary for the selection of optimal candidate siRNAs. RESULT: Here, we present a program that for the first time completely automates the search for siRNAs. In SiDE, the most complete set of rules for the selection of siRNA candidates (including G+C content, nucleotides at determined positions, thermodynamic properties, propensity to form internal hairpins, etc.) is implemented and moreover, specificity is achieved by a conceptually new method. After selecting possible siRNA candidates with the optimal functional properties, putative unspecific matches, which can cause cross-hybridization, are checked in databases containing a unique entry for each gene. These truly non-redundant databases are constructed from the genome annotations (Ensembl). Also intron/exon boundaries, presence of polymorphisms (single nucleotide polymorphisms) specificity for either gene or transcript, and other features can be selected to be considered in the design of siRNAs. AVAILABILITY: The program is available as a web server at http://side.bioinfo.cnio.es. The program was written under the GPL license. CONTACT: jdopazo@cnio.es. VL - 21 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15591357 N1 - Santoyo, Javier Vaquerizas, Juan M Dopazo, Joaquin Comparative Study Evaluation Studies Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2005 Apr 15;21(8):1376-82. Epub 2004 Dec 10. ER - TY - JOUR T1 - A novel candidate region linked to development of both pheochromocytoma and head/neck paraganglioma JF - Genes Chromosomes Cancer Y1 - 2005 A1 - Cascon, A. A1 - Ruiz-Llorente, S. A1 - Rodriguez-Perales, S. A1 - Honrado, E. A1 - Martinez-Ramirez, A. A1 - Leton, R. A1 - Montero-Conde, C. A1 - Benitez, J. A1 - Dopazo, J. A1 - Cigudosa, J. C. A1 - M. Robledo KW - 80 and over Child Chromosomes KW - Adolescent Adrenal Gland Neoplasms/*genetics Adult Aged Aged KW - Biological/*genetics KW - Human KW - Pair 1/genetics Chromosomes KW - Pair 11/genetics Chromosomes KW - Pair 3/genetics Chromosomes KW - Pair 8/genetics Female Gene Deletion Head and Neck Neoplasms/*genetics Humans Male Middle Aged Nucleic Acid Hybridization Paraganglioma/*genetics Pheochromocytoma/*genetics Tumor Markers AB - Although the histologic distinction between pheochromocytomas and head and neck paragangliomas is clear, little is known about the genetic differences between them. To date, various sets of genes have been found to be involved in inherited susceptibility to developing both tumor types, but the genes involved in sporadic pathogenesis are still unknown. To define new candidate regions, we performed CGH analysis on 29 pheochromocytomas and on 24 paragangliomas mainly of head and neck origin (20 of 24), which allowed us to differentiate between the two tumor types. Loss of 3q was significantly more frequent in pheochromocytomas, and loss of 1q appeared only in paragangliomas. We also found gain of 11q13 to be a significantly frequent alteration in malignant cases of both types. In addition, recurrent loss of 8p22-23 was found in 62% of pheochromocytomas (including all malignant cases) versus in 33% of paragangliomas, suggesting that this region contains candidate genes involved in the pathogenesis of this abnormality. Using FISH analysis on tissue microarrays, we confirmed genomic deletion of this region in 55% of pheochromocytomas compared to 12% of paragangliomas. Loss of 8p22-23 appears to be an important event in the sporadic development of these tumors, and additional molecular studies are necessary to identify candidate genes in this chromosomal region. VL - 42 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15609347 N1 - Cascon, Alberto Ruiz-Llorente, Sergio Rodriguez-Perales, Sandra Honrado, Emiliano Martinez-Ramirez, Angel Leton, Rocio Montero-Conde, Cristina Benitez, Javier Dopazo, Joaquin Cigudosa, Juan C Robledo, Mercedes Research Support, Non-U.S. Gov’t United States Genes, chromosomes & cancer Genes Chromosomes Cancer. 2005 Mar;42(3):260-8. ER - TY - JOUR T1 - Phenotypic characterization of BRCA1 and BRCA2 tumors based in a tissue microarray study with 37 immunohistochemical markers JF - Breast Cancer Res Treat Y1 - 2005 A1 - Palacios, J. A1 - Honrado, E. A1 - Osorio, A. A1 - Cazorla, A. A1 - Sarrio, D. A1 - Barroso, A. A1 - Rodriguez, S. A1 - Cigudosa, J. C. A1 - Diez, O. A1 - Alonso, C. A1 - Lerma, E. A1 - Dopazo, J. A1 - Rivas, C. A1 - Benitez, J. KW - Adult Apoptosis Breast Neoplasms/*genetics/*pathology Cell Cycle Proteins Cluster Analysis Female *Genes KW - Biological/genetics/metabolism KW - BRCA1 *Genes KW - BRCA2 Humans Immunohistochemistry In Situ Hybridization KW - Fluorescence Phenotype Spain *Tissue Array Analysis *Tumor Markers AB - Familial breast cancers that are associated with BRCA1 or BRCA2 germline mutations differ in both their morphological and immunohistochemical characteristics. To further characterize the molecular difference between genotypes, the authors evaluated the expression of 37 immunohistochemical markers in a tissue microarray (TMA) containing cores from 20 BRCA1, 14 BRCA2, and 59 sporadic age-matched breast carcinomas. Markers analyzed included, amog others, common markers in breast cancer, such as hormone receptors, p53 and HER2, along with 15 molecules involved in cell cycle regulation, such as cyclins, cyclin dependent kinases (CDK) and CDK inhibitors (CDKI), apoptosis markers, such as BCL2 and active caspase 3, and two basal/myoepithelial markers (CK 5/6 and P-cadherin). In addition, we analyzed the amplification of CCND1, CCNE, HER2 and MYC by FISH.Unsupervised cluster data analysis of both hereditary and sporadic cases using the complete set of immunohistochemical markers demonstrated that most BRCA1-associated carcinomas grouped in a branch of ER-, HER2-negative tumors that expressed basal cell markers and/or p53 and had higher expression of activated caspase 3. The cell cycle proteins associated with these tumors were E2F6, cyclins A, B1 and E, SKP2 and Topo IIalpha. In contrast, most BRCA2-associated carcinomas grouped in a branch composed by ER/PR/BCL2-positive tumors with a higher expression of the cell cycle proteins cyclin D1, cyclin D3, p27, p16, p21, CDK4, CDK2 and CDK1. In conclusion, our study in hereditary breast cancer tumors analyzing 37 immunohistochemical markers, define the molecular differences between BRCA1 and BRCA2 tumors with respect to hormonal receptors, cell cycle, apoptosis and basal cell markers. VL - 90 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15770521 N1 - Palacios, Jose Honrado, Emiliano Osorio, Ana Cazorla, Alicia Sarrio, David Barroso, Alicia Rodriguez, Sandra Cigudosa, Juan C Diez, Orland Alonso, Carmen Lerma, Enrique Dopazo, Joaquin Rivas, Carmen Benitez, Javier Research Support, Non-U.S. Gov’t Netherlands Breast cancer research and treatment Breast Cancer Res Treat. 2005 Mar;90(1):5-14. ER - TY - JOUR T1 - A predictor based on the somatic genomic changes of the BRCA1/BRCA2 breast cancer tumors identifies the non-BRCA1/BRCA2 tumors with BRCA1 promoter hypermethylation JF - Clin Cancer Res Y1 - 2005 A1 - Alvarez, S. A1 - Diaz-Uriarte, R. A1 - Osorio, A. A1 - Barroso, A. A1 - Melchor, L. A1 - Paz, M. F. A1 - Honrado, E. A1 - Rodriguez, R. A1 - Urioste, M. A1 - Valle, L. A1 - Diez, O. A1 - Cigudosa, J. C. A1 - Dopazo, J. A1 - Esteller, M. A1 - Benitez, J. KW - BRCA1 Protein/*genetics BRCA2 Protein/*genetics Breast Neoplasms/*genetics/pathology Chromosomes KW - Genetic/*genetics KW - Human KW - Human Humans Male Mutation Nucleic Acid Hybridization/methods Promoter Regions KW - Pair 12/genetics Chromosomes KW - Pair 15/genetics Chromosomes KW - Pair 18/genetics Chromosomes KW - Pair 2/genetics Chromosomes KW - Pair 8/genetics *DNA Methylation Female Genome AB - The genetic changes underlying in the development and progression of familial breast cancer are poorly understood. To identify a somatic genetic signature of tumor progression for each familial group, BRCA1, BRCA2, and non-BRCA1/BRCA2 (BRCAX) tumors, by high-resolution comparative genomic hybridization, we have analyzed 77 tumors previously characterized for BRCA1 and BRCA2 germ line mutations. Based on a combination of the somatic genetic changes observed at the six most different chromosomal regions and the status of the estrogen receptor, we developed using random forests a molecular classifier, which assigns to a given tumor a probability to belong either to the BRCA1 or to the BRCA2 class. Because 76.5% (26 of 34) of the BRCAX cases were classified with our predictor to the BRCA1 class with a probability of >50%, we analyzed the BRCA1 promoter region for aberrant methylation in all the BRCAX cases. We found that 15 of the 34 BRCAX analyzed tumors had hypermethylation of the BRCA1 gene. When we considered the predictor, we observed that all the cases with this epigenetic event were assigned to the BRCA1 class with a probability of >50%. Interestingly, 84.6% of the cases (11 of 13) assigned to the BRCA1 class with a probability >80% had an aberrant methylation of the BRCA1 promoter. This fact suggests that somatic BRCA1 inactivation could modify the profile of tumor progression in most of the BRCAX cases. VL - 11 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15709182 N1 - Alvarez, Sara Diaz-Uriarte, Ramon Osorio, Ana Barroso, Alicia Melchor, Lorenzo Paz, Maria Fe Honrado, Emiliano Rodriguez, Raquel Urioste, Miguel Valle, Laura Diez, Orland Cigudosa, Juan Cruz Dopazo, Joaquin Esteller, Manel Benitez, Javier Comparative Study Research Support, Non-U.S. Gov’t United States Clinical cancer research : an official journal of the American Association for Cancer Research Clin Cancer Res. 2005 Feb 1;11(3):1146-53. ER - TY - JOUR T1 - PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes JF - Nucleic Acids Res Y1 - 2005 A1 - L. Conde A1 - Vaquerizas, J. M. A1 - Ferrer-Costa, C. A1 - de la Cruz, X. A1 - Orozco, M. A1 - Dopazo, J. KW - Computer Graphics Genes *Genetic Predisposition to Disease Genotype Internet Phenotype *Polymorphism KW - Single Nucleotide *Software User-Computer Interface AB - We have developed a web tool, PupasView, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupasView constitutes an interactive environment in which functional information and population frequency data can be used as sequential filters over linkage disequilibrium parameters to obtain a final list of SNPs optimal for genotyping purposes. PupasView is the first resource that integrates phenotypic effects caused by SNPs at both the translational and the transcriptional level. PupasView retrieves SNPs that could affect conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites and changes in amino acids in the proteins for which a putative pathological effect is calculated. The program uses the mapping of SNPs in the genome provided by Ensembl. PupasView will be of much help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of the identification of the genes responsible for the disease. The PupasView web interface is accessible through http://pupasview.ochoa.fib.es and through http://www.pupasnp.org. VL - 33 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980522 N1 - Conde, Lucia Vaquerizas, Juan M Ferrer-Costa, Carles de la Cruz, Xavier Orozco, Modesto Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W501-5. ER - TY - JOUR T1 - DNMAD: web-based diagnosis and normalization for microarray data JF - Bioinformatics Y1 - 2004 A1 - Vaquerizas, J. M. A1 - Dopazo, J. A1 - Diaz-Uriarte, R. KW - Algorithms Database Management Systems Gene Expression Profiling/*methods/standards Information Storage and Retrieval/*methods *Internet Oligonucleotide Array Sequence Analysis/*methods/standards Sequence Alignment/methods Sequence Analysis KW - DNA/*methods *Software *User-Computer Interface AB - SUMMARY: We present a web server for Diagnosis and Normalization of MicroArray Data (DNMAD). DNMAD includes several common data transformations such as spatial and global robust local regression or multiple slide normalization, and allows for detecting several kinds of errors that result from the manipulation and the image analysis of the arrays. This tool offers a user-friendly interface, and is completely integrated within the Gene Expression Pattern Analysis Suite (GEPAS). AVAILABILITY: The tool is accessible on-line at http://dnmad.bioinfo.cnio.es. VL - 20 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15247094 N1 - Vaquerizas, Juan M Dopazo, Joaquin Diaz-Uriarte, Ramon Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2004 Dec 12;20(18):3656-8. Epub 2004 Jul 9. ER - TY - JOUR T1 - FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes JF - Bioinformatics Y1 - 2004 A1 - Fatima Al-Shahrour A1 - Diaz-Uriarte, R. A1 - Dopazo, J. KW - *Algorithms Artificial Intelligence Databases KW - babelomics KW - DNA/*methods *Software KW - Genetic Gene Expression Profiling/*methods *Hypermedia Information Storage and Retrieval/*methods *Internet *Phylogeny Sequence Alignment/methods Sequence Analysis AB -

We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.

VL - 20 UR - http://bioinformatics.oxfordjournals.org/content/20/4/578.abstract N1 -

Al-Shahrour, Fatima Diaz-Uriarte, Ramon Dopazo, Joaquin England Bioinformatics (Oxford, England) Bioinformatics. 2004 Mar 1;20(4):578-80. Epub 2004 Jan 22.

ER - TY - JOUR T1 - Gene expression analysis of chromosomal regions with gain or loss of genetic material detected by comparative genomic hybridization JF - Genes Chromosomes Cancer Y1 - 2004 A1 - Melendez, B. A1 - Diaz-Uriarte, R. A1 - Cuadros, M. A1 - Martinez-Ramirez, A. A1 - Fernandez-Piqueras, J. A1 - Dopazo, A. A1 - Cigudosa, J. C. A1 - Rivas, C. A1 - Dopazo, J. A1 - Martinez-Delgado, B. A1 - Benitez, J. KW - Chromosomes KW - Fluorescence Lymphoma KW - Human KW - Pair 13/*genetics Chromosomes KW - Pair 19/*genetics Chromosomes KW - Pair 6/*genetics Expressed Sequence Tags *Gene Dosage Gene Expression Profiling Humans In Situ Hybridization KW - T-Cell/*genetics Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis AB - Comparative genomic hybridization (CGH) has been widely used to detect copy number alterations in cancer and to identify regions containing candidate tumor-responsible genes; however, gene expression changes have been described only in highly amplified regions (amplicons). To study the overall impact of slight copy number changes on gene expression, we analyzed 16 T-cell lymphomas by using CGH and a custom-designed cDNA microarray containing 7,657 genes and expressed sequence tags related to tumorigenesis. We evaluated mean gene expression and variability within CGH-altered regions and explored the relationship between the effects of the gene and its position within these regions. Minimally overlapping CGH candidate areas (6q25, 13q21-q22, and 19q13.1) revealed a weak relationship between altered genomic content and gene expression. However, some candidate genes showed modified expression within these regions in the majority of tumors; these candidate genes were evaluated and confirmed in another independent series of 23 T-cell lymphomas by use of the same cDNA microarray and by FISH on a tissue microarray. When all the CGH regions detected for each tumor were considered, we found a significant increase or decrease in the mean expression of the genes contained in gained or lost regions, respectively. In addition, we found that the expression of a gene was dependent not only on its position within an altered region but also on its own mechanism of regulation: genes in the same altered region responded very differently to the gain or loss of genetic material. Supplementary material for this article can be found on the Genes, Chromosomes, and Cancer website at http://www.interscience.wiley.com/jpages/1045-2257/suppmat/index.html. VL - 41 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15382261 N1 - Melendez, Barbara Diaz-Uriarte, Ramon Cuadros, Marta Martinez-Ramirez, Angel Fernandez-Piqueras, Jose Dopazo, Ana Cigudosa, Juan-Cruz Rivas, Carmen Dopazo, Joaquin Martinez-Delgado, Beatriz Benitez, Javier Research Support, Non-U.S. Gov’t United States Genes, chromosomes & cancer Genes Chromosomes Cancer. 2004 Dec;41(4):353-65. ER - TY - JOUR T1 - New challenges in gene expression data analysis and the extended GEPAS JF - Nucleic Acids Res Y1 - 2004 A1 - Herrero, J. A1 - Vaquerizas, J. M. A1 - Fatima Al-Shahrour A1 - L. Conde A1 - A. Mateos A1 - Diaz-Uriarte, J. S. A1 - Dopazo, J. KW - gepas KW - microarray data analysis AB -

Since the first papers published in the late nineties, including, for the first time, a comprehensive analysis of microarray data, the number of questions that have been addressed through this technique have both increased and diversified. Initially, interest focussed on genes coexpressing across sets of experimental conditions, implying, essentially, the use of clustering techniques. Recently, however, interest has focussed more on finding genes differentially expressed among distinct classes of experiments, or correlated to diverse clinical outcomes, as well as in building predictors. In addition to this, the availability of accurate genomic data and the recent implementation of CGH arrays has made mapping expression and genomic data on the chromosomes possible. There is also a clear demand for methods that allow the automatic transfer of biological information to the results of microarray experiments. Different initiatives, such as the Gene Ontology (GO) consortium, pathways databases, protein functional motifs, etc., provide curated annotations for genes. Whereas many resources on the web focus mainly on clustering methods, GEPAS has evolved to cope with the aforementioned new challenges that have recently arisen in the field of microarray data analysis. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://gepas.bioinfo.cnio.es.

VL - 32 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15215434 N1 -

Herrero, Javier Vaquerizas, Juan M Al-Shahrour, Fatima Conde, Lucia Mateos, Alvaro Diaz-Uriarte, Javier Santoyo Ramon Dopazo, Joaquin England Nucleic acids research Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W485-91.

ER - TY - JOUR T1 - Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species JF - Bioinformatics Y1 - 2004 A1 - H. Dopazo A1 - J. Santoyo A1 - Dopazo, J. AB -

MOTIVATION: Through the most extensive phylogenomic analysis carried out to date, complete genomes of 11 eukaryotic species have been examined in order to find the homologous of more than 25,000 amino acid sequences. These sequences correspond to the exons of more than 3000 genes and were used as presence/absence characters to test one of the most controversial hypotheses concerning animal evolution, namely the Ecdysozoa hypothesis. Distance, maximum parsimony and Bayesian methods of phylogenetic reconstruction were used to test the hypothesis. RESULTS: The reliability of the ecdysozoa, grouping arthropods and nematodes in a single clade was unequivocally rejected in all the consensus trees. The Coelomata clade, grouping arthropods and chordates, was supported by the highest statistical confidence in all the reconstructions. The study of the dependence of the genomes’ tree accuracy on the number of exons used, demonstrated that an unexpectedly larger number of characters are necessary to obtain robust phylogenies. Previous studies supporting ecdysozoa, could not guarantee an accurate phylogeny because the number of characters used was clearly below the minimum required.

VL - 20 Suppl 1 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15262789 N1 -

Dopazo, Hernan Santoyo, Javier Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2004 Aug 4;20 Suppl 1:i116-21.

ER - TY - JOUR T1 - PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level JF - Nucleic Acids Res Y1 - 2004 A1 - L. Conde A1 - Vaquerizas, J. M. A1 - J. Santoyo A1 - Fatima Al-Shahrour A1 - Ruiz-Llorente, S. A1 - M. Robledo A1 - Dopazo, J. KW - Amino Acid Substitution Binding Sites Humans Internet Phenotype *Polymorphism KW - Genetic KW - Single Nucleotide RNA Splicing *Software Transcription Factors/metabolism *Transcription AB - We have developed a web tool, PupaSNP Finder (PupaSNP for short), for high-throughput searching for single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupaSNP takes as its input lists of genes (or generates them from chromosomal coordinates) and retrieves SNPs that could affect the conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites (TFBS) and changes in amino acids in the proteins. The program uses the mapping of SNPs in the genome provided by Ensembl. Additionally, user-defined SNPs (not yet mapped in the genome) can be easily provided to the program. Also, additional functional information from Gene Ontology, OMIM and homologies in other model organisms is provided. In contrast to other programs already available, which focus only on SNPs with possible effect in the protein, PupaSNP includes SNPs with possible transcriptional effect. PupaSNP will be of significant help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of identification of the genes responsible for the disease. The PupaSNP web interface is accessible through http://pupasnp.bioinfo.cnio.es. VL - 32 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15215388 N1 - Conde, Lucia Vaquerizas, Juan M Santoyo, Javier Al-Shahrour, Fatima Ruiz-Llorente, Sergio Robledo, Mercedes Dopazo, Joaquin England Nucleic acids research Nucleic Acids Res. 2004 Jul 1;32(Web Server issue):W242-8. ER - TY - JOUR T1 - An approach to inferring transcriptional regulation among genes from large-scale expression data JF - Comp Funct Genomics Y1 - 2003 A1 - Herrero, J. A1 - Diaz-Uriarte, R. A1 - Dopazo, J. AB - The use of DNA microarrays opens up the possibility of measuring the expression levels of thousands of genes simultaneously under different conditions. Time-course experiments allow researchers to study the dynamics of gene interactions. The inference of genetic networks from such measures can give important insights for the understanding of a variety of biological problems. Most of the existing methods for genetic network reconstruction require many experimental data points, or can only be applied to the reconstruction of small subnetworks. Here we present a method that reduces the dimensionality of the dataset and then extracts the significant dynamic correlations among genes. The method requires a number of points achievable in common time-course experiments. VL - 4 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18629097 N1 - Herrero, Javier Diaz-Uriarte, Ramon Dopazo, Joaquin Egypt Comparative and functional genomics Comp Funct Genomics. 2003;4(1):148-54. ER - TY - JOUR T1 - Comparing bacterial genomes through conservation profiles JF - Genome Res Y1 - 2003 A1 - Martin, M. J. A1 - Herrero, J. A1 - A. Mateos A1 - Dopazo, J. KW - Bacterial Genotype Models KW - Bacterial/genetics Cluster Analysis Conserved Sequence/*genetics DNA KW - Bacterial/genetics Escherichia coli/classification/*genetics Evolution KW - Bacterial/genetics Gene Order/genetics Genes KW - Bacterial/genetics/physiology *Genome KW - Chromosome Mapping/methods Chromosomes KW - Genetic Phenotype Phylogeny Sequence Homology KW - Molecular Gene Expression Profiling/methods Gene Expression Regulation KW - Nucleic Acid Species Specificity Terminology as Topic AB - We constructed two-dimensional representations of profiles of gene conservation across different genomes using the genome of Escherichia coli as a model. These profiles permit both the visualization at the genome level of different traits in the organism studied and, at the same time, reveal features related to the genomes analyzed (such as defective genomes or genomes that lack a particular system). Conserved genes are not uniformly distributed along the E. coli genome but tend to cluster together. The study of gene distribution patterns across genomes is important for the understanding of how sets of genes seem to be dependent on each other, probably having some functional link. This provides additional evidence that can be used for the elucidation of the function of unannotated genes. Clustering these patterns produces families of genes which can be arranged in a hierarchy of closeness. In this way, functions can be defined at different levels of generality depending on the level of the hierarchy that is studied. The combined study of conservation and phenotypic traits opens up the possibility of defining phenotype/genotype associations, and ultimately inferring the gene or genes responsible for a particular trait. VL - 13 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12695324 N1 - Martin, Maria J Herrero, Javier Mateos, Alvaro Dopazo, Joaquin Comparative Study United States Genome research Genome Res. 2003 May;13(5):991-8. Epub 2003 Apr 14. ER - TY - JOUR T1 - Gene expression data preprocessing JF - Bioinformatics Y1 - 2003 A1 - Herrero, J. A1 - Diaz-Uriarte, R. A1 - Dopazo, J. KW - *Database Management Systems Gene Expression Profiling/*methods Information Storage and Retrieval/methods Internet Oligonucleotide Array Sequence Analysis/*methods Sequence Alignment/*methods Sequence Analysis KW - DNA/*methods *Software *User-Computer Interface AB - We present an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools. VL - 19 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12651726 N1 - Herrero, J Diaz-Uriarte, R Dopazo, J England Bioinformatics (Oxford, England) Bioinformatics. 2003 Mar 22;19(5):655-6. ER - TY - JOUR T1 - GEPAS: A web-based resource for microarray gene expression data analysis JF - Nucleic Acids Res Y1 - 2003 A1 - Herrero, J. A1 - Fatima Al-Shahrour A1 - Diaz-Uriarte, R. A1 - A. Mateos A1 - Vaquerizas, J. M. A1 - J. Santoyo A1 - Dopazo, J. KW - gepas KW - microarray data analysis AB -

We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es).

VL - 31 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12824345 N1 -

Herrero, Javier Al-Shahrour, Fatima Diaz-Uriarte, Ramon Mateos, Alvaro Vaquerizas, Juan M Santoyo, Javier Dopazo, Joaquin Research Support, Non-U.S. Gov’t England Nucleic acids research Nucleic Acids Res. 2003 Jul 1;31(13):3461-7.

ER - TY - JOUR T1 - Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction JF - J Biotechnol Y1 - 2002 A1 - J. Tamames A1 - Clark, D. A1 - Herrero, J. A1 - Dopazo, J. A1 - Blaschke, C. A1 - Fernandez, J. M. A1 - Oliveros, J. C. A1 - Valencia, A. KW - Abstracting and Indexing as Topic/methods *Cluster Analysis *Database Management Systems Databases KW - Computer-Assisted/methods Information Storage and Retrieval/*methods Internet Medline National Library of Medicine (U.S.) Oligonucleotide Array Sequence Analysis/*methods United States KW - Genetic Gene Expression Gene Expression Profiling/*methods Image Processing AB - Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups. VL - 98 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12141992 N1 - Tamames, Javier Clark, Dominic Herrero, Javier Dopazo, Joaquin Blaschke, Christian Fernandez, Jose M Oliveros, Juan C Valencia, Alfonso Review Netherlands Journal of biotechnology J Biotechnol. 2002 Sep 25;98(2-3):269-83. ER - TY - JOUR T1 - Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns JF - J Proteome Res Y1 - 2002 A1 - Herrero, J. A1 - Dopazo, J. KW - Cluster Analysis Computational Biology/methods *Gene Expression Genes KW - Fungal/genetics *Genome Oligonucleotide Array Sequence Analysis/*methods Statistics as Topic/*methods Time Factors AB - Self-organizing maps (SOM) constitute an alternative to classical clustering methods because of its linear run times and superior performance to deal with noisy data. Nevertheless, the clustering obtained with SOM is dependent on the relative sizes of the clusters. Here, we show how the combination of SOM with hierarchical clustering methods constitutes an excellent tool for exploratory analysis of massive data like DNA microarray expression patterns. VL - 1 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12645919 N1 - Herrero, Javier Dopazo, Joaquin Research Support, Non-U.S. Gov’t United States Journal of proteome research J Proteome Res. 2002 Sep-Oct;1(5):467-70. ER - TY - JOUR T1 - Identification of genes involved in resistance to interferon-alpha in cutaneous T-cell lymphoma JF - Am J Pathol Y1 - 2002 A1 - Tracey, L. A1 - Villuendas, R. A1 - Ortiz, P. A1 - Dopazo, A. A1 - Spiteri, I. A1 - Lombardia, L. A1 - Rodriguez-Peralto, J. L. A1 - Fernandez-Herrera, J. A1 - Hernandez, A. A1 - Fraga, J. A1 - Dominguez, O. A1 - Herrero, J. A1 - Alonso, M. A. A1 - Dopazo, J. A1 - Piris, M. A. KW - Antineoplastic Agents/*pharmacology/therapeutic use Carrier Proteins/biosynthesis/genetics DNA-Binding Proteins/biosynthesis/genetics Drug Resistance KW - Biological Oligonucleotide Array Sequence Analysis RNA KW - Cultured KW - Cutaneous/diagnosis/drug therapy/*genetics/metabolism *Membrane Glycoproteins Models KW - Interleukin-1 Reproducibility of Results STAT1 Transcription Factor STAT3 Transcription Factor Trans-Activators/biosynthesis/genetics Tumor Cells KW - Neoplasm Gene Expression Profiling *Gene Expression Regulation KW - Neoplasm/biosynthesis *Receptors KW - Neoplastic Humans Interferon-alpha/*pharmacology/therapeutic use Kinetics Lymphoma KW - T-Cell AB - Interferon-alpha therapy has been shown to be active in the treatment of mycosis fungoides although the individual response to this therapy is unpredictable and dependent on essentially unknown factors. In an effort to better understand the molecular mechanisms of interferon-alpha resistance we have developed an interferon-alpha resistant variant from a sensitive cutaneous T-cell lymphoma cell line. We have performed expression analysis to detect genes differentially expressed between both variants using a cDNA microarray including 6386 cancer-implicated genes. The experiments showed that resistance to interferon-alpha is consistently associated with changes in the expression of a set of 39 genes, involved in signal transduction, apoptosis, transcription regulation, and cell growth. Additional studies performed confirm that STAT1 and STAT3 expression and interferon-alpha induction and activation are not altered between both variants. The gene MAL, highly overexpressed by resistant cells, was also found to be expressed by tumoral cells in a series of cutaneous T-cell lymphoma patients treated with interferon-alpha and/or photochemotherapy. MAL expression was associated with longer time to complete remission. Time-course experiments of the sensitive and resistant cells showed a differential expression of a subset of genes involved in interferon-response (1 to 4 hours), cell growth and apoptosis (24 to 48 hours.), and signal transduction. VL - 161 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12414529 N1 - Tracey, Lorraine Villuendas, Raquel Ortiz, Pablo Dopazo, Ana Spiteri, Inmaculada Lombardia, Luis Rodriguez-Peralto, Jose L Fernandez-Herrera, Jesus Hernandez, Almudena Fraga, Javier Dominguez, Orlando Herrero, Javier Alonso, Miguel A Dopazo, Joaquin Piris, Miguel A Research Support, Non-U.S. Gov’t United States The American journal of pathology Am J Pathol. 2002 Nov;161(5):1825-37. ER - TY - JOUR T1 - Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons JF - Genome Res Y1 - 2002 A1 - A. Mateos A1 - Dopazo, J. A1 - Jansen, R. A1 - Tu, Y. A1 - Gerstein, M. A1 - Stolovitzky, G. KW - Algorithms Artificial Intelligence Citric Acid Cycle/genetics Cluster Analysis Computational Biology/methods Gene Expression Profiling/*methods/statistics & numerical data Genes/*physiology Genetic Heterogeneity Neural Networks (Computer) Oligonucleotide AB - Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for 100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only 10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle. VL - 12 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12421757 N1 - Mateos, Alvaro Dopazo, Joaquin Jansen, Ronald Tu, Yuhai Gerstein, Mark Stolovitzky, Gustavo Research Support, Non-U.S. Gov’t Validation Studies United States Genome research Genome Res. 2002 Nov;12(11):1703-15. ER - TY - JOUR T1 - Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate JF - Microb Drug Resist Y1 - 2001 A1 - Dopazo, J. A1 - Mendoza, A. A1 - Herrero, J. A1 - Caldara, F. A1 - Humbert, Y. A1 - Friedli, L. A1 - Guerrier, M. A1 - Grand-Schenk, E. A1 - Gandin, C. A1 - de Francesco, M. A1 - Polissi, A. A1 - Buell, G. A1 - Feger, G. A1 - Garcia, E. A1 - Peitsch, M. A1 - Garcia-Bustos, J. F. KW - Bacterial Molecular Sequence Data Pneumococcal Infections/*microbiology Prokaryotic Cells RNA KW - Bacterial/chemistry/genetics Genes KW - Bacterial/genetics *Genome KW - DNA KW - Transfer/metabolism Streptococcus pneumoniae/*genetics AB - The public availability of numerous microbial genomes is enabling the analysis of bacterial biology in great detail and with an unprecedented, organism-wide and taxon-wide, broad scope. Streptococcus pneumoniae is one of the most important bacterial pathogens throughout the world. We present here sequences and functional annotations for 2.1-Mbp of pneumococcal DNA, covering more than 90% of the total estimated size of the genome. The sequenced strain is a clinical isolate resistant to macrolides and tetracycline. It carries a type 19F capsular locus, but multilocus sequence typing for several conserved genetic loci suggests that the strain sequenced belongs to a pneumococcal lineage that most often expresses a serotype 15 capsular polysaccharide. A total of 2,046 putative open reading frames (ORFs) longer than 100 amino acids were identified (average of 1,009 bp per ORF), including all described two-component systems and aminoacyl tRNA synthetases. Comparisons to other complete, or nearly complete, bacterial genomes were made and are presented in a graphical form for all the predicted proteins. VL - 7 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11442348 N1 - Dopazo, J Mendoza, A Herrero, J Caldara, F Humbert, Y Friedli, L Guerrier, M Grand-Schenk, E Gandin, C de Francesco, M Polissi, A Buell, G Feger, G Garcia, E Peitsch, M Garcia-Bustos, J F United States Microbial drug resistance (Larchmont, N.Y.) Microb Drug Resist. 2001 Summer;7(2):99-125. ER - TY - JOUR T1 - A hierarchical unsupervised growing neural network for clustering gene expression patterns JF - Bioinformatics Y1 - 2001 A1 - Herrero, J. A1 - Valencia, A. A1 - Dopazo, J. KW - *Algorithms Automatic Data Processing *Gene Expression Profiling *Neural Networks (Computer) *Oligonucleotide Array Sequence Analysis AB - MOTIVATION: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. RESULTS: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. AVAILABILITY: A server running the program can be found at: http://bioinfo.cnio.es/sotarray. VL - 17 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11238068 N1 - Herrero, J Valencia, A Dopazo, J Research Support, Non-U.S. Gov’t England Bioinformatics (Oxford, England) Bioinformatics. 2001 Feb;17(2):126-36. ER - TY - JOUR T1 - Identification of optimal regions for phylogenetic studies on VP1 gene of foot-and-mouth disease virus: analysis of types A and O Argentinean viruses JF - Vet Res Y1 - 2001 A1 - Nunez, J. I. A1 - Martin, M. J. A1 - Piccone, M. E. A1 - Carrillo, E. A1 - Palma, E. L. A1 - Dopazo, J. A1 - Sobrino, F. KW - Amino Acid Sequence Animals Aphthovirus/classification/*genetics Base Sequence Capsid/chemistry/*genetics Capsid Proteins DNA KW - Complementary/chemistry Molecular Sequence Data *Phylogeny Polymerase Chain Reaction RNA KW - Viral/chemistry/genetics Serotyping Viral Proteins/analysis/*genetics AB - An analysis of the informative content of sequence stretches on the foot-and-mouth disease virus (FMDV) VPI gene was applied to two important viral serotypes: A and O. Several sequence regions were identified to allow the reconstruction of phylogenetic trees equivalent to those derived from the whole VPI gene. The optimal informative regions for sequence windows of 150 to 250 nt were predicted between positions 250 and 550 of the gene. The sequences spanning the 250 nt of the 3’ end (positions 400 to 650), extensively used for FMDV phylogenetic analyses, showed a lower informative content. In spite of this, the use of sequences from this region allowed the derivation of phylogenetic trees for type A and type O FMDVs which showed topologies similar to those previously reported for the whole VP1 gene. When the sequences determined for viruses isolated in Argentina, between 1990 and 1993, were included in these analyses, the results obtained revealed features of the circulation of type A and type O viruses in the field, in the months that preceded the eradication of the disease in this country. Type A viruses were closely related to an Argentinean vaccine strain, and defined an independent cluster within this serotype. Among the type O viruses analysed, two groups were distinguished; one was closely related to the South American vaccine strains, while the other was grouped with viruses of the O3 subtype. In addition, a detailed phylogeny for type A FMDV is presented. VL - 32 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11254175 N1 - Nunez, J I Martin, M J Piccone, M E Carrillo, E Palma, E L Dopazo, J Sobrino, F Research Support, Non-U.S. Gov’t France Veterinary research Vet Res. 2001 Jan-Feb;32(1):31-45. ER - TY - JOUR T1 - Methods and approaches in the analysis of gene expression data JF - J Immunol Methods Y1 - 2001 A1 - Dopazo, J. A1 - Zanders, E. A1 - Dragoni, I. A1 - Amphlett, G. A1 - Falciani, F. AB -

The application of high-density DNA array technology to monitor gene transcription has been responsible for a real paradigm shift in biology. The majority of research groups now have the ability to measure the expression of a significant proportion of the human genome in a single experiment, resulting in an unprecedented volume of data being made available to the scientific community. As a consequence of this, the storage, analysis and interpretation of this information present a major challenge. In the field of immunology the analysis of gene expression profiles has opened new areas of investigation. The study of cellular responses has revealed that cells respond to an activation signal with waves of co-ordinated gene expression profiles and that the components of these responses are the key to understanding the specific mechanisms which lead to phenotypic differentiation. The discovery of ’cell type specific’ gene expression signatures have also helped the interpretation of the mechanisms leading to disease progression. Here we review the principles behind the most commonly used data analysis methods and discuss the approaches that have been employed in immunological research.

VL - 250 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11251224 N1 -

Dopazo, J Zanders, E Dragoni, I Amphlett, G Falciani, F Comparative Study Review Netherlands Journal of immunological methods J Immunol Methods. 2001 Apr;250(1-2):93-112.

ER - TY - JOUR T1 - Phylogenetic analysis of viroid and viroid-like satellite RNAs from plants: a reassessment JF - J Mol Evol Y1 - 2001 A1 - Elena, S. F. A1 - Dopazo, J. A1 - de la Pena, M. A1 - Flores, R. A1 - Diener, T. O. A1 - Moya, A. KW - Evolution KW - Molecular *Phylogeny Plant Viruses/*genetics RNA KW - Satellite/*genetics RNA KW - Viral/genetics Viroids/*genetics AB - The proposed monophyletic origin of a group of subviral plant pathogens (viroids and viroid-like satellite RNAs), as well as the phylogenetic relationships and the resulting taxonomy of these entities, has been recently questioned. The criticism comes from the (apparent) lack of sequence similarity among these RNAs necessary to reliably infer a phylogeny. Here we show that, despite their low overall sequence similarity, a sequence alignment manually adjusted to take into account all the local similarities and the insertions/deletions and duplications/rearrangements described in the literature for viroids and viroid-like satellite RNA, along with the use of an appropriate estimator of genetic distances, constitutes a data set suitable for a phylogenetic reconstruction. When the likelihood-mapping method was applied to this data set, the tree-likeness obtained was higher than that corresponding to a sequence alignment that does not take into consideration the local similarities. In addition, bootstrap analysis also supports the major groups previously proposed and the reconstruction is consistent with the biological properties of this RNAs. VL - 53 UR - http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11479686 N1 - Elena, S F Dopazo, J de la Pena, M Flores, R Diener, T O Moya, A Letter Research Support, Non-U.S. Gov’t United States Journal of molecular evolution J Mol Evol. 2001 Aug;53(2):155-9. ER -