%0 Journal Article %J Nature biotechnology %D 2014 %T A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. %A Su, Z. %A Labaj, P.P. %A .... %A Dopazo, J. %A .... %A Mason, C.E. %A Shi, L %K NGS %K RNA-seq %K SEQC %X We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. %B Nature biotechnology %V 32 %P 903–914 %8 2014 Aug 24 %G eng %U http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2957.html %R 10.1038/nbt.2957 %0 Journal Article %J Annals of Applied Biology %D 2014 %T Molecular interactions between sugar beet and Polymyxa betae during its life cycle %A N. Desoignies %A Carbonell, J. %A J.-S. Moreau %A A. Conesa %A Dopazo, J. %A A. Legrève %X Polymyxa betae is a biotrophic obligate sugar beet parasite that belongs to plasmodiophorids. The infection of sugar beet roots by this parasite is asymptomatic, except when it transmits Beet necrotic yellow vein virus (BNYVV), the causal agent of rhizomania. To date, there has been little work on P. betae–sugar beet molecular interactions, mainly because of the obligate nature of the parasite and also because research on rhizomania has tended to focus on the virus. In this study, we investigated these interactions through differential transcript analysis, using suppressive subtractive hybridization. The analysis included 76 P. betae and 120 sugar beet expressed sequence tags (ESTs). The expression of selected ESTs from both organisms was monitored during the protist life cycle, revealing a potential role of two P. betae proteins, profilin and a Von Willebrand factor domain-containing protein, in the early phase of infection. This study also revealed an over-expression of some sugar beet genes involved in defence, such as those encoding PR proteins, stress resistance proteins or lectins, especially during the plasmodial stage of the P. betae life cycle. In addition to providing new information on the molecular aspects of P. betae–sugar beet interactions, this study also enabled previously unknown ESTs of P. betae to be sequenced, thus enhancing our knowledge of the genome of this protist. %B Annals of Applied Biology %V 164 %P 244–256 %G eng %U http://onlinelibrary.wiley.com/doi/10.1111/aab.12095/abstract %R 10.1111/aab.12095 %0 Journal Article %J Neuromuscular disorders : NMD %D 2014 %T A novel locus for a hereditary recurrent neuropathy on chromosome 21q21. %A Calpena, E %A Martínez-Rubio, D %A Arpa, J %A García-Peñas, J J %A Montaner, D. %A Dopazo, J. %A Palau, F %A Espinós, C %X Hereditary recurrent neuropathies are uncommon. Disorders with a known molecular basis falling within this group include hereditary neuropathy with liability to pressure palsies (HNPP) due to the deletion of the PMP22 gene or to mutations in this same gene, and hereditary neuralgic amyotrophy (HNA) caused by mutations in the SEPT9 gene. We report a three-generation family presenting a hereditary recurrent neuropathy without pathological changes in either PMP22 or SEPT9 genes. We performed a genome-wide mapping, which yielded a locus of 12.4Mb on chromosome 21q21. The constructed haplotype fully segregated with the disease and we found significant evidence of linkage. After mutational screening of genes located within this locus, encoding for proteins and microRNAs, as well as analysis of large deletions/insertions, we identified 71 benign polymorphisms. Our findings suggest a novel genetic locus for a recurrent hereditary neuropathy of which the molecular defect remains elusive. Our results further underscore the clinical and genetic heterogeneity of this group of neuropathies. %B Neuromuscular disorders : NMD %V 24 %P 660-5 %8 2014 May 9 %G eng %U http://www.sciencedirect.com/science/article/pii/S0960896614001060# %R 10.1016/j.nmd.2014.04.004 %0 Journal Article %J BMC Medical Genomics %D 2011 %T A large scale survey reveals that chromosomal copy-number alterations significantly affect gene modules involved in cancer initiation and progression %A Alloza, E. %A Fatima Al-Shahrour %A Cigudosa, J. C. %A Dopazo, J. %X

Background

Recent observations point towards the existence of a large number of neighborhoods composed of functionally-related gene modules that lie together in the genome. This local component in the distribution of the functionality across chromosomes is probably affecting the own chromosomal architecture by limiting the possibilities in which genes can be arranged and distributed across the genome. As a direct consequence of this fact it is therefore presumable that diseases such as cancer, harboring DNA copy number alterations (CNAs), will have a symptomatology strongly dependent on modules of functionally-related genes rather than on a unique "important" gene.

Methods

We carried out a systematic analysis of more than 140,000 observations of CNAs in cancers and searched by enrichments in gene functional modules associated to high frequencies of loss or gains.

Results

The analysis of CNAs in cancers clearly demonstrates the existence of a significant pattern of loss of gene modules functionally related to cancer initiation and progression along with the amplification of modules of genes related to unspecific defense against xenobiotics (probably chemotherapeutical agents). With the extension of this analysis to an Array-CGH dataset (glioblastomas) from The Cancer Genome Atlas we demonstrate the validity of this approach to investigate the functional impact of CNAs.

Conclusions

The presented results indicate promising clinical and therapeutic implications. Our findings also directly point out to the necessity of adopting a function-centric, rather a gene-centric, view in the understanding of phenotypes or diseases harboring CNAs.

%B BMC Medical Genomics %V 4 %P 37 %8 06/05/2011 %G eng %U http://www.biomedcentral.com/1755-8794/4/37 %9 Research article %R 10.1186/1755-8794-4-37 %0 Journal Article %J Leuk Lymphoma %D 2009 %T Analysis of chronic lymphotic leukemia transcriptomic profile: differences between molecular subgroups %A Jantus Lewintre, E. %A Reinoso Martin, C. %A Montaner, D. %A Marin, M. %A Jose Terol, M. %A Farras, R. %A Benet, I. %A Calvete, J. J. %A Dopazo, J. %A Garcia-Conde, J. %K cancer %K microarray data analysis %X

B cell chronic lymphocytic leukemia (CLL) is a lymphoproliferative disorder with a variable clinical course. Patients with unmutated IgV(H) gene show a shorter progression-free and overall survival than patients with immunoglobulin heavy chain variable regions (IgV(H)) gene mutated. In addition, BCL6 mutations identify a subgroup of patients with high risk of progression. Gene expression was analysed in 36 early-stage patients using high-density microarrays. Around 150 genes differentially expressed were found according to IgV(H) mutations, whereas no difference was found according to BCL6 mutations. Functional profiling methods allowed us to distinguish KEGG and gene ontology terms showing coordinated gene expression changes across subgroups of CLL. We validated a set of differentially expressed genes according to IgV(H) status, scoring them as putative prognostic markers in CLL. Among them, CRY1, LPL, CD82 and DUSP22 are the ones with at least equal or superior performance to ZAP70 which is actually the most used surrogate marker of IgV(H) status.

%B Leuk Lymphoma %V 50 %P 68-79 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19127482 %0 Journal Article %J Microbiology %D 2009 %T Exploring the antimicrobial action of a carbon monoxide-releasing compound through whole-genome transcription profiling of Escherichia coli %A Nobre, L. S. %A Fatima Al-Shahrour %A Dopazo, J. %A Saraiva, L. M. %K Bacterial Genes %K Bacterial/genetics %K Biofilms Carbon Monoxide/*metabolism Escherichia coli/*genetics/metabolism Escherichia coli Proteins/genetics/metabolism *Gene Expression Profiling Gene Expression Regulation %K Regulator Genetic Complementation Test Methionine/metabolism Microbial Viability Mutation Oligonucleotide Array Sequence Analysis Organometallic Compounds/*pharmacology Phenotype RNA %X

We recently reported that carbon monoxide (CO) has bactericidal activity. To understand its mode of action we analysed the gene expression changes occurring when Escherichia coli, grown aerobically and anaerobically, is treated with the CO-releasing molecule CORM-2 (tricarbonyldichlororuthenium(II) dimer). Microarray analysis shows that the E. coli CORM-2 response is multifaceted, with a high number of differentially regulated genes spread through several functional categories, namely genes involved in inorganic ion transport and metabolism, regulators, and genes implicated in post-translational modification, such as chaperones. CORM-2 has a higher impact in E. coli cells grown anaerobically, as judged by the repression of genes belonging to eight functional classes which are not seen in the response of aerobically CORM-2-treated cells. The biological relevance of the variations caused by CORM-2 was substantiated by studying the CORM-2 sensitivity of selected E. coli mutants. The results show that the deletion of redox-sensing regulators SoxS and OxyR increased the sensitivity to CORM-2 and suggest that while SoxS plays an important role in protection against CORM-2 under both growth conditions, OxyR seems to participate only in the aerobic CORM-2 response. Under anaerobic conditions, we found that the heat-shock proteins IbpA and IbpB contribute to CORM-2 defence since the deletion of these genes increases the sensitivity of the strain. The induction of several met genes and the hypersensitivity to CORM-2 of the DeltametR, DeltametI and DeltametN mutant strains suggest that CO has effects on the methionine metabolism of E. coli. CORM-2 also affects the transcription of several E. coli biofilm-related genes and increases biofilm formation in E. coli. In particular, the absence of tqsA or bhsA increases the resistance of E. coli to CORM-2, and deletion of tsqA leads to a strain that has lost its capacity to form biofilm upon treatment with CORM-2. In spite of the relatively stable nature of the CO molecule, our results show that CO is able to trigger a significant alteration in the transcriptome of E. coli which necessarily has effects in several key metabolic pathways.

%B Microbiology %V 155 %P 813-24 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19246752 %0 Journal Article %J Artif Intell Med %D 2009 %T Formulating and testing hypotheses in functional genomics %A Dopazo, J. %K babelomics %K gene set analysis %X

OBJECTIVE: The ultimate goal of any genome-scale experiment is to provide a functional interpretation of the results, relating the available genomic information to the hypotheses that originated the experiment. METHODS AND RESULTS: Initially, this interpretation has been made on a pre-selection of relevant genes, based on the experimental values, followed by the study of the enrichment in some functional properties. Nevertheless, functional enrichment methods, demonstrated to have a flaw: the first step of gene selection was too stringent given that the cooperation among genes was ignored. The assumption that modules of genes related by relevant biological properties (functionality, co-regulation, chromosomal location, etc.) are the real actors of the cell biology lead to the development of new procedures, inspired in systems biology criteria, generically known as gene-set methods. These methods have been successfully used to analyze transcriptomic and large-scale genotyping experiments as well as to test other different genome-scale hypothesis in other fields such as phylogenomics.

%B Artif Intell Med %V 45 %P 97-107 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18789659 %0 Journal Article %J Nucleic Acids Res %D 2008 %T Babelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments %A Fatima Al-Shahrour %A Carbonell, J. %A Minguez, P. %A Goetz, S. %A A. Conesa %A Tarraga, J. %A Medina, Ignacio %A Alloza, E. %A Montaner, D. %A Dopazo, J. %K babelomics %K funtional profiling %X

We present a new version of Babelomics, a complete suite of web tools for the functional profiling of genome scale experiments, with new and improved methods as well as more types of functional definitions. Babelomics includes different flavours of conventional functional enrichment methods as well as more advanced gene set analysis methods that makes it a unique tool among the similar resources available. In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones such as Biocarta pathways or text mining-derived functional terms. Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other levels of regulation such as miRNA-mediated interference. Moreover, Babelomics allows for sub-selection of terms in order to test more focused hypothesis. Also gene annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the ’de novo’ functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program. Babelomics has been extensively re-engineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics is available at http://www.babelomics.org.

%B Nucleic Acids Res %V 36 %P W341-6 %G eng %U http://nar.oxfordjournals.org/content/36/suppl_2/W341.long %0 Journal Article %J BMC Med Genomics %D 2008 %T Biological processes, properties and molecular wiring diagrams of candidate low-penetrance breast cancer susceptibility genes %A Bonifaci, N. %A Berenguer, A. %A Diez, J. %A Reina, O. %A Medina, Ignacio %A Dopazo, J. %A Moreno, V. %A Pujana, M. A. %K gene set %K GWAS %K SNP %X

ABSTRACT: BACKGROUND: Recent advances in whole-genome association studies (WGASs) for human cancer risk are beginning to provide the part lists of low-penetrance susceptibility genes. However, statistical analysis in these studies is complicated by the vast number of genetic variants examined and the weak effects observed, as a result of which constraints must be incorporated into the study design and analytical approach. In this scenario, biological attributes beyond the adjusted statistics generally receive little attention and, more importantly, the fundamental biological characteristics of low-penetrance susceptibility genes have yet to be determined. METHODS: We applied an integrative approach for identifying candidate low-penetrance breast cancer susceptibility genes, their characteristics and molecular networks through the analysis of diverse sources of biological evidence. RESULTS: First, examination of the distribution of Gene Ontology terms in ordered WGAS results identified asymmetrical distribution of Cell Communication and Cell Death processes linked to risk. Second, analysis of 11 different types of molecular or functional relationships in genomic and proteomic data sets defined the "omic" properties of candidate genes: i/ differential expression in tumors relative to normal tissue; ii/ somatic genomic copy number changes correlating with gene expression levels; iii/ differentially expressed across age at diagnosis; and iv/ expression changes after BRCA1 perturbation. Finally, network modeling of the effects of variants on germline gene expression showed higher connectivity than expected by chance between novel candidates and with known susceptibility genes, which supports functional relationships and provides mechanistic hypotheses of risk. CONCLUSION: This study proposes that cell communication and cell death are major biological processes perturbed in risk of breast cancer conferred by low-penetrance variants, and defines the common omic properties, molecular interactions and possible functional effects of candidate genes and proteins.

%B BMC Med Genomics %V 1 %P 62 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=19094230 %0 Journal Article %J J Biomed Inform %D 2008 %T CLEAR-test: combining inference for differential expression and variability in microarray data analysis %A Valls, J. %A Grau, M. %A Sole, X. %A Hernandez, P. %A Montaner, D. %A Dopazo, J. %A Peinado, M. A. %A Capella, G. %A Moreno, V. %A Pujana, M. A. %K *Algorithms Artificial Intelligence *Data Interpretation %K Statistical Gene Expression Profiling/*methods Gene Expression Regulation/*physiology Oligonucleotide Array Sequence Analysis/*methods Proteome/*metabolism Signal Transduction/*physiology %X

A common goal of microarray experiments is to detect genes that are differentially expressed under distinct experimental conditions. Several statistical tests have been proposed to determine whether the observed changes in gene expression are significant. The t-test assigns a score to each gene on the basis of changes in its expression relative to its estimated variability, in such a way that genes with a higher score (in absolute values) are more likely to be significant. Most variants of the t-test use the complete set of genes to influence the variance estimate for each single gene. However, no inference is made in terms of the variability itself. Here, we highlight the problem of low observed variances in the t-test, when genes with relatively small changes are declared differentially expressed. Alternatively, the z-test could be used although, unlike the t-test, it can declare differentially expressed genes with high observed variances. To overcome this, we propose to combine the z-test, which focuses on large changes, with a chi(2) test to evaluate variability. We call this procedure CLEAR-test and we provide a combined p-value that offers a compromise between both aspects. Analysis of three publicly available microarray datasets reveals the greater performance of the CLEAR-test relative to the t-test and alternative methods. Finally, empirical and simulated data analyses demonstrate the greater reproducibility and statistical power of the CLEAR-test and z-test with respect to current alternative methods. In addition, the CLEAR-test improves the z-test by capturing reproducible genes with high variability.

%B J Biomed Inform %V 41 %P 33-45 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17597009 %0 Journal Article %J J Clin Endocrinol Metab %D 2008 %T Controlled ovarian stimulation induces a functional genomic delay of the endometrium with potential clinical implications %A Horcajadas, J. A. %A Minguez, P. %A Dopazo, J. %A Esteban, F. J. %A Dominguez, F. %A Giudice, L. C. %A Pellicer, A. %A Simon, C. %K Algorithms Chorionic Gonadotropin/genetics Endometrium/cytology/pathology/*physiology/physiopathology Female Gene Expression Regulation Genome %K Human Glutathione Peroxidase/genetics Humans Insulin-Like Growth Factor Binding Proteins/genetics Luteal Phase/physiology Luteinizing Hormone/genetics Menstrual Cycle Oligonucleotide Array Sequence Analysis Ovulation Induction/*methods RNA/genetics/isola %X

CONTEXT: Controlled ovarian stimulation induces morphological, biochemical, and functional genomic modifications of the human endometrium during the window of implantation. OBJECTIVE: Our objective was to compare the gene expression profile of the human endometrium in natural vs. controlled ovarian stimulation cycles throughout the early-mid secretory transition using microarray technology. METHOD: Microarray data from 49 endometrial biopsies obtained from LH+1 to LH+9 (n=25) in natural cycles and from human chorionic gonadotropin (hCG) +1 to hCG+9 in controlled ovarian stimulation cycles (n=24) were analyzed using different methods, such as clustering, profiling of biological processes, and selection of differentially expressed genes, as implemented in Gene Expression Pattern Analysis Suite and Babelomics programs. RESULTS: Endometria from natural cycles followed different genomic patterns compared with controlled ovarian stimulation cycles in the transition from the pre-receptive (days LH/hCG+1 until LH/hCG+5) to the receptive phase (day LH+7/hCG+7). Specifically, we have demonstrated the existence of a 2-d delay in the activation/repression of two clusters composed by 218 and 133 genes, respectively, on day hCG+7 vs. LH+7. Many of these delayed genes belong to the class window of implantation genes affecting basic biological processes in the receptive endometrium. CONCLUSIONS: These results demonstrate that gene expression profiling of the endometrium is different between natural and controlled ovarian stimulation cycles in the receptive phase. Identification of these differentially regulated genes can be used to understand the different developmental profiles of receptive endometrium during controlled ovarian stimulation and to search for the best controlled ovarian stimulation treatment in terms of minimal endometrial impact.

%B J Clin Endocrinol Metab %V 93 %P 4500-10 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18697870 %0 Journal Article %J Genomics %D 2008 %T Direct functional assessment of the composite phenotype through multivariate projection strategies %A A. Conesa %A Bro, R. %A Garcia-Garcia, F. %A Prats, J. M. %A Gotz, S. %A Kjeldahl, K. %A Montaner, D. %A Dopazo, J. %K Breast Neoplasms/genetics Computational Biology/*methods Databases %K Genetic Female Gene Expression Profiling/*statistics & numerical data Humans Mathematical Computing Multivariate Analysis Phenotype %X

We present a novel approach for the analysis of transcriptomics data that integrates functional annotation of gene sets with expression values in a multivariate fashion, and directly assesses the relation of functional features to a multivariate space of response phenotypical variables. Multivariate projection methods are used to obtain new correlated variables for a set of genes that share a given function. These new functional variables are then related to the response variables of interest. The analysis of the principal directions of the multivariate regression allows for the identification of gene function features correlated with the phenotype. Two different transcriptomics studies are used to illustrate the statistical and interpretative aspects of the methodology. We demonstrate the superiority of the proposed method over equivalent approaches.

%B Genomics %V 92 %P 373-83 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18652888 %0 Journal Article %J Nucleic Acids Res %D 2008 %T GEPAS, a web-based tool for microarray data analysis and interpretation %A Tarraga, J. %A Medina, Ignacio %A Carbonell, J. %A Huerta-Cepas, J. %A Minguez, P. %A Alloza, E. %A Fatima Al-Shahrour %A Vegas-Azcarate, S. %A Goetz, S. %A Escobar, P. %A Garcia-Garcia, F. %A A. Conesa %A Montaner, D. %A Dopazo, J. %K gepas %K microarray data analysis %X

Gene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org.

%B Nucleic Acids Res %V 36 %P W308-14 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18508806 %0 Journal Article %J Brief Bioinform %D 2008 %T Interoperability with Moby 1.0–it’s better than sharing your toothbrush! %A Wilkinson, M. D. %A Senger, M. %A Kawas, E. %A Bruskiewich, R. %A Gouzy, J. %A Noirot, C. %A Bardou, P. %A Ng, A. %A Haase, D. %A Saiz Ede, A. %A Wang, D. %A Gibbons, F. %A Gordon, P. M. %A Sensen, C. W. %A Carrasco, J. M. %A Fernandez, J. M. %A Shen, L. %A Links, M. %A Ng, M. %A Opushneva, N. %A Neerincx, P. B. %A Leunissen, J. A. %A Ernst, R. %A Twigger, S. %A Usadel, B. %A Good, B. %A Wong, Y. %A Stein, L. %A Crosby, W. %A Karlsson, J. %A Royo, R. %A Parraga, I. %A Ramirez, S. %A Gelpi, J. L. %A Trelles, O. %A Pisano, D. G. %A Jimenez, N. %A Kerhornou, A. %A Rosset, R. %A Zamacola, L. %A Tarraga, J. %A Huerta-Cepas, J. %A Carazo, J. M. %A Dopazo, J. %A R. Guigo %A Navarro, A. %A Orozco, M. %A Valencia, A. %A Claros, M. G. %A Perez, A. J. %A Aldana, J. %A Rojano, M. M. %A Fernandez-Santa Cruz, R. %A Navas, I. %A Schiltz, G. %A Farmer, A. %A Gessler, D. %A Schoof, H. %A Groscurth, A. %K Computational Biology/*methods *Database Management Systems *Databases %K Factual Information Storage and Retrieval/*methods *Internet *Programming Languages Systems Integration %X

The BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased to announce the release of the 1.0 version of the interoperability framework, registry Application Programming Interface and supporting Perl and Java code-bases. Together, these provide interoperable access to over 1400 bioinformatics resources worldwide through the BioMoby platform, and this number continues to grow. Here we highlight and discuss the features of BioMoby that make it distinct from other Semantic Web Service and interoperability initiatives, and that have been instrumental to its deployment and use by a wide community of bioinformatics service providers. The standard, client software, and supporting code libraries are all freely available at http://www.biomoby.org/.

%B Brief Bioinform %V 9 %P 220-31 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18238804 %0 Journal Article %J Nucleic Acids Res %D 2008 %T Joint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases %A Reumers, J. %A L. Conde %A Medina, Ignacio %A Maurer-Stroh, S. %A Van Durme, J. %A Dopazo, J. %A Rousseau, F. %A Schymkowitz, J. %K Amino Acid Substitution Animals *Databases %K Genetic Genetic Diseases %K Inborn/genetics HSP70 Heat-Shock Proteins/metabolism Humans Internet Mice MicroRNAs/metabolism *Mutation *Polymorphism %K Single Nucleotide Proteins/chemistry/genetics RNA Splice Sites Rats Transcription Factors/metabolism %X

Single nucleotide polymorphisms (SNPs) are, together with copy number variation, the primary source of variation in the human genome. SNPs are associated with altered response to drug treatment, susceptibility to disease and other phenotypic variation. Furthermore, during genetic screens for disease-associated mutations in groups of patients and control individuals, the distinction between disease causing mutation and polymorphism is often unclear. Annotation of the functional and structural implications of single nucleotide changes thus provides valuable information to interpret and guide experiments. The SNPeffect and PupaSuite databases are now synchronized to deliver annotations for both non-coding and coding SNP, as well as annotations for the SwissProt set of human disease mutations. In addition, SNPeffect now contains predictions of Tango2: an improved aggregation detector, and Waltz: a novel predictor of amyloid-forming sequences, as well as improved predictors for regions that are recognized by the Hsp70 family of chaperones. The new PupaSuite version incorporates predictions for SNPs in silencers and miRNAs including their targets, as well as additional methods for predicting SNPs in TFBSs and splice sites. Also predictions for mouse and rat genomes have been added. In addition, a PupaSuite web service has been developed to enable data access, programmatically. The combined database holds annotations for 4,965,073 regulatory as well as 133,505 coding human SNPs and 14,935 disease mutations, and phenotypic descriptions of 43,797 human proteins and is accessible via http://snpeffect.vib.be and http://pupasuite.bioinfo.cipf.es/.

%B Nucleic Acids Res %V 36 %P D825-9 %G eng %U http://nar.oxfordjournals.org/cgi/content/full/36/suppl_1/D825 %0 Journal Article %J Oncogene %D 2008 %T Molecular profiling related to poor prognosis in thyroid carcinoma. Combining gene expression data and biological information %A Montero-Conde, C. %A Martin-Campos, J. M. %A Lerma, E. %A Gimenez, G. %A Martinez-Guitarte, J. L. %A Combalia, N. %A Montaner, D. %A Matias-Guiu, X. %A Dopazo, J. %A de Leiva, A. %A M. Robledo %A Mauricio, D. %K Adenoma/genetics/metabolism/pathology Adolescent Adult Aged Carcinoma/genetics/metabolism/pathology Carcinoma %K Biological/*genetics/metabolism %K Neoplasm/genetics/metabolism Reverse Transcriptase Polymerase Chain Reaction Signal Transduction Thyroid Neoplasms/classification/*genetics/metabolism Tumor Markers %K Neoplastic Humans Male Middle Aged *Oligonucleotide Array Sequence Analysis Prognosis RNA %K Papillary/genetics/metabolism/pathology Cell Differentiation Female *Gene Expression Profiling *Gene Expression Regulation %X

Undifferentiated and poorly differentiated thyroid tumors are responsible for more than half of thyroid cancer patient deaths in spite of their low incidence. Conventional treatments do not obtain substantial benefits, and the lack of alternative approaches limits patient survival. Additionally, the absence of prognostic markers for well-differentiated tumors complicates patient-specific treatments and favors the progression of recurrent forms. In order to recognize the molecular basis involved in tumor dedifferentiation and identify potential markers for thyroid cancer prognosis prediction, we analysed the expression profile of 44 thyroid primary tumors with different degrees of dedifferentiation and aggressiveness using cDNA microarrays. Transcriptome comparison of dedifferentiated and well-differentiated thyroid tumors identified 1031 genes with >2-fold difference in absolute values and false discovery rate of <0.15. According to known molecular interaction and reaction networks, the products of these genes were mainly clustered in the MAPkinase signaling pathway, the TGF-beta signaling pathway, focal adhesion and cell motility, activation of actin polymerization and cell cycle. An exhaustive search in several databases allowed us to identify various members of the matrix metalloproteinase, melanoma antigen A and collagen gene families within the upregulated gene set. We also identified a prognosis classifier comprising just 30 transcripts with an overall accuracy of 95%. These findings may clarify the molecular mechanisms involved in thyroid tumor dedifferentiation and provide a potential prognosis predictor as well as targets for new therapies.

%B Oncogene %V 27 %P 1554-61 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17873908 %0 Journal Article %J Nucleic Acids Res %D 2008 %T PhylomeDB: a database for genome-wide collections of gene phylogenies %A Huerta-Cepas, J. %A Bueno, A. %A Dopazo, J. %A Gabaldón, T. %K Ancient Humans *Phylogeny Proteins/classification/genetics Saccharomyces cerevisiae/classification/genetics Sequence Alignment %K Base Sequence Escherichia coli/classification/genetics Genes *Genomics History %X The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es. %B Nucleic Acids Res %V 36 %P D491-6 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17962297 %0 Journal Article %J Nat Genet %D 2008 %T SNP and haplotype mapping for genetic analysis in the rat %A K. Saar %A A. Beck %A M. T. Bihoreau %A E. Birney %A D. Brocklebank %A Y. Chen %A E. Cuppen %A S. Demonchy %A Dopazo, J. %A P. Flicek %A M. Foglio %A A. Fujiyama %A I. G. Gut %A D. Gauguier %A R. Guigo %A V. Guryev %A M. Heinig %A O. Hummel %A N. Jahn %A S. Klages %A V. Kren %A M. Kube %A H. Kuhl %A Kuramoto, T. %A Kuroki, Y. %A Lechner, D. %A Lee, Y. A. %A Lopez-Bigas, N. %A Lathrop, G. M. %A Mashimo, T. %A Medina, Ignacio %A Mott, R. %A Patone, G. %A Perrier-Cornet, J. A. %A Platzer, M. %A Pravenec, M. %A Reinhardt, R. %A Sakaki, Y. %A Schilhabel, M. %A Schulz, H. %A Serikawa, T. %A Shikhagaie, M. %A Tatsumoto, S. %A Taudien, S. %A Toyoda, A. %A Voigt, B. %A Zelenika, D. %A Zimdahl, H. %A Hubner, N. %K Animals Chromosome Mapping *Databases %K Genetic %K Genetic Genome *Haplotypes Linkage Disequilibrium Phylogeny *Polymorphism %K Inbred Strains/*genetics Recombination %K Single Nucleotide *Quantitative Trait Loci Rats/*genetics Rats %X

The laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies.

%B Nat Genet %V 40 %P 560-6 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18443594 %0 Journal Article %J Hum Mutat %D 2008 %T Use of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans %A E. Capriotti %A Arbiza, L. %A Casadio, R. %A Dopazo, J. %A H. Dopazo %A M. A. Marti-Renom %K Algorithms Codon/genetics Computational Biology/*methods *DNA Mutational Analysis Databases %K Human Humans Iduronic Acid/analogs & derivatives/metabolism *Point Mutation Polymorphism %K Molecular *Genetic Predisposition to Disease Genetic Variation Genome %K Protein *Evolution %K Single Nucleotide Proteins/chemistry/*genetics Tumor Suppressor Protein p53/genetics %X Predicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies. %B Hum Mutat %V 29 %P 198-204 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17935148 %0 Journal Article %J BMC Bioinformatics %D 2007 %T The AnnoLite and AnnoLyze programs for comparative annotation of protein structures %A M. A. Marti-Renom %A Rossi, A. %A Fatima Al-Shahrour %A Davis, F. P. %A Pieper, U. %A Dopazo, J. %A Sali, A. %K *Algorithms Amino Acid Sequence Confidence Intervals Data Interpretation %K Amino Acid *Software Structure-Activity Relationship %K Protein Information Storage and Retrieval/methods Molecular Sequence Data Proteins/*chemistry/classification/*metabolism Sensitivity and Specificity Sequence Alignment/*methods Sequence Analysis %K Protein/*methods Sequence Homology %K Statistical *Databases %X BACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of 90% and average precision of 80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of 70% and average precision of 30%, correctly localizing binding sites for small molecules in 95% of its predictions. CONCLUSION: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/. %B BMC Bioinformatics %V 8 Suppl 4 %P S4 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17570147 %0 Journal Article %J Cancer Res %D 2007 %T Association study of 69 genes in the ret pathway identifies low-penetrance loci in sporadic medullary thyroid carcinoma %A Ruiz-Llorente, S. %A Montero-Conde, C. %A Milne, R. L. %A Moya, C. M. %A Cebrian, A. %A Leton, R. %A Cascon, A. %A Mercadillo, F. %A Landa, I. %A Borrego, S. %A Perez de Nanclares, G. %A Alvarez-Escola, C. %A Diaz-Perez, J. A. %A Carracedo, A. %A Urioste, M. %A Gonzalez-Neira, A. %A Benitez, J. %A Santisteban, P. %A Dopazo, J. %A Ponder, B. A. %A M. Robledo %K 80 and over Carcinoma %K Adolescent Adult Aged Aged %K Genetic %K Genetic Proto-Oncogene Proteins c-ret/*genetics/metabolism Signal Transduction Thyroid Neoplasms/*genetics/metabolism Transcription %K Medullary/*genetics/metabolism Case-Control Studies Cyclin-Dependent Kinase Inhibitor p15/biosynthesis/genetics Female Genetic Predisposition to Disease Germ-Line Mutation Haplotypes Humans Male Middle Aged Penetrance Polymorphism %K Single Nucleotide Promoter Regions %X To date, few association studies have been done to better understand the genetic basis for the development of sporadic medullary thyroid carcinoma (sMTC). To identify additional low-penetrance genes, we have done a two-stage case-control study in two European populations using high-throughput genotyping. We selected 417 single nucleotide polymorphisms (SNP) belonging to 69 genes either related to RET signaling pathway/functions or involved in key processes for cancer development. TagSNPs and functional variants were included where possible. These SNPs were initially studied in the largest known series of sMTC cases (n = 266) and controls (n = 422), all of Spanish origin. In stage II, an independent British series of 155 sMTC patients and 531 controls was included to validate the previous results. Associations were assessed by an exhaustive analysis of individual SNPs but also considering gene- and linkage disequilibrium-based haplotypes. This strategy allowed us to identify seven low-penetrance genes, six of them (STAT1, AURKA, BCL2, CDKN2B, CDK6, and COMT) consistently associated with sMTC risk in the two case-control series and a seventh (HRAS) with individual SNPs and haplotypes associated with sMTC in the Spanish data set. The potential role of CDKN2B was confirmed by a functional assay showing a role of a SNP (rs7044859) in the promoter region in altering the binding of the transcription factor HNF1. These results highlight the utility of association studies using homogeneous series of cases for better understanding complex diseases. %B Cancer Res %V 67 %P 9561-7 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17909067 %0 Journal Article %J Nucleic Acids Res %D 2007 %T DBAli tools: mining the protein structure space %A M. A. Marti-Renom %A Pieper, U. %A Madhusudhan, M. S. %A Rossi, A. %A Eswar, N. %A Davis, F. P. %A Fatima Al-Shahrour %A Dopazo, J. %A Sali, A. %K *Algorithms Amino Acid Sequence Computational Biology/*methods Data Interpretation %K Amino Acid *Software Structure-Activity Relationship %K Protein Internet Molecular Sequence Data Protein Conformation Proteins/*chemistry/classification/*metabolism Pseudomonas aeruginosa/*metabolism Sequence Alignment/*methods Sequence Analysis %K Protein/*methods Sequence Homology %K Statistical *Databases %X The DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions. %B Nucleic Acids Res %V 35 %P W393-7 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17478513 %0 Journal Article %J BMC Genomics %D 2007 %T Evidence for systems-level molecular mechanisms of tumorigenesis %A Hernandez, P. %A Huerta-Cepas, J. %A Montaner, D. %A Fatima Al-Shahrour %A Valls, J. %A Gomez, L. %A Capella, G. %A Dopazo, J. %A Pujana, M. A. %K *Cell Transformation %K Biological Models %K Genetic Models %K Messenger/metabolism Signal Transduction Systems Biology %K Neoplastic *Gene Expression Profiling *Gene Expression Regulation %K Neoplastic Humans Male Models %K Statistical Neoplasm Proteins/*physiology Neoplasms/etiology/*genetics Prostatic Neoplasms/genetics Protein Interaction Mapping RNA %X BACKGROUND: Cancer arises from the consecutive acquisition of genetic alterations. Increasing evidence suggests that as a consequence of these alterations, molecular interactions are reprogrammed in the context of highly connected and regulated cellular networks. Coordinated reprogramming would allow the cell to acquire the capabilities for malignant growth. RESULTS: Here, we determine the coordinated function of cancer gene products (i.e., proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts, hereafter referred to as "CGPs") defined as their topological properties and organization in the interactome network. We show that CGPs are central to information exchange and propagation and that they are specifically organized to promote tumorigenesis. Centrality is identified by both local (degree) and global (betweenness and closeness) measures, and systematically appears in down-regulated CGPs. Up-regulated CGPs do not consistently exhibit centrality, but both types of cancer products determine the overall integrity of the network structure. In addition to centrality, down-regulated CGPs show topological association that correlates with common biological processes and pathways involved in tumorigenesis. CONCLUSION: Given the current limited coverage of the human interactome, this study proposes that tumorigenesis takes place in a specific and organized way at the molecular systems-level and suggests a model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins. %B BMC Genomics %V 8 %P 185 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17584915 %0 Journal Article %J Nucleic Acids Res %D 2007 %T FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments %A Fatima Al-Shahrour %A Minguez, P. %A Tarraga, J. %A Medina, Ignacio %A Alloza, E. %A Montaner, D. %A Dopazo, J. %K babelomics %K functional enrichment analysys %X

The ultimate goal of any genome-scale experiment is to provide a functional interpretation of the data, relating the available information with the hypotheses that originated the experiment. Thus, functional profiling methods have become essential in diverse scenarios such as microarray experiments, proteomics, etc. We present the FatiGO+, a web-based tool for the functional profiling of genome-scale experiments, specially oriented to the interpretation of microarray experiments. In addition to different functional annotations (gene ontology, KEGG pathways, Interpro motifs, Swissprot keywords and text-mining based bioentities related to diseases and chemical compounds) FatiGO+ includes, as a novelty, regulatory and structural information. The regulatory information used includes predictions of targets for distinct regulatory elements (obtained from the Transfac and CisRed databases). Additionally FatiGO+ uses predictions of target motifs of miRNA to infer which of these can be activated or deactivated in the sample of genes studied. Finally, properties of gene products related to their relative location and connections in the interactome have also been used. Also, enrichment of any of these functional terms can be directly analysed on chromosomal coordinates. FatiGO+ can be found at: http://www.fatigoplus.org and within the Babelomics environment http://www.babelomics.org.

%B Nucleic Acids Res %V 35 %P W91-6 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17478504 %0 Journal Article %J BMC Bioinformatics %D 2007 %T From genes to functional classes in the study of biological systems %A Fatima Al-Shahrour %A Arbiza, L. %A H. Dopazo %A Huerta-Cepas, J. %A Minguez, P. %A Montaner, D. %A Dopazo, J. %K Algorithms Chromosome Mapping/*methods Computer Simulation Gene Expression Profiling/methods *Models %K babelomics %K Biological Multigene Family/*physiology Signal Transduction/*physiology *Software Systems Biology/*methods *User-Computer Interface %X

BACKGROUND: With the popularization of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at: http://www.babelomics.org.

%B BMC Bioinformatics %V 8 %P 114 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17407596 %0 Journal Article %J Bioinformation %D 2007 %T Functional profiling and gene expression analysis of chromosomal copy number alterations %A L. Conde %A Montaner, D. %A Burguet-Castell, J. %A Tarraga, J. %A Fatima Al-Shahrour %A Dopazo, J. %K babelomics %X

Contrarily to the traditional view in which only one or a few key genes were supposed to be the causative factors of diseases, we discuss the importance of considering groups of functionally related genes in the study of pathologies characterised by chromosomal copy number alterations. Recent observations have reported the existence of regions in higher eukaryotic chromosomes (including humans) containing genes of related function that show a high degree of coregulation. Copy number alterations will consequently affect to clusters of functionally related genes, which will be the final causative agents of the diseased phenotype, in many cases. Therefore, we propose that the functional profiling of the regions affected by copy number alterations must be an important aspect to take into account in the understanding of this type of pathologies. To illustrate this, we present an integrated study of DNA copy number variations, gene expression along with the functional profiling of chromosomal regions in a case of multiple myeloma.

%B Bioinformation %V 1 %P 432-5 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17597935 %0 Journal Article %J Bioinformatics %D 2007 %T Functional profiling of microarray experiments using text-mining derived bioentities %A Minguez, P. %A Fatima Al-Shahrour %A Montaner, D. %A Dopazo, J. %K Artificial Intelligence *Databases %K babelomics %K Protein Gene Expression Profiling/*methods Information Storage and Retrieval/*methods *Natural Language Processing Proteins/*classification/*metabolism Research/*methods Systems Integration %X

MOTIVATION: The increasing use of microarray technologies brought about a parallel demand in methods for the functional interpretation of the results. Beyond the conventional functional annotations for genes, such as gene ontology, pathways, etc. other sources of information are still to be exploited. Text-mining methods allow extracting informative terms (bioentities) with different functional, chemical, clinical, etc. meanings, that can be associated to genes. We show how to use these associations within an appropriate statistical framework and how to apply them through easy-to-use, web-based environments to the functional interpretation of microarray experiments. Functional enrichment and gene set enrichment tests using bioentities are presented.

%B Bioinformatics %V 23 %P 3098-9 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17855415 %0 Journal Article %J Genome Biol %D 2007 %T The human phylome %A Huerta-Cepas, J. %A H. Dopazo %A Dopazo, J. %A Gabaldón, T. %K Animals *Evolution Evolution %K DNA %K Molecular Gene Duplication *Genome Humans *Phylogeny Proteins/genetics Sequence Analysis %X BACKGROUND: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them. RESULTS: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes. CONCLUSION: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms. %B Genome Biol %V 8 %P R109 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17567924 %0 Journal Article %J Nucleic Acids Res %D 2007 %T ISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling %A L. Conde %A Montaner, D. %A Burguet-Castell, J. %A Tarraga, J. %A Medina, Ignacio %A Fatima Al-Shahrour %A Dopazo, J. %K Animals Cluster Analysis Computational Biology/*methods Computer Graphics Gene Expression Profiling/*methods Humans Internet Models %K Genetic *Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis/*methods Programming Languages *Software Systems Integration User-Computer Interface %X We present the ISACGH, a web-based system that allows for the combination of genomic data with gene expression values and provides different options for functional profiling of the regions found. Several visualization options offer a convenient representation of the results. Different efficient methods for accurate estimation of genomic copy number from array-CGH hybridization data have been included in the program. Moreover, the connection to the gene expression analysis package GEPAS allows the use of different facilities for data pre-processing and analysis. A DAS server allows exporting the results to the Ensembl viewer where contextual genomic information can be obtained. The program is freely available at: http://isacgh.bioinfo.cipf.es or within http://www.gepas.org. %B Nucleic Acids Res %V 35 %P W81-5 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17468499 %0 Journal Article %J Nucleic Acids Res %D 2007 %T Phylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics %A Tarraga, J. %A Medina, Ignacio %A Arbiza, L. %A Huerta-Cepas, J. %A Gabaldón, T. %A Dopazo, J. %A H. Dopazo %K Animals Computational Biology/*methods Databases %K DNA Sequence Analysis %K Genetic Evolution %K Molecular Genetic Techniques Humans *Internet Models %K Protein Software User-Computer Interface %K Statistical *Phylogeny Programming Languages Sequence Alignment Sequence Analysis %X Phylemon is an online platform for phylogenetic and evolutionary analyses of molecular sequence data. It has been developed as a web server that integrates a suite of different tools selected among the most popular stand-alone programs in phylogenetic and evolutionary analysis. It has been conceived as a natural response to the increasing demand of data analysis of many experimental scientists wishing to add a molecular evolution and phylogenetics insight into their research. Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction including methods for evolutionary rates analyses and molecular adaptation. Phylemon has several features that differentiates it from other resources: (i) It offers an integrated environment that enables the direct concatenation of evolutionary analyses, the storage of results and handles required data format conversions, (ii) Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding the user and facilitating the integration of multi-step analyses, and (iii) users can define and save complete pipelines for specific phylogenetic analysis to be automatically used on many genes in subsequent sessions or multiple genes in a single session (phylogenomics). The Phylemon web server is available at http://phylemon.bioinfo.cipf.es. %B Nucleic Acids Res %V 35 %P W38-42 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17452346 %0 Journal Article %J Bioinformatics %D 2007 %T Prophet, a web-based tool for class prediction using microarray data %A Medina, Ignacio %A Montaner, D. %A Tarraga, J. %A Dopazo, J. %K babelomics %K gepas %K predictors %X

Sample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found. Availability: Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/ Supplementary information: http://gepas.bioinfo.cipf.es/tutorial/prophet.html.

%B Bioinformatics %V 23 %P 390-1 %G eng %U http://bioinformatics.oxfordjournals.org/cgi/content/full/23/3/390?view=long&pmid=17138587 %0 Journal Article %J Nucleic Acids Res %D 2006 %T BABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments %A Fatima Al-Shahrour %A Minguez, P. %A Tarraga, J. %A Montaner, D. %A Alloza, E. %A Vaquerizas, J. M. %A L. Conde %A Blaschke, C. %A Vera, J. %A Dopazo, J. %K babelomics %K functional profiling %X

We present a new version of Babelomics, a complete suite of web tools for functional analysis of genome-scale experiments, with new and improved tools. New functionally relevant terms have been included such as CisRed motifs or bioentities obtained by text-mining procedures. An improved indexing has considerably speeded up several of the modules. An improved version of the FatiScan method for studying the coordinate behaviour of groups of functionally related genes is presented, along with a similar tool, the Gene Set Enrichment Analysis. Babelomics is now more oriented to test systems biology inspired hypotheses. Babelomics can be found at http://www.babelomics.org.

%B Nucleic Acids Res %V 34 %P W472-6 %G eng %U http://nar.oxfordjournals.org/content/34/suppl_2/W472.long %0 Journal Article %J Clin Transl Oncol %D 2006 %T Bioinformatics and cancer: an essential alliance %A Dopazo, J. %X

Modern research in cancer has been revolutionized by the introduction of new high-throughput methodologies such as DNA microarrays. Keeping the pace with these technologies, the bioinformatics offer new solutions for data analysis and, what is more important, it permits to formulate a new class of hypothesis inspired in systems biology, more oriented to blocks of functionally-related genes. Although software implementations for this new methodologies is new there are some options already available. Bioinformatic solutions for other high-throughput techniques such as array-CGH of large-scale genotyping is also revised.

%B Clin Transl Oncol %V 8 %P 409-15 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16790393 %0 Journal Article %J Genome Biol %D 2006 %T Discovery and hypothesis generation through bioinformatics %A Dopazo, J. %A Aloy, P. %K *Computational Biology Genome %K Genetic Phylogeny %K Human *Genomics Humans *Models %X A report on the 4th European Conference on Computational Biology and the 6th Spanish Annual Meeting on Bioinformatics, Madrid, Spain, 28 September-1 October 2005. %B Genome Biol %V 7 %P 307 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16522224 %0 Journal Article %J Cancer Res %D 2006 %T ERCC4 associated with breast cancer risk: a two-stage case-control study using high-throughput genotyping %A Milne, R. L. %A Ribas, G. %A Gonzalez-Neira, A. %A Fagerholm, R. %A Salas, A. %A Gonzalez, E. %A Dopazo, J. %A Nevanlinna, H. %A M. Robledo %A Benitez, J. %K 80 and over Breast Neoplasms/epidemiology/*genetics/pathology Case-Control Studies DNA-Binding Proteins/genetics/*physiology Female Finland/epidemiology Genes %K Adult Aged Aged %K Recessive Genetic Predisposition to Disease Genotype Humans Introns/genetics Linkage Disequilibrium Middle Aged Neoplasm Proteins/genetics/*physiology Neoplasm Staging *Polymorphism %K Single Nucleotide Risk Spain/epidemiology %X The failure of linkage studies to identify further high-penetrance susceptibility genes for breast cancer points to a polygenic model, with more common variants having modest effects on risk, as the most likely candidate. We have carried out a two-stage case-control study in two European populations to identify low-penetrance genes for breast cancer using high-throughput genotyping. Single-nucleotide polymorphisms (SNPs) were selected across preselected cancer-related genes, choosing tagSNPs and functional variants where possible. In stage 1, genotype frequencies for 640 SNPs in 111 genes were compared between 864 breast cancer cases and 845 controls from the Spanish population. In stage 2, candidate SNPs identified in stage 1 (nominal P < 0.01) were tested in a Finnish series of 884 cases and 1,104 controls. Of the 10 candidate SNPs in seven genes identified in stage 1, one (rs744154) on intron 1 of ERCC4, a gene belonging to the nucleotide excision repair pathway, was associated with recessive protection from breast cancer after adjustment for multiple testing in stage 2 (odds ratio, 0.57; Bonferroni-adjusted P = 0.04). After considering potential functional SNPs in the region of high linkage disequilibrium that extends across the entire gene and upstream into the promoter region, we concluded that rs744154 itself could be causal. Although intronic, it is located on the first intron, in a region that is highly conserved across species, and could therefore be functionally important. This study suggests that common intronic variation in ERCC4 is associated with protection from breast cancer. %B Cancer Res %V 66 %P 9420-7 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17018596 %0 Journal Article %J BMC Genomics %D 2006 %T Exploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions %A Goni, J. R. %A Vaquerizas, J. M. %A Dopazo, J. %A Orozco, M. %K Animals Base Sequence Computational Biology DNA/chemistry/*genetics/*metabolism Genome %K Genetic/genetics Regulatory Sequences %K Human/genetics Humans Mice Nucleic Acid Conformation Nucleotides/genetics Oligonucleotides/chemistry/*genetics/*metabolism Promoter Regions %K Nucleic Acid/*genetics Transcription Factors/metabolism %X BACKGROUND: DNA duplex sequences that can be targets for triplex formation are highly over-represented in the human genome, especially in regulatory regions. RESULTS: Here we studied using bioinformatics tools several properties of triplex target sequences in an attempt to determine those that make these sequences so special in the genome. CONCLUSION: Our results strongly suggest that the unique physical properties of these sequences make them particularly suitable as "separators" between protein-recognition sites in the promoter region. %B BMC Genomics %V 7 %P 63 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16566817 %0 Journal Article %J OMICS %D 2006 %T Functional interpretation of microarray experiments %A Dopazo, J. %K babelomics %K Diabetes Mellitus %K microarray data analysis %X

Over the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.

%B OMICS %V 10 %P 398-410 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=17069516 %0 Journal Article %J Genome Inform %D 2006 %T A function-centric approach to the biological interpretation of microarray time-series %A Minguez, P. %A Fatima Al-Shahrour %A Dopazo, J. %K babelomics %X

The interpretation of microarray experiments is commonly addressed by means a two-step approach in which the relevant genes are firstly selected uniquely on the basis of their experimental values (ignoring their coordinate behaviors) and in a second step their functional properties are studied to hypothesize about the biological roles they are fulfilling in the cell. Recently, different methods (e.g. GSEA or FatiScan) have been proposed to study the coordinate behavior of blocks of functionally-related genes. These methods study the distribution of functional information across lists of genes ranked according their different experimental values in a static situation, such as the comparison between two classes (e.g. healthy controls versus diseased cases). Nevertheless there is no an equivalent way of studying a dynamic situation from a functional point of view. We present a method for the functional analysis of microarrays series in which the experiments display autocorrelation between successive points (e.g. time series, dose-response experiments, etc.) The method allows to recover the dynamics of the molecular roles fulfilled by the genes along the series which provides a novel approach to functional interpretation of such experiments. The method finds blocks of functionally-related genes which are significantly and coordinately over-expressed at different points of the series. This method draws inspiration from systems biology given that the analysis does not focus on individual properties of genes but on collective behaving blocks of functionally-related genes. The FatiScan algorithm used in the method proposed is available at: http://fatiscan.bioinfo.cipf.es, or within the Babelomics suite: http://www.babelomics.org. Additional material is available at: http://bioinfo.cipf.es/data/plasmodium.

%B Genome Inform %V 17 %P 57-66 %G eng %0 Journal Article %J Haematologica %D 2006 %T Identification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma %A Largo, C. %A Alvarez, S. %A Saez, B. %A Blesa, D. %A Martin-Subero, J. I. %A Gonzalez-Garcia, I. %A Brieva, J. A. %A Dopazo, J. %A Siebert, R. %A Calasanz, M. J. %A Cigudosa, J. C. %K B-Cell %K Caspases Cell Line %K Human *Gene Amplification Gene Dosage Gene Expression Profiling *Gene Expression Regulation %K Marginal Zone/genetics Multiple Myeloma/*genetics Neoplasm Proteins/genetics Proto-Oncogene Proteins c-bcl-2/genetics %K Neoplasm Humans Immunoglobulin Heavy Chains/genetics Lymphoma %K Neoplastic Gene Rearrangement *Genes %K Tumor *Chromosomes %X BACKGROUND AND OBJECTIVES: Multiple myeloma (MM) is a malignancy characterized by clonal expansion of plasma cells. In 50% of the cases, the neoplastic transformation begins with a chromosomal translocation that juxtaposes the IGH gene locus to an oncogene. Gene copy number changes are also frequent in MM but less characterized than in other neoplasias. We aimed to characterize genes that are amplified and overexpressed in human myeloma cell lines (HMCL) to provide putative molecular targets for MM therapy. DESIGN AND METHODS: Nine HMCL were characterized by fluorescent in situ hybridization, comparative genomic hybridization (CGH) and cDNA microarrays for gene expression profiling and copy number changes. RESULTS: After defining the IGH-translocations present in the cell lines, we conducted expression-profiling analysis. Supervised analysis identified 166 genes with significantly different expression among the cell lines harboring MMSET/FGFR3 (4p16), MAF (16q) and CCND1 (11q13) rearrangements. Array-CGH was then performed. Five chromosomes recurrently affected by gains/amplifications in primary samples and cell lines were analyzed in detail. Sixty amplified and overexpressed genes were found and 25 (42%) of them were only overexpressed when amplified; moreover, six showed a significant association between overexpression and gain/amplification. We also found co-amplification and overexpression for genes located within the same amplicons, such as MALT1 and BCL2. INTERPRETATION AND CONCLUSIONS: Parallel analysis of gene copy numbers and expression levels by cDNA microarray in MM allowed efficient identification of genes whose expression levels are elevated because of increased copy number. This is the first time that MALT1 and BCL2 have been shown to be overexpressed and amplified in MM. %B Haematologica %V 91 %P 184-91 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16461302 %0 Journal Article %J Nucleic Acids Res %D 2006 %T Next station in microarray data analysis: GEPAS %A Montaner, D. %A Tarraga, J. %A Huerta-Cepas, J. %A Burguet, J. %A Vaquerizas, J. M. %A L. Conde %A Minguez, P. %A Vera, J. %A Mukherjee, S. %A Valls, J. %A Pujana, M. A. %A Alloza, E. %A Herrero, J. %A Fatima Al-Shahrour %A Dopazo, J. %K gepas %K microarray data analysis %X

The Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.

%B Nucleic Acids Res %V 34 %P W486-91 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16845056 %0 Journal Article %J Methods Mol Biol %D 2006 %T Ontology-driven approaches to analyzing data in functional genomics %A F. Azuaje %A Fatima Al-Shahrour %A Dopazo, J. %K babelomics %K Cluster Analysis %K Cluster Analysis Computational Biology/*methods *Data Interpretation %K Computational Biology %K Statistical Gene Expression Profiling %K Statistical Gene Expression Profiling *Genomics Humans %X

Ontologies are fundamental knowledge representations that provide not only standards for annotating and indexing biological information, but also the basis for implementing functional classification and interpretation models. This chapter discusses the application of gene ontology (GO) for predictive tasks in functional genomics. It focuses on the problem of analyzing functional patterns associated with gene products. This chapter is divided into two main parts. The first part overviews GO and its applications for the development of functional classification models. The second part presents two methods for the characterization of genomic information using GO. It discusses methods for measuring functional similarity of gene products, and a tool for supporting gene expression clustering analysis and validation.

%B Methods Mol Biol %V 316 %P 67-86 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16671401 %0 Journal Article %J PLoS Comput Biol %D 2006 %T Positive selection, relaxation, and acceleration in the evolution of the human and chimp genome %A Arbiza, L. %A Dopazo, J. %A H. Dopazo %K Adaptation %K Biological/genetics Animals *Evolution %K Molecular Genome/*genetics Humans Pan troglodytes/*genetics *Selection (Genetics) %X For years evolutionary biologists have been interested in searching for the genetic bases underlying humanness. Recent efforts at a large or a complete genomic scale have been conducted to search for positively selected genes in human and in chimp. However, recently developed methods allowing for a more sensitive and controlled approach in the detection of positive selection can be employed. Here, using 13,198 genes, we have deduced the sets of genes involved in rate acceleration, positive selection, and relaxation of selective constraints in human, in chimp, and in their ancestral lineage since the divergence from murids. Significant deviations from the strict molecular clock were observed in 469 human and in 651 chimp genes. The more stringent branch-site test of positive selection detected 108 human and 577 chimp positively selected genes. An important proportion of the positively selected genes did not show a significant acceleration in rates, and similarly, many of the accelerated genes did not show significant signals of positive selection. Functional differentiation of genes under rate acceleration, positive selection, and relaxation was not statistically significant between human and chimp with the exception of terms related to G-protein coupled receptors and sensory perception. Both of these were over-represented under relaxation in human in relation to chimp. Comparing differences between derived and ancestral lineages, a more conspicuous change in trends seems to have favored positive selection in the human lineage. Since most of the positively selected genes are different under the same functional categories between these species, we suggest that the individual roles of the alternative positively selected genes may be an important factor underlying biological differences between these species. %B PLoS Comput Biol %V 2 %P e38 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16683019 %0 Journal Article %J Nucleic Acids Res %D 2006 %T PupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes %A L. Conde %A Vaquerizas, J. M. %A H. Dopazo %A Arbiza, L. %A Reumers, J. %A Rousseau, F. %A Schymkowitz, J. %A Dopazo, J. %K Algorithms Computer Graphics Databases %K Molecular Genotype Haplotypes Internet Linkage Disequilibrium *Polymorphism %K Nucleic Acid Evolution %K Single Nucleotide *Software User-Computer Interface %X

We have developed a web tool, PupaSuite, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect, specifically oriented to help in the design of large-scale genotyping projects. PupaSuite uses a collection of data on SNPs from heterogeneous sources and a large number of pre-calculated predictions to offer a flexible and intuitive interface for selecting an optimal set of SNPs. It improves the functionality of PupaSNP and PupasView programs and implements new facilities such as the analysis of user’s data to derive haplotypes with functional information. A new estimator of putative effect of polymorphisms has been included that uses evolutionary information. Also SNPeffect database predictions have been included. The PupaSuite web interface is accessible through http://pupasuite.bioinfo.cipf.es and through http://www.pupasnp.org.

%B Nucleic Acids Res %V 34 %P W621-5 %G eng %U http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W621 %0 Journal Article %J J Mol Biol %D 2006 %T Selective pressures at a codon-level predict deleterious mutations in human disease genes %A Arbiza, L. %A Duchi, S. %A Montaner, D. %A Burguet, J. %A Pantoja-Uceda, D. %A Pineda-Lucena, A. %A Dopazo, J. %A H. Dopazo %K Amino Acid Sequence Amino Acid Substitution Codon/*genetics Databases %K Genetic Evolution %K Genetic Models %K Human Humans Models %K Inborn/*genetics Genome %K Molecular Genes %K Molecular Molecular Sequence Data *Mutation Neoplasms/genetics Proteins/genetics *Selection (Genetics) Tumor Suppressor Protein p53/chemistry/genetics %K p53 Genetic Diseases %X Deleterious mutations affecting biological function of proteins are constantly being rejected by purifying selection from the gene pool. The non-synonymous/synonymous substitution rate ratio (omega) is a measure of selective pressure on amino acid replacement mutations for protein-coding genes. Different methods have been developed in order to predict non-synonymous changes affecting gene function. However, none has considered the estimation of selective constraints acting on protein residues. Here, we have used codon-based maximum likelihood models in order to estimate the selective pressures on the individual amino acid residues of a well-known model protein: p53. We demonstrate that the number of residues under strong purifying selection in p53 is much higher than those that are strictly conserved during the evolution of the species. In agreement with theoretical expectations, residues that have been noted to be of structural relevance, or in direct association with DNA, were among those showing the highest signals of purifying selection. Conversely, those changing according to a neutral, or nearly neutral mode of evolution, were observed to be irrelevant for protein function. Finally, using more than 40 human disease genes, we demonstrate that residues evolving under strong selective pressures (omega<0.1) are significantly associated (p<0.01) with human disease. We hypothesize that non-synonymous change on amino acids showing omega<0.1 will most likely affect protein function. The application of this evolutionary prediction at a genomic scale will provide an a priori hypothesis of the phenotypic effect of non-synonymous coding single nucleotide polymorphisms (SNPs) in the human genome. %B J Mol Biol %V 358 %P 1390-404 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=16584746 %0 Journal Article %J Nucleic Acids Res %D 2005 %T BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments %A Fatima Al-Shahrour %A Minguez, P. %A Vaquerizas, J. M. %A L. Conde %A Dopazo, J. %K babelomics %K functional profiling %X

We present Babelomics, a complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments, which includes the use of information on Gene Ontology terms, interpro motifs, KEGG pathways, Swiss-Prot keywords, analysis of predicted transcription factor binding sites, chromosomal positions and presence in tissues with determined histological characteristics, through five integrated modules: FatiGO (fast assignment and transference of information), FatiWise, transcription factor association test, GenomeGO and tissues mining tool, respectively. Additionally, another module, FatiScan, provides a new procedure that integrates biological information in combination with experimental results in order to find groups of genes with modest but coordinate significant differential behaviour. FatiScan is highly sensitive and is capable of finding significant asymmetries in the distribution of genes of common function across a list of ordered genes even if these asymmetries were not extreme. The strong multiple-testing nature of the contrasts made by the tools is taken into account. All the tools are integrated in the gene expression analysis package GEPAS. Babelomics is the natural evolution of our tool FatiGO (which analysed almost 22,000 experiments during the last year) to include more sources on information and new modes of using it. Babelomics can be found at http://www.babelomics.org.

%B Nucleic Acids Res %V 33 %P W460-4 %G eng %U http://nar.oxfordjournals.org/content/33/suppl_2/W460.long %0 Journal Article %J Bioinformatics %D 2005 %T Discovering molecular functions significantly related to phenotypes by combining gene expression data and biological information %A Fatima Al-Shahrour %A Diaz-Uriarte, R. %A Dopazo, J. %K babelomics %K Biological Neoplasm Proteins/genetics/*metabolism Phenotype Software Structure-Activity Relationship Systems Integration Tumor Markers %K Biological/genetics/*metabolism %K Breast Neoplasms/genetics/*metabolism Computer Simulation *Database Management Systems *Databases %K Protein Documentation/methods Gene Expression Profiling/*methods Humans *Models %X

MOTIVATION: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. RESULTS: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted.

%B Bioinformatics %V 21 %P 2988-93 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15840702 %0 Journal Article %J Genome Biol %D 2005 %T Genome-scale evidence of the nematode-arthropod clade %A H. Dopazo %A Dopazo, J. %K Animals Arthropods/*classification/genetics Caenorhabditis elegans/classification/genetics Evolution %K Molecular *Genome Genomics Nematoda/*classification/genetics *Phylogeny %X BACKGROUND: The issue of whether coelomates form a single clade, the Coelomata, or whether all animals that moult an exoskeleton (such as the coelomate arthropods and the pseudocoelomate nematodes) form a distinct clade, the Ecdysozoa, is the most puzzling issue in animal systematics and a major open-ended subject in evolutionary biology. Previous single-gene and genome-scale analyses designed to resolve the issue have produced contradictory results. Here we present the first genome-scale phylogenetic evidence that strongly supports the Ecdysozoa hypothesis. RESULTS: Through the most extensive phylogenetic analysis carried out to date, the complete genomes of 11 eukaryotic species have been analyzed in order to find homologous sequences derived from 18 human chromosomes. Phylogenetic analysis of datasets showing an increased adjustment to equal evolutionary rates between nematode and arthropod sequences produced a gradual change from support for Coelomata to support for Ecdysozoa. Transition between topologies occurred when fast-evolving sequences of Caenorhabditis elegans were removed. When chordate, nematode and arthropod sequences were constrained to fit equal evolutionary rates, the Ecdysozoa topology was statistically accepted whereas Coelomata was rejected. CONCLUSIONS: The reliability of a monophyletic group clustering arthropods and nematodes was unequivocally accepted in datasets where traces of the long-branch attraction effect were removed. This is the first phylogenomic evidence to strongly support the ’moulting clade’ hypothesis. %B Genome Biol %V 6 %P R41 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15892869 %0 Journal Article %J Nucleic Acids Res %D 2005 %T GEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data %A Vaquerizas, J. M. %A L. Conde %A Yankilevich, P. %A Cabezon, A. %A Minguez, P. %A Diaz-Uriarte, R. %A Fatima Al-Shahrour %A Herrero, J. %A Dopazo, J. %K gepas %K microarray data analysis %X

The Gene Expression Profile Analysis Suite, GEPAS, has been running for more than three years. With >76,000 experiments analysed during the last year and a daily average of almost 300 analyses, GEPAS can be considered a well-established and widely used platform for gene expression microarray data analysis. GEPAS is oriented to the analysis of whole series of experiments. Its design and development have been driven by the demands of the biomedical community, probably the most active collective in the field of microarray users. Although clustering methods have obviously been implemented in GEPAS, our interest has focused more on methods for finding genes differentially expressed among distinct classes of experiments or correlated to diverse clinical outcomes, as well as on building predictors. There is also a great interest in CGH-arrays which fostered the development of the corresponding tool in GEPAS: InSilicoCGH. Much effort has been invested in GEPAS for developing and implementing efficient methods for functional annotation of experiments in the proper statistical framework. Thus, the popular FatiGO has expanded to a suite of programs for functional annotation of experiments, including information on transcription factor binding sites, chromosomal location and tissues. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.

%B Nucleic Acids Res %V 33 %P W616-20 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980548 %0 Journal Article %J Nucleic Acids Res %D 2005 %T HCAD, closing the gap between breakpoints and genes %A Hoffmann, R. %A Dopazo, J. %A Cigudosa, J. C. %A Valencia, A. %K *Chromosome Breakage Chromosome Disorders/diagnosis/*genetics *Databases %K Genetic Genes *Genetic Predisposition to Disease Humans PubMed Systems Integration %X Recurrent chromosome aberrations are an important resource when associating human pathologies to specific genes. However, for technical reasons a large number of chromosome breakpoints are defined only at the level of cytobands and many of the genes involved remain unidentified. We developed a web-based information system that mines the scientific literature and generates textual and comprehensive information on all human breakpoints. We show that the statistical analysis of this textual information and its combination with genomic data can identify genes directly involved in DNA rearrangements. The Human Chromosome Aberration Database (HCAD) is publicly accessible at http://www.pdg.cnb.uam.es/UniPub/HCAD/. %B Nucleic Acids Res %V 33 %P D511-3 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15608250 %0 Journal Article %J Bioinformatics %D 2005 %T Highly specific and accurate selection of siRNAs for high-throughput functional assays %A J. Santoyo %A Vaquerizas, J. M. %A Dopazo, J. %K *Algorithms Base Sequence *Gene Silencing Molecular Sequence Data RNA %K RNA/*methods *Software *User-Computer Interface %K Small Interfering/*genetics Sequence Alignment/*methods Sequence Analysis %X MOTIVATION: Small interfering RNA (siRNA) is widely used in functional genomics to silence genes by decreasing their expression to study the resulting phenotypes. The possibility of performing large-scale functional assays by gene silencing accentuates the necessity of a software capable of the high-throughput design of highly specific siRNA. The main objective sought was the design of a large number of siRNAs with appropriate thermodynamic properties and, especially, high specificity. Since all the available procedures require, to some extent, manual processing of the results to guarantee specific results, specificity constitutes to date, the major obstacle to the complete automation of all the steps necessary for the selection of optimal candidate siRNAs. RESULT: Here, we present a program that for the first time completely automates the search for siRNAs. In SiDE, the most complete set of rules for the selection of siRNA candidates (including G+C content, nucleotides at determined positions, thermodynamic properties, propensity to form internal hairpins, etc.) is implemented and moreover, specificity is achieved by a conceptually new method. After selecting possible siRNA candidates with the optimal functional properties, putative unspecific matches, which can cause cross-hybridization, are checked in databases containing a unique entry for each gene. These truly non-redundant databases are constructed from the genome annotations (Ensembl). Also intron/exon boundaries, presence of polymorphisms (single nucleotide polymorphisms) specificity for either gene or transcript, and other features can be selected to be considered in the design of siRNAs. AVAILABILITY: The program is available as a web server at http://side.bioinfo.cnio.es. The program was written under the GPL license. CONTACT: jdopazo@cnio.es. %B Bioinformatics %V 21 %P 1376-82 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15591357 %0 Journal Article %J Genes Chromosomes Cancer %D 2005 %T A novel candidate region linked to development of both pheochromocytoma and head/neck paraganglioma %A Cascon, A. %A Ruiz-Llorente, S. %A Rodriguez-Perales, S. %A Honrado, E. %A Martinez-Ramirez, A. %A Leton, R. %A Montero-Conde, C. %A Benitez, J. %A Dopazo, J. %A Cigudosa, J. C. %A M. Robledo %K 80 and over Child Chromosomes %K Adolescent Adrenal Gland Neoplasms/*genetics Adult Aged Aged %K Biological/*genetics %K Human %K Pair 1/genetics Chromosomes %K Pair 11/genetics Chromosomes %K Pair 3/genetics Chromosomes %K Pair 8/genetics Female Gene Deletion Head and Neck Neoplasms/*genetics Humans Male Middle Aged Nucleic Acid Hybridization Paraganglioma/*genetics Pheochromocytoma/*genetics Tumor Markers %X Although the histologic distinction between pheochromocytomas and head and neck paragangliomas is clear, little is known about the genetic differences between them. To date, various sets of genes have been found to be involved in inherited susceptibility to developing both tumor types, but the genes involved in sporadic pathogenesis are still unknown. To define new candidate regions, we performed CGH analysis on 29 pheochromocytomas and on 24 paragangliomas mainly of head and neck origin (20 of 24), which allowed us to differentiate between the two tumor types. Loss of 3q was significantly more frequent in pheochromocytomas, and loss of 1q appeared only in paragangliomas. We also found gain of 11q13 to be a significantly frequent alteration in malignant cases of both types. In addition, recurrent loss of 8p22-23 was found in 62% of pheochromocytomas (including all malignant cases) versus in 33% of paragangliomas, suggesting that this region contains candidate genes involved in the pathogenesis of this abnormality. Using FISH analysis on tissue microarrays, we confirmed genomic deletion of this region in 55% of pheochromocytomas compared to 12% of paragangliomas. Loss of 8p22-23 appears to be an important event in the sporadic development of these tumors, and additional molecular studies are necessary to identify candidate genes in this chromosomal region. %B Genes Chromosomes Cancer %V 42 %P 260-8 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15609347 %0 Journal Article %J Breast Cancer Res Treat %D 2005 %T Phenotypic characterization of BRCA1 and BRCA2 tumors based in a tissue microarray study with 37 immunohistochemical markers %A Palacios, J. %A Honrado, E. %A Osorio, A. %A Cazorla, A. %A Sarrio, D. %A Barroso, A. %A Rodriguez, S. %A Cigudosa, J. C. %A Diez, O. %A Alonso, C. %A Lerma, E. %A Dopazo, J. %A Rivas, C. %A Benitez, J. %K Adult Apoptosis Breast Neoplasms/*genetics/*pathology Cell Cycle Proteins Cluster Analysis Female *Genes %K Biological/genetics/metabolism %K BRCA1 *Genes %K BRCA2 Humans Immunohistochemistry In Situ Hybridization %K Fluorescence Phenotype Spain *Tissue Array Analysis *Tumor Markers %X Familial breast cancers that are associated with BRCA1 or BRCA2 germline mutations differ in both their morphological and immunohistochemical characteristics. To further characterize the molecular difference between genotypes, the authors evaluated the expression of 37 immunohistochemical markers in a tissue microarray (TMA) containing cores from 20 BRCA1, 14 BRCA2, and 59 sporadic age-matched breast carcinomas. Markers analyzed included, amog others, common markers in breast cancer, such as hormone receptors, p53 and HER2, along with 15 molecules involved in cell cycle regulation, such as cyclins, cyclin dependent kinases (CDK) and CDK inhibitors (CDKI), apoptosis markers, such as BCL2 and active caspase 3, and two basal/myoepithelial markers (CK 5/6 and P-cadherin). In addition, we analyzed the amplification of CCND1, CCNE, HER2 and MYC by FISH.Unsupervised cluster data analysis of both hereditary and sporadic cases using the complete set of immunohistochemical markers demonstrated that most BRCA1-associated carcinomas grouped in a branch of ER-, HER2-negative tumors that expressed basal cell markers and/or p53 and had higher expression of activated caspase 3. The cell cycle proteins associated with these tumors were E2F6, cyclins A, B1 and E, SKP2 and Topo IIalpha. In contrast, most BRCA2-associated carcinomas grouped in a branch composed by ER/PR/BCL2-positive tumors with a higher expression of the cell cycle proteins cyclin D1, cyclin D3, p27, p16, p21, CDK4, CDK2 and CDK1. In conclusion, our study in hereditary breast cancer tumors analyzing 37 immunohistochemical markers, define the molecular differences between BRCA1 and BRCA2 tumors with respect to hormonal receptors, cell cycle, apoptosis and basal cell markers. %B Breast Cancer Res Treat %V 90 %P 5-14 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15770521 %0 Journal Article %J Clin Cancer Res %D 2005 %T A predictor based on the somatic genomic changes of the BRCA1/BRCA2 breast cancer tumors identifies the non-BRCA1/BRCA2 tumors with BRCA1 promoter hypermethylation %A Alvarez, S. %A Diaz-Uriarte, R. %A Osorio, A. %A Barroso, A. %A Melchor, L. %A Paz, M. F. %A Honrado, E. %A Rodriguez, R. %A Urioste, M. %A Valle, L. %A Diez, O. %A Cigudosa, J. C. %A Dopazo, J. %A Esteller, M. %A Benitez, J. %K BRCA1 Protein/*genetics BRCA2 Protein/*genetics Breast Neoplasms/*genetics/pathology Chromosomes %K Genetic/*genetics %K Human %K Human Humans Male Mutation Nucleic Acid Hybridization/methods Promoter Regions %K Pair 12/genetics Chromosomes %K Pair 15/genetics Chromosomes %K Pair 18/genetics Chromosomes %K Pair 2/genetics Chromosomes %K Pair 8/genetics *DNA Methylation Female Genome %X The genetic changes underlying in the development and progression of familial breast cancer are poorly understood. To identify a somatic genetic signature of tumor progression for each familial group, BRCA1, BRCA2, and non-BRCA1/BRCA2 (BRCAX) tumors, by high-resolution comparative genomic hybridization, we have analyzed 77 tumors previously characterized for BRCA1 and BRCA2 germ line mutations. Based on a combination of the somatic genetic changes observed at the six most different chromosomal regions and the status of the estrogen receptor, we developed using random forests a molecular classifier, which assigns to a given tumor a probability to belong either to the BRCA1 or to the BRCA2 class. Because 76.5% (26 of 34) of the BRCAX cases were classified with our predictor to the BRCA1 class with a probability of >50%, we analyzed the BRCA1 promoter region for aberrant methylation in all the BRCAX cases. We found that 15 of the 34 BRCAX analyzed tumors had hypermethylation of the BRCA1 gene. When we considered the predictor, we observed that all the cases with this epigenetic event were assigned to the BRCA1 class with a probability of >50%. Interestingly, 84.6% of the cases (11 of 13) assigned to the BRCA1 class with a probability >80% had an aberrant methylation of the BRCA1 promoter. This fact suggests that somatic BRCA1 inactivation could modify the profile of tumor progression in most of the BRCAX cases. %B Clin Cancer Res %V 11 %P 1146-53 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15709182 %0 Journal Article %J Nucleic Acids Res %D 2005 %T PupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes %A L. Conde %A Vaquerizas, J. M. %A Ferrer-Costa, C. %A de la Cruz, X. %A Orozco, M. %A Dopazo, J. %K Computer Graphics Genes *Genetic Predisposition to Disease Genotype Internet Phenotype *Polymorphism %K Single Nucleotide *Software User-Computer Interface %X We have developed a web tool, PupasView, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupasView constitutes an interactive environment in which functional information and population frequency data can be used as sequential filters over linkage disequilibrium parameters to obtain a final list of SNPs optimal for genotyping purposes. PupasView is the first resource that integrates phenotypic effects caused by SNPs at both the translational and the transcriptional level. PupasView retrieves SNPs that could affect conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites and changes in amino acids in the proteins for which a putative pathological effect is calculated. The program uses the mapping of SNPs in the genome provided by Ensembl. PupasView will be of much help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of the identification of the genes responsible for the disease. The PupasView web interface is accessible through http://pupasview.ochoa.fib.es and through http://www.pupasnp.org. %B Nucleic Acids Res %V 33 %P W501-5 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15980522 %0 Journal Article %J Bioinformatics %D 2004 %T DNMAD: web-based diagnosis and normalization for microarray data %A Vaquerizas, J. M. %A Dopazo, J. %A Diaz-Uriarte, R. %K Algorithms Database Management Systems Gene Expression Profiling/*methods/standards Information Storage and Retrieval/*methods *Internet Oligonucleotide Array Sequence Analysis/*methods/standards Sequence Alignment/methods Sequence Analysis %K DNA/*methods *Software *User-Computer Interface %X SUMMARY: We present a web server for Diagnosis and Normalization of MicroArray Data (DNMAD). DNMAD includes several common data transformations such as spatial and global robust local regression or multiple slide normalization, and allows for detecting several kinds of errors that result from the manipulation and the image analysis of the arrays. This tool offers a user-friendly interface, and is completely integrated within the Gene Expression Pattern Analysis Suite (GEPAS). AVAILABILITY: The tool is accessible on-line at http://dnmad.bioinfo.cnio.es. %B Bioinformatics %V 20 %P 3656-8 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15247094 %0 Journal Article %J Bioinformatics %D 2004 %T FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes %A Fatima Al-Shahrour %A Diaz-Uriarte, R. %A Dopazo, J. %K *Algorithms Artificial Intelligence Databases %K babelomics %K DNA/*methods *Software %K Genetic Gene Expression Profiling/*methods *Hypermedia Information Storage and Retrieval/*methods *Internet *Phylogeny Sequence Alignment/methods Sequence Analysis %X

We present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.

%B Bioinformatics %V 20 %P 578-80 %G eng %U http://bioinformatics.oxfordjournals.org/content/20/4/578.abstract %0 Journal Article %J Genes Chromosomes Cancer %D 2004 %T Gene expression analysis of chromosomal regions with gain or loss of genetic material detected by comparative genomic hybridization %A Melendez, B. %A Diaz-Uriarte, R. %A Cuadros, M. %A Martinez-Ramirez, A. %A Fernandez-Piqueras, J. %A Dopazo, A. %A Cigudosa, J. C. %A Rivas, C. %A Dopazo, J. %A Martinez-Delgado, B. %A Benitez, J. %K Chromosomes %K Fluorescence Lymphoma %K Human %K Pair 13/*genetics Chromosomes %K Pair 19/*genetics Chromosomes %K Pair 6/*genetics Expressed Sequence Tags *Gene Dosage Gene Expression Profiling Humans In Situ Hybridization %K T-Cell/*genetics Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis %X Comparative genomic hybridization (CGH) has been widely used to detect copy number alterations in cancer and to identify regions containing candidate tumor-responsible genes; however, gene expression changes have been described only in highly amplified regions (amplicons). To study the overall impact of slight copy number changes on gene expression, we analyzed 16 T-cell lymphomas by using CGH and a custom-designed cDNA microarray containing 7,657 genes and expressed sequence tags related to tumorigenesis. We evaluated mean gene expression and variability within CGH-altered regions and explored the relationship between the effects of the gene and its position within these regions. Minimally overlapping CGH candidate areas (6q25, 13q21-q22, and 19q13.1) revealed a weak relationship between altered genomic content and gene expression. However, some candidate genes showed modified expression within these regions in the majority of tumors; these candidate genes were evaluated and confirmed in another independent series of 23 T-cell lymphomas by use of the same cDNA microarray and by FISH on a tissue microarray. When all the CGH regions detected for each tumor were considered, we found a significant increase or decrease in the mean expression of the genes contained in gained or lost regions, respectively. In addition, we found that the expression of a gene was dependent not only on its position within an altered region but also on its own mechanism of regulation: genes in the same altered region responded very differently to the gain or loss of genetic material. Supplementary material for this article can be found on the Genes, Chromosomes, and Cancer website at http://www.interscience.wiley.com/jpages/1045-2257/suppmat/index.html. %B Genes Chromosomes Cancer %V 41 %P 353-65 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15382261 %0 Journal Article %J Nucleic Acids Res %D 2004 %T New challenges in gene expression data analysis and the extended GEPAS %A Herrero, J. %A Vaquerizas, J. M. %A Fatima Al-Shahrour %A L. Conde %A A. Mateos %A Diaz-Uriarte, J. S. %A Dopazo, J. %K gepas %K microarray data analysis %X

Since the first papers published in the late nineties, including, for the first time, a comprehensive analysis of microarray data, the number of questions that have been addressed through this technique have both increased and diversified. Initially, interest focussed on genes coexpressing across sets of experimental conditions, implying, essentially, the use of clustering techniques. Recently, however, interest has focussed more on finding genes differentially expressed among distinct classes of experiments, or correlated to diverse clinical outcomes, as well as in building predictors. In addition to this, the availability of accurate genomic data and the recent implementation of CGH arrays has made mapping expression and genomic data on the chromosomes possible. There is also a clear demand for methods that allow the automatic transfer of biological information to the results of microarray experiments. Different initiatives, such as the Gene Ontology (GO) consortium, pathways databases, protein functional motifs, etc., provide curated annotations for genes. Whereas many resources on the web focus mainly on clustering methods, GEPAS has evolved to cope with the aforementioned new challenges that have recently arisen in the field of microarray data analysis. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://gepas.bioinfo.cnio.es.

%B Nucleic Acids Res %V 32 %P W485-91 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15215434 %0 Journal Article %J Bioinformatics %D 2004 %T Phylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species %A H. Dopazo %A J. Santoyo %A Dopazo, J. %X

MOTIVATION: Through the most extensive phylogenomic analysis carried out to date, complete genomes of 11 eukaryotic species have been examined in order to find the homologous of more than 25,000 amino acid sequences. These sequences correspond to the exons of more than 3000 genes and were used as presence/absence characters to test one of the most controversial hypotheses concerning animal evolution, namely the Ecdysozoa hypothesis. Distance, maximum parsimony and Bayesian methods of phylogenetic reconstruction were used to test the hypothesis. RESULTS: The reliability of the ecdysozoa, grouping arthropods and nematodes in a single clade was unequivocally rejected in all the consensus trees. The Coelomata clade, grouping arthropods and chordates, was supported by the highest statistical confidence in all the reconstructions. The study of the dependence of the genomes’ tree accuracy on the number of exons used, demonstrated that an unexpectedly larger number of characters are necessary to obtain robust phylogenies. Previous studies supporting ecdysozoa, could not guarantee an accurate phylogeny because the number of characters used was clearly below the minimum required.

%B Bioinformatics %V 20 Suppl 1 %P i116-21 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15262789 %0 Journal Article %J Nucleic Acids Res %D 2004 %T PupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level %A L. Conde %A Vaquerizas, J. M. %A J. Santoyo %A Fatima Al-Shahrour %A Ruiz-Llorente, S. %A M. Robledo %A Dopazo, J. %K Amino Acid Substitution Binding Sites Humans Internet Phenotype *Polymorphism %K Genetic %K Single Nucleotide RNA Splicing *Software Transcription Factors/metabolism *Transcription %X We have developed a web tool, PupaSNP Finder (PupaSNP for short), for high-throughput searching for single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupaSNP takes as its input lists of genes (or generates them from chromosomal coordinates) and retrieves SNPs that could affect the conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites (TFBS) and changes in amino acids in the proteins. The program uses the mapping of SNPs in the genome provided by Ensembl. Additionally, user-defined SNPs (not yet mapped in the genome) can be easily provided to the program. Also, additional functional information from Gene Ontology, OMIM and homologies in other model organisms is provided. In contrast to other programs already available, which focus only on SNPs with possible effect in the protein, PupaSNP includes SNPs with possible transcriptional effect. PupaSNP will be of significant help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of identification of the genes responsible for the disease. The PupaSNP web interface is accessible through http://pupasnp.bioinfo.cnio.es. %B Nucleic Acids Res %V 32 %P W242-8 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=15215388 %0 Journal Article %J Comp Funct Genomics %D 2003 %T An approach to inferring transcriptional regulation among genes from large-scale expression data %A Herrero, J. %A Diaz-Uriarte, R. %A Dopazo, J. %X The use of DNA microarrays opens up the possibility of measuring the expression levels of thousands of genes simultaneously under different conditions. Time-course experiments allow researchers to study the dynamics of gene interactions. The inference of genetic networks from such measures can give important insights for the understanding of a variety of biological problems. Most of the existing methods for genetic network reconstruction require many experimental data points, or can only be applied to the reconstruction of small subnetworks. Here we present a method that reduces the dimensionality of the dataset and then extracts the significant dynamic correlations among genes. The method requires a number of points achievable in common time-course experiments. %B Comp Funct Genomics %V 4 %P 148-54 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=18629097 %0 Journal Article %J Genome Res %D 2003 %T Comparing bacterial genomes through conservation profiles %A Martin, M. J. %A Herrero, J. %A A. Mateos %A Dopazo, J. %K Bacterial Genotype Models %K Bacterial/genetics Cluster Analysis Conserved Sequence/*genetics DNA %K Bacterial/genetics Escherichia coli/classification/*genetics Evolution %K Bacterial/genetics Gene Order/genetics Genes %K Bacterial/genetics/physiology *Genome %K Chromosome Mapping/methods Chromosomes %K Genetic Phenotype Phylogeny Sequence Homology %K Molecular Gene Expression Profiling/methods Gene Expression Regulation %K Nucleic Acid Species Specificity Terminology as Topic %X We constructed two-dimensional representations of profiles of gene conservation across different genomes using the genome of Escherichia coli as a model. These profiles permit both the visualization at the genome level of different traits in the organism studied and, at the same time, reveal features related to the genomes analyzed (such as defective genomes or genomes that lack a particular system). Conserved genes are not uniformly distributed along the E. coli genome but tend to cluster together. The study of gene distribution patterns across genomes is important for the understanding of how sets of genes seem to be dependent on each other, probably having some functional link. This provides additional evidence that can be used for the elucidation of the function of unannotated genes. Clustering these patterns produces families of genes which can be arranged in a hierarchy of closeness. In this way, functions can be defined at different levels of generality depending on the level of the hierarchy that is studied. The combined study of conservation and phenotypic traits opens up the possibility of defining phenotype/genotype associations, and ultimately inferring the gene or genes responsible for a particular trait. %B Genome Res %V 13 %P 991-8 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12695324 %0 Journal Article %J Bioinformatics %D 2003 %T Gene expression data preprocessing %A Herrero, J. %A Diaz-Uriarte, R. %A Dopazo, J. %K *Database Management Systems Gene Expression Profiling/*methods Information Storage and Retrieval/methods Internet Oligonucleotide Array Sequence Analysis/*methods Sequence Alignment/*methods Sequence Analysis %K DNA/*methods *Software *User-Computer Interface %X We present an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools. %B Bioinformatics %V 19 %P 655-6 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12651726 %0 Journal Article %J Nucleic Acids Res %D 2003 %T GEPAS: A web-based resource for microarray gene expression data analysis %A Herrero, J. %A Fatima Al-Shahrour %A Diaz-Uriarte, R. %A A. Mateos %A Vaquerizas, J. M. %A J. Santoyo %A Dopazo, J. %K gepas %K microarray data analysis %X

We present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es).

%B Nucleic Acids Res %V 31 %P 3461-7 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12824345 %0 Journal Article %J J Biotechnol %D 2002 %T Bioinformatics methods for the analysis of expression arrays: data clustering and information extraction %A J. Tamames %A Clark, D. %A Herrero, J. %A Dopazo, J. %A Blaschke, C. %A Fernandez, J. M. %A Oliveros, J. C. %A Valencia, A. %K Abstracting and Indexing as Topic/methods *Cluster Analysis *Database Management Systems Databases %K Computer-Assisted/methods Information Storage and Retrieval/*methods Internet Medline National Library of Medicine (U.S.) Oligonucleotide Array Sequence Analysis/*methods United States %K Genetic Gene Expression Gene Expression Profiling/*methods Image Processing %X Expression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups. %B J Biotechnol %V 98 %P 269-83 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12141992 %0 Journal Article %J J Proteome Res %D 2002 %T Combining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns %A Herrero, J. %A Dopazo, J. %K Cluster Analysis Computational Biology/methods *Gene Expression Genes %K Fungal/genetics *Genome Oligonucleotide Array Sequence Analysis/*methods Statistics as Topic/*methods Time Factors %X Self-organizing maps (SOM) constitute an alternative to classical clustering methods because of its linear run times and superior performance to deal with noisy data. Nevertheless, the clustering obtained with SOM is dependent on the relative sizes of the clusters. Here, we show how the combination of SOM with hierarchical clustering methods constitutes an excellent tool for exploratory analysis of massive data like DNA microarray expression patterns. %B J Proteome Res %V 1 %P 467-70 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12645919 %0 Journal Article %J Am J Pathol %D 2002 %T Identification of genes involved in resistance to interferon-alpha in cutaneous T-cell lymphoma %A Tracey, L. %A Villuendas, R. %A Ortiz, P. %A Dopazo, A. %A Spiteri, I. %A Lombardia, L. %A Rodriguez-Peralto, J. L. %A Fernandez-Herrera, J. %A Hernandez, A. %A Fraga, J. %A Dominguez, O. %A Herrero, J. %A Alonso, M. A. %A Dopazo, J. %A Piris, M. A. %K Antineoplastic Agents/*pharmacology/therapeutic use Carrier Proteins/biosynthesis/genetics DNA-Binding Proteins/biosynthesis/genetics Drug Resistance %K Biological Oligonucleotide Array Sequence Analysis RNA %K Cultured %K Cutaneous/diagnosis/drug therapy/*genetics/metabolism *Membrane Glycoproteins Models %K Interleukin-1 Reproducibility of Results STAT1 Transcription Factor STAT3 Transcription Factor Trans-Activators/biosynthesis/genetics Tumor Cells %K Neoplasm Gene Expression Profiling *Gene Expression Regulation %K Neoplasm/biosynthesis *Receptors %K Neoplastic Humans Interferon-alpha/*pharmacology/therapeutic use Kinetics Lymphoma %K T-Cell %X Interferon-alpha therapy has been shown to be active in the treatment of mycosis fungoides although the individual response to this therapy is unpredictable and dependent on essentially unknown factors. In an effort to better understand the molecular mechanisms of interferon-alpha resistance we have developed an interferon-alpha resistant variant from a sensitive cutaneous T-cell lymphoma cell line. We have performed expression analysis to detect genes differentially expressed between both variants using a cDNA microarray including 6386 cancer-implicated genes. The experiments showed that resistance to interferon-alpha is consistently associated with changes in the expression of a set of 39 genes, involved in signal transduction, apoptosis, transcription regulation, and cell growth. Additional studies performed confirm that STAT1 and STAT3 expression and interferon-alpha induction and activation are not altered between both variants. The gene MAL, highly overexpressed by resistant cells, was also found to be expressed by tumoral cells in a series of cutaneous T-cell lymphoma patients treated with interferon-alpha and/or photochemotherapy. MAL expression was associated with longer time to complete remission. Time-course experiments of the sensitive and resistant cells showed a differential expression of a subset of genes involved in interferon-response (1 to 4 hours), cell growth and apoptosis (24 to 48 hours.), and signal transduction. %B Am J Pathol %V 161 %P 1825-37 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12414529 %0 Journal Article %J Genome Res %D 2002 %T Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons %A A. Mateos %A Dopazo, J. %A Jansen, R. %A Tu, Y. %A Gerstein, M. %A Stolovitzky, G. %K Algorithms Artificial Intelligence Citric Acid Cycle/genetics Cluster Analysis Computational Biology/methods Gene Expression Profiling/*methods/statistics & numerical data Genes/*physiology Genetic Heterogeneity Neural Networks (Computer) Oligonucleotide %X Recent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for 100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only 10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle. %B Genome Res %V 12 %P 1703-15 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=12421757 %0 Journal Article %J Microb Drug Resist %D 2001 %T Annotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate %A Dopazo, J. %A Mendoza, A. %A Herrero, J. %A Caldara, F. %A Humbert, Y. %A Friedli, L. %A Guerrier, M. %A Grand-Schenk, E. %A Gandin, C. %A de Francesco, M. %A Polissi, A. %A Buell, G. %A Feger, G. %A Garcia, E. %A Peitsch, M. %A Garcia-Bustos, J. F. %K Bacterial Molecular Sequence Data Pneumococcal Infections/*microbiology Prokaryotic Cells RNA %K Bacterial/chemistry/genetics Genes %K Bacterial/genetics *Genome %K DNA %K Transfer/metabolism Streptococcus pneumoniae/*genetics %X The public availability of numerous microbial genomes is enabling the analysis of bacterial biology in great detail and with an unprecedented, organism-wide and taxon-wide, broad scope. Streptococcus pneumoniae is one of the most important bacterial pathogens throughout the world. We present here sequences and functional annotations for 2.1-Mbp of pneumococcal DNA, covering more than 90% of the total estimated size of the genome. The sequenced strain is a clinical isolate resistant to macrolides and tetracycline. It carries a type 19F capsular locus, but multilocus sequence typing for several conserved genetic loci suggests that the strain sequenced belongs to a pneumococcal lineage that most often expresses a serotype 15 capsular polysaccharide. A total of 2,046 putative open reading frames (ORFs) longer than 100 amino acids were identified (average of 1,009 bp per ORF), including all described two-component systems and aminoacyl tRNA synthetases. Comparisons to other complete, or nearly complete, bacterial genomes were made and are presented in a graphical form for all the predicted proteins. %B Microb Drug Resist %V 7 %P 99-125 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11442348 %0 Journal Article %J Bioinformatics %D 2001 %T A hierarchical unsupervised growing neural network for clustering gene expression patterns %A Herrero, J. %A Valencia, A. %A Dopazo, J. %K *Algorithms Automatic Data Processing *Gene Expression Profiling *Neural Networks (Computer) *Oligonucleotide Array Sequence Analysis %X MOTIVATION: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. RESULTS: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. AVAILABILITY: A server running the program can be found at: http://bioinfo.cnio.es/sotarray. %B Bioinformatics %V 17 %P 126-36 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11238068 %0 Journal Article %J Vet Res %D 2001 %T Identification of optimal regions for phylogenetic studies on VP1 gene of foot-and-mouth disease virus: analysis of types A and O Argentinean viruses %A Nunez, J. I. %A Martin, M. J. %A Piccone, M. E. %A Carrillo, E. %A Palma, E. L. %A Dopazo, J. %A Sobrino, F. %K Amino Acid Sequence Animals Aphthovirus/classification/*genetics Base Sequence Capsid/chemistry/*genetics Capsid Proteins DNA %K Complementary/chemistry Molecular Sequence Data *Phylogeny Polymerase Chain Reaction RNA %K Viral/chemistry/genetics Serotyping Viral Proteins/analysis/*genetics %X An analysis of the informative content of sequence stretches on the foot-and-mouth disease virus (FMDV) VPI gene was applied to two important viral serotypes: A and O. Several sequence regions were identified to allow the reconstruction of phylogenetic trees equivalent to those derived from the whole VPI gene. The optimal informative regions for sequence windows of 150 to 250 nt were predicted between positions 250 and 550 of the gene. The sequences spanning the 250 nt of the 3’ end (positions 400 to 650), extensively used for FMDV phylogenetic analyses, showed a lower informative content. In spite of this, the use of sequences from this region allowed the derivation of phylogenetic trees for type A and type O FMDVs which showed topologies similar to those previously reported for the whole VP1 gene. When the sequences determined for viruses isolated in Argentina, between 1990 and 1993, were included in these analyses, the results obtained revealed features of the circulation of type A and type O viruses in the field, in the months that preceded the eradication of the disease in this country. Type A viruses were closely related to an Argentinean vaccine strain, and defined an independent cluster within this serotype. Among the type O viruses analysed, two groups were distinguished; one was closely related to the South American vaccine strains, while the other was grouped with viruses of the O3 subtype. In addition, a detailed phylogeny for type A FMDV is presented. %B Vet Res %V 32 %P 31-45 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11254175 %0 Journal Article %J J Immunol Methods %D 2001 %T Methods and approaches in the analysis of gene expression data %A Dopazo, J. %A Zanders, E. %A Dragoni, I. %A Amphlett, G. %A Falciani, F. %X

The application of high-density DNA array technology to monitor gene transcription has been responsible for a real paradigm shift in biology. The majority of research groups now have the ability to measure the expression of a significant proportion of the human genome in a single experiment, resulting in an unprecedented volume of data being made available to the scientific community. As a consequence of this, the storage, analysis and interpretation of this information present a major challenge. In the field of immunology the analysis of gene expression profiles has opened new areas of investigation. The study of cellular responses has revealed that cells respond to an activation signal with waves of co-ordinated gene expression profiles and that the components of these responses are the key to understanding the specific mechanisms which lead to phenotypic differentiation. The discovery of ’cell type specific’ gene expression signatures have also helped the interpretation of the mechanisms leading to disease progression. Here we review the principles behind the most commonly used data analysis methods and discuss the approaches that have been employed in immunological research.

%B J Immunol Methods %V 250 %P 93-112 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11251224 %0 Journal Article %J J Mol Evol %D 2001 %T Phylogenetic analysis of viroid and viroid-like satellite RNAs from plants: a reassessment %A Elena, S. F. %A Dopazo, J. %A de la Pena, M. %A Flores, R. %A Diener, T. O. %A Moya, A. %K Evolution %K Molecular *Phylogeny Plant Viruses/*genetics RNA %K Satellite/*genetics RNA %K Viral/genetics Viroids/*genetics %X The proposed monophyletic origin of a group of subviral plant pathogens (viroids and viroid-like satellite RNAs), as well as the phylogenetic relationships and the resulting taxonomy of these entities, has been recently questioned. The criticism comes from the (apparent) lack of sequence similarity among these RNAs necessary to reliably infer a phylogeny. Here we show that, despite their low overall sequence similarity, a sequence alignment manually adjusted to take into account all the local similarities and the insertions/deletions and duplications/rearrangements described in the literature for viroids and viroid-like satellite RNA, along with the use of an appropriate estimator of genetic distances, constitutes a data set suitable for a phylogenetic reconstruction. When the likelihood-mapping method was applied to this data set, the tree-likeness obtained was higher than that corresponding to a sequence alignment that does not take into consideration the local similarities. In addition, bootstrap analysis also supports the major groups previously proposed and the reconstruction is consistent with the biological properties of this RNAs. %B J Mol Evol %V 53 %P 155-9 %G eng %U http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11479686