01916nas a2200253 4500008004100000022001400041245013800055210006900193260001600262300001400278490000700292520118400299653000801483653001201491653000901503100001101512700001601523700000501539700001501544700000501559700001601564700001101580856007101591 2014 eng d a1546-169600aA comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.0 acomprehensive assessment of RNAseq accuracy reproducibility and c2014 Aug 24 a903–9140 v323 aWe present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings.10aNGS10aRNA-seq10aSEQC1 aSu, Z.1 aLabaj, P.P.1 a1 aDopazo, J.1 a1 aMason, C.E.1 aShi, L uhttp://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2957.html01989nas a2200181 4500008004100000245008700041210006900128300001400197490000800211520142000219100001901639700001801658700001801676700001501694700001501709700001701724856006601741 2014 eng d00aMolecular interactions between sugar beet and Polymyxa betae during its life cycle0 aMolecular interactions between sugar beet and Polymyxa betae dur a244–2560 v1643 aPolymyxa betae is a biotrophic obligate sugar beet parasite that belongs to plasmodiophorids. The infection of sugar beet roots by this parasite is asymptomatic, except when it transmits Beet necrotic yellow vein virus (BNYVV), the causal agent of rhizomania. To date, there has been little work on P. betae–sugar beet molecular interactions, mainly because of the obligate nature of the parasite and also because research on rhizomania has tended to focus on the virus. In this study, we investigated these interactions through differential transcript analysis, using suppressive subtractive hybridization. The analysis included 76 P. betae and 120 sugar beet expressed sequence tags (ESTs). The expression of selected ESTs from both organisms was monitored during the protist life cycle, revealing a potential role of two P. betae proteins, profilin and a Von Willebrand factor domain-containing protein, in the early phase of infection. This study also revealed an over-expression of some sugar beet genes involved in defence, such as those encoding PR proteins, stress resistance proteins or lectins, especially during the plasmodial stage of the P. betae life cycle. In addition to providing new information on the molecular aspects of P. betae–sugar beet interactions, this study also enabled previously unknown ESTs of P. betae to be sequenced, thus enhancing our knowledge of the genome of this protist.1 aDesoignies, N.1 aCarbonell, J.1 aMoreau, J.-S.1 aConesa, A.1 aDopazo, J.1 aLegrève, A. uhttp://onlinelibrary.wiley.com/doi/10.1111/aab.12095/abstract01770nas a2200229 4500008004100000022001400041245007700055210006900132260001500201300001000216490000700226520109900233100001501332700002301347700001201370700002501382700001701407700001501424700001301439700001601452856007201468 2014 eng d a1873-236400aA novel locus for a hereditary recurrent neuropathy on chromosome 21q21.0 anovel locus for a hereditary recurrent neuropathy on chromosome c2014 May 9 a660-50 v243 aHereditary recurrent neuropathies are uncommon. Disorders with a known molecular basis falling within this group include hereditary neuropathy with liability to pressure palsies (HNPP) due to the deletion of the PMP22 gene or to mutations in this same gene, and hereditary neuralgic amyotrophy (HNA) caused by mutations in the SEPT9 gene. We report a three-generation family presenting a hereditary recurrent neuropathy without pathological changes in either PMP22 or SEPT9 genes. We performed a genome-wide mapping, which yielded a locus of 12.4Mb on chromosome 21q21. The constructed haplotype fully segregated with the disease and we found significant evidence of linkage. After mutational screening of genes located within this locus, encoding for proteins and microRNAs, as well as analysis of large deletions/insertions, we identified 71 benign polymorphisms. Our findings suggest a novel genetic locus for a recurrent hereditary neuropathy of which the molecular defect remains elusive. Our results further underscore the clinical and genetic heterogeneity of this group of neuropathies.1 aCalpena, E1 aMartínez-Rubio, D1 aArpa, J1 aGarcía-Peñas, J, J1 aMontaner, D.1 aDopazo, J.1 aPalau, F1 aEspinós, C uhttp://www.sciencedirect.com/science/article/pii/S0960896614001060#02349nas a2200181 4500008004100000022001400041245015400055210006900209260001500278300000700293490000600300520173800306100001502044700002402059700002102083700001502104856004802119 2011 eng d a1755-879400aA large scale survey reveals that chromosomal copy-number alterations significantly affect gene modules involved in cancer initiation and progression0 alarge scale survey reveals that chromosomal copynumber alteratio c06/05/2011 a370 v43 a
Recent observations point towards the existence of a large number of neighborhoods composed of functionally-related gene modules that lie together in the genome. This local component in the distribution of the functionality across chromosomes is probably affecting the own chromosomal architecture by limiting the possibilities in which genes can be arranged and distributed across the genome. As a direct consequence of this fact it is therefore presumable that diseases such as cancer, harboring DNA copy number alterations (CNAs), will have a symptomatology strongly dependent on modules of functionally-related genes rather than on a unique "important" gene.
We carried out a systematic analysis of more than 140,000 observations of CNAs in cancers and searched by enrichments in gene functional modules associated to high frequencies of loss or gains.
The analysis of CNAs in cancers clearly demonstrates the existence of a significant pattern of loss of gene modules functionally related to cancer initiation and progression along with the amplification of modules of genes related to unspecific defense against xenobiotics (probably chemotherapeutical agents). With the extension of this analysis to an Array-CGH dataset (glioblastomas) from The Cancer Genome Atlas we demonstrate the validity of this approach to investigate the functional impact of CNAs.
The presented results indicate promising clinical and therapeutic implications. Our findings also directly point out to the necessity of adopting a function-centric, rather a gene-centric, view in the understanding of phenotypes or diseases harboring CNAs.
1 aAlloza, E.1 aAl-Shahrour, Fatima1 aCigudosa, J., C.1 aDopazo, J. uhttp://www.biomedcentral.com/1755-8794/4/3701884nas a2200253 4500008004100000245010700041210006900148300001000217490000700227520107700234653001101311653002901322100002101351700002001372700001701392700001401409700001601423700001501439700001401454700002001468700001501488700002101503856010601524 2009 eng d00aAnalysis of chronic lymphotic leukemia transcriptomic profile: differences between molecular subgroups0 aAnalysis of chronic lymphotic leukemia transcriptomic profile di a68-790 v503 aB cell chronic lymphocytic leukemia (CLL) is a lymphoproliferative disorder with a variable clinical course. Patients with unmutated IgV(H) gene show a shorter progression-free and overall survival than patients with immunoglobulin heavy chain variable regions (IgV(H)) gene mutated. In addition, BCL6 mutations identify a subgroup of patients with high risk of progression. Gene expression was analysed in 36 early-stage patients using high-density microarrays. Around 150 genes differentially expressed were found according to IgV(H) mutations, whereas no difference was found according to BCL6 mutations. Functional profiling methods allowed us to distinguish KEGG and gene ontology terms showing coordinated gene expression changes across subgroups of CLL. We validated a set of differentially expressed genes according to IgV(H) status, scoring them as putative prognostic markers in CLL. Among them, CRY1, LPL, CD82 and DUSP22 are the ones with at least equal or superior performance to ZAP70 which is actually the most used surrogate marker of IgV(H) status.
10acancer10amicroarray data analysis1 aLewintre, Jantus1 aMartin, Reinoso1 aMontaner, D.1 aMarin, M.1 aTerol, Jose1 aFarras, R.1 aBenet, I.1 aCalvete, J., J.1 aDopazo, J.1 aGarcia-Conde, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1912748203290nas a2200205 4500008004100000245014400041210006900185300001100254490000800265520221900273653002002492653002302512653017902535653018702714100001802901700002402919700001502943700002002958856010602978 2009 eng d00aExploring the antimicrobial action of a carbon monoxide-releasing compound through whole-genome transcription profiling of Escherichia coli0 aExploring the antimicrobial action of a carbon monoxidereleasing a813-240 v1553 aWe recently reported that carbon monoxide (CO) has bactericidal activity. To understand its mode of action we analysed the gene expression changes occurring when Escherichia coli, grown aerobically and anaerobically, is treated with the CO-releasing molecule CORM-2 (tricarbonyldichlororuthenium(II) dimer). Microarray analysis shows that the E. coli CORM-2 response is multifaceted, with a high number of differentially regulated genes spread through several functional categories, namely genes involved in inorganic ion transport and metabolism, regulators, and genes implicated in post-translational modification, such as chaperones. CORM-2 has a higher impact in E. coli cells grown anaerobically, as judged by the repression of genes belonging to eight functional classes which are not seen in the response of aerobically CORM-2-treated cells. The biological relevance of the variations caused by CORM-2 was substantiated by studying the CORM-2 sensitivity of selected E. coli mutants. The results show that the deletion of redox-sensing regulators SoxS and OxyR increased the sensitivity to CORM-2 and suggest that while SoxS plays an important role in protection against CORM-2 under both growth conditions, OxyR seems to participate only in the aerobic CORM-2 response. Under anaerobic conditions, we found that the heat-shock proteins IbpA and IbpB contribute to CORM-2 defence since the deletion of these genes increases the sensitivity of the strain. The induction of several met genes and the hypersensitivity to CORM-2 of the DeltametR, DeltametI and DeltametN mutant strains suggest that CO has effects on the methionine metabolism of E. coli. CORM-2 also affects the transcription of several E. coli biofilm-related genes and increases biofilm formation in E. coli. In particular, the absence of tqsA or bhsA increases the resistance of E. coli to CORM-2, and deletion of tsqA leads to a strain that has lost its capacity to form biofilm upon treatment with CORM-2. In spite of the relatively stable nature of the CO molecule, our results show that CO is able to trigger a significant alteration in the transcriptome of E. coli which necessarily has effects in several key metabolic pathways.
10aBacterial Genes10aBacterial/genetics10aBiofilms Carbon Monoxide/*metabolism Escherichia coli/*genetics/metabolism Escherichia coli Proteins/genetics/metabolism *Gene Expression Profiling Gene Expression Regulation10aRegulator Genetic Complementation Test Methionine/metabolism Microbial Viability Mutation Oligonucleotide Array Sequence Analysis Organometallic Compounds/*pharmacology Phenotype RNA1 aNobre, L., S.1 aAl-Shahrour, Fatima1 aDopazo, J.1 aSaraiva, L., M. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1924675201598nas a2200145 4500008004100000245006200041210006200103300001100165490000700176520111100183653001501294653002201309100001501331856010601346 2009 eng d00aFormulating and testing hypotheses in functional genomics0 aFormulating and testing hypotheses in functional genomics a97-1070 v453 aOBJECTIVE: The ultimate goal of any genome-scale experiment is to provide a functional interpretation of the results, relating the available genomic information to the hypotheses that originated the experiment. METHODS AND RESULTS: Initially, this interpretation has been made on a pre-selection of relevant genes, based on the experimental values, followed by the study of the enrichment in some functional properties. Nevertheless, functional enrichment methods, demonstrated to have a flaw: the first step of gene selection was too stringent given that the cooperation among genes was ignored. The assumption that modules of genes related by relevant biological properties (functionality, co-regulation, chromosomal location, etc.) are the real actors of the cell biology lead to the development of new procedures, inspired in systems biology criteria, generically known as gene-set methods. These methods have been successfully used to analyze transcriptomic and large-scale genotyping experiments as well as to test other different genome-scale hypothesis in other fields such as phylogenomics.
10ababelomics10agene set analysis1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1878965902142nas a2200253 4500008004100000245010200041210006900143300001100212490000700223520138600230653001501616653002401631100002401655700001801679700001601697700001401713700001501727700001601742700002001758700001501778700001701793700001501810856006301825 2008 eng d00aBabelomics: advanced functional profiling of transcriptomics, proteomics and genomics experiments0 aBabelomics advanced functional profiling of transcriptomics prot aW341-60 v363 aWe present a new version of Babelomics, a complete suite of web tools for the functional profiling of genome scale experiments, with new and improved methods as well as more types of functional definitions. Babelomics includes different flavours of conventional functional enrichment methods as well as more advanced gene set analysis methods that makes it a unique tool among the similar resources available. In addition to the well-known functional definitions (GO, KEGG), Babelomics includes new ones such as Biocarta pathways or text mining-derived functional terms. Regulatory modules implemented include transcriptional control (Transfac, CisRed) and other levels of regulation such as miRNA-mediated interference. Moreover, Babelomics allows for sub-selection of terms in order to test more focused hypothesis. Also gene annotation correspondence tables can be imported, which allows testing with user-defined functional modules. Finally, a tool for the ’de novo’ functional annotation of sequences has been included in the system. This allows using yet unannotated organisms in the program. Babelomics has been extensively re-engineered and now it includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. Babelomics is available at http://www.babelomics.org.
10ababelomics10afuntional profiling1 aAl-Shahrour, Fatima1 aCarbonell, J.1 aMinguez, P.1 aGoetz, S.1 aConesa, A.1 aTarraga, J.1 aMedina, Ignacio1 aAlloza, E.1 aMontaner, D.1 aDopazo, J. uhttp://nar.oxfordjournals.org/content/36/suppl_2/W341.long02872nas a2200241 4500008004100000245013000041210006900171300000700240490000600247520211000253653001302363653000902376653000802385100001702393700001802410700001302428700001402441700002002455700001502475700001502490700001902505856010602524 2008 eng d00aBiological processes, properties and molecular wiring diagrams of candidate low-penetrance breast cancer susceptibility genes0 aBiological processes properties and molecular wiring diagrams of a620 v13 aABSTRACT: BACKGROUND: Recent advances in whole-genome association studies (WGASs) for human cancer risk are beginning to provide the part lists of low-penetrance susceptibility genes. However, statistical analysis in these studies is complicated by the vast number of genetic variants examined and the weak effects observed, as a result of which constraints must be incorporated into the study design and analytical approach. In this scenario, biological attributes beyond the adjusted statistics generally receive little attention and, more importantly, the fundamental biological characteristics of low-penetrance susceptibility genes have yet to be determined. METHODS: We applied an integrative approach for identifying candidate low-penetrance breast cancer susceptibility genes, their characteristics and molecular networks through the analysis of diverse sources of biological evidence. RESULTS: First, examination of the distribution of Gene Ontology terms in ordered WGAS results identified asymmetrical distribution of Cell Communication and Cell Death processes linked to risk. Second, analysis of 11 different types of molecular or functional relationships in genomic and proteomic data sets defined the "omic" properties of candidate genes: i/ differential expression in tumors relative to normal tissue; ii/ somatic genomic copy number changes correlating with gene expression levels; iii/ differentially expressed across age at diagnosis; and iv/ expression changes after BRCA1 perturbation. Finally, network modeling of the effects of variants on germline gene expression showed higher connectivity than expected by chance between novel candidates and with known susceptibility genes, which supports functional relationships and provides mechanistic hypotheses of risk. CONCLUSION: This study proposes that cell communication and cell death are major biological processes perturbed in risk of breast cancer conferred by low-penetrance variants, and defines the common omic properties, molecular interactions and possible functional effects of candidate genes and proteins.
10agene set10aGWAS10aSNP1 aBonifaci, N.1 aBerenguer, A.1 aDiez, J.1 aReina, O.1 aMedina, Ignacio1 aDopazo, J.1 aMoreno, V.1 aPujana, M., A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1909423002659nas a2200253 4500008004100000245010800041210006900149300001000218490000700228520165100235653006101886653019201947100001402139700001302153700001302166700001802179700001702197700001502214700002002229700001602249700001502265700001902280856010602299 2008 eng d00aCLEAR-test: combining inference for differential expression and variability in microarray data analysis0 aCLEARtest combining inference for differential expression and va a33-450 v413 aA common goal of microarray experiments is to detect genes that are differentially expressed under distinct experimental conditions. Several statistical tests have been proposed to determine whether the observed changes in gene expression are significant. The t-test assigns a score to each gene on the basis of changes in its expression relative to its estimated variability, in such a way that genes with a higher score (in absolute values) are more likely to be significant. Most variants of the t-test use the complete set of genes to influence the variance estimate for each single gene. However, no inference is made in terms of the variability itself. Here, we highlight the problem of low observed variances in the t-test, when genes with relatively small changes are declared differentially expressed. Alternatively, the z-test could be used although, unlike the t-test, it can declare differentially expressed genes with high observed variances. To overcome this, we propose to combine the z-test, which focuses on large changes, with a chi(2) test to evaluate variability. We call this procedure CLEAR-test and we provide a combined p-value that offers a compromise between both aspects. Analysis of three publicly available microarray datasets reveals the greater performance of the CLEAR-test relative to the t-test and alternative methods. Finally, empirical and simulated data analyses demonstrate the greater reproducibility and statistical power of the CLEAR-test and z-test with respect to current alternative methods. In addition, the CLEAR-test improves the z-test by capturing reproducible genes with high variability.
10a*Algorithms Artificial Intelligence *Data Interpretation10aStatistical Gene Expression Profiling/*methods Gene Expression Regulation/*physiology Oligonucleotide Array Sequence Analysis/*methods Proteome/*metabolism Signal Transduction/*physiology1 aValls, J.1 aGrau, M.1 aSole, X.1 aHernandez, P.1 aMontaner, D.1 aDopazo, J.1 aPeinado, M., A.1 aCapella, G.1 aMoreno, V.1 aPujana, M., A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1759700903014nas a2200229 4500008004100000245012600041210006900167300001200236490000700248520187400255653014702129653025902276100002302535700001602558700001502574700002002589700001802609700002002627700001702647700001402664856010602678 2008 eng d00aControlled ovarian stimulation induces a functional genomic delay of the endometrium with potential clinical implications0 aControlled ovarian stimulation induces a functional genomic dela a4500-100 v933 aCONTEXT: Controlled ovarian stimulation induces morphological, biochemical, and functional genomic modifications of the human endometrium during the window of implantation. OBJECTIVE: Our objective was to compare the gene expression profile of the human endometrium in natural vs. controlled ovarian stimulation cycles throughout the early-mid secretory transition using microarray technology. METHOD: Microarray data from 49 endometrial biopsies obtained from LH+1 to LH+9 (n=25) in natural cycles and from human chorionic gonadotropin (hCG) +1 to hCG+9 in controlled ovarian stimulation cycles (n=24) were analyzed using different methods, such as clustering, profiling of biological processes, and selection of differentially expressed genes, as implemented in Gene Expression Pattern Analysis Suite and Babelomics programs. RESULTS: Endometria from natural cycles followed different genomic patterns compared with controlled ovarian stimulation cycles in the transition from the pre-receptive (days LH/hCG+1 until LH/hCG+5) to the receptive phase (day LH+7/hCG+7). Specifically, we have demonstrated the existence of a 2-d delay in the activation/repression of two clusters composed by 218 and 133 genes, respectively, on day hCG+7 vs. LH+7. Many of these delayed genes belong to the class window of implantation genes affecting basic biological processes in the receptive endometrium. CONCLUSIONS: These results demonstrate that gene expression profiling of the endometrium is different between natural and controlled ovarian stimulation cycles in the receptive phase. Identification of these differentially regulated genes can be used to understand the different developmental profiles of receptive endometrium during controlled ovarian stimulation and to search for the best controlled ovarian stimulation treatment in terms of minimal endometrial impact.
10aAlgorithms Chorionic Gonadotropin/genetics Endometrium/cytology/pathology/*physiology/physiopathology Female Gene Expression Regulation Genome10aHuman Glutathione Peroxidase/genetics Humans Insulin-Like Growth Factor Binding Proteins/genetics Luteal Phase/physiology Luteinizing Hormone/genetics Menstrual Cycle Oligonucleotide Array Sequence Analysis Ovulation Induction/*methods RNA/genetics/isola1 aHorcajadas, J., A.1 aMinguez, P.1 aDopazo, J.1 aEsteban, F., J.1 aDominguez, F.1 aGiudice, L., C.1 aPellicer, A.1 aSimon, C. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1869787001778nas a2200229 4500008004100000245010300041210006900144300001100213490000700224520087500231653007101106653013601177100001501313700001201328700002201340700001801362700001301380700001701393700001701410700001501427856010601442 2008 eng d00aDirect functional assessment of the composite phenotype through multivariate projection strategies0 aDirect functional assessment of the composite phenotype through a373-830 v923 aWe present a novel approach for the analysis of transcriptomics data that integrates functional annotation of gene sets with expression values in a multivariate fashion, and directly assesses the relation of functional features to a multivariate space of response phenotypical variables. Multivariate projection methods are used to obtain new correlated variables for a set of genes that share a given function. These new functional variables are then related to the response variables of interest. The analysis of the principal directions of the multivariate regression allows for the identification of gene function features correlated with the phenotype. Two different transcriptomics studies are used to illustrate the statistical and interpretative aspects of the methodology. We demonstrate the superiority of the proposed method over equivalent approaches.
10aBreast Neoplasms/genetics Computational Biology/*methods Databases10aGenetic Female Gene Expression Profiling/*statistics & numerical data Humans Mathematical Computing Multivariate Analysis Phenotype1 aConesa, A.1 aBro, R.1 aGarcia-Garcia, F.1 aPrats, J., M.1 aGotz, S.1 aKjeldahl, K.1 aMontaner, D.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1865288802205nas a2200301 4500008004100000245007600041210006900117300001200186490000700198520130100205653001001506653002901516100001601545700002001561700001801581700002101599700001601620700001501636700002401651700002301675700001401698700001601712700002201728700001501750700001701765700001501782856010601797 2008 eng d00aGEPAS, a web-based tool for microarray data analysis and interpretation0 aGEPAS a webbased tool for microarray data analysis and interpret aW308-140 v363 aGene Expression Profile Analysis Suite (GEPAS) is one of the most complete and extensively used web-based packages for microarray data analysis. During its more than 5 years of activity it has continuously been updated to keep pace with the state-of-the-art in the changing microarray data analysis arena. GEPAS offers diverse analysis options that include well established as well as novel algorithms for normalization, gene selection, class prediction, clustering and functional profiling of the experiment. New options for time-course (or dose-response) experiments, microarray-based class prediction, new clustering methods and new tests for differential expression have been included. The new pipeliner module allows automating the execution of sequential analysis steps by means of a simple but powerful graphic interface. An extensive re-engineering of GEPAS has been carried out which includes the use of web services and Web 2.0 technology features, a new user interface with persistent sessions and a new extended database of gene identifiers. GEPAS is nowadays the most quoted web tool in its field and it is extensively used by researchers of many countries and its records indicate an average usage rate of 500 experiments per day. GEPAS, is available at http://www.gepas.org.
10agepas10amicroarray data analysis1 aTarraga, J.1 aMedina, Ignacio1 aCarbonell, J.1 aHuerta-Cepas, J.1 aMinguez, P.1 aAlloza, E.1 aAl-Shahrour, Fatima1 aVegas-Azcarate, S.1 aGoetz, S.1 aEscobar, P.1 aGarcia-Garcia, F.1 aConesa, A.1 aMontaner, D.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1850880603310nas a2200841 4500008004100000245008100041210007100122300001100193490000600204520100100210653007501211653010801286100002201394700001501416700001401431700002001445700001401465700001501479700001501494700001101509700001401520700001401534700001301548700001601561700001901577700001901596700002101615700002201636700001301658700001401671700001101685700001801696700002101714700002201735700001401757700001601771700001501787700001301802700001301815700001401828700001501842700001701857700001301874700001601887700001601903700001801919700001601937700001901953700001601972700001801988700001502006700001702021700001602038700002102054700001902075700001502094700001402109700001602123700001502139700001702154700001902171700001802190700001502208700001902223700002602242700001402268700001602282700001502298700001602313700001502329700001802344856010602362 2008 eng d00aInteroperability with Moby 1.0–it’s better than sharing your toothbrush!0 aInteroperability with Moby 10–it s better than sharing your toot a220-310 v93 aThe BioMoby project was initiated in 2001 from within the model organism database community. It aimed to standardize methodologies to facilitate information exchange and access to analytical resources, using a consensus driven approach. Six years later, the BioMoby development community is pleased to announce the release of the 1.0 version of the interoperability framework, registry Application Programming Interface and supporting Perl and Java code-bases. Together, these provide interoperable access to over 1400 bioinformatics resources worldwide through the BioMoby platform, and this number continues to grow. Here we highlight and discuss the features of BioMoby that make it distinct from other Semantic Web Service and interoperability initiatives, and that have been instrumental to its deployment and use by a wide community of bioinformatics service providers. The standard, client software, and supporting code libraries are all freely available at http://www.biomoby.org/.
10aComputational Biology/*methods *Database Management Systems *Databases10aFactual Information Storage and Retrieval/*methods *Internet *Programming Languages Systems Integration1 aWilkinson, M., D.1 aSenger, M.1 aKawas, E.1 aBruskiewich, R.1 aGouzy, J.1 aNoirot, C.1 aBardou, P.1 aNg, A.1 aHaase, D.1 aEde, Saiz1 aWang, D.1 aGibbons, F.1 aGordon, P., M.1 aSensen, C., W.1 aCarrasco, J., M.1 aFernandez, J., M.1 aShen, L.1 aLinks, M.1 aNg, M.1 aOpushneva, N.1 aNeerincx, P., B.1 aLeunissen, J., A.1 aErnst, R.1 aTwigger, S.1 aUsadel, B.1 aGood, B.1 aWong, Y.1 aStein, L.1 aCrosby, W.1 aKarlsson, J.1 aRoyo, R.1 aParraga, I.1 aRamirez, S.1 aGelpi, J., L.1 aTrelles, O.1 aPisano, D., G.1 aJimenez, N.1 aKerhornou, A.1 aRosset, R.1 aZamacola, L.1 aTarraga, J.1 aHuerta-Cepas, J.1 aCarazo, J., M.1 aDopazo, J.1 aGuigo, R.1 aNavarro, A.1 aOrozco, M.1 aValencia, A.1 aClaros, M., G.1 aPerez, A., J.1 aAldana, J.1 aRojano, M., M.1 aCruz, Fernandez-Santa1 aNavas, I.1 aSchiltz, G.1 aFarmer, A.1 aGessler, D.1 aSchoof, H.1 aGroscurth, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1823880402702nas a2200253 4500008004100000245013300041210006900174300001100243490000700254520167500261653004701936653002901983653012302012653010502135100001602240700001402256700002002270700002102290700001802311700001502329700001702344700002002361856006702381 2008 eng d00aJoint annotation of coding and non-coding single nucleotide polymorphisms and mutations in the SNPeffect and PupaSuite databases0 aJoint annotation of coding and noncoding single nucleotide polym aD825-90 v363 aSingle nucleotide polymorphisms (SNPs) are, together with copy number variation, the primary source of variation in the human genome. SNPs are associated with altered response to drug treatment, susceptibility to disease and other phenotypic variation. Furthermore, during genetic screens for disease-associated mutations in groups of patients and control individuals, the distinction between disease causing mutation and polymorphism is often unclear. Annotation of the functional and structural implications of single nucleotide changes thus provides valuable information to interpret and guide experiments. The SNPeffect and PupaSuite databases are now synchronized to deliver annotations for both non-coding and coding SNP, as well as annotations for the SwissProt set of human disease mutations. In addition, SNPeffect now contains predictions of Tango2: an improved aggregation detector, and Waltz: a novel predictor of amyloid-forming sequences, as well as improved predictors for regions that are recognized by the Hsp70 family of chaperones. The new PupaSuite version incorporates predictions for SNPs in silencers and miRNAs including their targets, as well as additional methods for predicting SNPs in TFBSs and splice sites. Also predictions for mouse and rat genomes have been added. In addition, a PupaSuite web service has been developed to enable data access, programmatically. The combined database holds annotations for 4,965,073 regulatory as well as 133,505 coding human SNPs and 14,935 disease mutations, and phenotypic descriptions of 43,797 human proteins and is accessible via http://snpeffect.vib.be and http://pupasuite.bioinfo.cipf.es/.
10aAmino Acid Substitution Animals *Databases10aGenetic Genetic Diseases10aInborn/genetics HSP70 Heat-Shock Proteins/metabolism Humans Internet Mice MicroRNAs/metabolism *Mutation *Polymorphism10aSingle Nucleotide Proteins/chemistry/genetics RNA Splice Sites Rats Transcription Factors/metabolism1 aReumers, J.1 aConde, L.1 aMedina, Ignacio1 aMaurer-Stroh, S.1 aVan Durme, J.1 aDopazo, J.1 aRousseau, F.1 aSchymkowitz, J. uhttp://nar.oxfordjournals.org/cgi/content/full/36/suppl_1/D82503141nas a2200313 4500008004100000245013000041210006900171300001200240490000700252520169500259653011401954653003602068653016902104653009402273653012702367100002202494700002602516700001402542700001602556700003002572700001702602700001702619700002002636700001502656700001702671700001602688700001702704856010602721 2008 eng d00aMolecular profiling related to poor prognosis in thyroid carcinoma. Combining gene expression data and biological information0 aMolecular profiling related to poor prognosis in thyroid carcino a1554-610 v273 aUndifferentiated and poorly differentiated thyroid tumors are responsible for more than half of thyroid cancer patient deaths in spite of their low incidence. Conventional treatments do not obtain substantial benefits, and the lack of alternative approaches limits patient survival. Additionally, the absence of prognostic markers for well-differentiated tumors complicates patient-specific treatments and favors the progression of recurrent forms. In order to recognize the molecular basis involved in tumor dedifferentiation and identify potential markers for thyroid cancer prognosis prediction, we analysed the expression profile of 44 thyroid primary tumors with different degrees of dedifferentiation and aggressiveness using cDNA microarrays. Transcriptome comparison of dedifferentiated and well-differentiated thyroid tumors identified 1031 genes with >2-fold difference in absolute values and false discovery rate of <0.15. According to known molecular interaction and reaction networks, the products of these genes were mainly clustered in the MAPkinase signaling pathway, the TGF-beta signaling pathway, focal adhesion and cell motility, activation of actin polymerization and cell cycle. An exhaustive search in several databases allowed us to identify various members of the matrix metalloproteinase, melanoma antigen A and collagen gene families within the upregulated gene set. We also identified a prognosis classifier comprising just 30 transcripts with an overall accuracy of 95%. These findings may clarify the molecular mechanisms involved in thyroid tumor dedifferentiation and provide a potential prognosis predictor as well as targets for new therapies.
10aAdenoma/genetics/metabolism/pathology Adolescent Adult Aged Carcinoma/genetics/metabolism/pathology Carcinoma10aBiological/*genetics/metabolism10aNeoplasm/genetics/metabolism Reverse Transcriptase Polymerase Chain Reaction Signal Transduction Thyroid Neoplasms/classification/*genetics/metabolism Tumor Markers10aNeoplastic Humans Male Middle Aged *Oligonucleotide Array Sequence Analysis Prognosis RNA10aPapillary/genetics/metabolism/pathology Cell Differentiation Female *Gene Expression Profiling *Gene Expression Regulation1 aMontero-Conde, C.1 aMartin-Campos, J., M.1 aLerma, E.1 aGimenez, G.1 aMartinez-Guitarte, J., L.1 aCombalia, N.1 aMontaner, D.1 aMatias-Guiu, X.1 aDopazo, J.1 ade Leiva, A.1 aRobledo, M.1 aMauricio, D. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1787390802003nas a2200181 4500008004100000245007400041210006900115300001100184490000700195520123100202653013101433653008301564100002101647700001401668700001501682700001801697856010601715 2008 eng d00aPhylomeDB: a database for genome-wide collections of gene phylogenies0 aPhylomeDB a database for genomewide collections of gene phylogen aD491-60 v363 aThe complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es.10aAncient Humans *Phylogeny Proteins/classification/genetics Saccharomyces cerevisiae/classification/genetics Sequence Alignment10aBase Sequence Escherichia coli/classification/genetics Genes *Genomics History1 aHuerta-Cepas, J.1 aBueno, A.1 aDopazo, J.1 aGabaldón, T. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1796229703097nas a2200757 4500008004100000245006200041210006200103300001000165490000700175520101900182653004201201653001201243653007801255653004301333653006701376100001301443700001301456700002101469700001501490700002001505700001301525700001501538700001701553700001501570700001501585700001501600700001701615700001601632700001701648700001401665700001501679700001501694700001501709700001301724700001501737700001301752700001301765700001301778700001701791700001501808700001601823700001601839700002001855700002001875700001601895700002001911700001301931700001501944700002701959700001601986700001702002700001802019700001502037700001902052700001502071700001702086700001902103700001802122700001602140700001502156700001402171700001702185700001602202700001502218856010602233 2008 eng d00aSNP and haplotype mapping for genetic analysis in the rat0 aSNP and haplotype mapping for genetic analysis in the rat a560-60 v403 aThe laboratory rat is one of the most extensively studied model organisms. Inbred laboratory rat strains originated from limited Rattus norvegicus founder populations, and the inherited genetic variation provides an excellent resource for the correlation of genotype to phenotype. Here, we report a survey of genetic variation based on almost 3 million newly identified SNPs. We obtained accurate and complete genotypes for a subset of 20,238 SNPs across 167 distinct inbred rat strains, two rat recombinant inbred panels and an F2 intercross. Using 81% of these SNPs, we constructed high-density genetic maps, creating a large dataset of fully characterized SNPs for disease gene mapping. Our data characterize the population structure and illustrate the degree of linkage disequilibrium. We provide a detailed SNP map and demonstrate its utility for mapping of quantitative trait loci. This community resource is openly available and augments the genetic tools for this workhorse of physiological studies.
10aAnimals Chromosome Mapping *Databases10aGenetic10aGenetic Genome *Haplotypes Linkage Disequilibrium Phylogeny *Polymorphism10aInbred Strains/*genetics Recombination10aSingle Nucleotide *Quantitative Trait Loci Rats/*genetics Rats1 aSaar, K.1 aBeck, A.1 aBihoreau, M., T.1 aBirney, E.1 aBrocklebank, D.1 aChen, Y.1 aCuppen, E.1 aDemonchy, S.1 aDopazo, J.1 aFlicek, P.1 aFoglio, M.1 aFujiyama, A.1 aGut, I., G.1 aGauguier, D.1 aGuigo, R.1 aGuryev, V.1 aHeinig, M.1 aHummel, O.1 aJahn, N.1 aKlages, S.1 aKren, V.1 aKube, M.1 aKuhl, H.1 aKuramoto, T.1 aKuroki, Y.1 aLechner, D.1 aLee, Y., A.1 aLopez-Bigas, N.1 aLathrop, G., M.1 aMashimo, T.1 aMedina, Ignacio1 aMott, R.1 aPatone, G.1 aPerrier-Cornet, J., A.1 aPlatzer, M.1 aPravenec, M.1 aReinhardt, R.1 aSakaki, Y.1 aSchilhabel, M.1 aSchulz, H.1 aSerikawa, T.1 aShikhagaie, M.1 aTatsumoto, S.1 aTaudien, S.1 aToyoda, A.1 aVoigt, B.1 aZelenika, D.1 aZimdahl, H.1 aHubner, N. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1844359402909nas a2200241 4500008004100000245013300041210006900174300001200243490000700255520182100262653009602083653009302179653007402272653002302346653008902369100001802458700001502476700001602491700001502507700001502522700002402537856010602561 2008 eng d00aUse of estimated evolutionary strength at the codon level improves the prediction of disease-related protein mutations in humans0 aUse of estimated evolutionary strength at the codon level improv a198-2040 v293 aPredicting the functional impact of protein variation is one of the most challenging problems in bioinformatics. A rapidly growing number of genome-scale studies provide large amounts of experimental data, allowing the application of rigorous statistical approaches for predicting whether a given single point mutation has an impact on human health. Up until now, existing methods have limited their source data to either protein or gene information. Novel in this work, we take advantage of both and focus on protein evolutionary information by using estimated selective pressures at the codon level. Here we introduce a new method (SeqProfCod) to predict the likelihood that a given protein variant is associated with human disease or not. Our method relies on a support vector machine (SVM) classifier trained using three sources of information: protein sequence, multiple protein sequence alignments, and the estimation of selective pressure at the codon level. SeqProfCod has been benchmarked with a large dataset of 8,987 single point mutations from 1,434 human proteins from SWISS-PROT. It achieves 82% overall accuracy and a correlation coefficient of 0.59, indicating that the estimation of the selective pressure helps in predicting the functional impact of single-point mutations. Moreover, this study demonstrates the synergic effect of combining two sources of information for predicting the functional effects of protein variants: protein sequence/profile-based information and the evolutionary estimation of the selective pressures at the codon level. The results of large-scale application of SeqProfCod over all annotated point mutations in SWISS-PROT (available for download at http://sgu.bioinfo.cipf.es/services/Omidios/; last accessed: 24 August 2007), could be used to support clinical studies.10aAlgorithms Codon/genetics Computational Biology/*methods *DNA Mutational Analysis Databases10aHuman Humans Iduronic Acid/analogs & derivatives/metabolism *Point Mutation Polymorphism10aMolecular *Genetic Predisposition to Disease Genetic Variation Genome10aProtein *Evolution10aSingle Nucleotide Proteins/chemistry/*genetics Tumor Suppressor Protein p53/genetics1 aCapriotti, E.1 aArbiza, L.1 aCasadio, R.1 aDopazo, J.1 aDopazo, H.1 aMarti-Renom, M., A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1793514802500nas a2200253 4500008004100000245008800041210006900129300000700198490001400205520139900219653007701618653005701695653019901752653003901951653002701990100002402017700001402041700002402055700001802079700001502097700001502112700001302127856010602140 2007 eng d00aThe AnnoLite and AnnoLyze programs for comparative annotation of protein structures0 aAnnoLite and AnnoLyze programs for comparative annotation of pro aS40 v8 Suppl 43 aBACKGROUND: Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. DESCRIPTION: AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of 90% and average precision of 80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of 70% and average precision of 30%, correctly localizing binding sites for small molecules in 95% of its predictions. CONCLUSION: The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at http://salilab.org/DBAli/.10a*Algorithms Amino Acid Sequence Confidence Intervals Data Interpretation10aAmino Acid *Software Structure-Activity Relationship10aProtein Information Storage and Retrieval/methods Molecular Sequence Data Proteins/*chemistry/classification/*metabolism Sensitivity and Specificity Sequence Alignment/*methods Sequence Analysis10aProtein/*methods Sequence Homology10aStatistical *Databases1 aMarti-Renom, M., A.1 aRossi, A.1 aAl-Shahrour, Fatima1 aDavis, F., P.1 aPieper, U.1 aDopazo, J.1 aSali, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1757014703226nas a2200433 4500008004100000245012400041210006900165300001100234490000700245520156900252653002601821653003101847653001201878653013601890653023502026653003902261100002202300700002202322700001802344700001702362700001602379700001402395700001502409700001902424700001402443700001602457700002402473700002302497700002302520700001802543700001602561700002302577700001602600700002002616700001502636700001902651700001602670856010602686 2007 eng d00aAssociation study of 69 genes in the ret pathway identifies low-penetrance loci in sporadic medullary thyroid carcinoma0 aAssociation study of 69 genes in the ret pathway identifies lowp a9561-70 v673 aTo date, few association studies have been done to better understand the genetic basis for the development of sporadic medullary thyroid carcinoma (sMTC). To identify additional low-penetrance genes, we have done a two-stage case-control study in two European populations using high-throughput genotyping. We selected 417 single nucleotide polymorphisms (SNP) belonging to 69 genes either related to RET signaling pathway/functions or involved in key processes for cancer development. TagSNPs and functional variants were included where possible. These SNPs were initially studied in the largest known series of sMTC cases (n = 266) and controls (n = 422), all of Spanish origin. In stage II, an independent British series of 155 sMTC patients and 531 controls was included to validate the previous results. Associations were assessed by an exhaustive analysis of individual SNPs but also considering gene- and linkage disequilibrium-based haplotypes. This strategy allowed us to identify seven low-penetrance genes, six of them (STAT1, AURKA, BCL2, CDKN2B, CDK6, and COMT) consistently associated with sMTC risk in the two case-control series and a seventh (HRAS) with individual SNPs and haplotypes associated with sMTC in the Spanish data set. The potential role of CDKN2B was confirmed by a functional assay showing a role of a SNP (rs7044859) in the promoter region in altering the binding of the transcription factor HNF1. These results highlight the utility of association studies using homogeneous series of cases for better understanding complex diseases.10a80 and over Carcinoma10aAdolescent Adult Aged Aged10aGenetic10aGenetic Proto-Oncogene Proteins c-ret/*genetics/metabolism Signal Transduction Thyroid Neoplasms/*genetics/metabolism Transcription10aMedullary/*genetics/metabolism Case-Control Studies Cyclin-Dependent Kinase Inhibitor p15/biosynthesis/genetics Female Genetic Predisposition to Disease Germ-Line Mutation Haplotypes Humans Male Middle Aged Penetrance Polymorphism10aSingle Nucleotide Promoter Regions1 aRuiz-Llorente, S.1 aMontero-Conde, C.1 aMilne, R., L.1 aMoya, C., M.1 aCebrian, A.1 aLeton, R.1 aCascon, A.1 aMercadillo, F.1 aLanda, I.1 aBorrego, S.1 ade Nanclares, Perez1 aAlvarez-Escola, C.1 aDiaz-Perez, J., A.1 aCarracedo, A.1 aUrioste, M.1 aGonzalez-Neira, A.1 aBenitez, J.1 aSantisteban, P.1 aDopazo, J.1 aPonder, B., A.1 aRobledo, M. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1790906702151nas a2200277 4500008004100000245005200041210005100093300001100144490000700155520104000162653008701202653005701289653019401346653003901540653002701579100002401606700001501630700002401645700001401669700001401683700001801697700002401715700001501739700001301754856010601767 2007 eng d00aDBAli tools: mining the protein structure space0 aDBAli tools mining the protein structure space aW393-70 v353 aThe DBAli tools use a comprehensive set of structural alignments in the DBAli database to leverage the structural information deposited in the Protein Data Bank (PDB). These tools include (i) the DBAlit program that allows users to input the 3D coordinates of a protein structure for comparison by MAMMOTH against all chains in the PDB; (ii) the AnnoLite and AnnoLyze programs that annotate a target structure based on its stored relationships to other structures; (iii) the ModClus program that clusters structures by sequence and structure similarities; (iv) the ModDom program that identifies domains as recurrent structural fragments and (v) an implementation of the COMPARER method in the SALIGN command in MODELLER that creates a multiple structure alignment for a set of related protein structures. Thus, the DBAli tools, which are freely accessible via the World Wide Web at http://salilab.org/DBAli/, allow users to mine the protein structure space by establishing relationships between protein structures and their functions.10a*Algorithms Amino Acid Sequence Computational Biology/*methods Data Interpretation10aAmino Acid *Software Structure-Activity Relationship10aProtein Internet Molecular Sequence Data Protein Conformation Proteins/*chemistry/classification/*metabolism Pseudomonas aeruginosa/*metabolism Sequence Alignment/*methods Sequence Analysis10aProtein/*methods Sequence Homology10aStatistical *Databases1 aMarti-Renom, M., A.1 aPieper, U.1 aMadhusudhan, M., S.1 aRossi, A.1 aEswar, N.1 aDavis, F., P.1 aAl-Shahrour, Fatima1 aDopazo, J.1 aSali, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1747851302784nas a2200301 4500008004100000245006900041210006800110300000800178490000600186520165900192653002501851653002201876653001901898653006101917653007001978653003402048653013602082100001802218700002102236700001702257700002402274700001402298700001402312700001602326700001502342700001902357856010602376 2007 eng d00aEvidence for systems-level molecular mechanisms of tumorigenesis0 aEvidence for systemslevel molecular mechanisms of tumorigenesis a1850 v83 aBACKGROUND: Cancer arises from the consecutive acquisition of genetic alterations. Increasing evidence suggests that as a consequence of these alterations, molecular interactions are reprogrammed in the context of highly connected and regulated cellular networks. Coordinated reprogramming would allow the cell to acquire the capabilities for malignant growth. RESULTS: Here, we determine the coordinated function of cancer gene products (i.e., proteins encoded by differentially expressed genes in tumors relative to healthy tissue counterparts, hereafter referred to as "CGPs") defined as their topological properties and organization in the interactome network. We show that CGPs are central to information exchange and propagation and that they are specifically organized to promote tumorigenesis. Centrality is identified by both local (degree) and global (betweenness and closeness) measures, and systematically appears in down-regulated CGPs. Up-regulated CGPs do not consistently exhibit centrality, but both types of cancer products determine the overall integrity of the network structure. In addition to centrality, down-regulated CGPs show topological association that correlates with common biological processes and pathways involved in tumorigenesis. CONCLUSION: Given the current limited coverage of the human interactome, this study proposes that tumorigenesis takes place in a specific and organized way at the molecular systems-level and suggests a model that comprises the precise down-regulation of groups of topologically-associated proteins involved in particular functions, orchestrated with the up-regulation of specific proteins.10a*Cell Transformation10aBiological Models10aGenetic Models10aMessenger/metabolism Signal Transduction Systems Biology10aNeoplastic *Gene Expression Profiling *Gene Expression Regulation10aNeoplastic Humans Male Models10aStatistical Neoplasm Proteins/*physiology Neoplasms/etiology/*genetics Prostatic Neoplasms/genetics Protein Interaction Mapping RNA1 aHernandez, P.1 aHuerta-Cepas, J.1 aMontaner, D.1 aAl-Shahrour, Fatima1 aValls, J.1 aGomez, L.1 aCapella, G.1 aDopazo, J.1 aPujana, M., A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1758491502196nas a2200217 4500008004100000245016500041210006900206300001000275490000700285520140700292653001501699653003501714100002401749700001601773700001601789700002001805700001501825700001701840700001501857856010601872 2007 eng d00aFatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments0 aFatiGO a functional profiling tool for genomic data Integration aW91-60 v353 aThe ultimate goal of any genome-scale experiment is to provide a functional interpretation of the data, relating the available information with the hypotheses that originated the experiment. Thus, functional profiling methods have become essential in diverse scenarios such as microarray experiments, proteomics, etc. We present the FatiGO+, a web-based tool for the functional profiling of genome-scale experiments, specially oriented to the interpretation of microarray experiments. In addition to different functional annotations (gene ontology, KEGG pathways, Interpro motifs, Swissprot keywords and text-mining based bioentities related to diseases and chemical compounds) FatiGO+ includes, as a novelty, regulatory and structural information. The regulatory information used includes predictions of targets for distinct regulatory elements (obtained from the Transfac and CisRed databases). Additionally FatiGO+ uses predictions of target motifs of miRNA to infer which of these can be activated or deactivated in the sample of genes studied. Finally, properties of gene products related to their relative location and connections in the interactome have also been used. Also, enrichment of any of these functional terms can be directly analysed on chromosomal coordinates. FatiGO+ can be found at: http://www.fatigoplus.org and within the Babelomics environment http://www.babelomics.org.
10ababelomics10afunctional enrichment analysys1 aAl-Shahrour, Fatima1 aMinguez, P.1 aTarraga, J.1 aMedina, Ignacio1 aAlloza, E.1 aMontaner, D.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1747850402367nas a2200229 4500008004100000245007200041210006900113300000800182490000600190520145600196653010501652653001501757653013601772100002401908700001501932700001501947700002101962700001601983700001701999700001502016856010602031 2007 eng d00aFrom genes to functional classes in the study of biological systems0 aFrom genes to functional classes in the study of biological syst a1140 v83 aBACKGROUND: With the popularization of high-throughput techniques, the need for procedures that help in the biological interpretation of results has increased enormously. Recently, new procedures inspired in systems biology criteria have started to be developed. RESULTS: Here we present FatiScan, a web-based program which implements a threshold-independent test for the functional interpretation of large-scale experiments that does not depend on the pre-selection of genes based on the multiple application of independent tests to each gene. The test implemented aims to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes. In addition, the test does not depend on the type of the data used for obtaining significance values, and consequently different types of biologically informative terms (gene ontology, pathways, functional motifs, transcription factor binding sites or regulatory sites from CisRed) can be applied to different classes of genome-scale studies. We exemplify its application in microarray gene expression, evolution and interactomics. CONCLUSION: Methods for gene set enrichment which, in addition, are independent from the original data and experimental design constitute a promising alternative for the functional profiling of genome-scale experiments. A web server that performs the test described and other similar ones can be found at: http://www.babelomics.org.
10aAlgorithms Chromosome Mapping/*methods Computer Simulation Gene Expression Profiling/methods *Models10ababelomics10aBiological Multigene Family/*physiology Signal Transduction/*physiology *Software Systems Biology/*methods *User-Computer Interface1 aAl-Shahrour, Fatima1 aArbiza, L.1 aDopazo, H.1 aHuerta-Cepas, J.1 aMinguez, P.1 aMontaner, D.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1740759601696nas a2200193 4500008004100000245009300041210006900134300001000203490000600213520105200219653001501271100001401286700001701300700002401317700001601341700002401357700001501381856010601396 2007 eng d00aFunctional profiling and gene expression analysis of chromosomal copy number alterations0 aFunctional profiling and gene expression analysis of chromosomal a432-50 v13 aContrarily to the traditional view in which only one or a few key genes were supposed to be the causative factors of diseases, we discuss the importance of considering groups of functionally related genes in the study of pathologies characterised by chromosomal copy number alterations. Recent observations have reported the existence of regions in higher eukaryotic chromosomes (including humans) containing genes of related function that show a high degree of coregulation. Copy number alterations will consequently affect to clusters of functionally related genes, which will be the final causative agents of the diseased phenotype, in many cases. Therefore, we propose that the functional profiling of the regions affected by copy number alterations must be an important aspect to take into account in the understanding of this type of pathologies. To illustrate this, we present an integrated study of DNA copy number variations, gene expression along with the functional profiling of chromosomal regions in a case of multiple myeloma.
10ababelomics1 aConde, L.1 aMontaner, D.1 aBurguet-Castell, J.1 aTarraga, J.1 aAl-Shahrour, Fatima1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1759793501608nas a2200193 4500008004100000245008900041210006900130300001100199490000700210520077100217653003900988653001501027653019401042100001601236700002401252700001701276700001501293856010601308 2007 eng d00aFunctional profiling of microarray experiments using text-mining derived bioentities0 aFunctional profiling of microarray experiments using textmining a3098-90 v233 aMOTIVATION: The increasing use of microarray technologies brought about a parallel demand in methods for the functional interpretation of the results. Beyond the conventional functional annotations for genes, such as gene ontology, pathways, etc. other sources of information are still to be exploited. Text-mining methods allow extracting informative terms (bioentities) with different functional, chemical, clinical, etc. meanings, that can be associated to genes. We show how to use these associations within an appropriate statistical framework and how to apply them through easy-to-use, web-based environments to the functional interpretation of microarray experiments. Functional enrichment and gene set enrichment tests using bioentities are presented.
10aArtificial Intelligence *Databases10ababelomics10aProtein Gene Expression Profiling/*methods Information Storage and Retrieval/*methods *Natural Language Processing Proteins/*classification/*metabolism Research/*methods Systems Integration1 aMinguez, P.1 aAl-Shahrour, Fatima1 aMontaner, D.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1785541502363nas a2200193 4500008004100000245002200041210001800063300000900081490000600090520176400096653003301860653000801893653009301901100002101994700001502015700001502030700001802045856010602063 2007 eng d00aThe human phylome0 ahuman phylome aR1090 v83 aBACKGROUND: Phylogenomics analyses serve to establish evolutionary relationships among organisms and their genes. A phylome, the complete collection of all gene phylogenies in a genome, constitutes a valuable source of information, but its use in large genomes still constitutes a technical challenge. The use of phylomes also requires the development of new methods that help us to interpret them. RESULTS: We reconstruct here the human phylome, which includes the evolutionary relationships of all human proteins and their homologs among 39 fully sequenced eukaryotes. Phylogenetic techniques used include alignment trimming, branch length optimization, evolutionary model testing and maximum likelihood and Bayesian methods. Although differences with alternative topologies are minor, most of the trees support the Coelomata and Unikont hypotheses as well as the grouping of primates with laurasatheria to the exclusion of rodents. We assess the extent of gene duplication events and their relationship with the functional roles of the protein families involved. We find support for at least one, and probably two, rounds of whole genome duplications before vertebrate radiation. Using a novel algorithm that is independent from a species phylogeny, we derive orthology and paralogy relationships of human proteins among eukaryotic genomes. CONCLUSION: Topological variations among phylogenies for different genes are to be expected, highlighting the danger of gene-sampling effects in phylogenomic analyses. Several links can be established between the functions of gene families duplicated at certain phylogenetic splits and major evolutionary transitions in those lineages. The pipeline implemented here can be easily adapted for use in other organisms.10aAnimals *Evolution Evolution10aDNA10aMolecular Gene Duplication *Genome Humans *Phylogeny Proteins/genetics Sequence Analysis1 aHuerta-Cepas, J.1 aDopazo, H.1 aDopazo, J.1 aGabaldón, T. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1756792401792nas a2200217 4500008004100000245012200041210006900163300001000232490000700242520078800249653013601037653016501173100001401338700001701352700002401369700001601393700002001409700002401429700001501453856010601468 2007 eng d00aISACGH: a web-based environment for the analysis of Array CGH and gene expression which includes functional profiling0 aISACGH a webbased environment for the analysis of Array CGH and aW81-50 v353 aWe present the ISACGH, a web-based system that allows for the combination of genomic data with gene expression values and provides different options for functional profiling of the regions found. Several visualization options offer a convenient representation of the results. Different efficient methods for accurate estimation of genomic copy number from array-CGH hybridization data have been included in the program. Moreover, the connection to the gene expression analysis package GEPAS allows the use of different facilities for data pre-processing and analysis. A DAS server allows exporting the results to the Ensembl viewer where contextual genomic information can be obtained. The program is freely available at: http://isacgh.bioinfo.cipf.es or within http://www.gepas.org.10aAnimals Cluster Analysis Computational Biology/*methods Computer Graphics Gene Expression Profiling/*methods Humans Internet Models10aGenetic *Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis/*methods Programming Languages *Software Systems Integration User-Computer Interface1 aConde, L.1 aMontaner, D.1 aBurguet-Castell, J.1 aTarraga, J.1 aMedina, Ignacio1 aAl-Shahrour, Fatima1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1746849902437nas a2200265 4500008004100000245009200041210006900133300001100202490000700213520143600220653005301656653002601709653002201735653005701757653004501814653008601859100001601945700002001961700001501981700002101996700001802017700001502035700001502050856010602065 2007 eng d00aPhylemon: a suite of web tools for molecular evolution, phylogenetics and phylogenomics0 aPhylemon a suite of web tools for molecular evolution phylogenet aW38-420 v353 aPhylemon is an online platform for phylogenetic and evolutionary analyses of molecular sequence data. It has been developed as a web server that integrates a suite of different tools selected among the most popular stand-alone programs in phylogenetic and evolutionary analysis. It has been conceived as a natural response to the increasing demand of data analysis of many experimental scientists wishing to add a molecular evolution and phylogenetics insight into their research. Tools included in Phylemon cover a wide yet selected range of programs: from the most basic for multiple sequence alignment to elaborate statistical methods of phylogenetic reconstruction including methods for evolutionary rates analyses and molecular adaptation. Phylemon has several features that differentiates it from other resources: (i) It offers an integrated environment that enables the direct concatenation of evolutionary analyses, the storage of results and handles required data format conversions, (ii) Once an outfile is produced, Phylemon suggests the next possible analyses, thus guiding the user and facilitating the integration of multi-step analyses, and (iii) users can define and save complete pipelines for specific phylogenetic analysis to be automatically used on many genes in subsequent sessions or multiple genes in a single session (phylogenomics). The Phylemon web server is available at http://phylemon.bioinfo.cipf.es.10aAnimals Computational Biology/*methods Databases10aDNA Sequence Analysis10aGenetic Evolution10aMolecular Genetic Techniques Humans *Internet Models10aProtein Software User-Computer Interface10aStatistical *Phylogeny Programming Languages Sequence Alignment Sequence Analysis1 aTarraga, J.1 aMedina, Ignacio1 aArbiza, L.1 aHuerta-Cepas, J.1 aGabaldón, T.1 aDopazo, J.1 aDopazo, H. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1745234601384nas a2200193 4500008004100000245007300041210006900114300001000183490000700193520078700200653001500987653001001002653001501012100002001027700001701047700001601064700001501080856009501095 2007 eng d00aProphet, a web-based tool for class prediction using microarray data0 aProphet a webbased tool for class prediction using microarray da a390-10 v233 aSample classification and class prediction is the aim of many gene expression studies. We present a web-based application, Prophet, which builds prediction rules and allows using them for further sample classification. Prophet automatically chooses the best classifier, along with the optimal selection of genes, using a strategy that renders unbiased cross-validated errors. Prophet is linked to different microarray data analysis modules, and includes a unique feature: the possibility of performing the functional interpretation of the molecular signature found. Availability: Prophet can be found at the URL http://prophet.bioinfo.cipf.es/ or within the GEPAS package at http://www.gepas.org/ Supplementary information: http://gepas.bioinfo.cipf.es/tutorial/prophet.html.
10ababelomics10agepas10apredictors1 aMedina, Ignacio1 aMontaner, D.1 aTarraga, J.1 aDopazo, J. uhttp://bioinformatics.oxfordjournals.org/cgi/content/full/23/3/390?view=long&pmid=1713858701445nas a2200253 4500008004100000245010300041210006900144300001100213490000700224520068700231653001500918653002500933100002400958700001600982700001600998700001701014700001501031700002301046700001401069700001701083700001301100700001501113856006301128 2006 eng d00aBABELOMICS: a systems biology perspective in the functional annotation of genome-scale experiments0 aBABELOMICS a systems biology perspective in the functional annot aW472-60 v343 aWe present a new version of Babelomics, a complete suite of web tools for functional analysis of genome-scale experiments, with new and improved tools. New functionally relevant terms have been included such as CisRed motifs or bioentities obtained by text-mining procedures. An improved indexing has considerably speeded up several of the modules. An improved version of the FatiScan method for studying the coordinate behaviour of groups of functionally related genes is presented, along with a similar tool, the Gene Set Enrichment Analysis. Babelomics is now more oriented to test systems biology inspired hypotheses. Babelomics can be found at http://www.babelomics.org.
10ababelomics10afunctional profiling1 aAl-Shahrour, Fatima1 aMinguez, P.1 aTarraga, J.1 aMontaner, D.1 aAlloza, E.1 aVaquerizas, J., M.1 aConde, L.1 aBlaschke, C.1 aVera, J.1 aDopazo, J. uhttp://nar.oxfordjournals.org/content/34/suppl_2/W472.long01043nas a2200121 4500008004100000245005300041210005200094300001100146490000600157520063700163100001500800856010600815 2006 eng d00aBioinformatics and cancer: an essential alliance0 aBioinformatics and cancer an essential alliance a409-150 v83 aModern research in cancer has been revolutionized by the introduction of new high-throughput methodologies such as DNA microarrays. Keeping the pace with these technologies, the bioinformatics offer new solutions for data analysis and, what is more important, it permits to formulate a new class of hypothesis inspired in systems biology, more oriented to blocks of functionally-related genes. Although software implementations for this new methodologies is new there are some options already available. Bioinformatic solutions for other high-throughput techniques such as array-CGH of large-scale genotyping is also revised.
1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1679039300743nas a2200169 4500008004100000245006300041210006300104300000800167490000600175520016700181653003400348653002200382653003500404100001500439700001300454856010600467 2006 eng d00aDiscovery and hypothesis generation through bioinformatics0 aDiscovery and hypothesis generation through bioinformatics a3070 v73 aA report on the 4th European Conference on Computational Biology and the 6th Spanish Annual Meeting on Bioinformatics, Madrid, Spain, 28 September-1 October 2005.10a*Computational Biology Genome10aGenetic Phylogeny10aHuman *Genomics Humans *Models1 aDopazo, J.1 aAloy, P. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1652222402830nas a2200277 4500008004100000245011000041210006900151300001100220490000700231520162300238653016301861653002002024653018602044653004602230100001802276700001402294700002302308700001802331700001402349700001702363700001502380700001902395700001602414700001602430856010602446 2006 eng d00aERCC4 associated with breast cancer risk: a two-stage case-control study using high-throughput genotyping0 aERCC4 associated with breast cancer risk a twostage casecontrol a9420-70 v663 aThe failure of linkage studies to identify further high-penetrance susceptibility genes for breast cancer points to a polygenic model, with more common variants having modest effects on risk, as the most likely candidate. We have carried out a two-stage case-control study in two European populations to identify low-penetrance genes for breast cancer using high-throughput genotyping. Single-nucleotide polymorphisms (SNPs) were selected across preselected cancer-related genes, choosing tagSNPs and functional variants where possible. In stage 1, genotype frequencies for 640 SNPs in 111 genes were compared between 864 breast cancer cases and 845 controls from the Spanish population. In stage 2, candidate SNPs identified in stage 1 (nominal P < 0.01) were tested in a Finnish series of 884 cases and 1,104 controls. Of the 10 candidate SNPs in seven genes identified in stage 1, one (rs744154) on intron 1 of ERCC4, a gene belonging to the nucleotide excision repair pathway, was associated with recessive protection from breast cancer after adjustment for multiple testing in stage 2 (odds ratio, 0.57; Bonferroni-adjusted P = 0.04). After considering potential functional SNPs in the region of high linkage disequilibrium that extends across the entire gene and upstream into the promoter region, we concluded that rs744154 itself could be causal. Although intronic, it is located on the first intron, in a region that is highly conserved across species, and could therefore be functionally important. This study suggests that common intronic variation in ERCC4 is associated with protection from breast cancer.10a80 and over Breast Neoplasms/epidemiology/*genetics/pathology Case-Control Studies DNA-Binding Proteins/genetics/*physiology Female Finland/epidemiology Genes10aAdult Aged Aged10aRecessive Genetic Predisposition to Disease Genotype Humans Introns/genetics Linkage Disequilibrium Middle Aged Neoplasm Proteins/genetics/*physiology Neoplasm Staging *Polymorphism10aSingle Nucleotide Risk Spain/epidemiology1 aMilne, R., L.1 aRibas, G.1 aGonzalez-Neira, A.1 aFagerholm, R.1 aSalas, A.1 aGonzalez, E.1 aDopazo, J.1 aNevanlinna, H.1 aRobledo, M.1 aBenitez, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1701859601522nas a2200205 4500008004100000245013200041210006900173300000700242490000600249520054800255653009100803653004200894653014400936653006001080100001701140700002301157700001501180700001501195856010601210 2006 eng d00aExploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions0 aExploring the reasons for the large density of triplexforming ol a630 v73 aBACKGROUND: DNA duplex sequences that can be targets for triplex formation are highly over-represented in the human genome, especially in regulatory regions. RESULTS: Here we studied using bioinformatics tools several properties of triplex target sequences in an attempt to determine those that make these sequences so special in the genome. CONCLUSION: Our results strongly suggest that the unique physical properties of these sequences make them particularly suitable as "separators" between protein-recognition sites in the promoter region.10aAnimals Base Sequence Computational Biology DNA/chemistry/*genetics/*metabolism Genome10aGenetic/genetics Regulatory Sequences10aHuman/genetics Humans Mice Nucleic Acid Conformation Nucleotides/genetics Oligonucleotides/chemistry/*genetics/*metabolism Promoter Regions10aNucleic Acid/*genetics Transcription Factors/metabolism1 aGoni, J., R.1 aVaquerizas, J., M.1 aDopazo, J.1 aOrozco, M. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1656681701588nas a2200157 4500008004100000245005600041210005600097300001200153490000700165520107100172653001501243653002201258653002901280100001501309856010601324 2006 eng d00aFunctional interpretation of microarray experiments0 aFunctional interpretation of microarray experiments a398-4100 v103 aOver the past few years, due to the popularisation of high-throughput methodologies such as DNA microarrays, the possibility of obtaining experimental data has increased significantly. Nevertheless, the interpretation of the results, which involves translating these data into useful biological knowledge, still remains a challenge. The methods and strategies used for this interpretation are in continuous evolution and new proposals are constantly arising. Initially, a two-step approach was used in which genes of interest were initially selected, based on thresholds that consider only experimental values, and then in a second, independent step the enrichment of these genes in biologically relevant terms, was analysed. For different reasons, these methods are relatively poor in terms of performance and a new generation of procedures, which draw inspiration from systems biology criteria, are currently under development. Such procedures, aim to directly test the behaviour of blocks of functionally related genes, instead of focusing on single genes.
10ababelomics10aDiabetes Mellitus10amicroarray data analysis1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1706951602399nas a2200157 4500008004100000245009100041210006900132300001000201490000700211520184000218653001502058100001602073700002402089700001502113856011302128 2006 eng d00aA function-centric approach to the biological interpretation of microarray time-series0 afunctioncentric approach to the biological interpretation of mic a57-660 v173 aThe interpretation of microarray experiments is commonly addressed by means a two-step approach in which the relevant genes are firstly selected uniquely on the basis of their experimental values (ignoring their coordinate behaviors) and in a second step their functional properties are studied to hypothesize about the biological roles they are fulfilling in the cell. Recently, different methods (e.g. GSEA or FatiScan) have been proposed to study the coordinate behavior of blocks of functionally-related genes. These methods study the distribution of functional information across lists of genes ranked according their different experimental values in a static situation, such as the comparison between two classes (e.g. healthy controls versus diseased cases). Nevertheless there is no an equivalent way of studying a dynamic situation from a functional point of view. We present a method for the functional analysis of microarrays series in which the experiments display autocorrelation between successive points (e.g. time series, dose-response experiments, etc.) The method allows to recover the dynamics of the molecular roles fulfilled by the genes along the series which provides a novel approach to functional interpretation of such experiments. The method finds blocks of functionally-related genes which are significantly and coordinately over-expressed at different points of the series. This method draws inspiration from systems biology given that the analysis does not focus on individual properties of genes but on collective behaving blocks of functionally-related genes. The FatiScan algorithm used in the method proposed is available at: http://fatiscan.bioinfo.cipf.es, or within the Babelomics suite: http://www.babelomics.org. Additional material is available at: http://bioinfo.cipf.es/data/plasmodium.
10ababelomics1 aMinguez, P.1 aAl-Shahrour, Fatima1 aDopazo, J. uhttp://clinbioinfosspa.es/content/function-centric-approach-biological-interpretation-microarray-time-series03074nas a2200325 4500008004100000245011200041210006900153300001100222490000700233520182100240653001102061653002302072653009602095653012202191653006602313653004102379653002302420100001402443700001602457700001302473700001402486700002602500700002402526700001902550700001502569700001602584700002102600700002102621856010602642 2006 eng d00aIdentification of overexpressed genes in frequently gained/amplified chromosome regions in multiple myeloma0 aIdentification of overexpressed genes in frequently gainedamplif a184-910 v913 aBACKGROUND AND OBJECTIVES: Multiple myeloma (MM) is a malignancy characterized by clonal expansion of plasma cells. In 50% of the cases, the neoplastic transformation begins with a chromosomal translocation that juxtaposes the IGH gene locus to an oncogene. Gene copy number changes are also frequent in MM but less characterized than in other neoplasias. We aimed to characterize genes that are amplified and overexpressed in human myeloma cell lines (HMCL) to provide putative molecular targets for MM therapy. DESIGN AND METHODS: Nine HMCL were characterized by fluorescent in situ hybridization, comparative genomic hybridization (CGH) and cDNA microarrays for gene expression profiling and copy number changes. RESULTS: After defining the IGH-translocations present in the cell lines, we conducted expression-profiling analysis. Supervised analysis identified 166 genes with significantly different expression among the cell lines harboring MMSET/FGFR3 (4p16), MAF (16q) and CCND1 (11q13) rearrangements. Array-CGH was then performed. Five chromosomes recurrently affected by gains/amplifications in primary samples and cell lines were analyzed in detail. Sixty amplified and overexpressed genes were found and 25 (42%) of them were only overexpressed when amplified; moreover, six showed a significant association between overexpression and gain/amplification. We also found co-amplification and overexpression for genes located within the same amplicons, such as MALT1 and BCL2. INTERPRETATION AND CONCLUSIONS: Parallel analysis of gene copy numbers and expression levels by cDNA microarray in MM allowed efficient identification of genes whose expression levels are elevated because of increased copy number. This is the first time that MALT1 and BCL2 have been shown to be overexpressed and amplified in MM.10aB-Cell10aCaspases Cell Line10aHuman *Gene Amplification Gene Dosage Gene Expression Profiling *Gene Expression Regulation10aMarginal Zone/genetics Multiple Myeloma/*genetics Neoplasm Proteins/genetics Proto-Oncogene Proteins c-bcl-2/genetics10aNeoplasm Humans Immunoglobulin Heavy Chains/genetics Lymphoma10aNeoplastic Gene Rearrangement *Genes10aTumor *Chromosomes1 aLargo, C.1 aAlvarez, S.1 aSaez, B.1 aBlesa, D.1 aMartin-Subero, J., I.1 aGonzalez-Garcia, I.1 aBrieva, J., A.1 aDopazo, J.1 aSiebert, R.1 aCalasanz, M., J.1 aCigudosa, J., C. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1646130201882nas a2200313 4500008004100000245005200041210005100093300001200144490000700156520100300163653001001166653002901176100001701205700001601222700002101238700001601259700002301275700001401298700001601312700001301328700001801341700001401359700001901373700001501392700001601407700002401423700001501447856010601462 2006 eng d00aNext station in microarray data analysis: GEPAS0 aNext station in microarray data analysis GEPAS aW486-910 v343 aThe Gene Expression Profile Analysis Suite (GEPAS) has been running for more than four years. During this time it has evolved to keep pace with the new interests and trends in the still changing world of microarray data analysis. GEPAS has been designed to provide an intuitive although powerful web-based interface that offers diverse analysis options from the early step of preprocessing (normalization of Affymetrix and two-colour microarray experiments and other preprocessing options), to the final step of the functional annotation of the experiment (using Gene Ontology, pathways, PubMed abstracts etc.), and include different possibilities for clustering, gene selection, class prediction and array-comparative genomic hybridization management. GEPAS is extensively used by researchers of many countries and its records indicate an average usage rate of 400 experiments per day. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
10agepas10amicroarray data analysis1 aMontaner, D.1 aTarraga, J.1 aHuerta-Cepas, J.1 aBurguet, J.1 aVaquerizas, J., M.1 aConde, L.1 aMinguez, P.1 aVera, J.1 aMukherjee, S.1 aValls, J.1 aPujana, M., A.1 aAlloza, E.1 aHerrero, J.1 aAl-Shahrour, Fatima1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1684505601646nas a2200217 4500008004100000245007200041210006900113300001000182490000800192520083200200653001501032653002101047653007301068653002601141653004201167653005901209100001501268700002401283700001501307856010601322 2006 eng d00aOntology-driven approaches to analyzing data in functional genomics0 aOntologydriven approaches to analyzing data in functional genomi a67-860 v3163 aOntologies are fundamental knowledge representations that provide not only standards for annotating and indexing biological information, but also the basis for implementing functional classification and interpretation models. This chapter discusses the application of gene ontology (GO) for predictive tasks in functional genomics. It focuses on the problem of analyzing functional patterns associated with gene products. This chapter is divided into two main parts. The first part overviews GO and its applications for the development of functional classification models. The second part presents two methods for the characterization of genomic information using GO. It discusses methods for measuring functional similarity of gene products, and a tool for supporting gene expression clustering analysis and validation.
10ababelomics10aCluster Analysis10aCluster Analysis Computational Biology/*methods *Data Interpretation10aComputational Biology10aStatistical Gene Expression Profiling10aStatistical Gene Expression Profiling *Genomics Humans1 aAzuaje, F.1 aAl-Shahrour, Fatima1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1667140102542nas a2200181 4500008004100000245010000041210006900141300000800210490000600218520184100224653001502065653004302080653008602123100001502209700001502224700001502239856010602254 2006 eng d00aPositive selection, relaxation, and acceleration in the evolution of the human and chimp genome0 aPositive selection relaxation and acceleration in the evolution ae380 v23 aFor years evolutionary biologists have been interested in searching for the genetic bases underlying humanness. Recent efforts at a large or a complete genomic scale have been conducted to search for positively selected genes in human and in chimp. However, recently developed methods allowing for a more sensitive and controlled approach in the detection of positive selection can be employed. Here, using 13,198 genes, we have deduced the sets of genes involved in rate acceleration, positive selection, and relaxation of selective constraints in human, in chimp, and in their ancestral lineage since the divergence from murids. Significant deviations from the strict molecular clock were observed in 469 human and in 651 chimp genes. The more stringent branch-site test of positive selection detected 108 human and 577 chimp positively selected genes. An important proportion of the positively selected genes did not show a significant acceleration in rates, and similarly, many of the accelerated genes did not show significant signals of positive selection. Functional differentiation of genes under rate acceleration, positive selection, and relaxation was not statistically significant between human and chimp with the exception of terms related to G-protein coupled receptors and sensory perception. Both of these were over-represented under relaxation in human in relation to chimp. Comparing differences between derived and ancestral lineages, a more conspicuous change in trends seems to have favored positive selection in the human lineage. Since most of the positively selected genes are different under the same functional categories between these species, we suggest that the individual roles of the alternative positively selected genes may be an important factor underlying biological differences between these species.10aAdaptation10aBiological/genetics Animals *Evolution10aMolecular Genome/*genetics Humans Pan troglodytes/*genetics *Selection (Genetics)1 aArbiza, L.1 aDopazo, J.1 aDopazo, H. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1668301901783nas a2200253 4500008004100000245010200041210006900143300001100212490000700223520089100230653004301121653008001164653002701244653005601271100001401327700002301341700001501364700001501379700001601394700001701410700002001427700001501447856006701462 2006 eng d00aPupaSuite: finding functional single nucleotide polymorphisms for large-scale genotyping purposes0 aPupaSuite finding functional single nucleotide polymorphisms for aW621-50 v343 aWe have developed a web tool, PupaSuite, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect, specifically oriented to help in the design of large-scale genotyping projects. PupaSuite uses a collection of data on SNPs from heterogeneous sources and a large number of pre-calculated predictions to offer a flexible and intuitive interface for selecting an optimal set of SNPs. It improves the functionality of PupaSNP and PupasView programs and implements new facilities such as the analysis of user’s data to derive haplotypes with functional information. A new estimator of putative effect of polymorphisms has been included that uses evolutionary information. Also SNPeffect database predictions have been included. The PupaSuite web interface is accessible through http://pupasuite.bioinfo.cipf.es and through http://www.pupasnp.org.
10aAlgorithms Computer Graphics Databases10aMolecular Genotype Haplotypes Internet Linkage Disequilibrium *Polymorphism10aNucleic Acid Evolution10aSingle Nucleotide *Software User-Computer Interface1 aConde, L.1 aVaquerizas, J., M.1 aDopazo, H.1 aArbiza, L.1 aReumers, J.1 aRousseau, F.1 aSchymkowitz, J.1 aDopazo, J. uhttp://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W62102859nas a2200301 4500008004100000245009400041210006900135300001300204490000800217520172300225653007401948653002202022653001902044653002402063653002802087653002002115653015502135653002502290100001502315700001402330700001702344700001602361700002202377700002202399700001502421700001502436856010602451 2006 eng d00aSelective pressures at a codon-level predict deleterious mutations in human disease genes0 aSelective pressures at a codonlevel predict deleterious mutation a1390-4040 v3583 aDeleterious mutations affecting biological function of proteins are constantly being rejected by purifying selection from the gene pool. The non-synonymous/synonymous substitution rate ratio (omega) is a measure of selective pressure on amino acid replacement mutations for protein-coding genes. Different methods have been developed in order to predict non-synonymous changes affecting gene function. However, none has considered the estimation of selective constraints acting on protein residues. Here, we have used codon-based maximum likelihood models in order to estimate the selective pressures on the individual amino acid residues of a well-known model protein: p53. We demonstrate that the number of residues under strong purifying selection in p53 is much higher than those that are strictly conserved during the evolution of the species. In agreement with theoretical expectations, residues that have been noted to be of structural relevance, or in direct association with DNA, were among those showing the highest signals of purifying selection. Conversely, those changing according to a neutral, or nearly neutral mode of evolution, were observed to be irrelevant for protein function. Finally, using more than 40 human disease genes, we demonstrate that residues evolving under strong selective pressures (omega<0.1) are significantly associated (p<0.01) with human disease. We hypothesize that non-synonymous change on amino acids showing omega<0.1 will most likely affect protein function. The application of this evolutionary prediction at a genomic scale will provide an a priori hypothesis of the phenotypic effect of non-synonymous coding single nucleotide polymorphisms (SNPs) in the human genome.10aAmino Acid Sequence Amino Acid Substitution Codon/*genetics Databases10aGenetic Evolution10aGenetic Models10aHuman Humans Models10aInborn/*genetics Genome10aMolecular Genes10aMolecular Molecular Sequence Data *Mutation Neoplasms/genetics Proteins/genetics *Selection (Genetics) Tumor Suppressor Protein p53/chemistry/genetics10ap53 Genetic Diseases1 aArbiza, L.1 aDuchi, S.1 aMontaner, D.1 aBurguet, J.1 aPantoja-Uceda, D.1 aPineda-Lucena, A.1 aDopazo, J.1 aDopazo, H. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1658474602090nas a2200193 4500008004100000245012600041210006900167300001100236490000700247520144700254653001501701653002501716100002401741700001601765700002301781700001401804700001501818856006301833 2005 eng d00aBABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments0 aBABELOMICS a suite of web tools for functional annotation and an aW460-40 v333 aWe present Babelomics, a complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments, which includes the use of information on Gene Ontology terms, interpro motifs, KEGG pathways, Swiss-Prot keywords, analysis of predicted transcription factor binding sites, chromosomal positions and presence in tissues with determined histological characteristics, through five integrated modules: FatiGO (fast assignment and transference of information), FatiWise, transcription factor association test, GenomeGO and tissues mining tool, respectively. Additionally, another module, FatiScan, provides a new procedure that integrates biological information in combination with experimental results in order to find groups of genes with modest but coordinate significant differential behaviour. FatiScan is highly sensitive and is capable of finding significant asymmetries in the distribution of genes of common function across a list of ordered genes even if these asymmetries were not extreme. The strong multiple-testing nature of the contrasts made by the tools is taken into account. All the tools are integrated in the gene expression analysis package GEPAS. Babelomics is the natural evolution of our tool FatiGO (which analysed almost 22,000 experiments during the last year) to include more sources on information and new modes of using it. Babelomics can be found at http://www.babelomics.org.
10ababelomics10afunctional profiling1 aAl-Shahrour, Fatima1 aMinguez, P.1 aVaquerizas, J., M.1 aConde, L.1 aDopazo, J. uhttp://nar.oxfordjournals.org/content/33/suppl_2/W460.long02842nas a2200205 4500008004100000245013300041210006900174300001200243490000700255520183200262653001502094653013902109653003602248653010202284653008402386100002402470700002102494700001502515856010602530 2005 eng d00aDiscovering molecular functions significantly related to phenotypes by combining gene expression data and biological information0 aDiscovering molecular functions significantly related to phenoty a2988-930 v213 aMOTIVATION: The analysis of genome-scale data from different high throughput techniques can be used to obtain lists of genes ordered according to their different behaviours under distinct experimental conditions corresponding to different phenotypes (e.g. differential gene expression between diseased samples and controls, different response to a drug, etc.). The order in which the genes appear in the list is a consequence of the biological roles that the genes play within the cell, which account, at molecular scale, for the macroscopic differences observed between the phenotypes studied. Typically, two steps are followed for understanding the biological processes that differentiate phenotypes at molecular level: first, genes with significant differential expression are selected on the basis of their experimental values and subsequently, the functional properties of these genes are analysed. Instead, we present a simple procedure which combines experimental measurements with available biological information in a way that genes are simultaneously tested in groups related by common functional properties. The method proposed constitutes a very sensitive tool for selecting genes with significant differential behaviour in the experimental conditions tested. RESULTS: We propose the use of a method to scan ordered lists of genes. The method allows the understanding of the biological processes operating at molecular level behind the macroscopic experiment from which the list was generated. This procedure can be useful in situations where it is not possible to obtain statistically significant differences based on the experimental measurements (e.g. low prevalence diseases, etc.). Two examples demonstrate its application in two microarray experiments and the type of information that can be extracted.
10ababelomics10aBiological Neoplasm Proteins/genetics/*metabolism Phenotype Software Structure-Activity Relationship Systems Integration Tumor Markers10aBiological/genetics/*metabolism10aBreast Neoplasms/genetics/*metabolism Computer Simulation *Database Management Systems *Databases10aProtein Documentation/methods Gene Expression Profiling/*methods Humans *Models1 aAl-Shahrour, Fatima1 aDiaz-Uriarte, R.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1584070202221nas a2200157 4500008004100000245005800041210005600099300000800155490000600163520157700169653010501746653007601851100001501927700001501942856010601957 2005 eng d00aGenome-scale evidence of the nematode-arthropod clade0 aGenomescale evidence of the nematodearthropod clade aR410 v63 aBACKGROUND: The issue of whether coelomates form a single clade, the Coelomata, or whether all animals that moult an exoskeleton (such as the coelomate arthropods and the pseudocoelomate nematodes) form a distinct clade, the Ecdysozoa, is the most puzzling issue in animal systematics and a major open-ended subject in evolutionary biology. Previous single-gene and genome-scale analyses designed to resolve the issue have produced contradictory results. Here we present the first genome-scale phylogenetic evidence that strongly supports the Ecdysozoa hypothesis. RESULTS: Through the most extensive phylogenetic analysis carried out to date, the complete genomes of 11 eukaryotic species have been analyzed in order to find homologous sequences derived from 18 human chromosomes. Phylogenetic analysis of datasets showing an increased adjustment to equal evolutionary rates between nematode and arthropod sequences produced a gradual change from support for Coelomata to support for Ecdysozoa. Transition between topologies occurred when fast-evolving sequences of Caenorhabditis elegans were removed. When chordate, nematode and arthropod sequences were constrained to fit equal evolutionary rates, the Ecdysozoa topology was statistically accepted whereas Coelomata was rejected. CONCLUSIONS: The reliability of a monophyletic group clustering arthropods and nematodes was unequivocally accepted in datasets where traces of the long-branch attraction effect were removed. This is the first phylogenomic evidence to strongly support the ’moulting clade’ hypothesis.10aAnimals Arthropods/*classification/genetics Caenorhabditis elegans/classification/genetics Evolution10aMolecular *Genome Genomics Nematoda/*classification/genetics *Phylogeny1 aDopazo, H.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1589286902188nas a2200241 4500008004100000245009500041210006900136300001200205490000700217520141200224653001001636653002901646100002301675700001401698700002001712700001601732700001601748700002101764700002401785700001601809700001501825856010601840 2005 eng d00aGEPAS, an experiment-oriented pipeline for the analysis of microarray gene expression data0 aGEPAS an experimentoriented pipeline for the analysis of microar aW616-200 v333 aThe Gene Expression Profile Analysis Suite, GEPAS, has been running for more than three years. With >76,000 experiments analysed during the last year and a daily average of almost 300 analyses, GEPAS can be considered a well-established and widely used platform for gene expression microarray data analysis. GEPAS is oriented to the analysis of whole series of experiments. Its design and development have been driven by the demands of the biomedical community, probably the most active collective in the field of microarray users. Although clustering methods have obviously been implemented in GEPAS, our interest has focused more on methods for finding genes differentially expressed among distinct classes of experiments or correlated to diverse clinical outcomes, as well as on building predictors. There is also a great interest in CGH-arrays which fostered the development of the corresponding tool in GEPAS: InSilicoCGH. Much effort has been invested in GEPAS for developing and implementing efficient methods for functional annotation of experiments in the proper statistical framework. Thus, the popular FatiGO has expanded to a suite of programs for functional annotation of experiments, including information on transcription factor binding sites, chromosomal location and tissues. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://www.gepas.org.
10agepas10amicroarray data analysis1 aVaquerizas, J., M.1 aConde, L.1 aYankilevich, P.1 aCabezon, A.1 aMinguez, P.1 aDiaz-Uriarte, R.1 aAl-Shahrour, Fatima1 aHerrero, J.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1598054801413nas a2200181 4500008004100000245005600041210005500097300001100152490000700163520072100170653007700891653008700968100001701055700001501072700002101087700001701108856010601125 2005 eng d00aHCAD, closing the gap between breakpoints and genes0 aHCAD closing the gap between breakpoints and genes aD511-30 v333 aRecurrent chromosome aberrations are an important resource when associating human pathologies to specific genes. However, for technical reasons a large number of chromosome breakpoints are defined only at the level of cytobands and many of the genes involved remain unidentified. We developed a web-based information system that mines the scientific literature and generates textual and comprehensive information on all human breakpoints. We show that the statistical analysis of this textual information and its combination with genomic data can identify genes directly involved in DNA rearrangements. The Human Chromosome Aberration Database (HCAD) is publicly accessible at http://www.pdg.cnb.uam.es/UniPub/HCAD/.10a*Chromosome Breakage Chromosome Disorders/diagnosis/*genetics *Databases10aGenetic Genes *Genetic Predisposition to Disease Humans PubMed Systems Integration1 aHoffmann, R.1 aDopazo, J.1 aCigudosa, J., C.1 aValencia, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1560825002607nas a2200181 4500008004100000245009100041210006900132300001200201490000700213520184100220653007402061653005202135653007802187100001602265700002302281700001502304856010602319 2005 eng d00aHighly specific and accurate selection of siRNAs for high-throughput functional assays0 aHighly specific and accurate selection of siRNAs for highthrough a1376-820 v213 aMOTIVATION: Small interfering RNA (siRNA) is widely used in functional genomics to silence genes by decreasing their expression to study the resulting phenotypes. The possibility of performing large-scale functional assays by gene silencing accentuates the necessity of a software capable of the high-throughput design of highly specific siRNA. The main objective sought was the design of a large number of siRNAs with appropriate thermodynamic properties and, especially, high specificity. Since all the available procedures require, to some extent, manual processing of the results to guarantee specific results, specificity constitutes to date, the major obstacle to the complete automation of all the steps necessary for the selection of optimal candidate siRNAs. RESULT: Here, we present a program that for the first time completely automates the search for siRNAs. In SiDE, the most complete set of rules for the selection of siRNA candidates (including G+C content, nucleotides at determined positions, thermodynamic properties, propensity to form internal hairpins, etc.) is implemented and moreover, specificity is achieved by a conceptually new method. After selecting possible siRNA candidates with the optimal functional properties, putative unspecific matches, which can cause cross-hybridization, are checked in databases containing a unique entry for each gene. These truly non-redundant databases are constructed from the genome annotations (Ensembl). Also intron/exon boundaries, presence of polymorphisms (single nucleotide polymorphisms) specificity for either gene or transcript, and other features can be selected to be considered in the design of siRNAs. AVAILABILITY: The program is available as a web server at http://side.bioinfo.cnio.es. The program was written under the GPL license. CONTACT: jdopazo@cnio.es.10a*Algorithms Base Sequence *Gene Silencing Molecular Sequence Data RNA10aRNA/*methods *Software *User-Computer Interface10aSmall Interfering/*genetics Sequence Alignment/*methods Sequence Analysis1 aSantoyo, J.1 aVaquerizas, J., M.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1559135702687nas a2200337 4500008004100000245010400041210006900145300001000214490000700224520138200231653003401613653006501647653002501712653001001737653003201747653003301779653003201812653019101844100001502035700002202050700002602072700001602098700002502114700001402139700002202153700001602175700001502191700002102206700001602227856010602243 2005 eng d00aA novel candidate region linked to development of both pheochromocytoma and head/neck paraganglioma0 anovel candidate region linked to development of both pheochromoc a260-80 v423 aAlthough the histologic distinction between pheochromocytomas and head and neck paragangliomas is clear, little is known about the genetic differences between them. To date, various sets of genes have been found to be involved in inherited susceptibility to developing both tumor types, but the genes involved in sporadic pathogenesis are still unknown. To define new candidate regions, we performed CGH analysis on 29 pheochromocytomas and on 24 paragangliomas mainly of head and neck origin (20 of 24), which allowed us to differentiate between the two tumor types. Loss of 3q was significantly more frequent in pheochromocytomas, and loss of 1q appeared only in paragangliomas. We also found gain of 11q13 to be a significantly frequent alteration in malignant cases of both types. In addition, recurrent loss of 8p22-23 was found in 62% of pheochromocytomas (including all malignant cases) versus in 33% of paragangliomas, suggesting that this region contains candidate genes involved in the pathogenesis of this abnormality. Using FISH analysis on tissue microarrays, we confirmed genomic deletion of this region in 55% of pheochromocytomas compared to 12% of paragangliomas. Loss of 8p22-23 appears to be an important event in the sporadic development of these tumors, and additional molecular studies are necessary to identify candidate genes in this chromosomal region.10a80 and over Child Chromosomes10aAdolescent Adrenal Gland Neoplasms/*genetics Adult Aged Aged10aBiological/*genetics10aHuman10aPair 1/genetics Chromosomes10aPair 11/genetics Chromosomes10aPair 3/genetics Chromosomes10aPair 8/genetics Female Gene Deletion Head and Neck Neoplasms/*genetics Humans Male Middle Aged Nucleic Acid Hybridization Paraganglioma/*genetics Pheochromocytoma/*genetics Tumor Markers1 aCascon, A.1 aRuiz-Llorente, S.1 aRodriguez-Perales, S.1 aHonrado, E.1 aMartinez-Ramirez, A.1 aLeton, R.1 aMontero-Conde, C.1 aBenitez, J.1 aDopazo, J.1 aCigudosa, J., C.1 aRobledo, M. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1560934702968nas a2200337 4500008004100000245012900041210006900170300000900239490000700248520175600255653010902011653003502120653001702155653006002172653007102232100001702303700001602320700001502336700001602351700001502367700001602382700001802398700002102416700001302437700001502450700001402465700001502479700001402494700001602508856010602524 2005 eng d00aPhenotypic characterization of BRCA1 and BRCA2 tumors based in a tissue microarray study with 37 immunohistochemical markers0 aPhenotypic characterization of BRCA1 and BRCA2 tumors based in a a5-140 v903 aFamilial breast cancers that are associated with BRCA1 or BRCA2 germline mutations differ in both their morphological and immunohistochemical characteristics. To further characterize the molecular difference between genotypes, the authors evaluated the expression of 37 immunohistochemical markers in a tissue microarray (TMA) containing cores from 20 BRCA1, 14 BRCA2, and 59 sporadic age-matched breast carcinomas. Markers analyzed included, amog others, common markers in breast cancer, such as hormone receptors, p53 and HER2, along with 15 molecules involved in cell cycle regulation, such as cyclins, cyclin dependent kinases (CDK) and CDK inhibitors (CDKI), apoptosis markers, such as BCL2 and active caspase 3, and two basal/myoepithelial markers (CK 5/6 and P-cadherin). In addition, we analyzed the amplification of CCND1, CCNE, HER2 and MYC by FISH.Unsupervised cluster data analysis of both hereditary and sporadic cases using the complete set of immunohistochemical markers demonstrated that most BRCA1-associated carcinomas grouped in a branch of ER-, HER2-negative tumors that expressed basal cell markers and/or p53 and had higher expression of activated caspase 3. The cell cycle proteins associated with these tumors were E2F6, cyclins A, B1 and E, SKP2 and Topo IIalpha. In contrast, most BRCA2-associated carcinomas grouped in a branch composed by ER/PR/BCL2-positive tumors with a higher expression of the cell cycle proteins cyclin D1, cyclin D3, p27, p16, p21, CDK4, CDK2 and CDK1. In conclusion, our study in hereditary breast cancer tumors analyzing 37 immunohistochemical markers, define the molecular differences between BRCA1 and BRCA2 tumors with respect to hormonal receptors, cell cycle, apoptosis and basal cell markers.10aAdult Apoptosis Breast Neoplasms/*genetics/*pathology Cell Cycle Proteins Cluster Analysis Female *Genes10aBiological/genetics/metabolism10aBRCA1 *Genes10aBRCA2 Humans Immunohistochemistry In Situ Hybridization10aFluorescence Phenotype Spain *Tissue Array Analysis *Tumor Markers1 aPalacios, J.1 aHonrado, E.1 aOsorio, A.1 aCazorla, A.1 aSarrio, D.1 aBarroso, A.1 aRodriguez, S.1 aCigudosa, J., C.1 aDiez, O.1 aAlonso, C.1 aLerma, E.1 aDopazo, J.1 aRivas, C.1 aBenitez, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1577052102881nas a2200397 4500008004100000245016800041210006900209300001200278490000700290520143600297653010101733653002201834653001001856653008301866653003301949653003301982653003302015653003202048653005102080100001602131700002102147700001502168700001602183700001602199700001602215700001602231700001802247700001602265700001402281700001302295700002102308700001502329700001702344700001602361856010602377 2005 eng d00aA predictor based on the somatic genomic changes of the BRCA1/BRCA2 breast cancer tumors identifies the non-BRCA1/BRCA2 tumors with BRCA1 promoter hypermethylation0 apredictor based on the somatic genomic changes of the BRCA1BRCA2 a1146-530 v113 aThe genetic changes underlying in the development and progression of familial breast cancer are poorly understood. To identify a somatic genetic signature of tumor progression for each familial group, BRCA1, BRCA2, and non-BRCA1/BRCA2 (BRCAX) tumors, by high-resolution comparative genomic hybridization, we have analyzed 77 tumors previously characterized for BRCA1 and BRCA2 germ line mutations. Based on a combination of the somatic genetic changes observed at the six most different chromosomal regions and the status of the estrogen receptor, we developed using random forests a molecular classifier, which assigns to a given tumor a probability to belong either to the BRCA1 or to the BRCA2 class. Because 76.5% (26 of 34) of the BRCAX cases were classified with our predictor to the BRCA1 class with a probability of >50%, we analyzed the BRCA1 promoter region for aberrant methylation in all the BRCAX cases. We found that 15 of the 34 BRCAX analyzed tumors had hypermethylation of the BRCA1 gene. When we considered the predictor, we observed that all the cases with this epigenetic event were assigned to the BRCA1 class with a probability of >50%. Interestingly, 84.6% of the cases (11 of 13) assigned to the BRCA1 class with a probability >80% had an aberrant methylation of the BRCA1 promoter. This fact suggests that somatic BRCA1 inactivation could modify the profile of tumor progression in most of the BRCAX cases.10aBRCA1 Protein/*genetics BRCA2 Protein/*genetics Breast Neoplasms/*genetics/pathology Chromosomes10aGenetic/*genetics10aHuman10aHuman Humans Male Mutation Nucleic Acid Hybridization/methods Promoter Regions10aPair 12/genetics Chromosomes10aPair 15/genetics Chromosomes10aPair 18/genetics Chromosomes10aPair 2/genetics Chromosomes10aPair 8/genetics *DNA Methylation Female Genome1 aAlvarez, S.1 aDiaz-Uriarte, R.1 aOsorio, A.1 aBarroso, A.1 aMelchor, L.1 aPaz, M., F.1 aHonrado, E.1 aRodriguez, R.1 aUrioste, M.1 aValle, L.1 aDiez, O.1 aCigudosa, J., C.1 aDopazo, J.1 aEsteller, M.1 aBenitez, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1570918202075nas a2200205 4500008004100000245012600041210006900167300001100236490000700247520124100254653010501495653005601600100001401656700002301670700002101693700001901714700001501733700001501748856010601763 2005 eng d00aPupasView: a visual tool for selecting suitable SNPs, with putative pathological effect in genes, for genotyping purposes0 aPupasView a visual tool for selecting suitable SNPs with putativ aW501-50 v333 aWe have developed a web tool, PupasView, for the selection of single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupasView constitutes an interactive environment in which functional information and population frequency data can be used as sequential filters over linkage disequilibrium parameters to obtain a final list of SNPs optimal for genotyping purposes. PupasView is the first resource that integrates phenotypic effects caused by SNPs at both the translational and the transcriptional level. PupasView retrieves SNPs that could affect conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites and changes in amino acids in the proteins for which a putative pathological effect is calculated. The program uses the mapping of SNPs in the genome provided by Ensembl. PupasView will be of much help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of the identification of the genes responsible for the disease. The PupasView web interface is accessible through http://pupasview.ochoa.fib.es and through http://www.pupasnp.org.10aComputer Graphics Genes *Genetic Predisposition to Disease Genotype Internet Phenotype *Polymorphism10aSingle Nucleotide *Software User-Computer Interface1 aConde, L.1 aVaquerizas, J., M.1 aFerrer-Costa, C.1 ade la Cruz, X.1 aOrozco, M.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1598052201390nas a2200169 4500008004100000245006900041210006700110300001100177490000700188520056300195653024500758653005201003100002301055700001501078700002101093856010601114 2004 eng d00aDNMAD: web-based diagnosis and normalization for microarray data0 aDNMAD webbased diagnosis and normalization for microarray data a3656-80 v203 aSUMMARY: We present a web server for Diagnosis and Normalization of MicroArray Data (DNMAD). DNMAD includes several common data transformations such as spatial and global robust local regression or multiple slide normalization, and allows for detecting several kinds of errors that result from the manipulation and the image analysis of the arrays. This tool offers a user-friendly interface, and is completely integrated within the Gene Expression Pattern Analysis Suite (GEPAS). AVAILABILITY: The tool is accessible on-line at http://dnmad.bioinfo.cnio.es.10aAlgorithms Database Management Systems Gene Expression Profiling/*methods/standards Information Storage and Retrieval/*methods *Internet Oligonucleotide Array Sequence Analysis/*methods/standards Sequence Alignment/methods Sequence Analysis10aDNA/*methods *Software *User-Computer Interface1 aVaquerizas, J., M.1 aDopazo, J.1 aDiaz-Uriarte, R. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1524709401450nas a2200193 4500008004100000245010400041210006900145300001100214490000700225520063300232653005000865653001500915653002700930653016800957100002401125700002101149700001501170856007101185 2004 eng d00aFatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes0 aFatiGO a web tool for finding significant associations of Gene O a578-800 v203 aWe present a simple but powerful procedure to extract Gene Ontology (GO) terms that are significantly over- or under-represented in sets of genes within the context of a genome-scale experiment (DNA microarray, proteomics, etc.). Said procedure has been implemented as a web application, FatiGO, allowing for easy and interactive querying. FatiGO, which takes the multiple-testing nature of statistical contrast into account, currently includes GO associations for diverse organisms (human, mouse, fly, worm and yeast) and the TrEMBL/Swissprot GOAnnotations@EBI correspondences from the European Bioinformatics Institute.
10a*Algorithms Artificial Intelligence Databases10ababelomics10aDNA/*methods *Software10aGenetic Gene Expression Profiling/*methods *Hypermedia Information Storage and Retrieval/*methods *Internet *Phylogeny Sequence Alignment/methods Sequence Analysis1 aAl-Shahrour, Fatima1 aDiaz-Uriarte, R.1 aDopazo, J. uhttp://bioinformatics.oxfordjournals.org/content/20/4/578.abstract02990nas a2200325 4500008004100000245013600041210006900177300001100246490000700257520176100264653001602025653002602041653001002067653003402077653003402111653011302145653008802258100001702346700002102363700001602384700002502400700002702425700001502452700002102467700001402488700001502502700002502517700001602542856010602558 2004 eng d00aGene expression analysis of chromosomal regions with gain or loss of genetic material detected by comparative genomic hybridization0 aGene expression analysis of chromosomal regions with gain or los a353-650 v413 aComparative genomic hybridization (CGH) has been widely used to detect copy number alterations in cancer and to identify regions containing candidate tumor-responsible genes; however, gene expression changes have been described only in highly amplified regions (amplicons). To study the overall impact of slight copy number changes on gene expression, we analyzed 16 T-cell lymphomas by using CGH and a custom-designed cDNA microarray containing 7,657 genes and expressed sequence tags related to tumorigenesis. We evaluated mean gene expression and variability within CGH-altered regions and explored the relationship between the effects of the gene and its position within these regions. Minimally overlapping CGH candidate areas (6q25, 13q21-q22, and 19q13.1) revealed a weak relationship between altered genomic content and gene expression. However, some candidate genes showed modified expression within these regions in the majority of tumors; these candidate genes were evaluated and confirmed in another independent series of 23 T-cell lymphomas by use of the same cDNA microarray and by FISH on a tissue microarray. When all the CGH regions detected for each tumor were considered, we found a significant increase or decrease in the mean expression of the genes contained in gained or lost regions, respectively. In addition, we found that the expression of a gene was dependent not only on its position within an altered region but also on its own mechanism of regulation: genes in the same altered region responded very differently to the gain or loss of genetic material. Supplementary material for this article can be found on the Genes, Chromosomes, and Cancer website at http://www.interscience.wiley.com/jpages/1045-2257/suppmat/index.html.10aChromosomes10aFluorescence Lymphoma10aHuman10aPair 13/*genetics Chromosomes10aPair 19/*genetics Chromosomes10aPair 6/*genetics Expressed Sequence Tags *Gene Dosage Gene Expression Profiling Humans In Situ Hybridization10aT-Cell/*genetics Nucleic Acid Hybridization Oligonucleotide Array Sequence Analysis1 aMelendez, B.1 aDiaz-Uriarte, R.1 aCuadros, M.1 aMartinez-Ramirez, A.1 aFernandez-Piqueras, J.1 aDopazo, A.1 aCigudosa, J., C.1 aRivas, C.1 aDopazo, J.1 aMartinez-Delgado, B.1 aBenitez, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1538226102102nas a2200217 4500008004100000245007500041210006900116300001200185490000700197520140300204653001001607653002901617100001601646700002301662700002401685700001401709700001501723700002501738700001501763856010601778 2004 eng d00aNew challenges in gene expression data analysis and the extended GEPAS0 aNew challenges in gene expression data analysis and the extended aW485-910 v323 aSince the first papers published in the late nineties, including, for the first time, a comprehensive analysis of microarray data, the number of questions that have been addressed through this technique have both increased and diversified. Initially, interest focussed on genes coexpressing across sets of experimental conditions, implying, essentially, the use of clustering techniques. Recently, however, interest has focussed more on finding genes differentially expressed among distinct classes of experiments, or correlated to diverse clinical outcomes, as well as in building predictors. In addition to this, the availability of accurate genomic data and the recent implementation of CGH arrays has made mapping expression and genomic data on the chromosomes possible. There is also a clear demand for methods that allow the automatic transfer of biological information to the results of microarray experiments. Different initiatives, such as the Gene Ontology (GO) consortium, pathways databases, protein functional motifs, etc., provide curated annotations for genes. Whereas many resources on the web focus mainly on clustering methods, GEPAS has evolved to cope with the aforementioned new challenges that have recently arisen in the field of microarray data analysis. The web-based pipeline for microarray gene expression data, GEPAS, is available at http://gepas.bioinfo.cnio.es.
10agepas10amicroarray data analysis1 aHerrero, J.1 aVaquerizas, J., M.1 aAl-Shahrour, Fatima1 aConde, L.1 aMateos, A.1 aDiaz-Uriarte, J., S.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1521543401756nas a2200145 4500008004100000245011900041210006900160300001200229490001500241520120200256100001501458700001601473700001501489856010601504 2004 eng d00aPhylogenomics and the number of characters required for obtaining an accurate phylogeny of eukaryote model species0 aPhylogenomics and the number of characters required for obtainin ai116-210 v20 Suppl 13 aMOTIVATION: Through the most extensive phylogenomic analysis carried out to date, complete genomes of 11 eukaryotic species have been examined in order to find the homologous of more than 25,000 amino acid sequences. These sequences correspond to the exons of more than 3000 genes and were used as presence/absence characters to test one of the most controversial hypotheses concerning animal evolution, namely the Ecdysozoa hypothesis. Distance, maximum parsimony and Bayesian methods of phylogenetic reconstruction were used to test the hypothesis. RESULTS: The reliability of the ecdysozoa, grouping arthropods and nematodes in a single clade was unequivocally rejected in all the consensus trees. The Coelomata clade, grouping arthropods and chordates, was supported by the highest statistical confidence in all the reconstructions. The study of the dependence of the genomes’ tree accuracy on the number of exons used, demonstrated that an unexpectedly larger number of characters are necessary to obtain robust phylogenies. Previous studies supporting ecdysozoa, could not guarantee an accurate phylogeny because the number of characters used was clearly below the minimum required.
1 aDopazo, H.1 aSantoyo, J.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1526278902164nas a2200229 4500008004100000245009400041210006900135300001100204490000700215520128900222653008201511653001201593653009301605100001401698700002301712700001601735700002401751700002201775700001601797700001501813856010601828 2004 eng d00aPupaSNP Finder: a web tool for finding SNPs with putative effect at transcriptional level0 aPupaSNP Finder a web tool for finding SNPs with putative effect aW242-80 v323 aWe have developed a web tool, PupaSNP Finder (PupaSNP for short), for high-throughput searching for single nucleotide polymorphisms (SNPs) with potential phenotypic effect. PupaSNP takes as its input lists of genes (or generates them from chromosomal coordinates) and retrieves SNPs that could affect the conserved regions that the cellular machinery uses for the correct processing of genes (intron/exon boundaries or exonic splicing enhancers), predicted transcription factor binding sites (TFBS) and changes in amino acids in the proteins. The program uses the mapping of SNPs in the genome provided by Ensembl. Additionally, user-defined SNPs (not yet mapped in the genome) can be easily provided to the program. Also, additional functional information from Gene Ontology, OMIM and homologies in other model organisms is provided. In contrast to other programs already available, which focus only on SNPs with possible effect in the protein, PupaSNP includes SNPs with possible transcriptional effect. PupaSNP will be of significant help in studies of multifactorial disorders, where the use of functional SNPs will increase the sensitivity of identification of the genes responsible for the disease. The PupaSNP web interface is accessible through http://pupasnp.bioinfo.cnio.es.10aAmino Acid Substitution Binding Sites Humans Internet Phenotype *Polymorphism10aGenetic10aSingle Nucleotide RNA Splicing *Software Transcription Factors/metabolism *Transcription1 aConde, L.1 aVaquerizas, J., M.1 aSantoyo, J.1 aAl-Shahrour, Fatima1 aRuiz-Llorente, S.1 aRobledo, M.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1521538801309nas a2200145 4500008004100000245010100041210006900142300001100211490000600222520077700228100001601005700002101021700001501042856010601057 2003 eng d00aAn approach to inferring transcriptional regulation among genes from large-scale expression data0 aapproach to inferring transcriptional regulation among genes fro a148-540 v43 aThe use of DNA microarrays opens up the possibility of measuring the expression levels of thousands of genes simultaneously under different conditions. Time-course experiments allow researchers to study the dynamics of gene interactions. The inference of genetic networks from such measures can give important insights for the understanding of a variety of biological problems. Most of the existing methods for genetic network reconstruction require many experimental data points, or can only be applied to the reconstruction of small subnetworks. Here we present a method that reduces the dimensionality of the dataset and then extracts the significant dynamic correlations among genes. The method requires a number of points achievable in common time-course experiments.1 aHerrero, J.1 aDiaz-Uriarte, R.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1862909702356nas a2200265 4500008004100000245006200041210006200103300001000165490000700175520124200182653003001424653007301454653007501527653004901602653004201651653004301693653005001736653007501786653005801861100001901919700001601938700001501954700001501969856010601984 2003 eng d00aComparing bacterial genomes through conservation profiles0 aComparing bacterial genomes through conservation profiles a991-80 v133 aWe constructed two-dimensional representations of profiles of gene conservation across different genomes using the genome of Escherichia coli as a model. These profiles permit both the visualization at the genome level of different traits in the organism studied and, at the same time, reveal features related to the genomes analyzed (such as defective genomes or genomes that lack a particular system). Conserved genes are not uniformly distributed along the E. coli genome but tend to cluster together. The study of gene distribution patterns across genomes is important for the understanding of how sets of genes seem to be dependent on each other, probably having some functional link. This provides additional evidence that can be used for the elucidation of the function of unannotated genes. Clustering these patterns produces families of genes which can be arranged in a hierarchy of closeness. In this way, functions can be defined at different levels of generality depending on the level of the hierarchy that is studied. The combined study of conservation and phenotypic traits opens up the possibility of defining phenotype/genotype associations, and ultimately inferring the gene or genes responsible for a particular trait.10aBacterial Genotype Models10aBacterial/genetics Cluster Analysis Conserved Sequence/*genetics DNA10aBacterial/genetics Escherichia coli/classification/*genetics Evolution10aBacterial/genetics Gene Order/genetics Genes10aBacterial/genetics/physiology *Genome10aChromosome Mapping/methods Chromosomes10aGenetic Phenotype Phylogeny Sequence Homology10aMolecular Gene Expression Profiling/methods Gene Expression Regulation10aNucleic Acid Species Specificity Terminology as Topic1 aMartin, M., J.1 aHerrero, J.1 aMateos, A.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1269532401226nas a2200169 4500008004100000245003900041210003900080300001000119490000700129520049600136653021400632653005200846100001600898700002100914700001500935856010600950 2003 eng d00aGene expression data preprocessing0 aGene expression data preprocessing a655-60 v193 aWe present an interactive web tool for preprocessing microarray gene expression data. It analyses the data, suggests the most appropriate transformations and proceeds with them after user agreement. The normal preprocessing steps include scale transformations, management of missing values, replicate handling, flat pattern filtering and pattern standardization and they are required before performing any pattern analysis. The processed data set can be sent to other pattern analysis tools.10a*Database Management Systems Gene Expression Profiling/*methods Information Storage and Retrieval/methods Internet Oligonucleotide Array Sequence Analysis/*methods Sequence Alignment/*methods Sequence Analysis10aDNA/*methods *Software *User-Computer Interface1 aHerrero, J.1 aDiaz-Uriarte, R.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1265172601500nas a2200217 4500008004100000245007700041210006900118300001100187490000700198520080200205653001001007653002901017100001601046700002401062700002101086700001501107700002301122700001601145700001501161856010601176 2003 eng d00aGEPAS: A web-based resource for microarray gene expression data analysis0 aGEPAS A webbased resource for microarray gene expression data an a3461-70 v313 aWe present a web-based pipeline for microarray gene expression profile analysis, GEPAS, which stands for Gene Expression Profile Analysis Suite (http://gepas.bioinfo.cnio.es). GEPAS is composed of different interconnected modules which include tools for data pre-processing, two-conditions comparison, unsupervised and supervised clustering (which include some of the most popular methods as well as home made algorithms) and several tests for differential gene expression among different classes, continuous variables or survival analysis. A multiple purpose tool for data mining, based on Gene Ontology, is also linked to the tools, which constitutes a very convenient way of analysing clustering results. On-line tutorials are available from our main web server (http://bioinfo.cnio.es).
10agepas10amicroarray data analysis1 aHerrero, J.1 aAl-Shahrour, Fatima1 aDiaz-Uriarte, R.1 aMateos, A.1 aVaquerizas, J., M.1 aSantoyo, J.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1282434501648nas a2200241 4500008004100000245010900041210006900150300001100219490000700230520055300237653010300790653018900893653008001082100001601162700001401178700001601192700001501208700001701223700002201240700002101262700001701283856010601300 2002 eng d00aBioinformatics methods for the analysis of expression arrays: data clustering and information extraction0 aBioinformatics methods for the analysis of expression arrays dat a269-830 v983 aExpression arrays facilitate the monitoring of changes in the expression patterns of large collections of genes. The analysis of expression array data has become a computationally-intensive task that requires the development of bioinformatics technology for a number of key stages in the process, such as image analysis, database storage, gene clustering and information extraction. Here, we review the current trends in each of these areas, with particular emphasis on the development of the related technology being carried out within our groups.10aAbstracting and Indexing as Topic/methods *Cluster Analysis *Database Management Systems Databases10aComputer-Assisted/methods Information Storage and Retrieval/*methods Internet Medline National Library of Medicine (U.S.) Oligonucleotide Array Sequence Analysis/*methods United States10aGenetic Gene Expression Gene Expression Profiling/*methods Image Processing1 aTamames, J.1 aClark, D.1 aHerrero, J.1 aDopazo, J.1 aBlaschke, C.1 aFernandez, J., M.1 aOliveros, J., C.1 aValencia, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1214199201191nas a2200157 4500008004100000245011600041210006900157300001100226490000600237520046000243653007400703653011900777100001600896700001500912856010600927 2002 eng d00aCombining hierarchical clustering and self-organizing maps for exploratory analysis of gene expression patterns0 aCombining hierarchical clustering and selforganizing maps for ex a467-700 v13 aSelf-organizing maps (SOM) constitute an alternative to classical clustering methods because of its linear run times and superior performance to deal with noisy data. Nevertheless, the clustering obtained with SOM is dependent on the relative sizes of the clusters. Here, we show how the combination of SOM with hierarchical clustering methods constitutes an excellent tool for exploratory analysis of massive data like DNA microarray expression patterns.10aCluster Analysis Computational Biology/methods *Gene Expression Genes10aFungal/genetics *Genome Oligonucleotide Array Sequence Analysis/*methods Statistics as Topic/*methods Time Factors1 aHerrero, J.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1264591903136nas a2200397 4500008004100000245010000041210006900141300001200210490000800222520146400230653015401694653005901848653001301907653008901920653015002009653006702159653003702226653008702263653001102350100001502361700001902376700001402395700001502409700001602424700001802440700003002458700002602488700001802514700001402532700001802546700001602564700001902580700001502599700001802614856010602632 2002 eng d00aIdentification of genes involved in resistance to interferon-alpha in cutaneous T-cell lymphoma0 aIdentification of genes involved in resistance to interferonalph a1825-370 v1613 aInterferon-alpha therapy has been shown to be active in the treatment of mycosis fungoides although the individual response to this therapy is unpredictable and dependent on essentially unknown factors. In an effort to better understand the molecular mechanisms of interferon-alpha resistance we have developed an interferon-alpha resistant variant from a sensitive cutaneous T-cell lymphoma cell line. We have performed expression analysis to detect genes differentially expressed between both variants using a cDNA microarray including 6386 cancer-implicated genes. The experiments showed that resistance to interferon-alpha is consistently associated with changes in the expression of a set of 39 genes, involved in signal transduction, apoptosis, transcription regulation, and cell growth. Additional studies performed confirm that STAT1 and STAT3 expression and interferon-alpha induction and activation are not altered between both variants. The gene MAL, highly overexpressed by resistant cells, was also found to be expressed by tumoral cells in a series of cutaneous T-cell lymphoma patients treated with interferon-alpha and/or photochemotherapy. MAL expression was associated with longer time to complete remission. Time-course experiments of the sensitive and resistant cells showed a differential expression of a subset of genes involved in interferon-response (1 to 4 hours), cell growth and apoptosis (24 to 48 hours.), and signal transduction.10aAntineoplastic Agents/*pharmacology/therapeutic use Carrier Proteins/biosynthesis/genetics DNA-Binding Proteins/biosynthesis/genetics Drug Resistance10aBiological Oligonucleotide Array Sequence Analysis RNA10aCultured10aCutaneous/diagnosis/drug therapy/*genetics/metabolism *Membrane Glycoproteins Models10aInterleukin-1 Reproducibility of Results STAT1 Transcription Factor STAT3 Transcription Factor Trans-Activators/biosynthesis/genetics Tumor Cells10aNeoplasm Gene Expression Profiling *Gene Expression Regulation10aNeoplasm/biosynthesis *Receptors10aNeoplastic Humans Interferon-alpha/*pharmacology/therapeutic use Kinetics Lymphoma10aT-Cell1 aTracey, L.1 aVilluendas, R.1 aOrtiz, P.1 aDopazo, A.1 aSpiteri, I.1 aLombardia, L.1 aRodriguez-Peralto, J., L.1 aFernandez-Herrera, J.1 aHernandez, A.1 aFraga, J.1 aDominguez, O.1 aHerrero, J.1 aAlonso, M., A.1 aDopazo, J.1 aPiris, M., A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1241452902866nas a2200193 4500008004100000245011400041210006900155300001200224490000700236520197100243653025902214100001502473700001502488700001502503700001102518700001702529700002002546856010602566 2002 eng d00aSystematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons0 aSystematic learning of gene functional classes from DNA array ex a1703-150 v123 aRecent advances in microarray technology have opened new ways for functional annotation of previously uncharacterised genes on a genomic scale. This has been demonstrated by unsupervised clustering of co-expressed genes and, more importantly, by supervised learning algorithms. Using prior knowledge, these algorithms can assign functional annotations based on more complex expression signatures found in existing functional classes. Previously, support vector machines (SVMs) and other machine-learning methods have been applied to a limited number of functional classes for this purpose. Here we present, for the first time, the comprehensive application of supervised neural networks (SNNs) for functional annotation. Our study is novel in that we report systematic results for 100 classes in the Munich Information Center for Protein Sequences (MIPS) functional catalog. We found that only 10% of these are learnable (based on the rate of false negatives). A closer analysis reveals that false positives (and negatives) in a machine-learning context are not necessarily "false" in a biological sense. We show that the high degree of interconnections among functional classes confounds the signatures that ought to be learned for a unique class. We term this the "Borges effect" and introduce two new numerical indices for its quantification. Our analysis indicates that classification systems with a lower Borges effect are better suitable for machine learning. Furthermore, we introduce a learning procedure for combining false positives with the original class. We show that in a few iterations this process converges to a gene set that is learnable with considerably low rates of false positives and negatives and contains genes that are biologically related to the original class, allowing for a coarse reconstruction of the interactions between associated biological pathways. We exemplify this methodology using the well-studied tricarboxylic acid cycle.10aAlgorithms Artificial Intelligence Citric Acid Cycle/genetics Cluster Analysis Computational Biology/methods Gene Expression Profiling/*methods/statistics & numerical data Genes/*physiology Genetic Heterogeneity Neural Networks (Computer) Oligonucleotide1 aMateos, A.1 aDopazo, J.1 aJansen, R.1 aTu, Y.1 aGerstein, M.1 aStolovitzky, G. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1242175702311nas a2200361 4500008004100000245009500041210006900136300001100205490000600216520111600222653009801338653003901436653003101475653000801506653005901514100001501573700001601588700001601604700001601620700001601636700001601652700001701668700002101685700001501706700002101721700001601742700001401758700001401772700001501786700001601801700002601817856010601843 2001 eng d00aAnnotated draft genomic sequence from a Streptococcus pneumoniae type 19F clinical isolate0 aAnnotated draft genomic sequence from a Streptococcus pneumoniae a99-1250 v73 aThe public availability of numerous microbial genomes is enabling the analysis of bacterial biology in great detail and with an unprecedented, organism-wide and taxon-wide, broad scope. Streptococcus pneumoniae is one of the most important bacterial pathogens throughout the world. We present here sequences and functional annotations for 2.1-Mbp of pneumococcal DNA, covering more than 90% of the total estimated size of the genome. The sequenced strain is a clinical isolate resistant to macrolides and tetracycline. It carries a type 19F capsular locus, but multilocus sequence typing for several conserved genetic loci suggests that the strain sequenced belongs to a pneumococcal lineage that most often expresses a serotype 15 capsular polysaccharide. A total of 2,046 putative open reading frames (ORFs) longer than 100 amino acids were identified (average of 1,009 bp per ORF), including all described two-component systems and aminoacyl tRNA synthetases. Comparisons to other complete, or nearly complete, bacterial genomes were made and are presented in a graphical form for all the predicted proteins.10aBacterial Molecular Sequence Data Pneumococcal Infections/*microbiology Prokaryotic Cells RNA10aBacterial/chemistry/genetics Genes10aBacterial/genetics *Genome10aDNA10aTransfer/metabolism Streptococcus pneumoniae/*genetics1 aDopazo, J.1 aMendoza, A.1 aHerrero, J.1 aCaldara, F.1 aHumbert, Y.1 aFriedli, L.1 aGuerrier, M.1 aGrand-Schenk, E.1 aGandin, C.1 ade Francesco, M.1 aPolissi, A.1 aBuell, G.1 aFeger, G.1 aGarcia, E.1 aPeitsch, M.1 aGarcia-Bustos, J., F. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1144234802638nas a2200157 4500008004100000245009500041210006900136300001100205490000700216520196500223653013802188100001602326700001702342700001502359856010602374 2001 eng d00aA hierarchical unsupervised growing neural network for clustering gene expression patterns0 ahierarchical unsupervised growing neural network for clustering a126-360 v173 aMOTIVATION: We describe a new approach to the analysis of gene expression data coming from DNA array experiments, using an unsupervised neural network. DNA array technologies allow monitoring thousands of genes rapidly and efficiently. One of the interests of these studies is the search for correlated gene expression patterns, and this is usually achieved by clustering them. The Self-Organising Tree Algorithm, (SOTA) (Dopazo,J. and Carazo,J.M. (1997) J. Mol. Evol., 44, 226-233), is a neural network that grows adopting the topology of a binary tree. The result of the algorithm is a hierarchical cluster obtained with the accuracy and robustness of a neural network. RESULTS: SOTA clustering confers several advantages over classical hierarchical clustering methods. SOTA is a divisive method: the clustering process is performed from top to bottom, i.e. the highest hierarchical levels are resolved before going to the details of the lowest levels. The growing can be stopped at the desired hierarchical level. Moreover, a criterion to stop the growing of the tree, based on the approximate distribution of probability obtained by randomisation of the original data set, is provided. By means of this criterion, a statistical support for the definition of clusters is proposed. In addition, obtaining average gene expression patterns is a built-in feature of the algorithm. Different neurons defining the different hierarchical levels represent the averages of the gene expression patterns contained in the clusters. Since SOTA runtimes are approximately linear with the number of items to be classified, it is especially suitable for dealing with huge amounts of data. The method proposed is very general and applies to any data providing that they can be coded as a series of numbers and that a computable measure of similarity between data items can be used. AVAILABILITY: A server running the program can be found at: http://bioinfo.cnio.es/sotarray.10a*Algorithms Automatic Data Processing *Gene Expression Profiling *Neural Networks (Computer) *Oligonucleotide Array Sequence Analysis1 aHerrero, J.1 aValencia, A.1 aDopazo, J. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1123806802532nas a2200229 4500008004100000245015400041210006900195300001000264490000700274520149500281653013001776653009301906653007401999100001802073700001902091700002002110700001702130700001802147700001502165700001602180856010602196 2001 eng d00aIdentification of optimal regions for phylogenetic studies on VP1 gene of foot-and-mouth disease virus: analysis of types A and O Argentinean viruses0 aIdentification of optimal regions for phylogenetic studies on VP a31-450 v323 aAn analysis of the informative content of sequence stretches on the foot-and-mouth disease virus (FMDV) VPI gene was applied to two important viral serotypes: A and O. Several sequence regions were identified to allow the reconstruction of phylogenetic trees equivalent to those derived from the whole VPI gene. The optimal informative regions for sequence windows of 150 to 250 nt were predicted between positions 250 and 550 of the gene. The sequences spanning the 250 nt of the 3’ end (positions 400 to 650), extensively used for FMDV phylogenetic analyses, showed a lower informative content. In spite of this, the use of sequences from this region allowed the derivation of phylogenetic trees for type A and type O FMDVs which showed topologies similar to those previously reported for the whole VP1 gene. When the sequences determined for viruses isolated in Argentina, between 1990 and 1993, were included in these analyses, the results obtained revealed features of the circulation of type A and type O viruses in the field, in the months that preceded the eradication of the disease in this country. Type A viruses were closely related to an Argentinean vaccine strain, and defined an independent cluster within this serotype. Among the type O viruses analysed, two groups were distinguished; one was closely related to the South American vaccine strains, while the other was grouped with viruses of the O3 subtype. In addition, a detailed phylogeny for type A FMDV is presented.10aAmino Acid Sequence Animals Aphthovirus/classification/*genetics Base Sequence Capsid/chemistry/*genetics Capsid Proteins DNA10aComplementary/chemistry Molecular Sequence Data *Phylogeny Polymerase Chain Reaction RNA10aViral/chemistry/genetics Serotyping Viral Proteins/analysis/*genetics1 aNunez, J., I.1 aMartin, M., J.1 aPiccone, M., E.1 aCarrillo, E.1 aPalma, E., L.1 aDopazo, J.1 aSobrino, F. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1125417501764nas a2200169 4500008004100000245006700041210006700108300001100175490000800186520121300194100001501407700001601422700001601438700001701454700001701471856010601488 2001 eng d00aMethods and approaches in the analysis of gene expression data0 aMethods and approaches in the analysis of gene expression data a93-1120 v2503 aThe application of high-density DNA array technology to monitor gene transcription has been responsible for a real paradigm shift in biology. The majority of research groups now have the ability to measure the expression of a significant proportion of the human genome in a single experiment, resulting in an unprecedented volume of data being made available to the scientific community. As a consequence of this, the storage, analysis and interpretation of this information present a major challenge. In the field of immunology the analysis of gene expression profiles has opened new areas of investigation. The study of cellular responses has revealed that cells respond to an activation signal with waves of co-ordinated gene expression profiles and that the components of these responses are the key to understanding the specific mechanisms which lead to phenotypic differentiation. The discovery of ’cell type specific’ gene expression signatures have also helped the interpretation of the mechanisms leading to disease progression. Here we review the principles behind the most commonly used data analysis methods and discuss the approaches that have been employed in immunological research.
1 aDopazo, J.1 aZanders, E.1 aDragoni, I.1 aAmphlett, G.1 aFalciani, F. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=1125122401950nas a2200229 4500008004100000245009500041210006900136300001000205490000700215520116100222653001401383653005301397653002801450653003701478100001801515700001501533700001901548700001501567700001901582700001301601856010601614 2001 eng d00aPhylogenetic analysis of viroid and viroid-like satellite RNAs from plants: a reassessment0 aPhylogenetic analysis of viroid and viroidlike satellite RNAs fr a155-90 v533 aThe proposed monophyletic origin of a group of subviral plant pathogens (viroids and viroid-like satellite RNAs), as well as the phylogenetic relationships and the resulting taxonomy of these entities, has been recently questioned. The criticism comes from the (apparent) lack of sequence similarity among these RNAs necessary to reliably infer a phylogeny. Here we show that, despite their low overall sequence similarity, a sequence alignment manually adjusted to take into account all the local similarities and the insertions/deletions and duplications/rearrangements described in the literature for viroids and viroid-like satellite RNA, along with the use of an appropriate estimator of genetic distances, constitutes a data set suitable for a phylogenetic reconstruction. When the likelihood-mapping method was applied to this data set, the tree-likeness obtained was higher than that corresponding to a sequence alignment that does not take into consideration the local similarities. In addition, bootstrap analysis also supports the major groups previously proposed and the reconstruction is consistent with the biological properties of this RNAs.10aEvolution10aMolecular *Phylogeny Plant Viruses/*genetics RNA10aSatellite/*genetics RNA10aViral/genetics Viroids/*genetics1 aElena, S., F.1 aDopazo, J.1 ade la Pena, M.1 aFlores, R.1 aDiener, T., O.1 aMoya, A. uhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&dopt=Citation&list_uids=11479686