TY - JOUR T1 - A new parallel pipeline for DNA methylation analysis of long reads datasets. JF - BMC bioinformatics Y1 - 2017 A1 - Olanda, Ricardo A1 - Pérez, Mariano A1 - Orduña, Juan M A1 - Tárraga, Joaquín A1 - Joaquín Dopazo KW - Methyl-Seq KW - NGS AB - BACKGROUND: DNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. RESULTS: In this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while yielding a better level of sensitivity, particularly for datasets composed of long reads. This strategy can be exported to other methylation, DNA and RNA analysis tools. CONCLUSIONS: The developed software tool achieves execution times one order of magnitude shorter than the existing tools, while yielding equal sensitivity for short reads and even better sensitivity for long reads. VL - 18 UR - http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1574-3 ER - TY - JOUR T1 - 267 Spanish exomes reveal population-specific differences in disease-related genetic variation. JF - Molecular biology and evolution Y1 - 2016 A1 - Joaquín Dopazo A1 - Amadoz, Alicia A1 - Bleda, Marta A1 - García-Alonso, Luz A1 - Alemán, Alejandro A1 - Garcia-Garcia, Francisco A1 - Rodriguez, Juan A A1 - Daub, Josephine T A1 - Muntané, Gerard A1 - Antonio Rueda A1 - Vela-Boza, Alicia A1 - López-Domingo, Francisco J A1 - Florido, Javier P A1 - Arce, Pablo A1 - Ruiz-Ferrer, Macarena A1 - Méndez-Vidal, Cristina A1 - Arnold, Todd E A1 - Spleiss, Olivia A1 - Alvarez-Tejado, Miguel A1 - Navarro, Arcadi A1 - Bhattacharya, Shomi S A1 - Borrego, Salud A1 - Santoyo-López, Javier A1 - Antiňolo, Guillermo KW - disease KW - NGS KW - polymorphisms KW - Population genomics KW - prioritization KW - SNP AB - Recent results from large-scale genomic projects suggest that allele frequencies, which are highly relevant for medical purposes, differ considerably across different populations. The need for a detailed catalogue of local variability motivated the whole exome sequencing of 267 unrelated individuals, representative of the healthy Spanish population. Like in other studies, a considerable number of rare variants were found (almost one third of the described variants). There were also relevant differences in allelic frequencies in polymorphic variants, including about 10,000 polymorphisms private to the Spanish population. The allelic frequencies of variants conferring susceptibility to complex diseases (including cancer, schizophrenia, Alzheimer disease, type 2 diabetes and other pathologies) were overall similar to those of other populations. However, the trend is the opposite for variants linked to Mendelian and rare diseases (including several retinal degenerative dystrophies and cardiomyopathies) that show marked frequency differences between populations. Interestingly, a correspondence between differences in allelic frequencies and disease prevalence was found, highlighting the relevance of frequency differences in disease risk. These differences are also observed in variants that disrupt known drug binding sites, suggesting an important role for local variability in population-specific drug resistances or adverse effects. We have made the Spanish population variant server web page that contains population frequency information for the complete list of 170,888 variant positions we found publicly available (http://spv.babelomics.org/), We show that it if fundamental to determine population-specific variant frequencies in order to distinguish real disease associations from population-specific polymorphisms. UR - https://mbe.oxfordjournals.org/content/early/2016/02/17/molbev.msw005.full ER - TY - JOUR T1 - Assessment of Targeted Next-Generation Sequencing as a Tool for the Diagnosis of Charcot-Marie-Tooth Disease and Hereditary Motor Neuropathy. JF - The Journal of molecular diagnostics : JMD Y1 - 2016 A1 - Lupo, Vincenzo A1 - Garcia-Garcia, Francisco A1 - Sancho, Paula A1 - Tello, Cristina A1 - García-Romero, Mar A1 - Villarreal, Liliana A1 - Alberti, Antonia A1 - Sivera, Rafael A1 - Joaquín Dopazo A1 - Pascual-Pascual, Samuel I A1 - Márquez-Infante, Celedonio A1 - Casasnovas, Carlos A1 - Sevilla, Teresa A1 - Espinós, Carmen KW - Charcot-Marie-Tooth KW - CMT KW - Diagnostic KW - NGS KW - Panels KW - rare diseases KW - Targeted resequencing AB - Charcot-Marie-Tooth disease is characterized by broad genetic heterogeneity with >50 known disease-associated genes. Mutations in some of these genes can cause a pure motor form of hereditary motor neuropathy, the genetics of which are poorly characterized. We designed a panel comprising 56 genes associated with Charcot-Marie-Tooth disease/hereditary motor neuropathy. We validated this diagnostic tool by first testing 11 patients with pathological mutations. A cohort of 33 affected subjects was selected for this study. The DNAJB2 c.352+1G>A mutation was detected in two cases; novel changes and/or variants with low frequency (<1%) were found in 12 cases. There were no candidate variants in 18 cases, and amplification failed for one sample. The DNAJB2 c.352+1G>A mutation was also detected in three additional families. On haplotype analysis, all of the patients from these five families shared the same haplotype; therefore, the DNAJB2 c.352+1G>A mutation may be a founder event. Our gene panel allowed us to perform a very rapid and cost-effective screening of genes involved in Charcot-Marie-Tooth disease/hereditary motor neuropathy. Our diagnostic strategy was robust in terms of both coverage and read depth for all of the genes and patient samples. These findings demonstrate the difficulty in achieving a definitive molecular diagnosis because of the complexity of interpreting new variants and the genetic heterogeneity that is associated with these neuropathies. UR - http://www.sciencedirect.com/science/article/pii/S1525157815002615 ER - TY - JOUR T1 - HPG pore: an efficient and scalable framework for nanopore sequencing data. JF - BMC bioinformatics Y1 - 2016 A1 - Tárraga, Joaquín A1 - Gallego, Asunción A1 - Arnau, Vicente A1 - Medina, Ignacio A1 - Dopazo, Joaquin KW - hadoop KW - HPC KW - nanopore KW - NGS AB - BACKGROUND: The use of nanopore technologies is expected to spread in the future because they are portable and can sequence long fragments of DNA molecules without prior amplification. The first nanopore sequencer available, the MinION™ from Oxford Nanopore Technologies, is a USB-connected, portable device that allows real-time DNA analysis. In addition, other new instruments are expected to be released soon, which promise to outperform the current short-read technologies in terms of throughput. Despite the flood of data expected from this technology, the data analysis solutions currently available are only designed to manage small projects and are not scalable. RESULTS: Here we present HPG Pore, a toolkit for exploring and analysing nanopore sequencing data. HPG Pore can run on both individual computers and in the Hadoop distributed computing framework, which allows easy scale-up to manage the large amounts of data expected to result from extensive use of nanopore technologies in the future. CONCLUSIONS: HPG Pore allows for virtually unlimited sequencing data scalability, thus guaranteeing its continued management in near future scenarios. HPG Pore is available in GitHub at http://github.com/opencb/hpg-pore . VL - 17 UR - http://www.biomedcentral.com/1471-2105/17/107 ER - TY - JOUR T1 - Assessing the impact of mutations found in next generation sequencing data over human signaling pathways. JF - Nucleic acids research Y1 - 2015 A1 - Hernansaiz-Ballesteros, Rosa D A1 - Salavert, Francisco A1 - Sebastián-Leon, Patricia A1 - Alemán, Alejandro A1 - Medina, Ignacio A1 - Joaquín Dopazo KW - NGS KW - pathways KW - signalling KW - Systems biology AB - Modern sequencing technologies produce increasingly detailed data on genomic variation. However, conventional methods for relating either individual variants or mutated genes to phenotypes present known limitations given the complex, multigenic nature of many diseases or traits. Here we present PATHiVar, a web-based tool that integrates genomic variation data with gene expression tissue information. PATHiVar constitutes a new generation of genomic data analysis methods that allow studying variants found in next generation sequencing experiment in the context of signaling pathways. Simple Boolean models of pathways provide detailed descriptions of the impact of mutations in cell functionality so as, recurrences in functionality failures can easily be related to diseases, even if they are produced by mutations in different genes. Patterns of changes in signal transmission circuits, often unpredictable from individual genes mutated, correspond to patterns of affected functionalities that can be related to complex traits such as disease progression, drug response, etc. PATHiVar is available at: http://pathivar.babelomics.org. VL - 43 UR - http://nar.oxfordjournals.org/content/43/W1/W270 ER - TY - JOUR T1 - Babelomics 5.0: functional interpretation for new generations of genomic data. JF - Nucleic acids research Y1 - 2015 A1 - Alonso, Roberto A1 - Salavert, Francisco A1 - Garcia-Garcia, Francisco A1 - Carbonell-Caballero, José A1 - Bleda, Marta A1 - García-Alonso, Luz A1 - Sanchis-Juan, Alba A1 - Perez-Gil, Daniel A1 - Marin-Garcia, Pablo A1 - Sánchez, Rubén A1 - Cubuk, Cankut A1 - Hidalgo, Marta R A1 - Amadoz, Alicia A1 - Hernansaiz-Ballesteros, Rosa D A1 - Alemán, Alejandro A1 - Tárraga, Joaquín A1 - Montaner, David A1 - Medina, Ignacio A1 - Dopazo, Joaquin KW - babelomics KW - data integration KW - gene set analysis KW - interactome KW - network analysis KW - NGS KW - RNA-seq KW - Systems biology KW - transcriptomics AB - Babelomics has been running for more than one decade offering a user-friendly interface for the functional analysis of gene expression and genomic data. Here we present its fifth release, which includes support for Next Generation Sequencing data including gene expression (RNA-seq), exome or genome resequencing. Babelomics has simplified its interface, being now more intuitive. Improved visualization options, such as a genome viewer as well as an interactive network viewer, have been implemented. New technical enhancements at both, client and server sides, makes the user experience faster and more dynamic. Babelomics offers user-friendly access to a full range of methods that cover: (i) primary data analysis, (ii) a variety of tests for different experimental designs and (iii) different enrichment and network analysis algorithms for the interpretation of the results of such tests in the proper functional context. In addition to the public server, local copies of Babelomics can be downloaded and installed. Babelomics is freely available at: http://www.babelomics.org. VL - 43 UR - http://nar.oxfordjournals.org/content/43/W1/W117 ER - TY - JOUR T1 - Combining tumor genome simulation with crowdsourcing to benchmark somatic single-nucleotide-variant detection. JF - Nature methods Y1 - 2015 A1 - Ewing, Adam D A1 - Houlahan, Kathleen E A1 - Hu, Yin A1 - Ellrott, Kyle A1 - Caloian, Cristian A1 - Yamaguchi, Takafumi N A1 - Bare, J Christopher A1 - P’ng, Christine A1 - Waggott, Daryl A1 - Sabelnykova, Veronica Y A1 - Kellen, Michael R A1 - Norman, Thea C A1 - Haussler, David A1 - Friend, Stephen H A1 - Stolovitzky, Gustavo A1 - Margolin, Adam A A1 - Stuart, Joshua M A1 - Boutros, Paul C ED - ICGC-TCGA DREAM Somatic Mutation Calling Challenge participants ED - Liu Xi ED - Ninad Dewal ED - Yu Fan ED - Wenyi Wang ED - David Wheeler ED - Andreas Wilm ED - Grace Hui Ting ED - Chenhao Li ED - Denis Bertrand ED - Niranjan Nagarajan ED - Qing-Rong Chen ED - Chih-Hao Hsu ED - Ying Hu ED - Chunhua Yan ED - Warren Kibbe ED - Daoud Meerzaman ED - Kristian Cibulskis ED - Mara Rosenberg ED - Louis Bergelson ED - Adam Kiezun ED - Amie Radenbaugh ED - Anne-Sophie Sertier ED - Anthony Ferrari ED - Laurie Tonton ED - Kunal Bhutani ED - Nancy F Hansen ED - Difei Wang ED - Lei Song ED - Zhongwu Lai ED - Liao, Yang ED - Shi, Wei ED - Carbonell-Caballero, José ED - Joaquín Dopazo ED - Cheryl C K Lau ED - Justin Guinney KW - cancer KW - NGS KW - variant calling AB - The detection of somatic mutations from cancer genome sequences is key to understanding the genetic basis of disease progression, patient survival and response to therapy. Benchmarking is needed for tool assessment and improvement but is complicated by a lack of gold standards, by extensive resource requirements and by difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge, a crowdsourced benchmark of somatic mutation detection algorithms. Here we report the BAMSurgeon tool for simulating cancer genomes and the results of 248 analyses of three in silico tumors created with it. Different algorithms exhibit characteristic error profiles, and, intriguingly, false positives show a trinucleotide profile very similar to one found in human tumors. Although the three simulated tumors differ in sequence contamination (deviation from normal cell sequence) and in subclonality, an ensemble of pipelines outperforms the best individual pipeline in all cases. BAMSurgeon is available at https://github.com/adamewing/bamsurgeon/. UR - http://www.nature.com/nmeth/journal/vaop/ncurrent/full/nmeth.3407.html ER - TY - JOUR T1 - Concurrent and Accurate Short Read Mapping on Multicore Processors. JF - IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM Y1 - 2015 A1 - Martinez, Hector A1 - Tárraga, Joaquín A1 - Medina, Ignacio A1 - Barrachina, Sergio A1 - Castillo, Maribel A1 - Dopazo, Joaquin A1 - Quintana-Orti, Enrique S KW - HPC KW - NGS KW - short real mapping AB - We introduce a parallel aligner with a work-flow organization for fast and accurate mapping of RNA sequences on servers equipped with multicore processors. Our software, [Formula: see text] ([Formula: see text] is an open-source application. The software is available at http://www.opencb.org, exploits a suffix array to rapidly map a large fraction of the RNA fragments (reads), as well as leverages the accuracy of the Smith-Waterman algorithm to deal with conflictive reads. The aligner is enhanced with a careful strategy to detect splice junctions based on an adaptive division of RNA reads into small segments (or seeds), which are then mapped onto a number of candidate alignment locations, providing crucial information for the successful alignment of the complete reads. The experimental results on a platform with Intel multicore technology report the parallel performance of [Formula: see text], on RNA reads of 100-400 nucleotides, which excels in execution time/sensitivity to state-of-the-art aligners such as TopHat 2+Bowtie 2, MapSplice, and STAR. VL - 12 UR - http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=7010005 ER - TY - JOUR T1 - Exome sequencing reveals a high genetic heterogeneity on familial Hirschsprung disease. JF - Scientific reports Y1 - 2015 A1 - Luzón-Toro, Berta A1 - Gui, Hongsheng A1 - Ruiz-Ferrer, Macarena A1 - Sze-Man Tang, Clara A1 - Fernández, Raquel M A1 - Sham, Pak-Chung A1 - Torroglosa, Ana A1 - Kwong-Hang Tam, Paul A1 - Espino-Paisán, Laura A1 - Cherny, Stacey S A1 - Bleda, Marta A1 - Enguix-Riego, María Del Valle A1 - Joaquín Dopazo A1 - Antiňolo, Guillermo A1 - Garcia-Barceló, Maria-Mercè A1 - Borrego, Salud KW - babelomics KW - Hirschprung KW - NGS KW - prioritization AB - Hirschsprung disease (HSCR; OMIM 142623) is a developmental disorder characterized by aganglionosis along variable lengths of the distal gastrointestinal tract, which results in intestinal obstruction. Interactions among known HSCR genes and/or unknown disease susceptibility loci lead to variable severity of phenotype. Neither linkage nor genome-wide association studies have efficiently contributed to completely dissect the genetic pathways underlying this complex genetic disorder. We have performed whole exome sequencing of 16 HSCR patients from 8 unrelated families with SOLID platform. Variants shared by affected relatives were validated by Sanger sequencing. We searched for genes recurrently mutated across families. Only variations in the FAT3 gene were significantly enriched in five families. Within-family analysis identified compound heterozygotes for AHNAK and several genes (N = 23) with heterozygous variants that co-segregated with the phenotype. Network and pathway analyses facilitated the discovery of polygenic inheritance involving FAT3, HSCR known genes and their gene partners. Altogether, our approach has facilitated the detection of more than one damaging variant in biologically plausible genes that could jointly contribute to the phenotype. Our data may contribute to the understanding of the complex interactions that occur during enteric nervous system development and the etiopathology of familial HSCR. VL - 5 UR - http://www.nature.com/articles/srep16473 ER - TY - JOUR T1 - A Parallel and Sensitive Software Tool for Methylation Analysis on Multicore Platforms. JF - Bioinformatics (Oxford, England) Y1 - 2015 A1 - Tárraga, Joaquín A1 - Pérez, Mariano A1 - Orduña, Juan M A1 - Duato, José A1 - Medina, Ignacio A1 - Joaquín Dopazo KW - BS-seq KW - HPC KW - methylation KW - NGS AB - MOTIVATION: DNA methylation analysis suffers from very long processing time, since the advent of Next-Generation Sequencers (NGS) has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. Since it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. RESULTS: We present a new software tool, called HPG-Methyl, which efficiently maps bisulfite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows-Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith-Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPGMethyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulfite reads. AVAILABILITY: Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password "anonymous"). CONTACT: Juan.Orduna@uv.es. VL - 31 UR - http://bioinformatics.oxfordjournals.org/content/31/19/3130.long ER - TY - JOUR T1 - Acceleration of short and long DNA read mapping without loss of accuracy using suffix array. JF - Bioinformatics (Oxford, England) Y1 - 2014 A1 - Tárraga, Joaquín A1 - Arnau, Vicente A1 - Martinez, Hector A1 - Moreno, Raul A1 - Cazorla, Diego A1 - Salavert-Torres, José A1 - Blanquer-Espert, Ignacio A1 - Joaquín Dopazo A1 - Medina, Ignacio KW - NGS KW - short read mapping. HPC. suffix arrays AB - HPG Aligner applies suffix arrays for DNA read mapping. This implementation produces a highly sensitive and extremely fast mapping of DNA reads that scales up almost linearly with read length. The approach presented here is faster (over 20x for long reads) and more sensitive (over 98% in a wide range of read lengths) than the current, state-of-the-art mappers. HPG Aligner is not only an optimal alternative for current sequencers but also the only solution available to cope with longer reads and growing throughputs produced by forthcoming sequencing technologies. VL - 30 UR - http://bioinformatics.oxfordjournals.org/content/early/2014/08/19/bioinformatics.btu553.long ER - TY - JOUR T1 - A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium. JF - Nature biotechnology Y1 - 2014 A1 - Su, Z. A1 - Labaj, P.P. A1 - .... A1 - Dopazo, J. A1 - .... A1 - Mason, C.E. A1 - Shi, L KW - NGS KW - RNA-seq KW - SEQC AB - We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA sequencing (RNA-seq) performance for junction discovery and differential expression profiling and compare it to microarray and quantitative PCR (qPCR) data using complementary metrics. At all sequencing depths, we discover unannotated exon-exon junctions, with >80% validated by qPCR. We find that measurements of relative expression are accurate and reproducible across sites and platforms if specific filters are used. In contrast, RNA-seq and microarrays do not provide accurate absolute measurements, and gene-specific biases are observed for all examined platforms, including qPCR. Measurement performance depends on the platform and data analysis pipeline, and variation is large for transcript-level profiling. The complete SEQC data sets, comprising >100 billion reads (10Tb), provide unique resources for evaluating RNA-seq analyses for clinical and regulatory settings. VL - 32 UR - http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.2957.html ER - TY - JOUR T1 - A New Overgrowth Syndrome is Due to Mutations in RNF125. JF - Human mutation Y1 - 2014 A1 - Tenorio, Jair A1 - Mansilla, Alicia A1 - Valencia, María A1 - Martínez-Glez, Víctor A1 - Romanelli, Valeria A1 - Arias, Pedro A1 - Castrejón, Nerea A1 - Poletta, Fernando A1 - Guillén-Navarro, Encarna A1 - Gordo, Gema A1 - Mansilla, Elena A1 - García-Santiago, Fé A1 - González-Casado, Isabel A1 - Vallespín, Elena A1 - Palomares, María A1 - Mori, María A A1 - Santos-Simarro, Fernando A1 - García-Miñaur, Sixto A1 - Fernández, Luis A1 - Mena, Rocío A1 - Benito-Sanz, Sara A1 - Del Pozo, Angela A1 - Silla, Juan Carlos A1 - Ibañez, Kristina A1 - López-Granados, Eduardo A1 - Martín-Trujillo, Alex A1 - Montaner, David A1 - Heath, Karen E A1 - Campos-Barros, Angel A1 - Joaquín Dopazo A1 - Nevado, Julián A1 - Monk, David A1 - Ruiz-Pérez, Víctor L A1 - Lapunzina, Pablo KW - NGS KW - prioritization KW - Rare Disease AB - Overgrowth syndromes (OGS) are a group of disorders in which all parameters of growth and physical development are above the mean for age and sex. We evaluated a series of 270 families from the Spanish Overgrowth Syndrome Registry with no known overgrowth syndrome. We identified one de novo deletion and three missense mutations in RNF125 in six patients from 4 families with overgrowth, macrocephaly, intellectual disability, mild hydrocephaly, hypoglycaemia and inflammatory diseases resembling Sjögren syndrome. RNF125 encodes an E3 ubiquitin ligase and is a novel gene of OGS. Our studies of the RNF125 pathway point to upregulation of RIG-I-IPS1-MDA5 and/or disruption of the PI3K-AKT and interferon signaling pathways as the putative final effectors. This article is protected by copyright. All rights reserved. VL - 35 UR - http://onlinelibrary.wiley.com/doi/10.1002/humu.22689/abstract ER - TY - JOUR T1 - Genome Maps, a new generation genome browser. JF - Nucleic acids research Y1 - 2013 A1 - Medina, Ignacio A1 - Salavert, Francisco A1 - Sánchez, Rubén A1 - De Maria, Alejandro A1 - Alonso, Roberto A1 - Escobar, Pablo A1 - Bleda, Marta A1 - Joaquín Dopazo KW - BAM KW - genome viewer KW - HTML5 KW - javascript KW - Next Generation Sequencing KW - NGS KW - SVG KW - VCF AB - Genome browsers have gained importance as more genomes and related genomic information become available. However, the increase of information brought about by new generation sequencing technologies is, at the same time, causing a subtle but continuous decrease in the efficiency of conventional genome browsers. Here, we present Genome Maps, a genome browser that implements an innovative model of data transfer and management. The program uses highly efficient technologies from the new HTML5 standard, such as scalable vector graphics, that optimize workloads at both server and client sides and ensure future scalability. Thus, data management and representation are entirely carried out by the browser, without the need of any Java Applet, Flash or other plug-in technology installation. Relevant biological data on genes, transcripts, exons, regulatory features, single-nucleotide polymorphisms, karyotype and so forth, are imported from web services and are available as tracks. In addition, several DAS servers are already included in Genome Maps. As a novelty, this web-based genome browser allows the local upload of huge genomic data files (e.g. VCF or BAM) that can be dynamically visualized in real time at the client side, thus facilitating the management of medical data affected by privacy restrictions. Finally, Genome Maps can easily be integrated in any web application by including only a few lines of code. Genome Maps is an open source collaborative initiative available in the GitHub repository (https://github.com/compbio-bigdata-viz/genome-maps). Genome Maps is available at: http://www.genomemaps.org. VL - 41 UR - http://nar.oxfordjournals.org/content/41/W1/W41 ER - TY - JOUR T1 - A map of human microRNA variation uncovers unexpectedly high levels of variability. JF - Genome medicine Y1 - 2012 A1 - Carbonell, José A1 - Alloza, Eva A1 - Arce, Pablo A1 - Borrego, Salud A1 - Santoyo, Javier A1 - Ruiz-Ferrer, Macarena A1 - Medina, Ignacio A1 - Jiménez-Almazán, Jorge A1 - Méndez-Vidal, Cristina A1 - González-del Pozo, María A1 - Vela, Alicia A1 - Bhattacharya, Shomi S A1 - Antiňolo, Guillermo A1 - Dopazo, Joaquin KW - NGS AB - ABSTRACT: BACKGROUND: MicroRNAs (miRNAs) are key components of the gene regulatory network in many species. During the past few years, these regulatory elements have been shown to be involved in an increasing number and range of diseases. Consequently, the compilation of a comprehensive map of natural variability in healthy population seems an obvious requirement for future research on miRNA-related pathologies. METHODS: Data on 14 populations from the 1000 Genomes Project were analysed, along with new data extracted from 60 exomes of healthy individuals from a southern Spain population, sequenced in the context of the Medical Genome Project, to derive an accurate map of miRNA variability. RESULTS: Despite the common belief that miRNAs are highly conserved elements, analysis of the sequences of the 1,152 individuals indicated that the observed level of variability is double what was expected. A total of 527 variants were found. Among these, 45 variants affected the recognition region of the corresponding miRNA and were found in 43 different miRNAs, 26 of which are known to be involved in 57 diseases. Different parts of the mature structure of the miRNA were affected to different degrees by variants, which suggests the existence of a selective pressure related to the relative functional impact of the change. Moreover, 41 variants showed a significant deviation from the Hardy-Weinberg equilibrium, which supports the existence of a selective process against some alleles. The average number of variants per individual in miRNAs was 28. CONCLUSIONS: Despite an expectation that miRNAs would be highly conserved genomic elements, our study reports a level of variability comparable to that observed for coding genes. VL - 4 UR - http://genomemedicine.com/content/4/8/62/abstract ER - TY - JOUR T1 - Qualimap: evaluating next-generation sequencing alignment data. JF - Bioinformatics (Oxford, England) Y1 - 2012 A1 - García-Alcalde, Fernando A1 - Okonechnikov, Konstantin A1 - Carbonell, José A1 - Cruz, Luis M A1 - Götz, Stefan A1 - Sonia Tarazona A1 - Joaquín Dopazo A1 - Meyer, Thomas F A1 - Ana Conesa KW - NGS AB - MOTIVATION: The sequence alignment/map (SAM) and the binary alignment/map (BAM) formats have become the standard method of representation of nucleotide sequence alignments for next-generation sequencing data. SAM/BAM files usually contain information from tens to hundreds of millions of reads. Often, the sequencing technology, protocol and/or the selected mapping algorithm introduce some unwanted biases in these data. The systematic detection of such biases is a non-trivial task that is crucial to drive appropriate downstream analyses. RESULTS: We have developed Qualimap, a Java application that supports user-friendly quality control of mapping data, by considering sequence features and their genomic properties. Qualimap takes sequence alignment data and provides graphical and statistical analyses for the evaluation of data. Such quality-control data are vital for highlighting problems in the sequencing and/or mapping processes, which must be addressed prior to further analyses. AVAILABILITY: Qualimap is freely available from http://www.qualimap.org. CONTACT: aconesa@cipf.es SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. VL - 28 UR - http://bioinformatics.oxfordjournals.org/content/28/20/2678.long ER - TY - JOUR T1 - Using GPUs for the Exact Alignment of Short-read Genetic Sequences by Means of the Burrows–Wheeler Transform. JF - IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM Y1 - 2012 A1 - Salavert Torres, Jose A1 - Blanquer Espert, Ignacio A1 - Tomas Dominguez, Andres A1 - Hernendez, Vicente A1 - Medina, Ignacio A1 - Terraga, Joaquin A1 - Dopazo, Joaquin KW - Burrows-Wheeler transform KW - CPU execution KW - GPGPU KW - NGS AB - General Purpose Graphic Processing Units (GPGPUs) constitute an inexpensive resource for computing-intensive applications that could exploit an intrinsic fine-grain parallelism. This paper presents the design and implementation in GPGPUs of an exact alignment tool for nucleotide sequences based on the Burrows-Wheeler Transform. We compare this algorithm with state-of-the-art implementations of the same algorithm over standard CPUs, and considering the same conditions in terms of I/O. Excluding disk transfers, the implementation of the algorithm in GPUs shows a speedup larger than 12x, when compared to CPU execution. This implementation exploits the parallelism by concurrently searching different sequences on the same reference search tree, maximising memory locality and ensuring a symmetric access to the data. The article describes the behaviour of the algorithm in GPU, showing a good scalability in the performance, only limited by the size of the GPU inner memory. VL - 9 UR - http://ieeexplore.ieee.org.sire.ub.edu/xpl/articleDetails.jsp?reload=true&arnumber=6175888 ER - TY - JOUR T1 - Initial genomics of the human nucleolus. JF - PLoS genetics Y1 - 2010 A1 - Németh, Attila A1 - Ana Conesa A1 - Santoyo-López, Javier A1 - Medina, Ignacio A1 - Montaner, David A1 - Péterfia, Bálint A1 - Solovei, Irina A1 - Cremer, Thomas A1 - Dopazo, Joaquin A1 - Längst, Gernot KW - NGS KW - nucleolus AB -

We report for the first time the genomics of a nuclear compartment of the eukaryotic cell. 454 sequencing and microarray analysis revealed the pattern of nucleolus-associated chromatin domains (NADs) in the linear human genome and identified different gene families and certain satellite repeats as the major building blocks of NADs, which constitute about 4% of the genome. Bioinformatic evaluation showed that NAD-localized genes take part in specific biological processes, like the response to other organisms, odor perception, and tissue development. 3D FISH and immunofluorescence experiments illustrated the spatial distribution of NAD-specific chromatin within interphase nuclei and its alteration upon transcriptional changes. Altogether, our findings describe the nature of DNA sequences associated with the human nucleolus and provide insights into the function of the nucleolus in genome organization and establishment of nuclear architecture.

VL - 6 UR - http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.1000889 ER -