Computational Biology Unit
We are committed to develop algorithms, models and tools for the analysis of genomic data that enable researchers to understand what biological processes or variants are involved in differentphenotypes or diseases. For doing that we apply techniques, concepts and analysis of Scientific Computing, which is the subfield of computer science concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems.
The main lines of reasearch are:
- Discovery, characterization and analysis of genomic variants: Understanding the relationship between genomic variants and phenotype is one of the main goals in biology and medicine. New findings show that genomic variants are more common than expected making more difficult the analysis and interpretation of results. We are interested in characterizing and developing new methods and algorithms to help researchers to study these genomic variants. These methods range from statistical tests to prioritization or biological network-based algorithms.
- Biological networks and Systems Biology modeling: Genes, proteins and regulatory elements operate within an intricate network of interactions. A new paradigm has emerged to study these biological systems, this new holistic paradigm aims to understand how the interactions of the components of biological systems give rise to the function and how they participate in penotypes and diseases. We are interested in developing new algorithms and tools to model and analize these biological networks.
- Software development for data analysis: Recent advances in high-throughput technologies such as NGS make harder than ever the analysis of genomic data because of the size and heterogeneity of data. Today the bottleneck is no data generation but data analysis, data is generated in a few hours or days, but analysis can take weeks or months. We need new computing solutions to allow researchers to work with this huge volume of data efficiently and to work in a distributed enviroment like in a cloud. We are developing DNA and RNA aligners and solutions to analize with genomic variants.
- Integrative information system and WEB services: During last years the number of biological databases has grown exponentially. Today biological information is spread out over more than 1000 databases making dificult the retrieval, integration and access to the data as these databases use different standards. We are developing a centralized database with the most useful biological information from different sources and making all these information accessible through WEB services, by doing so biological information will be available easily to researchers to accelerate data analysis.
To achieve this goals we use advanced computing solutions:
- WEB applications and cloud-based solutions for data analysis: The size of the data currently produced by new high-throughput technologies such as NGS force us to think new ways to store and analyze genomic data. The incredible size of new experiments, up to some TeraBytes, make it impossible to store in curent workstations or even move the data over the network. We are exploring and developing new strategies and WEB applications to efficently store, analyze and explore data in a cloud.
- HPC software development for data analysis: Recent advances in high-throughput technologies such as NGS make harder than ever the analysis of genomic data because of the size and heterogeneity of data. We are developing a next generation software in Bioinformatics using HPC computing that exploits the current hardware and computing technologies such as multi-core CPUs or GPGPUs. This software aims to implement most useful analysis on a minute scale without sacrificing any feature.
- Machine learning software development: In recent years high-throughput sequencing technologies are increasingly being used in clinics to develop diagnostic, prognostic and decision making tools. Machine learning algorithms are starting to be pervasively used to build predictors based on different omic data and for knowledge discovery.
To develop our solutions we use the most modern computer technologies available:
- WEB applications and WEB services: We use modern HTML5 technologies for building rich client applications and RESTful WEB services to make data and analysis available in a efficient way from our servers.
- HPC technologies: We combine OpenMP and SSE/AVX instructions of CPU to implement efficiently algorithms or analysis. We also have used some Nvidia CUDA software implementation when appropiate.
- Distributed and Cloud computing: We use Apache Hadoop framework to analize with MapReduce and store large volumes of data with NoSQL HBase. We are also using Amazon AWS to export some of our services.
- Machine learning and Computational modeling: We use machine learning algorithms such as predictors or clustering to solve some of the problems when looking for patterns in data. We also use graph theory to model biological networks and develop methods for the analysis.
CSVS is a crowdsourcing initiative to provide information about the genomic variability of the Spanish population to the scientific/medical community. It can be found at http://csvs.babelomics.org/.
SPACNACS is a crowdsourcing initiative to provide information about Copy Number Variations of the Spanish population to the scientific and medical community: http://csvs.clinbioinfosspa.es/spacnacs/.
Hipathia is a web tool for the interpretation of the consequences of the combined changes of gene expression levels and/or genomic mutations in the context of signalling pathways that can be found at http://hipathia.babelomics.org/.