Pipeline Copy Number Variations

This page describes the process followed for the determination and analysis of CNVs for germ or somatic samples. It is specially adapted for illumina paired-end sequencing, both of exomes and of complete genomes.

 

This pipeline is based on ExomeDepth1 best practices.

 

General Pipeline Structure

The pipeline has been developed using the WDL workflow development language. The implemented WDL code is specifically prepared to be executed by using the Cromwell tool. To facilitate execution, the pipeline has been encapsulated within a bash script that automates each of the steps.

The complete pipeline is represented in the following diagram:

Diagram available in: cnv_workflow_diagram.odp

Requirements and recommendations

  • Analysis of 5 to 10 samples from the same sequencing run is recommended.

  • For the prediction of non-autosomal CNVs, gender must match in the set of samples.

  • Since ExomeDepth is designed to detect rare CNVs, it is recommended not to include relatives or samples with similar phenotypes in a given batch analysis.

  • The standard output error and, specially, the correlation coefficient with the reference set, should be carefully inspected in order to detect possible low confident results.

  • This pipeline is specifically developed for illumina paired-end sequencing, although it could work with other technologies.

  • It is recommended not to eliminate BAM reads (duplicates, low quality by alignment, etc.)

  • It is not advisable to call on regions with low mapping, segmental duplication or high GC content.

[1] Plagnol V., Curtis J., Epstein M., Mok K., Stebbings E., Grigoriadou S., Wood N., Hambleton S., Burns S., Thrasher A., et al. A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformormatics. 2012;28:2747–2754. doi: 10.1093/bioinformatics/bts526. https://github.com/vplagnol/ExomeDepth