Whole Genome and Transcriptome (WGTS) version 6.0
Whole Genome
Fig. 3 Whole Genome Sequencing Analysis Pipeline
The whole genome pipeline commences once the bcl2fastq workflow is completed and FASTQ files are available (not shown).
FASTQ files are quality controlled using FastQC. FastQC produces quality control metrics related to reads (e.g. total numbers of reads).
FASTQ files are aligned with BwaMem2 to generate an unprocessed lane-level BAM file.
Cases are quality controlled with the - bamQCworkflow generating a JSON file of lane-level alignment QC metrics for review. The quality control metrics include the insert size distribution, amount of duplication, mapping percentage, and other WG ‘Single Lane’ metrics described in QM. Quality Control and Calibration Procedures. Genomic fingerprints are generated from lane-level alignments and made available to sample authentication procedures.
Cases are quality controlled again with bamQC running on the merged set of all lane-level alignments generating a JSON file of call-ready alignment QC metrics for review. In addition to the lane-level QC metrics this includes an assessment of the per-sample depth of coverage (QM. Quality Control and Calibration Procedures).
All lane-level BAM files are collected and processed via BamMergePreProcessing, which merges and sorts lane-level BAMs, as well as performing duplicate marking, and base quality score recalibration to generate a call-ready sample-level BAM..
These normal and tumour BAM files are used as input for the variant calling workflows.
MuTect2 generates SNV and INDEL mutation calls in vcf format, which are annotated by VariantEffectPredictor, generating a MAF file of annotated calls.
GRIDSS and Delly generate somatic structural alterations in VCF format. The Delly vcf is post-processed by MAVIS to generate calls in TSV format, in addition to graphical representations of the structural event in SVG format.
The GRIDSS vcf is post-processed by Purple and used for evidence to support copy number calls, loss of heterozygosity status, and estimate tumour purity.
msisensor calls the proportion of microsatellite sites with evidence of variants between T-N to produce a microsatellite score recorded in a .TXT file.
HRDetect calls the homologous recombination deficiency (HRD) status using the Mutect2 vcf file, and the GRIDSS vcf file as input. The output file is a json file containing the HRD results.
T1K reports germline HLA typing alleles by estimating allele abundances from input read alignments. The output is a TSV file that includes the identified HLA alleles, their abundance, quality, and any secondary alleles.
All alteration files are provided to Djerba to generate a provisional clinical report for review by genome interpreters.
WGS Workflows and Software
More information about the analysis pipelines is available in the ‘Procedure’ section below. Workflow parameterization is automated through the linked Shesmu configuration. This repository is restricted to authorized individuals.
Workflow |
Version |
Parameterization |
Reference Data |
Bioinformatics Software |
|---|---|---|---|---|
2.1.1 |
python/3.7 python/2.7 gatk/4.1.6.0 gatk/3.6-0 samtools/1.9 |
|||
5.1.3 |
vidarr-u20-bamqc-call-ready.shesmu vidarr-u20-bamqc-lane-level.shesmu |
bam-qc-metrics/0.2.5 python/3.6 mosdepth/0.2.9 gatk/4.1.6.0 picard/2.21.2 samtools/1.14 samtools/1.9 |
||
1.0.2 |
bcl2fastq/2.20.0.422 htslib/1.9 |
|||
3.1.3 |
barcodex-rs/0.1.2 bcl2fastq-jail/3.1.2b bcl2fastq/2.20.0.422 |
|||
1.0.1 |
hg19-bwamem2-index/2.2.1 hg38-bwamem2-index-with-alt/2.2.1 |
barcodex-rs/0.1.2 python/3.7 slicer/0.3.0 rust/1.45.1 cutadapt/1.8.3 bwa-mem2/2.2.1 samtools/1.9 |
||
1.3.0 |
python/3.7 mosdepth/0.2.9 bedtools/2.27 |
|||
1.1.0 |
vidarr-u20-crosscheckFingerprintsCollector_fastq_exceptions.shesmu vidarr-u20-crosscheckFingerprintsCollector_bam.shesmu vidarr-u20-crosscheckFingerprintsCollector_fastq.shesmu |
hg38-star-index100/2.7.3a hg19-bwa-index/0.7.17 hg19-star-index100/2.7.3a hg38-bwa-index-with-alt/0.7.17 |
samtools/1.15 crosscheckfingerprints-haplotype-map/20230324 tabix/0.2.6 seqtk/1.3 star/2.7.3a bwa/0.7.17 gatk/4.1.6.0 samtools/1.14 gatk/4.2.0.0 samtools/1.9 |
|
2.6.1 |
hg38-delly/1.0 hg19-delly/1.0 |
vcftools/0.1.16 tabix/0.2.6 java/8 bcftools/1.9 picard/2.19.2 delly/0.9.1 |
||
1.1.0 |
hg19-bwa-index/0.7.12 hg38-bwa-index/0.7.12 mm10-bwa-index/0.7.12 |
bwa/0.7.12 mosdepth/0.2.9 bam-qc-metrics/0.2.5 slicer/0.3.0 python/3.6 cutadapt/1.8.3 picard/2.21.2 samtools/1.9 |
||
3.2.0 |
java/11 fastqc/0.11.9 perl/5.28 |
|||
1.3.1 |
hg38-gridss-index/1.0 hmftools-data/53138 |
gatk/4.1.6.0 samtools/1.14 hmftools/1.1 gridss/2.13.2m |
||
1.6.0 |
hg38-dac-exclusion/1.0 sigtools-data/1.0 |
hrdetect-rscript/1.5.8 bcftools/1.9 tabix/1.9 sigtools/2.4.1 |
||
3.3.3 |
vidarr-u20-mavis_clinical.shesmu vidarr-u20-mavis_non_clinical.shesmu |
hg38v110-mavis/2.2.6 |
bcftools/1.9 mavis-config/1.2 mavis/2.2.6 |
|
1.2.0 |
msisensorpro/1.2.0 |
|||
1.0.9 |
vidarr-u20-mutect2_tumor_only.shesmu vidarr-u20-mutect2_normal_only.shesmu vidarr-u20-mutect2_matched-bysample.shesmu |
hg38-gatk-gnomad/2.0 |
samtools/1.9 |
|
1.1.3 |
hg38-gridss-index/1.0 hmftools-data/53138 hg38-dac-exclusion/1.0 |
python/3.10.6 bcftools/1.9 gatk/4.1.6.0 hmftools/1.1 |
||
2.5.0 |
vidarr-u20-variantEffectPredictor_matched-bysample.shesmu vidarr-u20-variantEffectPredictor_tumor_only.shesmu |
vep-hg19-cache/105 vep-mm39-cache/105 vep-hg38-cache/105 |
bedtools/2.27 tabix/0.2.6 vep/105.0 bcftools/1.9 vcf2maf/1.6.21b gatk/4.1.7.0 |
|
1.1.0 |
picard/2.21.2 |
Whole Transcriptome
Fig. 4 Whole Transcriptome Sequencing Analysis Pipeline
As with the WGS informatics pipeline, the whole transcriptome pipeline commences once FASTQ files are generated from bcl2fastq.
FASTQ files are aligned with the STAR workflow, generating genome-aligned and transcriptome-aligned BAM files. STAR also outputs a TSV file of chimeric junctions which is used as input for the STAR-Fusion workflow.
The FASTQ files are also provided to RNASeqQc which generates a JSON file of QC metrics for plotting via Dashi. The quality control metrics include the WT ‘Single Lane’ metrics described in QM. Quality Control and Calibration Procedures. Genomic fingerprints are generated from lane-level alignments and made available to sample authentication procedures.
The transcriptome-aligned BAM file is provided as input to RSEM, generating FPKM values and normalized expression counts in tabular format.
RNA fusion calls are generated from STAR-Fusion and Aribba. Both are used as input to to MAVIS for validation and annotation.
All alteration files are provided to Djerba to generate a provisional clinical report for review by genome interpreters.
TS Workflows and Software
More information about the analysis pipelines is available in the ‘Procedure’ section below. Workflow parameterization is automated through the linked Shesmu configuration. This repository is restricted to authorized individuals.
Workflow |
Version |
Parameterization |
Reference Data |
Bioinformatics Software |
|---|---|---|---|---|
2.4.0 |
gencode/31 rarriba/0.1 arriba/2.4.0 samtools/1.16.1 |
|||
1.0.2 |
bcl2fastq/2.20.0.422 htslib/1.9 |
|||
3.1.3 |
barcodex-rs/0.1.2 bcl2fastq-jail/3.1.2b bcl2fastq/2.20.0.422 |
|||
1.1.0 |
vidarr-u20-crosscheckFingerprintsCollector_fastq_exceptions.shesmu vidarr-u20-crosscheckFingerprintsCollector_bam.shesmu vidarr-u20-crosscheckFingerprintsCollector_fastq.shesmu |
hg38-star-index100/2.7.3a hg19-bwa-index/0.7.17 hg19-star-index100/2.7.3a hg38-bwa-index-with-alt/0.7.17 |
samtools/1.15 crosscheckfingerprints-haplotype-map/20230324 tabix/0.2.6 seqtk/1.3 star/2.7.3a bwa/0.7.17 gatk/4.1.6.0 samtools/1.14 gatk/4.2.0.0 samtools/1.9 |
|
3.2.0 |
java/11 fastqc/0.11.9 perl/5.28 |
|||
3.3.3 |
vidarr-u20-mavis_clinical.shesmu vidarr-u20-mavis_non_clinical.shesmu |
hg38v110-mavis/2.2.6 |
bcftools/1.9 mavis-config/1.2 mavis/2.2.6 |
|
1.3.0 |
vidarr-u20-rnaseqqc-lane_level.shesmu vidarr-u20-rnaseqqc-call_ready.shesmu |
hg38-star-index100/2.7.3a hg19-star-index100/2.6.0c |
production-tools-python/2 bwa/0.7.17 star/2.7.3a picard/2.19.2 star/2.6.0c bam-qc-metrics/0.2.5 rnaseqqc-ribosome-grch38-bwa-index/1.0.0 jq/1.6 picard/2.21.2 samtools/1.9 |
|
1.0.1 |
hg38-rsem-index/1.3.0 hg19-rsem-index/1.3.3 |
rsem/1.3.3 |
||
2.3.0 |
vidarr-u20-star_lane_level.shesmu vidarr-u20-star_call_ready.shesmu |
hg38-star-index100/2.7.10b hg19-star-index100/2.7.10b |
picard/2.19.2 |
|
2.0.2 |
star-fusion-genome/1.8.1-hg38 |
star-fusion/1.8.1 |
Change Log |