(V)ariant (A)nalysis (P)ipeline

Thank you for your interest in using the Variant Analysis Pipeline. VAP is a comprehensive workflow for reference mapping and variant detection of genomic and transcriptomic reads using a suite of bioinformatics tools.

Article Source:

Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data Adetunji MO, Lamont SJ, Abasht B, Schmidt CJ (2019) Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data. PLOS ONE 14(9): e0216838. https://doi.org/10.1371/journal.pone.0216838

Bioinformatic tools

Bioinformatic tools are grouped based on sequencing reads

Genomic Sequencing

Transcriptomic Sequencing

Variant Calling (for both Genomic/Transcriptomic Sequencing)

N.B. : parameters of all tools are set to default.

Software used to design the VAP workflow are:

Software Version
TopHat2 2.1.1
HiSAT2 2.1.0
STAR 2.5.2b
SAMtools 1.4.1
Picard tools 2.13.2
GATK 3.8
BWA-mem 0.7.17
BOWTIE2 2.3.5.1

Current pipeline is not compatible with GATK v4

Contact maintainer to make custom changes to the different tools

Things to be aware of

Job File

Indexes for Assembly tools and Variant Calling tools

Before running the pipeline. Create indexes for the different assemblers specified REFERENCE GENOME INDEX SYNTAXS:

Downstream Merge and Filter Step (runMergeFilter)

The downstream step performs the following:

  1. Merge SNPs from all variant calling tools initially specified to execute (TopHAT2/HiSAT2/STAR or BOWTIE/BWA).
  2. Pre-set filtering criteria using GATK-VariantFiltration tool.
    1. ReadRankPosSum (RRPS) < -8
    2. Quality by depth (QD) < 5
    3. Read depth (DP) < 10
    4. Fisher’s exact test p-value (FS) > 60
    5. Mapping Quality (MQ) < 40
    6. SnpCluster (3 SNPs in 35bp)
    7. Mann-Whitney Rank-Sum (MQRankSum) < -12.5
  3. Exploratory statistics of all variant files.

To run workflow

perl VariantAnalysisPipeline.pl -c config_job.file