Variant Calling Pipeline

variants

Comprehensive variant detection: QC, alignment to reference, and SNP/indel calling with FreeBayes.

Overview

The Variant Calling pipeline identifies genetic variants (SNPs, insertions, deletions) by comparing sequencing reads to a reference genome:

- **Quality control**: fastp trims and filters reads - **Alignment**: BWA-MEM aligns reads to the reference - **Post-processing**: samtools sort and mark duplicates - **Variant calling**: FreeBayes calls variants - **Output**: VCF and a human-readable variant report

Supports single-end or paired-end FASTQ, and Sanger AB1/ABI (converted to FASTQ).

Input Requirements

Reference genome (required)

FASTA (.fasta, .fa, .fas)

Sample reads (required)

- FASTQ (.fastq, .fq, .fastq.gz, .fq.gz), or - AB1/ABI (Sanger; converted automatically) - Optional R2 file for paired-end

Parameters

- **Quality threshold**, **trimming window**, **threads**: Same as other pipelines - **Min mapping quality** (default: 20): Minimum MAPQ for variants - **Min base quality** (default: 10): Minimum base quality in variant calling

Outputs

- **VCF file**: Standard variant call format with genotypes and quality metrics - **Variant report**: Summary of variants (counts, types, positions) - **fastp report**: QC and trimming summary

Best Practices

1. Use a reference that matches your organism and build 2. Higher coverage improves variant sensitivity and precision 3. For somatic calling, consider additional filtering on allele frequency and depth 4. Review BAM/alignment metrics if variant counts seem unusual