Genome Assembly Pipeline
assemblyComplete genomic workflow: quality control, trimming, and de novo assembly of sequencing reads to produce assembled contigs.
The Genome Assembly pipeline takes raw sequencing reads and produces assembled contigs through a standardized workflow:
- **Quality control**: fastp trims low-quality bases and adapters - **De novo assembly**: SPAdes performs assembly without a reference genome - **Output**: FASTA contigs, assembly statistics, and optional FASTQ of processed reads
Supported input: FASTQ (.fastq, .fq, .fastq.gz, .fq.gz) or Sanger AB1/ABI files. AB1/ABI files are automatically converted to FASTQ before processing.
Reads (required)
- Single FASTQ file (single-end or interleaved), or - Sanger AB1/ABI file (converted to FASTQ automatically)
Parameters
- **Quality threshold** (default: 20): Phred score below which bases are trimmed - **Trimming window size** (default: 5): Sliding window for quality trimming - **Minimum overlap** (default: 30): Minimum overlap for assembly (SPAdes) - **Threads** (default: 4): Number of CPU threads
- **Assembled contigs**: FASTA file of contigs - **Assembly statistics**: Contig count, total length, N50 - **Processed reads** (optional): Quality-filtered FASTQ - **fastp report**: JSON summary of QC and trimming
1. Use high-quality reads; low coverage or high error rates reduce assembly quality 2. For bacterial/small genomes, default parameters often work well 3. Increase threads for large datasets to reduce runtime 4. Check fastp reports to confirm adapter removal and quality trimming