RNA-Seq Pipeline

transcriptomics

Transcriptome analysis: QC, alignment with STAR, and gene quantification with featureCounts.

Overview

The RNA-Seq pipeline processes RNA sequencing data for gene-level quantification:

- **Quality control**: fastp trims and filters reads - **Alignment**: STAR aligns reads to the reference genome (with splice awareness) - **Quantification**: featureCounts assigns reads to genes using a GTF annotation - **Output**: Count matrix, alignment stats, and fastp report

Requires a reference genome (FASTA) and gene annotation (GTF/GFF).

Input Requirements

Reference genome (required)

FASTA (.fasta, .fa, .fas, or gzipped)

Gene annotation (required)

GTF or GFF (.gtf, .gff, .gff3)

Reads (required)

FASTQ (single or paired-end)

Parameters

- **Quality threshold**, **trimming window**, **threads**: Standard options - **Sample name**: Used for output file naming

Outputs

- **Gene counts**: Table of read counts per gene (featureCounts output) - **Alignment statistics**: STAR summary (mapped, multi-mapped, etc.) - **fastp report**: QC and trimming summary

Best Practices

1. Use a reference and GTF that match (same organism and version) 2. STAR builds an index from the FASTA; first run may take longer 3. For differential expression, use the count matrix with tools like DESeq2 or edgeR 4. Check STAR logs for mapping rate and splice junction counts