RNA-Seq Pipeline

transcriptomics

Transcriptome analysis: QC, alignment with STAR, and gene quantification with featureCounts.

Overview

The RNA-Seq pipeline processes RNA sequencing data for gene-level quantification:

- **Quality control**: fastp trims and filters reads - **Alignment**: STAR aligns reads to the reference genome (with splice awareness) - **Quantification**: featureCounts assigns reads to genes using a GTF annotation - **Output**: Count matrix, alignment stats, and fastp report

Requires a reference genome (FASTA) and gene annotation (GTF/GFF).

Input Requirements

Reference genome (required)

FASTA (.fasta, .fa, .fas, or gzipped)

Gene annotation (required)

GTF or GFF (.gtf, .gff, .gff3)

Reads (required)

FASTQ (single or paired-end)

Parameters

- **Quality threshold**, **trimming window**, **threads**: Standard options - **Sample name**: Used for output file naming

Outputs

- **Gene counts**: Table of read counts per gene (featureCounts output) - **Alignment statistics**: STAR summary (mapped, multi-mapped, etc.) - **fastp report**: QC and trimming summary

Best Practices

1. Use a reference and GTF that match (same organism and version) 2. STAR builds an index from the FASTA; first run may take longer 3. For differential expression, use the count matrix with tools like DESeq2 or edgeR 4. Check STAR logs for mapping rate and splice junction counts

Related Resources