bulk-rnaseq-nf is a bioinformatics pipeline that can be used to analyse RNA sequencing data. It takes a samplesheet and FASTQ files as input, performs lane concatenation, quality control (QC), trimming, alignment, assembly, quantification, and prepares data for input into packages (e.g. DESeq2) for differential expression analysis.
- Lane concatenation for samples sequences on multiple lanes
- Adapter trimming, and read QC (
Trim Galore!) HiSAT2index generation if not readily availableHiSAT2alignment- Sort and index alignments (
SAMtools) - Transcript assembly and quantification (
StringTie) - Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks (
MultiQC,R)
Each directory and file is structured to facilitate the processing pipeline.
conf/: Configuration files related to the project.modules/: Contains sub-modules, each serving specific roles like preprocessing, alignment, and transcript assembly.preprocess/: Preprocessing scripts, such as concatenating and trimming fastqs.align/: Contains scripts for indexing and alignment using HISAT2.transcript_assembly/: Scripts for transcript assembly and quantification using StringTie.qc: Quality control
workflows/: Main pipeline scripts.bin/: Directory for helper scripts.params.yaml: Configuration file specifying input parameters.nextflow.config: Pipeline-wide configuration settings.
nextflow run workflows/main.nf -params-file params.yaml