Slurm and snakemake for Dummies

To use slurm on cruchomics, first check which nodes are being use and on the next step choose the one which has the least traffic (from cn001 to cn005)

squeue

Follow the salloc instructions to allocate the compute power you specify

salloc -N 1 -w omics-cn005 --cpus-per-task 30 --mem=30G

Activate the environment

conda activate rnaseq

For a tryout of the pipeline, do a dry run in the snakemake_rnaseq folder

snakemake -np

Run the pipeline specifying the number of cores to be used

srun snakemake --cores 30

If the run fails, do this to unlock the directory and try again!

srun snakemake -np --unlock

In another terminal window, check if the pipeline is working (make sure the node is the same as in salloc).

srun -n 1  -t 1 --pty -w omics-cn004 bash -i
top

Check the %CPU and %mem (RAM) usage, the CPU will not use more compute power than you specified with (–cpus-per-task 30 and –cores 30) but the RAM will always try to use what it needs to complete the job, which will cause problems if you do not allocate enough memory (–mem=30G).

To stop the command top (or any other) do ‘control c’

The output files will be in two different folders:

  • Temp

In ‘mapped’ - the RNA-Seq read alignment files: *.bam

  • Results

Unscaled RNA-Seq read counts: counts.txt Differential expression file: results.tsv (this is redundant when using the version of the pipeline which counts at the exon level)

*In ‘fastp’ - the fastqc report files: *.html *In ‘logs’ - the mapping statistics: *sum.txt