Settings and Configuration
Environment Variables
The activation script of NGS-Pipeline sets several environement variables in the activated shell as following:
VVG_BASEDIR
VVG_REPODIR
VVGBIN
NGS_PIPELINE_BASE
NGSENV_BASEDIR
Users can set some environment variables either in a profile file under
$VVG_BASEDIR/etc/bashrc.d/ directory, in $NGSENV_BASEDIR/profile, or
directly from the shell using notation ENVVAR_NAME=value.
The following are the list of the environment variables:
Variable name |
Remarks |
|---|---|
NGS_PIPELINE_LOGLEVEL |
|
NGS_PIPELINE_LOGFILE |
|
NGS_PIPELINE_JOBS |
|
NGS_PIPELINE_FORCE |
|
NGS_PIPELINE_NO_CONFIG_CASCADE |
|
NGS_PIPELINE_CMD_MODS |
|
SNAKEMAKE_CLUSTER_EXTRA_FLAGS |
Configuration Files
User can set custom configuration in YAML-format config files, usually named
as config.yaml.
Cascading Configuration
In default mode, NGS-Pipeline will perform cascading configuration setting.
Essentially, when a snakemake workflow process is being executed, the process
will read for a config.yaml file in current working directory if the file
exists.
The workflow process will try to read another config.yaml file in the
parent directory, and the parent of the parent directory, and so on until it
reaches the NGSENV_BASEDIR as the root of the hieararchy.
Settings from config.yaml in the directory farther to the current working
directory will be overridden by the settings from config.yaml in the
directory closer to the working directory (up to the working directory itself).
With this scheme, it is easy to arrange configurations applied to whole
project, but then customized for certain sample sets down to individual sample.
To illustrate how the cascading configuration works, assume that we have the following directory layout:
NGSENV_BASEDIR
├── config.yaml (1)
└── sets
├── clinical-samples
│ ├── analysis
│ │ └── samples
│ │ ├── patient-001
│ │ │ └── config.yaml (2)
│ │ ├── patient-002
│ │ └── patient-003
│ └── config.yaml (3)
├── joint-varcall
│ └── config.yaml (4)
├── public-samples
│ └── analysis
│ └── samples
│ ├── P0001
│ └── P0002
└── study-A
├── analysis
│ └── samples
│ ├── A001
│ └── A002
└── config.yaml (5)
When run-sample-variant-caller command is executed to perform per sample
processing (mapping, genotyping) to samples in the clinical-samples,
public-samples and study-A directories, a snakemake workflow is being
run for each sample with the respective sample directory as working directory.
For patient-001 sample, the workflow will encounter config.yaml (2),
then config.yaml (3), and the base config.yaml (1).
The settings in the config.yaml (1) will be overridden with any settings in
config.yaml (3), which then will be overridden by any settings in
config.yaml (2).
Likewise, for sample A001, the applied settings will be those from
config.yaml (1) which will then be overriden by config.yaml (5).
For sample P0001, the settings will only use the ones from
config.yaml (1).
With this scheme, it is easy to setup general settings for all samples in
config.yaml (1), set some custom settings for in-house-sequenced
study-A samples (such as keeping the proper-paired bam files for SRA
submission) in config.yaml (5), set some custom settings for all clinical
samples (such as keeping the final bam files for manual inspection) in
config.yaml (3) and set specific settings for just sample patient-001
(such as lowering some thresholds as the sample is of lower quality) in
config.yaml (2).
The cascading configuration can be opted out by using --no-config-cascade
argument in most of NGS-Pipeline commands.
Configurations
Config Name |
Remarks |
Default Value |
Available Values |
|---|---|---|---|
read_trimmer_wf |
ssf_trimmer_null.smk |
ssf_trimmer_fastp.smk ssf_trimmer_cutadapt.smk |
|
reads_mapper_wf |
ssf_mapper_bwa.smk |
ssf_mapper_minimap2.smk ssf_mapper_bowtie2.smk |
|
variant_caller_wf |
ssf_varcall_gatk.smk |
||
joint_variant_caller_wf |
jointvarcall_gatk.smk |
jointvarcall_freebayes.smk jointvarcall_bcftools.smk jointvarcall_clair3.smk |
Config Name |
Remarks |
Default Value |
Available Values |
|---|---|---|---|
refseq_file |
|||
refmap_file |
|||
deduplicate |
True |
False |
|
keep_paired_bam |
False |
True |
|
keep_final_bam |
False |
True |