Installation
Manual-mode
You'll need wfmash, seqwish, smoothxg,
odgi, gfaffix, bcftools and vg
in your shell's PATH
. They can be build from source, or installed via Bioconda.
Then, add the pggb
bash script to your PATH
to complete the installation.
How to add a binary to my path?
Optionally, install MultiQC for reporting or pigz to compress the output files of the pipeline.
Docker
To simplify installation and versioning, we have an automated GitHub action that pushes the current docker build to the GitHub registry. To use it, first pull the actual image (IMPORTANT: see also how to build docker locally):
docker pull ghcr.io/pangenome/pggb:latest
Or if you want to pull a specific snapshot from https://github.com/orgs/pangenome/packages/container/package/pggb:
docker pull ghcr.io/pangenome/pggb:TAG
You can pull the docker image also from dockerhub:
docker pull pangenome/pggb
As an example, going in the pggb
directory
git clone --recursive https://github.com/pangenome/pggb.git
cd pggb
you can run the container using the human leukocyte antigen (HLA) data provided in this repo:
docker run -it -v ${PWD}/data/:/data ghcr.io/pangenome/pggb:latest /bin/bash -c "pggb -i /data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -G 2000 -n 10 -t 16 -v -V 'gi|568815561:#' -o /data/out -M -C cons,100,1000,10000 -m"
The -v
argument of docker run
always expects a full path.
If you intended to pass a host directory, use absolute path.
This is taken care of by using ${PWD}
.
build docker locally
Multiple pggb
's tools use SIMD instructions that require AVX (like abPOA
) or need it to improve performance.
The currently built docker image has -Ofast -march=sandybridge
set.
This means that the docker image can run on processors that support AVX or later, improving portability, but preventing your system hardware from being fully exploited.
In practice, this could mean that specific tools are up to 9 times slower.
And that a pipeline runs ~30% slower compared to when using a native build docker image.
To achieve better performance, it is STRONGLY RECOMMENDED to build the docker image locally after replacing -march=sandybridge
with -march=native
and the Generic` build type with `Release
in the Dockerfile
:
sed -i 's/-march=sandybridge/-march=native/g' Dockerfile
sed -i 's/Generic/Release/g' Dockerfile
To build a docker image locally using the Dockerfile
, execute:
docker build --target binary -t ${USER}/pggb:latest .
Staying in the pggb
directory, we can run pggb
with the locally build image:
docker run -it -v ${PWD}/data/:/data ${USER}/pggb /bin/bash -c "pggb -i /data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -G 2000 -n 10 -t 16 -v -V 'gi|568815561:#' -o /data/out -M -C cons,100,1000,10000 -m"
A script that handles the whole building process automatically can be found at https://github.com/nf-core/pangenome#building-a-native-container <https://github.com/nf-core/pangenome#building-a-native-container>`_.
Singularity
Many managed HPCs utilize Singularity as a secure alternative to docker. Fortunately, docker images can be run through Singularity seamlessly.
First pull the docker file and create a Singularity SIF image from the dockerfile. This might take a few minutes.
singularity pull docker://ghcr.io/pangenome/pggb:latest
Next clone the pggb repo and cd into it
git clone --recursive https://github.com/pangenome/pggb.git
cd pggb
Finally, run pggb from the Singularity image. For Singularity to be able to read and write files to a directory on the host operating system, we need to 'bind' that directory using the -B option and pass the pggb command as an argument.
A script that handles the whole building process automatically can be found at https://github.com/nf-core/pangenome#building-a-native-container <https://github.com/nf-core/pangenome#building-a-native-container>`_.
Bioconda
A pggb
recipe for Bioconda
is available at https://anaconda.org/bioconda/pggb.
To install the latest version using Conda
execute:
conda install -c bioconda pggb
GUIX
git clone https://github.com/ekg/guix-genomics
cd guix-genomics
GUIX_PACKAGE_PATH=. guix package -i pggb
Nextflow
A Nextflow DSL2 port of pggb
is actively developed by the nf-core community.
See nf-core/pangenome for more details. The aim is to implement a cluster-scalable version of pggb
.
The Nextflow version can run the precise base-level alignment step of wfmash
in parallel across the nodes of a cluster.
This makes it already faster than this bash implementation.