.. _installation: ############ Installation ############ Manual-mode =========== .. |br| raw:: html
You'll need `wfmash `_, `seqwish `_, `smoothxg `_, `odgi `_, and `gfaffix `_ in your shell's ``PATH``. They can be build from source or installed via Bioconda. Then, add the ``pggb`` bash script to your ``PATH`` to complete the installation. `How to add a binary to my path? `_ |br| Optionally, install `bcftools `_, `vcfbub `_, `vcfwave `, and `vg `_ for calling and normalizing variants, `MultiQC `_ for generating summarized statistics in a MultiQC report, or `pigz `_ to compress the output files of the pipeline. Docker ====== To simplify installation and versioning, we have an automated GitHub action that pushes the current docker build to the GitHub registry. To use it, first pull the actual image (**IMPORTANT**: see also how to :ref:`build_docker_locally`): .. code-block:: bash docker pull ghcr.io/pangenome/pggb:latest Or if you want to pull a specific snapshot from `https://github.com/orgs/pangenome/packages/container/package/pggb `_: .. code-block:: bash docker pull ghcr.io/pangenome/pggb:TAG You can pull the docker image also from `dockerhub `_: .. code-block:: bash docker pull pangenome/pggb As an example, going in the ``pggb`` directory .. code-block:: bash git clone --recursive https://github.com/pangenome/pggb.git cd pggb you can run the container using the human leukocyte antigen (HLA) data provided in this repo: .. code-block:: bash docker run -it -v ${PWD}/data/:/data ghcr.io/pangenome/pggb:latest /bin/bash -c "pggb -i /data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -n 10 -t 16 -V 'gi|568815561' -o /data/out" The ``-v`` argument of ``docker run`` always expects a full path. **If you intended to pass a host directory, use absolute path.** This is taken care of by using ``${PWD}``. .. _build_docker_locally: build docker locally -------------------------- Multiple ``pggb``'s tools use SIMD instructions that require AVX (like ``abPOA``) or need it to improve performance. The currently built docker image has ``-Ofast -march=sandybridge`` set. This means that the docker image can run on processors that support AVX or later, improving portability, but preventing your system hardware from being fully exploited. In practice, this could mean that specific tools are up to 9 times slower. And that a pipeline runs ~30% slower compared to when using a native build docker image. To achieve better performance, it is **STRONGLY RECOMMENDED** to build the docker image locally after replacing ``-march=sandybridge`` with ``-march=native`` and the ``Generic` build type with `Release`` in the ``Dockerfile``: .. code-block:: bash sed -i 's/-march=sandybridge/-march=native/g' Dockerfile sed -i 's/Generic/Release/g' Dockerfile To build a docker image locally using the ``Dockerfile``, execute: .. code-block:: bash docker build --target binary -t ${USER}/pggb:latest . Staying in the ``pggb`` directory, we can run ``pggb`` with the locally build image: .. code-block:: bash docker run -it -v ${PWD}/data/:/data ${USER}/pggb /bin/bash -c "pggb -i /data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -n 10 -t 16 -V 'gi|568815561' -o /data/out" A script that handles the whole building process automatically can be found at `https://github.com/nf-core/pangenome#building-a-native-container `_`. Singularity ====== Many managed HPCs utilize Singularity as a secure alternative to docker. Fortunately, docker images can be run through Singularity seamlessly. First pull the docker file and create a Singularity SIF image from the dockerfile. This might take a few minutes. .. code-block:: bash singularity pull docker://ghcr.io/pangenome/pggb:latest Next clone the `pggb` repo and `cd` into it .. code-block:: bash git clone --recursive https://github.com/pangenome/pggb.git cd pggb Finally, run `pggb` from the Singularity image. For Singularity to be able to read and write files to a directory on the host operating system, we need to 'bind' that directory using the `-B` option and pass the `pggb` command as an argument. .. code-block:: bash singularity run -B ${PWD}/data:/data ../pggb_latest.sif "pggb -i /data/HLA/DRB1-3123.fa.gz -p 70 -s 3000 -n 10 -t 16 -V 'gi|568815561' -o /data/out" A script that handles the whole building process automatically can be found at `https://github.com/nf-core/pangenome#building-a-native-container `_`. Bioconda ======== A ``pggb`` recipe for ``Bioconda`` is available at https://anaconda.org/bioconda/pggb. To install the latest version using ``Conda`` execute: .. code-block:: bash conda install -c bioconda pggb GUIX ==== .. code-block:: bash git clone https://github.com/ekg/guix-genomics cd guix-genomics GUIX_PACKAGE_PATH=. guix package -i pggb Nextflow ======== A Nextflow DSL2 port of ``pggb`` is actively developed by the `nf-core `_ community. See `nf-core/pangenome `_ for more details. The aim is to implement a cluster-scalable version of ``pggb``. The Nextflow version can run the precise base-level alignment step of ``wfmash`` in parallel across the nodes of a cluster. This makes it already faster than this `bash` implementation.