.. meta::
:description: pggb: the pangenome graph builder
:keywords: variation graph, pangenome graph
==================================
Welcome to the PGGB world!
==================================
In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias.
`Pangenome graphs `__ encoded in the variation graph data model describe the all versus all alignment of many sequences.
`pggb `_ renders a collection of sequences into a pangenome graph, in the variation graph model.
Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation.
Maintaining local linearity is important for the interpretation, visualization, and reuse of pangenome variation graphs.
Core packages
=============
.. |wfmash| image:: img/wfmash.png
:target: rst/wfmash.html
.. |seqwish| image:: img/seqwish.jpg
:target: rst/seqwish.html
.. |smoothxg| image:: img/smoothxg.png
:target: rst/smoothxg.html
.. list-table::
:widths: 100 100
:align: center
* - |wfmash|
- **Pairwise sequence alignment with** `wfmash `_
+ `mashmap `_ variant for approximate mappings
+ `wavefront-guided `_ global alignment for long secs
+ `wavefront `_ algorithm for base-level alignment
+ Pairwise alignments in `PAF `_ format
* - |seqwish|
- **Graph induction with** `seqwish `_
+ Build alignment graph with interval tress
+ Compute transitive closure of bases
+ Path tracing yields variation graph
+ Raw pangenome graph in `GFAv1 `_ format
* - |smoothxg|
- **Graph normalization with** `smoothxg `_
+ Global graph sorting with `PG-SGD `_
+ Break graph into blocks
+ Smooth blocks via `POA `_
+ Graph has partial local order
+ Smoothed graph in `GFAv1 `_ format
..
+ Consensus paths and graph
+ Whole genome alignment in `MAF `_ format
Contributed packages
====================
Moreover, the pipeline supports identification and collapse of redundant structure with `GFAffix `_.
Optional post-processing steps with `ODGI `_ provide 1D and 2D diagnostic visualizations of the graph and basic graph metrics.
Variant calling is also possible with `vg `_ ``deconstruct`` to obtain a VCF file relative to any set of reference sequences used in the construction.
It utilizes a `path jaccard concept `_ to correctly localize variants in segmental duplications and variable number tandem repeats.
In the HPRC data, this greatly improved variant calling performance.
The output graph (``*.smooth.fix.gfa``) is suitable for read mapping in `vg `_ or with `GraphAligner `_.
A Nextflow version of ``pggb`` is currently developed on `nf-core/pangenome `_.
This pipeline presents an implementation that scales better on a cluster.
Pipeline Workflow
=================
.. image:: img/pggb-flow-diagram.png
.. toctree::
:maxdepth: 1
:hidden:
Welcome
rst/installation
rst/quick_start
rst/tutorials
rst/faqs
.. toctree::
:maxdepth: 1
:caption: Parameters
:hidden:
rst/essential_parameters
rst/optional_parameters
rst/organism_example_parameters
rst/larger_pangenomes_parameters
.. toctree::
:maxdepth: 1
:caption: Core Packages
:hidden:
rst/wfmash
rst/seqwish
rst/smoothxg
Citation
--------
| Erik Garrison*, Andrea Guarracino*, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, Pjotr Prins, `Building pangenome graphs `_, bioRxiv 2023.04.05.535718; doi: https://doi.org/10.1101/2023.04.05.535718
------
Index
------
* :ref:`genindex`
* :ref:`search`