Welcome to the PGGB world!

In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias. Pangenome graphs encoded in the variation graph data model describe the all versus all alignment of many sequences.

pggb renders a collection of sequences into a pangenome graph, in the variation graph model. Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation. Maintaining local linearity is important for the interpretation, visualization, and reuse of pangenome variation graphs.

Core packages

wfmash

Pairwise sequence alignment with wfmash

seqwish

Graph induction with seqwish

  • Build alignment graph with interval tress

  • Compute transitive closure of bases

  • Path tracing yields variation graph

  • Raw pangenome graph in GFAv1 format

smoothxg

Graph normalization with smoothxg

  • Global graph sorting with PG-SGD

  • Break graph into blocks

  • Smooth blocks via POA

  • Graph has partial local order

  • Smoothed graph in GFAv1 format

Contributed packages

Moreover, the pipeline supports identification and collapse of redundant structure with GFAffix. Optional post-processing steps with ODGI provide 1D and 2D diagnostic visualizations of the graph and basic graph metrics. Variant calling is also possible with vg deconstruct to obtain a VCF file relative to any set of reference sequences used in the construction. It utilizes a path jaccard concept to correctly localize variants in segmental duplications and variable number tandem repeats. In the HPRC data, this greatly improved variant calling performance.

The output graph (*.smooth.fix.gfa) is suitable for read mapping in vg or with GraphAligner.

A Nextflow version of pggb is currently developed on nf-core/pangenome. This pipeline presents an implementation that scales better on a cluster.

Pipeline Workflow

_images/pggb-flow-diagram.png

Citation

Erik Garrison*, Andrea Guarracino*, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, Pjotr Prins, Building pangenome graphs, bioRxiv 2023.04.05.535718; doi: https://doi.org/10.1101/2023.04.05.535718

Index