Welcome to the PGGB world!

In standard genomic approaches sequences are related to a single linear reference genome introducing reference bias. Pangenome graphs encoded in the variation graph data model describe the all versus all alignment of many sequences.

pggb renders a collection of sequences into a pangenome graph, in the variation graph model. Its goal is to build a graph that is locally directed and acyclic while preserving large-scale variation. Maintaining local linearity is important for the interpretation, visualization, and reuse of pangenome variation graphs.

Core packages

Pairwise sequence alignment with wfmash

mashmap variant for approximate mappings
wavefront-guided global alignment for long secs
wavefront algorithm for base-level alignment
Pairwise alignments in PAF format

Graph induction with seqwish

Build alignment graph with interval tress
Compute transitive closure of bases
Path tracing yields variation graph
Raw pangenome graph in GFAv1 format

Graph normalization with smoothxg

Global graph sorting with PG-SGD
Break graph into blocks
Smooth blocks via POA
Graph has partial local order
Smoothed graph in GFAv1 format

Contributed packages

Moreover, the pipeline supports identification and collapse of redundant structure with GFAffix. Optional post-processing steps with ODGI provide 1D and 2D diagnostic visualizations of the graph and basic graph metrics. Variant calling is also possible with vg deconstruct to obtain a VCF file relative to any set of reference sequences used in the construction. It utilizes a path jaccard concept to correctly localize variants in segmental duplications and variable number tandem repeats. In the HPRC data, this greatly improved variant calling performance.

The output graph (*.smooth.fix.gfa) is suitable for read mapping in vg or with GraphAligner.

A Nextflow version of pggb is currently developed on nf-core/pangenome. This pipeline presents an implementation that scales better on a cluster.

Pipeline Workflow

Citation

Erik Garrison*, Andrea Guarracino*, Simon Heumos, Flavia Villani, Zhigui Bao, Lorenzo Tattini, Jörg Hagmann, Sebastian Vorbrugg, Santiago Marco-Sola, Christian Kubica, David G. Ashbrook, Kaisa Thorell, Rachel L. Rusholme-Pilcher, Gianni Liti, Emilio Rudbeck, Sven Nahnsen, Zuyu Yang, Mwaniki N. Moses, Franklin L. Nobrega, Yi Wu, Hao Chen, Joep de Ligt, Peter H. Sudmant, Nicole Soranzo, Vincenza Colonna, Robert W. Williams, Pjotr Prins, Building pangenome graphs, bioRxiv 2023.04.05.535718; doi: https://doi.org/10.1101/2023.04.05.535718

Welcome to the PGGB world!

Core packages

Contributed packages

Pipeline Workflow

Citation

Index