Description

Methods

The MAF was obtained from the HPRC v1.0 minigraph-cactus HAL file using cactus v2.6.4 as follows:

cactus-hal2maf ./js ./hprc-v1.0-mc-grch38.hal \
  hprc-v1.0-mc-grch38.maf.gz --noAncestors --refGenome GRCh38 \
    --filterGapCausingDupes --chunkSize 100000 --batchCores 96 \
      --batchCount 10 --noAncestors --batchParallelTaf 32 \
         --batchSystem slurm --logFile hprc-v1.0-mc-grch38.maf.gz.log

zcat hprc-v1.0-mc-grch38.maf.gz | mafDuplicateFilter -m - -k \
   | bgzip > hprc-v1.0-mc-grch38-single-copy.maf.gz

Credits

Thank you to Glenn Hickey for providing the HAL file from the HPRC project.

References

Wen-Wei Liao, Mobin Asri, Jana Ebler, ...et al, Heng Lin, Benedict Paten A draft human pangenome reference. Nature. 2023 May;617(7960):312-324. PMID: 37165242; PMC: PMC1017212; DOI: 10.1038/s41586-023-05896-x

Glenn Hickey, Jean Monlong, Jana Ebler, Adam M Novak, Jordan M Eizenga, Yan Gao; Human Pangenome Reference Consortium; Tobias Marschall, Heng Li, Benedict Paten Pangenome graph construction from genome alignments with Minigraph-Cactus. Nature Biotechnology. 2023 May 10. doi: 10.1038/s41587-023-01793-w. PMID: 37165083; DOI: 10.1038/s41587-023-01793-w

Armstrong J, Hickey G, Diekhans M, Fiddes IT, Novak AM, Deran A, Fang Q, Xie D, Feng S, Stiller J et al. Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature. 2020 Nov;587(7833):246-251. PMID: 33177663; PMC: PMC7673649; DOI: 10.1038/s41586-020-2871-y

Paten B, Earl D, Nguyen N, Diekhans M, Zerbino D, Haussler D. Cactus: Algorithms for genome multiple sequence alignment. Genome Res. 2011 Sep;21(9):1512-28. PMID: 21665927; PMC: PMC3166836; DOI: 10.1101/gr.123356.111