Description

The Arquivo Brasileiro Online de Mutações (ABraOM) provides genomic variants obtained with whole-genome sequencing from SABE, a census-based sample of elderly individuals from São Paulo, Brazil's largest city. The Brazilian population is constituted by ~500 years of admixture between Africans, Europeans, and Native Americans. Additionally, the cohort presents ~3% of individuals with non-admixed Japanese ancestry (early 20th century migration). Coverage 38.6x. TEs, HLAs and new sequence are also available.

Data Access

The data can be explored interactively with the Table Browser or the Data Integrator. For programmatic access, our REST API can be used; the track name is abraom. For bulk download, the VCF file can be obtained from our download server.

The original data can also be downloaded from the ABraOM website.

Methods

For academic use only. Licensing for commercial use might be available under request and agreement. By using this resource you agree to cite the flagship paper (Naslavsky et al. Nat Comm 2022).

Whole-genome sequencing was performed at Human Longevity Inc. using TruSeq Nano DNA HT libraries sequenced on Illumina HiSeqX instruments with 150 bp paired-end reads targeting 30x coverage, and reads were mapped to GRCh38 using ISIS software. Sample sex was validated by comparing CPMs of X chromosome and male-specific Y (MSY) reads relative to autosomes, yielding the expected female (~55,000 X CPM, <200 MSY CPM) and male (~27,500 X CPM, >550 MSY CPM) patterns. Germline SNVs and indels were called following GATK Best Practices (GATK v3.7) via per-sample GVCFs (HaplotypeCaller), joint genotyping (CombineGVCFs, GenotypeGVCFs), and Variant Quality Score Recalibration (VQSR-AS); multiallelic variants were split with an in-house script, left-aligned with BCFtools, and annotated using Annovar and custom scripts against dbSNP, 1000 Genomes, and gnomAD, with putative loss-of-function variants identified using LOFTEE v0.3-beta irrespective of confidence labels. Variant and genotype quality was further assessed using the in-house CEGH-Filter two-step algorithm based on depth and allele balance, and analyses retained only GATK VQSR-AS PASS variants and higher-confidence CEGH-Filter calls. Relatedness was assessed using KING and PC-Relate (GENESIS), retaining a single proband per related pair and excluding one contaminated sample (>3% by verifyBAMID), resulting in a final dataset of 1,171 unrelated individuals. Final samples achieved mean coverages ranging from 31.3x to 64.8x, with an average of 38.65x and a median of 36.6x. We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.

References

Naslavsky MS, Scliar MO, Yamamoto GL, Wang JYT, Zverinova S, Karp T, Nunes K, Ceroni JRM, de Carvalho DL, da Silva Simões CE et al. Whole-genome sequencing of 1,171 elderly admixed individuals from São Paulo, Brazil. Nat Commun. 2022 Mar 4;13(1):1004. PMID: 35246524; PMC: PMC8897431