The Mexico Biobank (MXB) project genotyped 6,011 individuals sampled across all 32 states of Mexico during the 2000 National Health Survey (ENSA 2000) conducted by the National Institute of Public Health (INSP). Genotyping was performed with the Illumina Multi-Ethnic Global Array (MEGA, ~1.8M SNPs), optimized for admixed populations and enriched for ancestry-informative and medically relevant variants. Only autosomal, biallelic SNPs passing quality control are included. Samples were selected from 898 recruitment sites, with prioritization of indigenous language speakers.
This track shows allele frequencies computed from the phased genotypes. The full phased genotype data with haplotype clustering display is available in the Mexico Biobank track under Phased Variants. Frequencies can also be plotted onto a map on the MexVar platform. The hg38 data was lifted from hg19 by UCSC (see below).
We are not allowed to redistribute the VCF file. Allele frequencies by geographical state and ancestry are available via the MexVar platform. Raw genotype data are available under controlled access at the EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email andres.moreno@cinvestav.mx to obtain the data.
Data processing included GenomeStudio → PLINK conversion, strand alignment, removal of duplicates, update of map positions using dbSNP Build 151 and low-quality variants/individuals, and relatedness filtering. At UCSC, the phased VCF was lifted from hg19 to hg38 with CrossMap, then allele counts (AC, AF, AN) were computed using bcftools fill-tags and genotypes were stripped to produce a sites-only frequency VCF.
We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. For some tracks, python scripts were necessary and are also available from GitHub.
We thank the Center for Research and Advanced Studies (Cinvestav) of Mexico for generating and providing the frequency data, the National Institute of Medical Sciences and Nutrition (INCMNSZ) for DNA extraction, and the Ministry of Health together with the National Institute of Public Health (INSP) for the design and implementation of the National Health Survey 2000 (ENSA 2000). We also thank the ENSA-Genomics Consortium for their contributions to sample collection and data processing that made possible the construction of the MXB genomic resource.
Barberena-Jonas C, Medina-Muñoz SG, Cedillo-Castelán V, Sepúlveda-Morales T, Gonzaga-Jáuregui C, ENSA Genomics Consortium, García-García L, Ioannidis AG, Moreno-Estrada A. Clinical genetic variation across Hispanic populations in the Mexican Biobank. Nat Med. 2026 Jan 21;. DOI: 10.1038/s41591-025-04100-z; PMID: 41566040
Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Corés CD, Barberena-Jonas C, Medina-Muñoz SG, Ragsdale A, Delgado-Sánchez G, Cruz-Hervert LP, Ferreyra-Reyes L et al. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature. 2023 Oct;622(7984):775-783. PMID: 37821706; PMC: PMC10600006