The Mexico Biobank (MXB) project genotyped 6,011 individuals sampled across all 32 states of Mexico during the 2000 National Health Survey (ENSA 2000) conducted by the National Institute of Public Health (INSP). Genotyping used the Illumina Multi-Ethnic Global Array (MEGA, ~1.8M SNPs), which is optimized for admixed populations and enriched for ancestry-informative and medically relevant variants. Only autosomal, biallelic SNPs that passed quality control are included. Samples came from 898 recruitment sites, and indigenous language speakers were prioritized.
This track shows allele frequencies computed from the phased genotypes. The full phased genotype data with haplotype clustering display is available in the Mexico Biobank track under Phased Variants. Frequencies can also be plotted onto a map on the MexVar platform. The hg38 data was lifted from hg19 by UCSC (see below).
Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.
Allele frequencies by geographical state and ancestry are available via the MexVar platform. Raw genotype data are available under controlled access at the EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email andres.moreno@cinvestav.mx to obtain the data.
Data processing included GenomeStudio → PLINK conversion, strand alignment, removal of duplicates, update of map positions using dbSNP Build 151 and low-quality variants/individuals, and relatedness filtering. At UCSC, the phased VCF was lifted from hg19 to hg38 with CrossMap, then allele counts (AC, AF, AN) were computed using bcftools fill-tags and genotypes were stripped to produce a sites-only frequency VCF.
The makeDoc file documents how the source files of the varFreqs track were converted. For some tracks, python scripts were needed and are also available from GitHub.
We thank the Center for Research and Advanced Studies (Cinvestav) of Mexico for generating and providing the frequency data, the National Institute of Medical Sciences and Nutrition (INCMNSZ) for DNA extraction, and the Ministry of Health together with the National Institute of Public Health (INSP) for the design and implementation of the National Health Survey 2000 (ENSA 2000). We also thank the ENSA-Genomics Consortium for their contributions to sample collection and data processing that made possible the construction of the MXB genomic resource.
Barberena-Jonas C, Medina-Muñoz SG, Cedillo-Castelán V, Sepúlveda-Morales T, Gonzaga-Jáuregui C, ENSA Genomics Consortium, García-García L, Ioannidis AG, Moreno-Estrada A. Clinical genetic variation across Hispanic populations in the Mexican Biobank. Nat Med. 2026 Jan 21;. DOI: 10.1038/s41591-025-04100-z; PMID: 41566040
Sohail M, Palma-Martínez MJ, Chong AY, Quinto-Corés CD, Barberena-Jonas C, Medina-Muñoz SG, Ragsdale A, Delgado-Sánchez G, Cruz-Hervert LP, Ferreyra-Reyes L et al. Mexican Biobank advances population and medical genomics of diverse ancestries. Nature. 2023 Oct;622(7984):775-783. PMID: 37821706; PMC: PMC10600006