Description of genome variant meta-data

This document describe the meta-data of 11,841,785 (11 million) genome variants in the UK Bio-Bank (UKBB) we analyzed.

Because the meta-data are different for each of the 4 races we analyzed, we put them in 4 different TSV named "staticprependp.tsv", where {p} denotes one of the 4 races,

All four TSV has exactly 11,841,786 rows (a header included), and 9 columns,

The coordinates, alleles, and imputation information are identical for all 4 races, but the HWE p-values are distinct for each, this is why we require 4 different genome variant meta-data.

Besides "ATCG", the reference and alternative allele (ref/alt) also use "I" and "D" to denote insersion and deletion (i.e., indels).

The UK-Biobank genotype was aligned to Hg19/GRCh37 coordinates, so column c19 and p19 are complete. The Hg38/GRCh38 coordinates however, were uplifted from Hg19/GRCh37 via NCBI's online tool, and 9884 out of 11 million variants failed the uplifting. The 9884 variants that failed to uplift were given chromosome 0 and position 0 for column c38 and p38, _respectively.

Author: Xiaoran Tong

Created: 2021-10-06 Wed 12:40

Validate