The field of human genetics has assumed that carrying out larger and larger GWAS is the only way to identify new risk genetic variants. The focus on sample size is time consuming, economically expensive, prone to artifacts (e.g. hidden population substructure), and oftentimes unhelpful for risk prediction across humankind. Given that a genomic region is structured in a finite number of haplotypes in human populations, we propose that summary statistics from extant GWAS suffice to recapitulate with accuracy the effect that each haplotype present in a population exerts on the trait of interest. A focus on haplotypes can incorporate secondary and potential non-linear effects, providing for the first time a methodology to ascertain the effects of causal alleles without interferences from linkage disequilibrium.
We will develop an in silico approach using artificial intelligence (AI) to characterize the genetic component of a disease from the summary statistics from existing GWAS. After fine-tuning the approach through simulations of complex phenotype architectures, we will use the algorithm to i) fine-map and estimate the effect of known risk alleles in particular diseases, ii) discover and describe new risk variants, iii) evaluate the presence of population stratification, iv) refine individual risk estimation through better powered polygenic scores, and v) improve the transferability of predictors across ancestries.
Through a new perspective on how to analyze and interpret genetic studies, this project will deliver a new software for studying the genomic architecture of a phenotype, a new catalogue of genetic variants associated with biomedical traits, and the weights to calculate estimators of individual risk of disease with enhanced transferability across human ancestries, arguably one of the main obstacles hindering the progress towards precision medicine.