Unravelling the molecular mechanisms causal to type 2 diabetes across global populations and disease-relevant tissues

MR analyses were performed using QTLs as IVs. Throughout this paper, we refer to ‘genes’ for gene expression levels genetically predicted from cis-eQTLs and to ‘proteins’ for protein levels genetically predicted from cis-pQTLs. We use ‘molecular traits’ to refer to genes and proteins identified in any MR analysis.

Study design overview

We performed blood MR analyses by using IVs defined from blood eQTLs and plasma pQTLs from multiple cohorts (Fig. 1 and Supplementary Table 1). Analyses were conducted in an ancestry-aware manner; that is, within ancestry groups genetically similar to EUR26,27, Africans (AFR28,29), admixed Americans (AMR30) and East-Asians (EAS31) (Fig. 1). Causal effects were considered based on several lines of evidence coming from statistical significance corrected for multiple testing, sensitivity analysis and colocalization (STROBE guidelines; see Methods). Replication was performed in matched genetic ancestry groups28,29,30,32,33, depending on data availability (Fig. 1). We meta-analysed MR results across genetic ancestry groups, where possible, for either genes or proteins, separately (Methods). We also performed cis-eQTL MR analysis in seven additional T2D-relevant tissues34,35 (whole pancreas, pancreatic islets, brain hypothalamus, visceral adipose, subcutaneous adipose, liver and skeletal muscle). We report causal estimates as odds ratios (ORs) per unit of genetically predicted gene expression and protein levels on T2D risk. We refer to protective effects of molecular traits for which the OR is lower than 1; that is, for which increased levels of expression are associated with reduced T2D risk, and to deleterious effects otherwise. Although blood eQTL MR is not the most tissue-relevant for T2D, we performed this analysis to allow comparison between global populations, given that non-EUR eQTL data from other tissues are not currently available.

Fig. 1: Overview of the cohorts and tissues used to perform single-ancestry MR analyses in populations genetically similar to EUR, AFR, AMR and EAS based on the 1000 Genomes Project phase 3 (ref. 18).

Discovery cohorts are indicated in bold, and replication cohorts for blood MR analyses are in italics. Created in BioRender.com.

Assessing QTLs in global populations improves the detection of causal effects

We defined IVs in blood for a total of 20,307 genes and 1,630 proteins in at least one genetic ancestry group (Extended Data Fig. 1a). Causal effects of 331 genes were detected in at least one single-ancestry MR analysis, of which 316 (95.5%) were found in the EUR analysis (Fig. 2). This high proportion reflects the higher expected statistical power in EUR owing to the higher sample size of both the exposure and outcome datasets (Methods and Extended Data Fig. 1b). Yet a total of 15 genes showed significant causal effects only in non-EUR (ten in AMR and five in AFR). Of these 15 genes, nine (60%) were replicated in independent cohorts from matched genetic ancestry groups, strengthening the evidence for their potential causal role in T2D risk (Fig. 2a and Supplementary Tables 4 and 5). Two examples include TOLLIP-AS1, detected in AMR (OR = 0.87, q = 2.25 × 10−4) and PTGES2, detected in AFR (OR = 0.90, q = 2.40 × 10−2), with protective effects against T2D risk (Extended Data Fig. 2a,b). When investigating the allele frequencies and effect sizes of the IVs of these two genes, we observed different patterns across genetic ancestry groups. The allele frequencies of the IVs used for PTGES2 across AFR and EUR were similar, while the effect sizes on gene expression were larger in AFR than in EUR. Conversely, for TOLLIP-AS1, the effect sizes on gene expression were of the same magnitude among AMR and EUR, but the allele frequencies of the IVs were higher in AMR than in EUR (Extended Data Fig. 2c). For these two genes, the expected statistical power was the highest in EUR, probably because of the higher sample size of the outcome GWAS in this genetic ancestry group. Despite having the highest statistical power, we were not able to detect causal effects of these two genes in EUR. More comprehensive and diverse data will be needed to claim potential ancestry-specific causal effects.

Fig. 2: Molecular traits with causal effects identified in the blood eQTL and pQTL MR analyses.

a, Venn diagrams representing the number of genes and proteins with causal effects identified in the three ancestry groups. b, Forest plots of the causal effects identified only in non-EUR from the blood eQTL and pQTL MR analyses. Causal estimates from the single-ancestry MR in the discovery cohorts (matched genetic ancestry group between the exposure and the outcome) are represented. Filled dots represent causal estimates from MR analyses that have an FDR-corrected p value (q value) of <0.05 and pass the sensitivity criteria and show evidence of colocalization (PPH4 > 0.8) in single-ancestry analyses. Genes and proteins with causal effects identified in single-ancestry analyses and replicated in independent cohorts from the same genetic ancestry group are denoted with a star. Points represent MR causal estimates derived from summary statistics (OR for T2D per 1 s.d. change in genetically predicted gene expression or protein levels, measure of centre) and error bars represent 95% confidence interval (CI). Sample sizes of the QTL datasets used as exposure datasets: (1) eQTL EUR (n = 31,684), AFR (n = 1,032), AMR (n = 893); (2) pQTL EUR (n = 35,559), AFR (n = 1,871), EAS (n = 1,823). Sample size of the T2D GWAS meta-analysis used as outcome datasets: EUR (n = 242,283 cases and n = 1,569,734 controls), AFR (n = 50,251 cases and n = 103,909 controls), AMR (n = 29,375 cases and n = 59,368 controls), EAS (n = 88,109 cases and n = 339,395 controls).

Source data

Similarly, there were 46 proteins with causal effects detected in single-ancestry MR analyses, of which 39 (85%) were detected in EUR. The remaining seven proteins with causal effects detected only in non-EUR were detected in EAS (Fig. 2b). Two of them were tested only in EAS (ANXA7 and CALCOCO2), while the remaining five were tested in multiple ancestries but were significant only in EAS: CTRB2 (OR = 1.08, q = 3.60 × 10−8), FAM20B (OR = 0.87, q = 3.80 × 10−4), BOC (OR = 1.17, q = 2.17 × 10−4), NRP1 (OR = 1.08, q = 6.59 × 10−3) and ALDH2 (OR = 1.05, q = 6.70 × 10−7). Several of these proteins presented causal estimates in EUR close to the estimates in EAS, while not being considered causal in EUR. This is the case for BOC and FAM20B, which presented significant causal estimates in the EUR MR analyses but did not show evidence of colocalization in this ancestry group (BOC: posterior probability of hypothesis four (PPH4) = 0.068 in EUR vs PPH4 = 0.919 in EAS; FAM20B: PPH4 = 3.2 × 10−5 in EUR vs PPH4 = 0.974 in EAS). The EUR pQTL dataset used in our study comes from an Icelandic population dataset (deCODE). The lack of colocalization in EUR for these proteins may be the result of a linkage disequilibrium mismatch between Icelandic pQTL data and the EUR GWAS meta-analysis from T2DGGI. For EAS, colocalization was identified possibly owing to a more similar linkage disequilibrium pattern between the pQTL and EAS GWAS meta-analysis from T2DGGI (Extended Data Fig. 3). None of the proteins with causal effects identified in EAS could be tested for replication in this genetic ancestry group because of the lack of a replication cohort with Somascan proteomics data and the lack of IVs in the EAS UK Biobank36 proteomics data (Supplementary Table 6).

Our results further reveal the benefits of investigating non-EUR QTLs, given the increased number of genes and proteins for which IVs are available only in non-EUR. Here, a total of 6,431 genes and 570 proteins were tested only in one genetic ancestry group, of which 3,648 genes (56.7%) and 302 proteins (53.0%) were only tested in non-EUR (Extended Data Fig. 1a). Most of these IVs show minor allele frequencies lower in EUR than in non-EUR, especially when considering IVs of genes and proteins tested only in AFR (Fig. 3). It would therefore require larger sample sizes in EUR QTL studies to detect these IVs, given the expected homogeneous effect sizes across genetic ancestry groups37. Examples include ENOX1 (OR = 1.07, q = 3.80 × 10−2 in AFR eQTL MR), ANXA7 (OR = 0.94, q = 1.16 × 10−2 in EAS pQTL MR) and CALCOCO2 (OR = 0.90, q = 3.61 × 10−3 in EAS pQTL MR). CALCOCO2 is a gene previously suggested to be associated with T2D38, with a knockdown decreasing insulin content in the human pancreatic beta cell line EndoC-βH1 (ref. 39). Here, we corroborate this finding with evidence of a protective effect of increased blood expression levels of CALCOCO2 against T2D risk.

Fig. 3: Distribution of differences between EUR and non-EUR in the minor allele frequency of IVs for genes and proteins only tested in non-EUR.

The differences were computed as \({\mathrm{MAF}}_{\mathrm{non}-\mathrm{EUR}}-{\mathrm{MAF}}_{\mathrm{EUR}}\). Minor allele frequencies (MAFs) were obtained from the Genome Aggregation Database (gnomAD)78. gnomADg_NFE_MAF refers to the MAF observed in gnomAD genomes in the non-Finnish European (NFE) population. Positive differences—that is, IVs for which the MAF is lower in NFE than in the corresponding non-EUR ancestry group—are represented in red, and negative differences in grey. Boxplots represent the median (centre line) and the 25th and 75th percentiles (box bounds), with whiskers extending to 1.5× the interquartile range (IQR) from the box. Number of IVs included genes only tested in AFR (n = 1,539 IVs) and AMR (n = 7,262 IVs), and proteins only tested in AFR (n = 462 IVs) and EAS (n = 231 IVs).

Source data

Low ancestry-related heterogeneity for blood molecular traits causal to T2D risk

To detect potential causal effects that could be shared across genetic ancestry groups, we performed a meta-analysis of the single-ancestry MR causal estimates. Identifying shared effects that could potentially be targeted by drugs would increase the equitability of benefits across populations, a major current challenge for the translation of genomics findings into the clinic40. A total of 13,878 genes and 1,094 proteins had IVs in at least two genetic ancestry groups and were meta-analysed using random effect models to obtain a cross-ancestry causal effect estimate. We identify causal effects of 81 genes and five unique proteins on T2D risk in the MR meta-analysis across genetic ancestry groups (Extended Data Fig. 4 and Supplementary Tables 2 and 3). Most of these causal effects of genes (93.8%) and proteins (80%) were also detected in the EUR single-ancestry MR analysis, highlighting the driving power of this genetic ancestry group in the meta-analysis, linked to the larger sample size of the corresponding QTL and GWAS datasets (Fig. 4a). The meta-analysis identified four genes not detected in any single-ancestry MR analyses: MUTYH, NEIL1, ZFP36L2 and TUFM (Fig. 4b). For MUTYH and NEIL1, the meta-analysis was driven by the statistical power in EUR (Fig. 4b). For ZFP36L2 and TUFM, all genetic ancestries presented limited statistical power and similar estimates of causal effects, highlighting the power gain of meta-analysing MR results from multiple genetic ancestry groups.

Fig. 4: Molecular traits with causal effects identified in the blood eQTL and pQTL MR meta-analyses across ancestries.

a, Upset plots representing genes and proteins with causal effects identified in the three ancestry groups and in the cross-ancestry meta-analysis. b, Forest plots of the causal effects identified only in the blood eQTL MR meta-analysis. Filled dots represent causal estimates from MR analyses that have q < 0.05 and (1) pass the sensitivity criteria and show evidence of colocalization (PPH4 > 0.8) in single-ancestry analyses or (2) present nominal significance and meet criterion 1 in at least one cohort entering the meta-analysis. Points represent MR causal estimates derived from summary statistics (OR for T2D per 1 s.d. change in genetically predicted gene expression or protein levels, measure of centre); error bars, 95% CI. Sample sizes of the QTL datasets used as exposure datasets: (1) eQTL EUR (n = 31,684), AFR (n = 1,032), AMR (n = 893); (2) pQTL EUR (n = 35,559), AFR (n = 1,871), EAS (n = 1,823). Sample size of the T2D GWAS meta-analysis used as outcome datasets: EUR (n = 242,283 cases and n = 1,569,734 controls), AFR (n = 50,251 cases and n = 103,909 controls), AMR (n = 29,375 cases and n = 59,368 controls), EAS (n = 88,109 cases and n = 339,395 controls). c, Minimal detectable effect (OR) to achieve a statistical power of 80% in the three genetic ancestry groups for the four genes identified only in the eQTL meta-analysis.

Source data

Of the 81 genes with significant causal effects on T2D risk in the meta-analysis, evidence of ancestry-related heterogeneity, as determined by Cochran’s Q statistic, was found for none of the proteins and for only one gene, KLHL42, albeit with a P value very close to the nominal significance threshold (Cochran’s P = 3.96 × 10−2). Together with the single-ancestry analyses, these results indicate low levels of ancestry-related heterogeneity of causal effects of gene and protein expression levels on T2D risk.

Causal effects present high tissue-related heterogeneity

To increase the granularity of our causal inference analyses and the likelihood of detecting causal effects relevant for T2D15, MR analyses were conducted in EUR in seven tissues relevant to T2D: subcutaneous adipose, visceral adipose, brain hypothalamus, liver, skeletal muscle, whole pancreas and pancreatic islets. Causal effects on T2D risk were identified for 70 to 243 genes, depending on the tissue (Fig. 5a and Supplementary Table 7). More than 90% of the causal effects were identified in tissues in which the corresponding gene is also expressed, highlighting the validity of our agnostic gene-centric approach (Fig. 5a and Methods). One example is MTNR1B, a gene found to be impaired in early insulin secretion and expressed mostly in pancreatic islets41, the only tissue tested in our study with a strong causal effect (q = 2.17 × 10−97, PPH4 = 1; Supplementary Fig. 1). Given that molecular QTL data in blood are more widely available across multiple genetic ancestry groups and with higher sample sizes than in other tissues, we compared findings from non-blood tissues to the 335 genes identified in any blood MR analyses. A total of 928 genes showed significant causal effects on T2D risk in at least one T2D-relevant tissue and/or in blood. We observed the largest causal effects in blood, followed by visceral adipose tissue, skeletal muscle and subcutaneous adipose tissue (Extended Data Fig. 5).

Fig. 5: Overview of the results from the eQTL EUR MR analyses in T2D-relevant tissues.

a, Number of genes tested in MR analyses from T2D-relevant tissues with significant causal effects on T2D risk, percentage of causal effects also detected in blood eQTL MR and percentage of causal effects detected in a tissue where the gene is also expressed. b, Pairwise overlap of significant causal effects across T2D-relevant tissues and blood eQTL MR. For easier reading, only pairwise overlap between different tissues is represented. c, Distribution of I2 values representing the heterogeneity of causal estimates for genes tested in at least two tissues (including blood).

Source data

We report 328 genes (36%) with significant causal effects in at least two tissues (Fig. 5b). Only 18% of the genes with causal effects identified in a T2D primary tissue also showed significant causal effects in blood, highlighting the additional information provided by eQTL MR analyses in non-blood tissues. The highest overlap was observed between subcutaneous and visceral adipose tissues, as well as between subcutaneous adipose tissue and skeletal muscle. We observed low overlap between whole pancreas and pancreatic islets, underscoring the importance of investigating fine-scale tissue molecular profiles. Only 28 out of the 328 genes with causal effects on T2D risk in multiple tissues showed concordant directions of effect across tissues. Accordingly, we observed a large tissue-related heterogeneity, as highlighted by the high I2 values depicted in Fig. 5c, which represents the percentage of variance caused by study heterogeneity. Among the genes tested in at least two tissues, 75% (14,304 out of 19,020) show nominally significant heterogeneity. This tissue-related heterogeneity could be underestimated given the lack of shared IVs across tissues when investigating cis-eQTLs13, with 11,707 genes (21%) in our analyses being tested in only one non-blood tissue. Evaluating the causal effects of all genes in disease-relevant tissues will be needed to comprehensively capture tissue-related heterogeneity and characterize shared and tissue-specific causal mechanisms.

Contextualizing causal effects into the T2D landscape

In this study, we performed a gene-centric agnostic approach; that is, without being constrained to T2D loci. To validate this approach in light of previous T2D findings, we compared the molecular traits with causal effects on T2D risk from this study to established diabetes-related effector genes, using eight different gene sets (Methods and Supplementary Table 9). Between 8% and 40% of the diabetes-related genes could not be tested in any of our MR analyses, depending on the gene set considered (Fig. 6a). The proportion of genes with a causal effect on T2D risk identified among those tested increased with the level of supporting evidence for T2D, ranging from less than 5% in congenital and neonatal diabetes forms to 20% or more when focusing on T2D effector genes as defined by Mahajan & McCarthy42 and Human Genetic Evidence Calculator (HuGE) scores38 (Fig. 6a and Supplementary Table 10).

Fig. 6: Causal effects among established diabetes-related gene sets.

a, Stacked bars showing the number of genes not tested, or tested with and without causal effects across molecular traits. A gene was considered to have causal effects if it had a q value of <0.05, passed sensitivity analyses and showed evidence of colocalization (PPH4 > 0.8). Bars are ordered by the number of genes with causal effects. b, Enrichment of established 1,079 T2D effector genes (Mahajan & McCarthy42) and genes with HuGE scores of >30. The table shows the number of genes tested; that is, those with at least one IV, as well as the number of genes with causal effects for the list of 1,079 T2D effector genes and for the rest of the genome. The table also shows the fold enrichment of the percentage of causal effects among tested genes and the corresponding P value from Fisher’s exact test (two-sided). Results are reported overall and by tissue.

Source data

To extend these analyses, we included the list of 1,079 T2D genes using the combined definition of T2D effector genes from Mahajan & McCarthy42 and HuGE scores of >30, representing the gene set with the highest number of curated T2D-relevant genes with causal effects38. For this gene set, we showed an enrichment of causal effects compared to the rest of the genes tested in our analyses: causal effects were identified for 19.03% of T2D-related genes, but for only 2.28% of the rest of the genes (P = 8.62 × 10−74; Fig. 6b and Methods). The same trends were found when considering causal effects at the single-tissue level (P values ranging from 1.06 × 10−4 to 2.07 × 10−25). When investigating the pattern of association of tested T2D-related genes in the different tissues, we observed that strong effects were often found in pancreatic islets (Extended Data Fig. 6a), most of the time in a similar direction as in pancreas (Extended Data Fig. 6b). On the contrary, the direction of causal effect in pancreatic islets was opposite to the one observed in subcutaneous tissue for 75% of the genes with causal effects across the two tissues. For example, this is the case for BAK1 (coding a pro-apoptotic protein from the BCL-2 family), which is significant and passes sensitivity analyses in all tissues, but is confirmed by colocalization evidence only in pancreas and pancreatic islets. Based on MR causal estimates, increased expression of this gene is associated with decreased T2D risk in skeletal muscle and adipose tissues, with increased risk in all the other tissues tested.

To further investigate how tissue-informed causal inference analyses can shed light on established T2D genes, we compared our findings to the eight mechanistic clusters derived in previous work1 (Methods, Extended Data Fig. 7 and Supplementary Table 11). Genes assigned to the metabolic syndrome clusters were more likely to be causal in visceral adipose tissue, with visceral adiposity being a main driver of the metabolic syndrome43 (Fisher’s P value = 7.23 × 10−3, OR = 3.12). Similarly, genes assigned to the beta cell PI+ cluster were more likely to show evidence of a causal effect in pancreatic islets (Fisher’s P value = 1.46 × 10−2, OR = 3.35). These examples show that pinpointing the tissues driving causal effects can help unravel the molecular mechanisms of T2D heterogeneity.

eQTL and pQTL analyses offer complementary and non-redundant insights

To assess the complementarity between eQTL and pQTL MR findings, we compared the genes and proteins with significant causal effects identified by each analysis (Table 1).

Table 1 Molecular traits (genes and proteins) with evidence of causal effects on T2D risk from eQTL and pQTL MR

The absolute number of molecular traits with overlapping evidence of causal effects of gene expression and protein abundance levels was small (seven out of 1,563 molecular traits tested in both eQTL and pQTL MR analyses), in line with the expected complementary information captured by the two molecular levels44. One example is CPXM1, for which we identified causal effects on T2D risk in the plasma pQTL MR analysis (OR = 1.05, q = 1.45 × 10−2), replicated in two independent cohorts from EUR, as well as in the eQTL MR analysis in skeletal muscle (OR = 1.07, q = 1.23 × 10−4) and subcutaneous adipose tissue (OR = 1.07, q = 5.69 × 10−5), always with increasing expression levels associated with increased T2D risk (Fig. 7a). CPXM1 has a role in adipose tissue production45 and is associated with insulin resistance in polycystic ovary syndrome46. CPXM1 is expressed in multiple tissues, especially in subcutaneous adipose tissue, but tissue-specific pQTL data will be needed to assess the origin of circulating CPXM1 abundance. Only HIBCH was identified in eQTL-based and pQTL-based causal inference analyses in blood (eQTL OR = 0.96, q = 1.37 × 10−2; pQTL OR = 0.95, q = 4.94 × 10−4). Increased expression levels of this gene in pancreatic islets (OR = 0.98, q = 2.9 × 10−3) and visceral adipose tissue (OR = 0.96, q = 1.15 × 10−3) also showed a significant protective causal effect against T2D risk (Fig. 7b). By contrast, higher expression levels of HIBCH in subcutaneous adipose tissue (OR = 1.06, q = 5.68 × 10−4) and skeletal muscle (OR = 1.04, q = 5.31 × 10−4) were causally associated with increased T2D risk.

Fig. 7: Results for CPXM1 and HIBCH.

a,b, For each molecular trait, causal estimates from blood eQTL, plasma pQTL and T2D-relevant tissue eQTL MR analyses are shown for CPXM1 (a) and HIBCH (b). Filled dots represent causal estimates from MR analyses that have a q value <0.05 and (1) pass the sensitivity criteria and show evidence of colocalization (PPH4 > 0.8) in single-ancestry analyses or (2) present nominal significance and meet criteria (1) in at least one cohort entering the meta-analysis. Points represent MR causal estimates derived from summary statistics (OR for T2D per s.d. change in genetically predicted gene expression or protein levels, a measure of centre); error bars, 95% CI. Sample sizes of the QTL datasets used as exposure datasets: (1) eQTL EUR (discovery n = 31,684, replication n = 801), AFR (discovery n = 1,032, replication n = 757), AMR (discovery n = 893, replication n = 784); (2) pQTL EUR (discovery n = 35,559, replication ARIC n = 7,123, replication UKB n = 54,219), AFR (discovery n = 1,871, replication AASK n = 466, replication UKB n = 262), EAS (discovery n = 1,823, replication UKB n = 262). Sample size of the T2D-relevant tissues eQTL: subcutaneous adipose (n = 711), visceral adipose (n = 584), brain hypothalamus (n = 256), liver (n = 261), skeletal muscle (n = 816), pancreas (n = 362), pancreatic islets (n = 446). Sample size of the T2D GWAS meta-analysis used as outcome datasets: EUR (n = 242,283 cases and n = 1,569,734 controls), AFR (n = 50,251 cases and n = 103,909 controls), AMR (n = 29,375 cases and n = 59,368 controls), EAS (n = 88,109 cases and n = 339,395 controls). The median TPM observed in GTEx in the eight tissues tested in our MR analysis is represented for CPXM1. For HIBCH, the LocusCompare and LocusZoom plots demonstrating the colocalization evidence from eQTL in T2D-relevant tissues are displayed. ARIC, Atherosclerosis Risk in Communities study; UKB, UK Biobank; AASK, African American Study of Kidney Disease and Hypertension; TPM, transcripts per million.

Source data

Six additional molecular traits presented significant causal effects on T2D risk in both eQTL from T2D-relevant tissues and in the blood pQTL MR analyses. Although we investigated non-blood tissue MR in EUR only, owing to data availability, we found overlap with blood pQTL findings from EAS for FAM20B and BOC. Future studies are needed to characterize non-blood molecular QTLs in non-EUR to validate these results. Obtaining non-blood QTLs in global populations will also provide insights into the ancestry-related heterogeneity in non-blood tissues.

For subcutaneous adipose tissue, skeletal muscle and liver, we found a significant enrichment of overlapping molecular traits with causal effects in blood eQTL and pQTL causal inference analyses (Methods). This highlights the non-redundant information captured by gene expression and protein abundance levels, for example, owing to post-transcriptional changes not captured at the RNA level44 (Methods and Supplementary Table 8).

Related posts

WVU Medicine endocrinologist highlights steps to prevent and manage diabetes

Landmark study shows Libre technology helps people with Type 2 diabetes on basal insulin improve glucose management

Dexcom unveils Type 2 diabetes CGM data at ATTD