A SG10K Health Study, investigating the associations of various epigenetic clocks on ageing, in a multi-ethnic Singaporean cohort.
Explore the Singapore National Precision Medicine Strategy »
Report bug
·
Request data
·
Themes
·
Blog
Principal Investigators (SG10K Health Aging Study)
Neerja Karnani
.
Joanne Ngeow Yuen Yie
.
Brian Kennedy
Singapore's National Precision Medicine (NPM) strategy seeks to acceleration biomedical research, improve health outcomes, and enhance opportunities for economic value across sectors through a decade long roadmap. The first phase of this strategy is a "proof of concept", through the SG10K health study - primarily generating a genomic reference database of 10,000 healthy Singaporeans, demonstrating the feasibility of large-scale genomic data generation. As part of SG10K health study, this study investigates the Ageing-related clinical phenotypes, alongside genetic, epigenetic, telomere length, and epigenetic clocks, to provide a comprehensive overview of the molecular landscape of Age-related phenotypes in an Asian multiethnic cohort.
This git repository houses the codes used for the analysis of the NPM Aging Study.
placeholder for link to manuscript.
- Study Cohorts
- DNA Methylation
- Genomics
- EpiAge Estimates
- Telomere Length
- Analysis Overview
- Status
- Authors
A birth cohort comprising of one of the most carefully phenotyped parent-offspring study, enabling examination of the potential roles of fetal, developmental, and epigenetic factors in pathways to disease.
An adult cohort which aims to identify the genetic and environmental factors that underpin development of obesity, diabetes, cardiovascular disease and other complex diseases in Singapore.
An adult cohort which aims to discover how lifestyle factors, physiological factors, genetic factors and their interactions impact the development of common health conditions, and to monitor risk factors in the population and gain insight into determinants of health-related behaviours.
An adult cohort, flagship initiative of the academic medical centre in precision medicine, a discipline where medical treatments and procedures are tailored to individual patients, based on their detailed genetic, molecular and clinical profiles.
An adult cohort which aims to provide novel knowledge in the population eye health to enable dissecting, detecting and preventing the eye diseases in Singapore and Asia, and to promote and improve global eye health.
An adult cohort collected between 2015 and 2016 in TTSH Health Screening Programmes to support health related studies at TTSH.
- Illumina EPIC Array pre-processing was performed by Marie's Loh lab.
- Single-sample csv files per study were obtained following standard Type 1/Type 2 and Red/Green channel normalizations.
- Independent of whole-genome sequencing (WGS) data, Marie's Lab incorporated a PCA-based ethnicity QC to determine population structure and stratify by ethnic groups. As genomic data supersedes epigenetic information, we do not apply Marie's ethnicity QCs and classifications. (They are very similar but not 100% identical. Also for samples that fail genomic QC, it does not make sense to superimpose epigenetic ethnicity classifications onto the subject; because they are not 100% identical.)
- QC parameters employed in this study include:
- Sample call rates were removed (total CpGs passing QC per sample < 90%)
- Sex QC
- Subject duplication
- Age (is NA or not)
- Kinship (cryptic relationships)
- Cohort resolution
- 10,019 samples passed the initial QCs in Marie Loh's lab.
- Multimodal CpGs were removed (nmode.mc (modedist=0.2) > 1)
- Non-variable CpGs were removed (IQR < 0.05)
- CpGs failing marker call rates were removed (Det P > 0.01)
- Sex chromosomes were removed.
- CpGs with ethnic-specific (based on SG10K MAF <5%) within single-base extension were removed.
- Cross hybridizing probes and probes recommended to be removed under the Illumina EPIC manifest (v1_0_b5) were removed.
- 747,992 CpGs passed this QC.
- Whole Genome Sequencing of 10,258 healthy Singaporeans was performed.
- Single-sample gVCF files were obtained following GATK4 "germline short variant per-sample calling" reference implementation defined parameters and companion files (GATK resource bundle GRCh38).
- msVCF files were obtained by performing a joint-calling step.
Sample QC & annotation
- 9,770 samples passed the initial genomic coverage requirements per study.
- Variants failing VQSR filter were removed.
- Sex was imputed based on the mean depth ratio of chrX/chr20 and chrY/chr20 of each sample, and samples with abnormal ploidy were excluded.
- Samples with call rate < 95%, contamination rate > 2%, error rate > 1.5%, extreme heterozygosity (> 3SD) were excluded.
- Only non-monomorphic autosomal biallelic SNPs in HWE (P < 10e-8) were included.
- Low complexity regions were excluded after LD pruning (r^2 > 0.2).
- Samples with cryptic relationships were excluded (pi-hat > 0.2).
- Samples showing evidence of admixture between ethnicities through PCA outliers were excluded.
- 8,118 samples passed this WGS QC.
- Subjects passing WGS QCs (having WGS-derived ethnicity classifications) were combined with those passing DNA Methylation QCs. This gave us 6,240 unique subjects.
- Post QC DNA methylation betas were used in the generation of epigenetic ages.
- Age acceleration estimates were adjusted for chronological age.
- Three of the epigenetic age variants were obtained from the DNA methylation Age Calculator developed by Steve Horvath:
- Horvath
- PhenoAge
- GrimAge
- Cell type proportions were obtained from AdvancedAnalysisBlood from the calculator.
- The output was also normalized by the calculator to make the data comparable between clocks.
- corSampleVSgoldstandard was also adopted under a < 0.8 threshold as recommended by the calculator.
- Zhang (Elastic Net) age estimates were calculated from scripts available online.
- Zhang Age acceleration values were obtained by regressing Zhang DNA methylation age on chronological age.
- TelSeq was conducted on SG10K genomic data passing genomic QC, evaluating the frequency of telomeric repeats (TTAGGG) with a default parameter of k=7.
- TelSeq estimates were correlated with qPCR measured telomere lengths in a subset of the same study samples as well as other WGS based telomere length estimations (Telomerecat).
- Raw estimates were normalized through rank-based z-scores.
- 8,045 samples passed this QC.
- Linear regression models were employed, adjusting for cohort, ethnicity, sex, and cell type proportions.
- 5,497 unique adult samples had genetic and age acceleration (any one of the four) passing QC.
- Samples were stratified by cohort and ethnicity prior to analyses.
- Associations were adjusted for sex and genetic PCA PCs.
- For common SNPs,
- Linear Wald test analysis was conducted using Efficient and Parallelizable Association Container Toolbox (EPACTS v3.3.0)
- Cohort-specific association summaries were meta-analyzed using a standard error approach in METAL.
- SNPs present in 3 or less (out of 5) of the cohorts, or having a VQSLOD < 0 were excluded from the meta-analysis.
- For rare SNPs,
- autosomal, low frequency (MAF < 0.05), protein altering variants (as annotated by SnpEff) present in at least 4 cohorts were analyzed under gene-based tests using RAREMETAL (SKAT method).
- RVTEST was used to calculate the single-SNP summary statistics and covariance matrices from each cohort required for the meta-analysis.
- Genes with < 5 aggregated SNPs in the SKAT test were omitted.
- To mitigate the influence of DNA methylation outliers, we truncate outlier values beyond 2xIQR to the nearest value. [see PMID: 34633450]
- To mitigate batch effects, we conduct linear regression analyses via study-specific ethnic and sex sub-cohorts (N>50), before combining the results via meta-analysis under a standard error approach. Per linear regression, we adjust for chip and cell type proportion. This gave us 112,472 EWAS significant CpGs.
- Subsequently, we conduct a leave-one-out (LOO) analysis and remove CpGs which do not pass LOO meta-EWAS significance or have mostly non-concordant effect size directionalities across all LOOs.
- This gave us 36,878 Age meta-EWAS significant CpGs.
- Linear regression was used to differentiate telomere length differences between datasets within the SG10K cohort, as well as associations with the various phenotypes, adjusting for cohort (where applicable), age, ethnicity, and sex.
Manuscript under review.