Skip to content

Fine mapping TWAS associations

Nicholas Mancuso edited this page Mar 31, 2019 · 14 revisions

The main aim of FOCUS is to fine-map TWAS associations at GWAS risk regions. FOCUS takes as input 1) GWAS summary statistics, 2) reference LD, and 3) eQTL weight database. Given these data, FOCUS can finemap in a tissue-agnostic or tissue-prioritized approach.

The basic command for fine-mapping is

focus finemap SUMSTATS PLINK_REFLD WEIGHT_DB

where SUMSTATS is the GWAS summary file, PLINK_REFLD is the path to PLINK-formatted genotype data for computing reference LD, and WEIGHT_DB is the path to a FOCUS weight database. Help on all the options and functionality can be listed by entering

focus finemap --help

For example, the command to perform tissue-agnostic fine-mapping on chromosome 1 for GWAS summary data LDL_2010.clean.sumstats.gz using 1000G.EUR.QC.1 reference genotypes, and gtex_v7.db eQTL weights is given as,

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --chr 1 --out LDL_2010.chr1

This command will scan LDL_2010.clean.sumstats.gz for risk regions and then perform TWAS+fine-mapping using LD estimated from plink-formatted 1000G.EUR.QC.1 and eQTL weights from gtex_v7.db.

To take the tissue-prioritized approach the flag --tissue TISSUE is added

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --chr 1 --tissue LIVER --out LDL_2010.chr1

FOCUS has the ability to generate a figure for each region that contains the predicted expression correlation, TWAS summary statistics and PIP for each gene. To do this add the --plot flag.

focus finemap LDL_2010.clean.sumstats.gz 1000G.EUR.QC.1 gtex_v7.db --chr 1 --tissue LIVER --plot --out LDL_2010.chr1

Here is an example image illustrating the local correlation structure, TWAS p-values, and PIPs for each model

The output from the finemap operation is a table:

Column Description
ens_gene_id Ensembl gene ID
ens_tx_id Ensemble transcript ID
mol_name Name of the gene/linc/pseudogene
tissue Tissue the original expression was measured in
ref_name Name of the QTL reference panel
type Type of molecular feature (gene, lncRNA, lincRNA, pseudogene)
chrom Chromosome
tx_start Transcription start site
tx_stop Transcription stop site
inference Inference procedure for model (e.g., LASSO, BSLMM)
cv.R2 Cross-validation predictive Rsquared
cv.R2.pval P-value of the Cross-validation
twas_z Marginal TWAS Z score
pip Marginal posterior inclusion probability
in_cred_set Flag indicating whether or not model is included in the credible set
region Identifier for the genomic region

We recommend using reference LD from LDSC.

We recommend using a multiple tissue, multiple eQTL reference panel weight database here. This combines GTExv7 weights from PrediXcan with METSIM, NTR, YFS, and CMC weights from FUSION software into a single usable database for FOCUS.

Clone this wiki locally