You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/TFregulomeR.Rmd
+27-21Lines changed: 27 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -43,21 +43,27 @@ source, TF source ID, number of peaks and number of peaks with motif.
43
43
In TFregulomeR project, we
44
44
used MEME-ChIP to perform motif de novo discovery in each ChIP-seq. Highly and centrally enriched
45
45
motifs were selected and compared with the existing TFBS databases, such as HOCOMOCO and JASPAR.
46
-
Around 6% of highly enriched motifs were not consistent with the TFBS databases, and it might
47
-
be due to the fact that in those cell types, the given TFs are indirectly recruited to genome.
48
-
Besides, approximately 9% of motifs were not recorded for their corresponding TFs in the databases.
49
-
In order to confirm the reliability of these 15% motifs, we used HOMER to perform a de novo motif
46
+
89 highly enriched motifs were not consistent with the TFBS databases. it might
47
+
be due to the fact that in those cell types, the given TFs are indirectly recruited to genome,
48
+
and/or that highly abundant presence of cohesion and polycomb group proteins masks the motif
49
+
enrichment of the ChIP’ed TF.
50
+
Besides, 136 motifs were not recorded for their corresponding TFs in the databases.
51
+
In order to confirm the reliability of these 225 motifs, we used HOMER to perform a de novo motif
50
52
discovery again. Motif results by HOMER were compared with those by MEME-ChIP and their similarity
51
-
were measured by normalised Pearson correlation coefficient using the formula: Ncor = cor * w / w_smaller,
53
+
were measured by normalised Pearson correlation coefficient using compare-matrices function in
54
+
RSAT with the formula: Ncor = cor * w / w_smaller,
52
55
where cor is raw Pearson correlation coefficient, w is the alignment width of two matrices from
53
56
MEME-ChIP and HOMER (the minimum value of w was set as 5), and w_smaller is the width of smaller
54
57
motifs from MEME-ChIP and HOMER. We found that majority of those PWM matrices generated by MEME-ChIP,
55
58
a combined algorithm suite of expectation maximization and regular expressions, were able to be
56
-
recapitulated by HOMER, which takes advantage of hypergeometric enrichment. We have added the
57
-
information into the last two columns of `dataBrowser` output (from v1.2.0).
59
+
recapitulated by HOMER, which takes advantage of hypergeometric enrichment (Figure 1). We have added the information into the last two columns of `dataBrowser` output (from v1.2.0).
58
60
59
61
In particular, if no input is given for the function, all records in TFregulomeR compendium will be returned.
60
62
63
+
```{r echo=FALSE, fig.cap="Figure 1. Similarity of de novo enriched motifs by MEME-ChIP and HOMER. The beeswarm and violin plots show the normalised Pearson correlation coefficient of de novo motifs called by MEME-ChIP and HOMER, and the red dash denotes normalised Pearson correlation coefficient value 0.7.", out.width = '80%',fig.align="center"}
#> ... ... ... Cofactor report for id 'MM1_HSA_K562_CEBPB' has been saved as MM1_HSA_K562_CEBPB_cofactor_report.pdf
970
976
```
971
977
972
-
```{r echo=FALSE, fig.cap="Figure 7. MethMotif logo of K562 CEBPB common peaks intersected with K562 CEBPD peaks", out.width = '40%', fig.align="center"}
978
+
```{r echo=FALSE, fig.cap="Figure 8. MethMotif logo of K562 CEBPB common peaks intersected with K562 CEBPD peaks", out.width = '40%', fig.align="center"}
```{r echo=FALSE, fig.cap="Figure 8. MethMotif logo of K562 CEBPB exclusive peaks intersected with K562 ATF4 peaks", out.width = '40%', fig.align="center"}
982
+
```{r echo=FALSE, fig.cap="Figure 9. MethMotif logo of K562 CEBPB exclusive peaks intersected with K562 ATF4 peaks", out.width = '40%', fig.align="center"}
```{r echo=FALSE, fig.cap="Figure 11. HTML annotation report of the genomic locations of K562 CEBPB exclusive peak", out.width = '100%', fig.align="center"}
1133
+
```{r echo=FALSE, fig.cap="Figure 12. HTML annotation report of the genomic locations of K562 CEBPB exclusive peak", out.width = '100%', fig.align="center"}
The key function of transcription factors is to regulate gene expression. By working with Genomic Regions Enrichment of Annotations Tool (GREAT), TFregulomeR allows users to annotate the functions of TFBSs using `greatAnnotate`. Given that GREAT server doesn't support hg38, liftOver R package has been incorporated in TFregulomeR to convert hg38 to hg19. The annotation output of `greatAnnotate` is intuitive, not only will a data.frame containing annotation results be returned, but also an HTML report will be saved. The HTML report takes advantage of `rbokeh` package, which presents a vivid and dynamic interface (Figure 12).
1139
+
The key function of transcription factors is to regulate gene expression. By working with Genomic Regions Enrichment of Annotations Tool (GREAT), TFregulomeR allows users to annotate the functions of TFBSs using `greatAnnotate`. Given that GREAT server doesn't support hg38, liftOver R package has been incorporated in TFregulomeR to convert hg38 to hg19. The annotation output of `greatAnnotate` is intuitive, not only will a data.frame containing annotation results be returned, but also an HTML report will be saved. The HTML report takes advantage of `rbokeh` package, which presents a vivid and dynamic interface (Figure 13).
1134
1140
1135
1141
```{r eval=FALSE}
1136
1142
# annotate the functions of K562 CEBPB exclusive peaks
```{r echo=FALSE, fig.cap="Figure 12. HTML annotation report of the genes targeted by K562 CEBPB exclusive peak", out.width = '100%', fig.align="center"}
1181
+
```{r echo=FALSE, fig.cap="Figure 13. HTML annotation report of the genes targeted by K562 CEBPB exclusive peak", out.width = '100%', fig.align="center"}
0 commit comments