Skip to content

Commit c94ca7b

Browse files
authored
add demo figs
1 parent 996c35e commit c94ca7b

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

docs/src/index.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,14 @@
66

77
Testing for differences in features between clusters in various applications often leads to inflated false positives when practitioners use the same dataset to identify clusters and then test features, an issue commonly known as “double dipping”.
88

9+
![dd](https://github.com/user-attachments/assets/e5383503-2e4d-45d0-adff-77f3a0f82899)
10+
11+
![xkcd](https://github.com/user-attachments/assets/8de07b78-8346-4316-ae8c-855c305d625f)
12+
13+
> The xkcd-style cartoon is drawn with the help of R package [xkcd](https://xkcd.r-forge.r-project.org/)
14+
915
To address this challenge, inspired by data-splitting strategies for controlling the false discovery rate (FDR) in regressions ([Dai et al., 2023](https://www.tandfonline.com/doi/abs/10.1080/01621459.2022.2060113)), we present a novel method that applies data-splitting to control FDR while maintaining high power in unsupervised clustering.
1016

1117
We first divide the dataset into two halves, then apply the conventional testing-after-clustering procedure to each half separately and combine the resulting test statistics to form a new statistic for each feature. The new statistic can help control the FDR due to its property of having a sampling distribution that is symmetric around zero for any null feature.
18+
19+
![mds](https://github.com/user-attachments/assets/7f75a845-0c9b-41cd-982c-b5fba53c5a71)

0 commit comments

Comments
 (0)