This project develops a fully interpretable, biologically grounded diabetes risk screener using structured signal logic and hybrid probabilistic gating. Unlike black-box models, this system prioritizes high recall, transparency, and modular reasoning — ideal for real-world healthcare screening and research environments.
BiqDS (Biophysics-Informed Quantitative Diabetes Screener) is based on the insight that:
Early onset of diabetes is associated with a measurable reduction in projected longevity within phenotypic clusters.
Using this principle, the system builds a FinalSignal
that blends:
- Cluster-level longevity estimates,
- Observed diabetes prevalence,
- Delta in life expectancy between diabetic and non-diabetic patients,
- And hybrid filters using Random Forest model confidence when needed.
- ✅ Unsupervised phenotypic clustering (
k=35
) to define biological groupings - ✅ Longevity modeling from cluster-level age averages
- ✅ Δ-longevity weights to adjust signal confidence
- ✅ FinalSignal: interpretable risk score derived from biophysical and statistical logic
- ✅ Threshold sweep and F1 optimization
- ✅ Hybrid filtering with Random Forest probability for robust screening
- ✅ High recall configuration for maximum sensitivity
-
Data Cleaning & Clipping
Remove impossible values and apply domain-informed outlier clipping to enhance medical validity. -
Clustering (KMeans)
Segment patients into biological phenotype clusters using scaled input features. -
Longevity Modeling
Compute average age per cluster and normalize to build aLongevityScore
. -
Diabetes Prevalence
Measure cluster-level diabetes probability. -
Delta-Longevity Weighting
Estimate how much diabetes shortens life within each cluster to computeDeltaLongevityWeight
. -
FinalSignal Construction
BlendLongevityScore
andDiabetesProbability
withDeltaLongevityWeight
. -
Threshold Optimization
Sweep fixed thresholds (e.g. 0.10–0.25) to select high-recall, high-F1 configurations. -
Hybrid Rule (Optional)
Apply a probability veto using a Random Forest trained on correlation-ranked features.
Threshold | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
0.20 | 0.85 | 0.68 | 0.88 | 0.77 |
0.10 (high recall) | 0.77 | 0.53 | 0.98 | 0.69 |
Confusion matrix and classification reports included in the notebook.
- Clone the repo
git clone https://github.com/yourusername/biqds-diabetes-screener cd biqds-diabetes-screener Run the notebook:
BiqDS.ipynb
Dataset required: Place diabetes.csv (PIMA dataset) in the same directory.