🩺 Week 1 — Disease Prediction Using Patient Data

This project is part of the DevelopersHub Internship (AI/ML).
The task for Week 1 was to learn the basic ML workflow by predicting heart disease using patient data from the UCI Cleveland Heart Disease dataset.

📂 Project Structure

WEEK1-disease-prediction/
│
├── data/
│   └── cleveland.csv               # Dataset (renamed from processed.cleveland.data)
│
├── notebooks/                      # Jupyter notebooks for each step
│   ├── 01_load_and_explore.ipynb   # Step 1: Load & Explore dataset
│   ├── 02_preprocessing.ipynb      # Step 2: Preprocessing (imputation, scaling, binary target)
│   ├── 03_eda.ipynb                # Step 3: Exploratory Data Analysis (EDA)
│   ├── 04_model_training.ipynb     # Step 4: Model Training (Logistic Regression & Random Forest)
│   └── 05_evaluation_report.ipynb  # Step 5: One-page summary notebook
│
├── week1_report.md                 # Short 1-page Markdown report
├── week1_report.pdf                # Exported PDF report (for submission)
└── README.md                       # Project documentation (this file)



---

## 📊 Dataset
- **Source:** UCI Machine Learning Repository (Cleveland subset, processed version)  
- **Size:** 303 rows × 14 columns  
- **Target:** `target` (0–4) → binarized to `target_bin` (0 = healthy, 1 = disease)  

---

## ⚙️ Preprocessing
- Missing values:
  - `ca`, `thal` → filled with **mode** (most frequent value)
  - other numeric columns → filled with **median**
- Features scaled to **[0, 1]** using `MinMaxScaler`
- Final dataset: **13 features + 1 binary target**

---

## 🔍 Exploratory Data Analysis (EDA)
- Class balance: ~54% healthy, ~46% disease  
- Feature distributions plotted (histograms)  
- Correlation heatmap to study feature relationships  

---

## 🤖 Models & Results
Two models were trained and evaluated:

| Model                | Accuracy |
|-----------------------|----------|
| Logistic Regression   | **0.8525** |
| Random Forest         | **0.9016** ✅ |

**Selected Model:** Random Forest (better accuracy)

---

## 📄 Outcome
- Learned a complete **ML workflow**:
  - Data loading → preprocessing → EDA → model training → evaluation  
- Produced a **1-page report** (`week1_report.pdf`) for submission  
- Random Forest performed best and is the selected baseline model.  

---

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🩺 Week 1 — Disease Prediction Using Patient Data

📂 Project Structure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01_load_and_explore.ipynb		01_load_and_explore.ipynb
02_preprocessing.ipynb		02_preprocessing.ipynb
03_eda.ipynb		03_eda.ipynb
04_model_training.ipynb		04_model_training.ipynb
05_evaluation_report.ipynb		05_evaluation_report.ipynb
README.md		README.md
cleveland.csv		cleveland.csv
week1_report.md		week1_report.md

A-iftikhar02/-Disease-Prediction-Using-Patient-Data

Folders and files

Latest commit

History

Repository files navigation

🩺 Week 1 — Disease Prediction Using Patient Data

📂 Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages