Skip to content

ML-powered student performance prediction system with end-to-end pipeline. Predicts math scores using demographic & academic features through automated model selection (Random Forest, XGBoost, CatBoost) and Flask web deployment.

Notifications You must be signed in to change notification settings

preetham-11/student_performance_predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Student Performance Predictor

Overview

A comprehensive machine learning project that predicts student math scores using demographic and academic features. The system implements a complete MLOps pipeline from data ingestion to model deployment with a user-friendly web interface.

Features

  • End-to-End ML Pipeline: Complete workflow from data ingestion to model deployment
  • Multiple Algorithm Comparison: Tests 7 different regression algorithms with hyperparameter tuning
  • Real-time Predictions: Flask web application for instant score predictions
  • Automated Model Selection: Automatically selects the best performing model based on R² score
  • Data Preprocessing: Handles categorical encoding and feature scaling
  • Modular Architecture: Well-structured codebase with separate components for each pipeline stage

Project Structure

student_performance/
├── artifacts/                    # Stored models and preprocessors
│   ├── model.pkl                # Trained ML model
│   ├── preprocessor.pkl         # Data preprocessing pipeline
│   ├── train.csv               # Training dataset
│   ├── test.csv                # Testing dataset
│   └── data.csv                # Raw dataset
├── notebook/
│   └── data/
│       └── stud.csv            # Original dataset
├── src/
│   ├── components/             # Core ML pipeline components
│   │   ├── __init__.py
│   │   ├── data_ingestion.py   # Data loading and splitting
│   │   ├── data_transformation.py  # Data preprocessing
│   │   └── model_trainer.py    # Model training and selection
│   ├── pipeline/               # Prediction pipeline
│   │   ├── __init__.py
│   │   └── predict_pipeline.py # Inference pipeline
│   ├── __init__.py
│   ├── exception.py            # Custom exception handling
│   ├── logger.py              # Logging configuration
│   └── utils.py               # Utility functions
├── templates/                  # HTML templates for web app
│   ├── index.html             # Homepage template
│   └── home.html              # Prediction form template
├── app.py                     # Flask web application
├── requirements.txt           # Project dependencies
└── README.md                  # Project documentation

Machine Learning Pipeline

Data Ingestion

  • Loads student performance dataset
  • Splits data into training (80%) and testing (20%) sets
  • Saves processed datasets to artifacts folder

Data Transformation

  • Handles categorical variables (gender, ethnicity, education level, etc.)
  • Applies feature scaling using StandardScaler
  • Creates preprocessing pipeline for consistent data transformation

Model Training

The system evaluates multiple regression algorithms:

  • Random Forest Regressor
  • Decision Tree Regressor
  • Gradient Boosting Regressor
  • Linear Regression
  • XGBoost Regressor
  • CatBoost Regressor
  • AdaBoost Regressor

Each model undergoes hyperparameter tuning using GridSearchCV to find optimal parameters.

Model Selection

  • Automatically selects the best performing model based on R² score
  • Requires minimum R² score of 0.6 for model acceptance
  • Saves the best model for production use

Input Features

  • Gender: Male/Female
  • Race/Ethnicity: Student's ethnic background
  • Parental Level of Education: Education level of parents
  • Lunch: Standard or free/reduced lunch
  • Test Preparation Course: Completed or not completed
  • Reading Score: Student's reading test score
  • Writing Score: Student's writing test score

Output

  • Math Score Prediction: Predicted math test score (0-100)

Technologies Used

  • Python 3.8
  • Scikit-learn: Machine learning algorithms and preprocessing
  • XGBoost & CatBoost: Advanced boosting algorithms
  • Flask: Web application framework
  • Pandas & NumPy: Data manipulation and analysis
  • HTML/CSS: Frontend interface

Model Performance

The system automatically selects the best performing model based on R² score evaluation on test data, ensuring reliable predictions for student math performance.

About

ML-powered student performance prediction system with end-to-end pipeline. Predicts math scores using demographic & academic features through automated model selection (Random Forest, XGBoost, CatBoost) and Flask web deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages