Skip to content

Pure-Python implementations of fundamental machine learning algorithms for hands-on learning and experimentation.

License

Notifications You must be signed in to change notification settings

AdilShamim8/ML-Algorithm-Library

Repository files navigation

ML-Algorithm-Library

ML-Algorithm-Library: A curated collection of classic machine learning algorithms implemented from scratch in Python + NumPy. Designed for clarity, extensibility, and learning—empowering you to inspect, modify, and build on foundational algorithms.


Table of Contents

  1. Overview
  2. Features
  3. Badges & CI
  4. Installation
  5. Usage
  6. Repository Structure
  7. Development & Contributing
  8. Roadmap
  9. License
  10. Acknowledgments
  11. Contact & Support

Overview

Machine learning is often treated as a black box. ML-Algorithm-Library demystifies core algorithms by providing pure-Python (+ NumPy) implementations with clear, well-documented code. This fosters deep understanding: inspect every operation, trace the math-to-code mapping, experiment with variations, and extend for your research or production prototypes.

Key goals:

  • Educational clarity: Code is written & commented to teach intuition as well as implementation details.
  • Minimal dependencies: Only NumPy and standard library; avoids hiding logic behind heavy frameworks.
  • Consistent interface: Uniform fit / predict (or analogous) APIs across models.
  • Modular design: Organized by category (Regression, Classification, Clustering, etc.), making it easy to extend.
  • Production-awareness: While focusing on clarity, highlights performance considerations and when to scale up to optimized libraries.

Features

  • Core Algorithms Implemented from Scratch
    • Regression: Simple Linear, Multiple Linear, Polynomial, Ridge, Lasso
    • Classification: Logistic Regression, k-NN, SVM (linear & kernel), Naive Bayes, Decision Trees, Ensemble methods (Bagging, Random Forest, AdaBoost, Gradient Boosting)
    • Clustering: k-Means, Hierarchical Agglomerative, Gaussian Mixture Models (EM), optional DBSCAN
    • Dimensionality Reduction: PCA, LDA, SVD demonstrations
    • Neural Networks: Perceptron, Multilayer Perceptron with backpropagation, Autoencoders
    • Recommendation Basics: Collaborative Filtering, Content-Based Filtering, Matrix Factorization prototypes
    • (Optional/Advanced) Reinforcement Learning: Multi-armed bandit, Q-learning demos
  • Utilities
    • Data preprocessing: train-test split, normalization/standardization, one-hot encoding
    • Metrics: MSE, RMSE, MAE, R², accuracy, precision/recall/F1, confusion matrix, silhouette score, etc.
    • Helpers: reproducible random_state usage, logging/progress for iterative algorithms
  • Jupyter Notebooks & Examples
    • Intuitive tutorials for each algorithm: theory → code walkthrough → visualization → “what if” experiments
    • Standalone scripts for quick trials
  • Testing
    • Pytest-based unit tests on synthetic datasets to verify correctness and edge conditions
  • Extensible Structure
    • Encourage contributions: add new algorithms or improvements following the established template.

Badges & CI

  • Build Status: Automated tests run on GitHub Actions for every push/PR.
  • Coverage (optional): Integrate Codecov or Coveralls to monitor test coverage.
  • License: MIT.
  • Python Version Support: 3.7 and above.

Badges appear at the top; CI configuration file (.github/workflows/ci.yml) defines steps:

  1. Checkout code
  2. Setup Python environment
  3. Install dependencies (numpy, pytest)
  4. Run pytest and optionally coverage reporting.

Installation

This library is lightweight and not currently published to PyPI. You can install directly from GitHub:

# Clone repository
git clone https://github.com/AdilShamim8/ML-Algorithm-Library.git
cd ML-Algorithm-Library

# (Optional) Create a virtual environment
python -m venv venv
source venv/bin/activate   # macOS/Linux
venv\Scripts\activate      # Windows

# Install dependencies
pip install numpy
# If you plan to run tests or notebooks:
pip install pytest jupyter matplotlib

To integrate this library into another project, install in “editable” mode:

pip install -e .

(Ensure you have a setup.py or pyproject.toml configured if you choose to support pip install -e ..)


Usage

Importing and Running an Algorithm

All algorithms follow a consistent pattern:

from ml_algorithm_library.regression.simple_linear import SimpleLinearReg

# Prepare data
X_train = [1, 2, 3, 4, 5]
y_train = [2, 4, 5, 4, 5]

# Instantiate & fit
model = SimpleLinearReg()
model.fit(X_train, y_train)

# Predict
X_test = [6, 7, 8]
y_pred = model.predict(X_test)
print("Predictions:", y_pred)

Example Notebooks

  • Regression Tutorial: notebooks/Regression/SimpleLinearRegression.ipynb

    • Intuition: fitting a line as minimizing squared errors.
    • Code walkthrough: calculating slope/intercept.
    • Visualization: plot data & fitted line.
    • Experiments: add noise, outliers, regularization.
  • Classification Tutorial: notebooks/Classification/LogisticRegression.ipynb

    • Derive sigmoid, loss, gradient descent.
    • Compare with scikit-learn output.
    • Plot decision boundary on synthetic data.
  • Clustering & Dimensionality Reduction: Similar structure with plots showing cluster assignments, PCA projections.

Use jupyter notebook to open and run these demos. They serve both as learning material and quick reference for usage patterns.


Repository Structure

ML-Algorithm-Library/
│
├── ml_algorithm_library/           # Core package
│   ├── __init__.py
│   ├── regression/
│   │   ├── __init__.py
│   │   ├── simple_linear.py
│   │   ├── multiple_linear.py
│   │   ├── polynomial.py
│   │   ├── ridge.py
│   │   └── lasso.py
│   ├── classification/
│   │   ├── __init__.py
│   │   ├── logistic_regression.py
│   │   ├── knn.py
│   │   ├── svm.py
│   │   ├── naive_bayes.py
│   │   ├── decision_tree.py
│   │   └── ensemble/
│   │       ├── bagging.py
│   │       ├── random_forest.py
│   │       ├── adaboost.py
│   │       └── gradient_boosting.py
│   ├── clustering/
│   │   ├── __init__.py
│   │   ├── kmeans.py
│   │   ├── hierarchical.py
│   │   └── gmm.py
│   ├── dimensionality_reduction/
│   │   ├── __init__.py
│   │   ├── pca.py
│   │   └── lda.py
│   ├── neural_networks/
│   │   ├── __init__.py
│   │   ├── perceptron.py
│   │   ├── mlp.py
│   │   └── autoencoder.py
│   ├── recommendation/
│   │   ├── __init__.py
│   │   ├── collaborative_filtering.py
│   │   └── content_based.py
│   ├── utils/
│   │   ├── __init__.py
│   │   ├── metrics.py
│   │   ├── data_preprocessing.py
│   │   └── helpers.py
│   └── ...                          # optional advanced modules
│
├── notebooks/                      # Jupyter tutorials & demos
│   ├── Regression/
│   ├── Classification/
│   ├── Clustering/
│   └── NeuralNetworks/
│
├── examples/                       # Standalone scripts (e.g., run_simple_linear.py)
│
├── tests/                          # Pytest tests
│   ├── test_simple_linear.py
│   ├── test_logistic_regression.py
│   └── ...
│
├── .github/
│   ├── workflows/
│   │   └── ci.yml                 # GitHub Actions CI config
│   ├── ISSUE_TEMPLATE.md
│   ├── PULL_REQUEST_TEMPLATE.md
│   └── CODE_OF_CONDUCT.md
│
├── requirements.txt                # numpy, pytest, jupyter, matplotlib (optional)
├── setup.py / pyproject.toml       # For packaging (optional)
├── README.md                       # This file
└── LICENSE                         # MIT License

Development & Contributing

Contributions are welcomed! Follow these guidelines to ensure consistency and high quality.

Code Style & Standards

  • PEP 8 compliant: readable, concise, clear.
  • Docstrings: Use triple-quoted docstrings. Top-level summary, followed by parameters, returns, and brief explanation of algorithm’s intuition or references.
  • Inline comments: Explain non-obvious steps (e.g., derivation points, vectorization decisions).
  • Type hints (optional): You may include type hints for clarity, but keep code beginner-friendly.
  • Dependencies: Only NumPy and standard library in core modules. Development dependencies (pytest, jupyter, matplotlib) listed in requirements.txt.

Branching & Workflow

  • Main branch: Always passing CI, stable.

  • Feature branches: Branch from main named feature/<algorithm-name> or fix/<issue-number>.

  • Pull Requests:

    • Base branch: main.
    • Provide clear PR title and description: what was added/changed, motivation, references.
    • Link any related issue.
    • Ensure all tests pass locally before opening PR.

Testing

  • Use pytest.
  • Write tests for new algorithms under tests/, e.g., test_<algorithm>.py.
  • Synthetic datasets: small, deterministic. Validate core functionality and edge cases.
  • Run locally: pytest --maxfail=1 --disable-warnings -q.
  • CI will run pytest on each push/PR; ensure coverage doesn’t drop significantly.

Continuous Integration

  • GitHub Actions config in .github/workflows/ci.yml. Example steps:

    1. Checkout code.
    2. Setup Python (versions: 3.7, 3.8, 3.9, 3.10+).
    3. Install dependencies (pip install numpy pytest).
    4. Run pytest.
    5. (Optional) Report coverage to Codecov.
  • Keep CI fast; avoid heavy dependencies in CI. Notebooks are not executed in CI by default (to save time), but code cells can be spot-tested if desired.

Issue & PR Templates

  • Issue template: Encourage clear bug reports or feature requests:

    • Title prefix: [BUG], [FEATURE], [DOC].
    • Description: Expected behavior, actual behavior, minimal reproduction code.
  • PR template: Checklist for authors:

    • Code follows style guidelines.
    • New tests added and passing.
    • Documentation/notebook updated.
    • Relevant issue linked.

Templates reside under .github/ISSUE_TEMPLATE.md and .github/PULL_REQUEST_TEMPLATE.md.

Code of Conduct

  • A CODE_OF_CONDUCT.md in .github/ defines community standards.
  • Encourage respectful, inclusive collaboration.
  • Clearly state reporting guidelines for unacceptable behavior.

Contributor Recognition

  • Contributor list reflected in README via a badge or link to GitHub contributors graph.
  • Acknowledge significant contributions in the release notes or the acknowledgments section.

Testing Locally

  1. Clone and activate the environment:

    git clone https://github.com/AdilShamim8/ML-Algorithm-Library.git
    cd ML-Algorithm-Library
    python -m venv venv
    source venv/bin/activate
    pip install numpy pytest
  2. Run tests:

    pytest
  3. (Optional) Run notebooks:

    pip install jupyter matplotlib
    jupyter notebook
  4. Ensure new contributions include tests and docs, and pass CI.


Roadmap

Track progress via GitHub Projects or Issues. Possible milestones:

  • v0.1: Core Regression & Classification (Simple Linear, Multiple Linear, Logistic, k-NN, Decision Tree)
  • v0.2: Regularized Models & Ensembles (Ridge, Lasso, Bagging, Random Forest)
  • v0.3: Clustering & Dimensionality Reduction (k-Means, PCA, Hierarchical Clustering)
  • v0.4: Neural Networks from Scratch (Perceptron, MLP)
  • v0.5: Advanced Methods (Gradient Boosting basics, GMM, Recommendation prototypes)
  • v1.0: Stable release with comprehensive tests, documentation, and example notebooks.

Each version is tagged with semantic versioning (v0.x.y). Release notes summarizing additions, fixes, and any breaking changes.


License

This project is licensed under the MIT License. See LICENSE for full text.


Acknowledgments

  • Classic machine learning textbooks and open-source implementations that inspired this library.
  • The broader ML community’s tutorials, papers, and discussions—guiding clarity and pedagogy.
  • Contributors who enhance the library with new algorithms, tests, and improved explanations.

Contact & Support

  • Issues & Feature Requests: Open an issue in this repository.
  • Discussion: Use GitHub Discussions (if enabled) or open an issue with a “Discussion” label.
  • Maintainer: Adil Shamim (@AdilShamim8).

Stay engaged: star the Repository ⭐, share feedback, and help others learn by contributing examples or improvements.

About

Pure-Python implementations of fundamental machine learning algorithms for hands-on learning and experimentation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published