ML-Algorithm-Library: A curated collection of classic machine learning algorithms implemented from scratch in Python + NumPy. Designed for clarity, extensibility, and learning—empowering you to inspect, modify, and build on foundational algorithms.
- Overview
- Features
- Badges & CI
- Installation
- Usage
- Repository Structure
- Development & Contributing
- Roadmap
- License
- Acknowledgments
- Contact & Support
Machine learning is often treated as a black box. ML-Algorithm-Library demystifies core algorithms by providing pure-Python (+ NumPy) implementations with clear, well-documented code. This fosters deep understanding: inspect every operation, trace the math-to-code mapping, experiment with variations, and extend for your research or production prototypes.
Key goals:
- Educational clarity: Code is written & commented to teach intuition as well as implementation details.
- Minimal dependencies: Only NumPy and standard library; avoids hiding logic behind heavy frameworks.
- Consistent interface: Uniform
fit
/predict
(or analogous) APIs across models. - Modular design: Organized by category (Regression, Classification, Clustering, etc.), making it easy to extend.
- Production-awareness: While focusing on clarity, highlights performance considerations and when to scale up to optimized libraries.
- Core Algorithms Implemented from Scratch
- Regression: Simple Linear, Multiple Linear, Polynomial, Ridge, Lasso
- Classification: Logistic Regression, k-NN, SVM (linear & kernel), Naive Bayes, Decision Trees, Ensemble methods (Bagging, Random Forest, AdaBoost, Gradient Boosting)
- Clustering: k-Means, Hierarchical Agglomerative, Gaussian Mixture Models (EM), optional DBSCAN
- Dimensionality Reduction: PCA, LDA, SVD demonstrations
- Neural Networks: Perceptron, Multilayer Perceptron with backpropagation, Autoencoders
- Recommendation Basics: Collaborative Filtering, Content-Based Filtering, Matrix Factorization prototypes
- (Optional/Advanced) Reinforcement Learning: Multi-armed bandit, Q-learning demos
- Utilities
- Data preprocessing: train-test split, normalization/standardization, one-hot encoding
- Metrics: MSE, RMSE, MAE, R², accuracy, precision/recall/F1, confusion matrix, silhouette score, etc.
- Helpers: reproducible random_state usage, logging/progress for iterative algorithms
- Jupyter Notebooks & Examples
- Intuitive tutorials for each algorithm: theory → code walkthrough → visualization → “what if” experiments
- Standalone scripts for quick trials
- Testing
- Pytest-based unit tests on synthetic datasets to verify correctness and edge conditions
- Extensible Structure
- Encourage contributions: add new algorithms or improvements following the established template.
- Build Status: Automated tests run on GitHub Actions for every push/PR.
- Coverage (optional): Integrate Codecov or Coveralls to monitor test coverage.
- License: MIT.
- Python Version Support: 3.7 and above.
Badges appear at the top; CI configuration file (.github/workflows/ci.yml
) defines steps:
- Checkout code
- Setup Python environment
- Install dependencies (
numpy
,pytest
) - Run
pytest
and optionally coverage reporting.
This library is lightweight and not currently published to PyPI. You can install directly from GitHub:
# Clone repository
git clone https://github.com/AdilShamim8/ML-Algorithm-Library.git
cd ML-Algorithm-Library
# (Optional) Create a virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
# Install dependencies
pip install numpy
# If you plan to run tests or notebooks:
pip install pytest jupyter matplotlib
To integrate this library into another project, install in “editable” mode:
pip install -e .
(Ensure you have a setup.py
or pyproject.toml
configured if you choose to support pip install -e .
.)
All algorithms follow a consistent pattern:
from ml_algorithm_library.regression.simple_linear import SimpleLinearReg
# Prepare data
X_train = [1, 2, 3, 4, 5]
y_train = [2, 4, 5, 4, 5]
# Instantiate & fit
model = SimpleLinearReg()
model.fit(X_train, y_train)
# Predict
X_test = [6, 7, 8]
y_pred = model.predict(X_test)
print("Predictions:", y_pred)
-
Regression Tutorial:
notebooks/Regression/SimpleLinearRegression.ipynb
- Intuition: fitting a line as minimizing squared errors.
- Code walkthrough: calculating slope/intercept.
- Visualization: plot data & fitted line.
- Experiments: add noise, outliers, regularization.
-
Classification Tutorial:
notebooks/Classification/LogisticRegression.ipynb
- Derive sigmoid, loss, gradient descent.
- Compare with scikit-learn output.
- Plot decision boundary on synthetic data.
-
Clustering & Dimensionality Reduction: Similar structure with plots showing cluster assignments, PCA projections.
Use jupyter notebook
to open and run these demos. They serve both as learning material and quick reference for usage patterns.
ML-Algorithm-Library/
│
├── ml_algorithm_library/ # Core package
│ ├── __init__.py
│ ├── regression/
│ │ ├── __init__.py
│ │ ├── simple_linear.py
│ │ ├── multiple_linear.py
│ │ ├── polynomial.py
│ │ ├── ridge.py
│ │ └── lasso.py
│ ├── classification/
│ │ ├── __init__.py
│ │ ├── logistic_regression.py
│ │ ├── knn.py
│ │ ├── svm.py
│ │ ├── naive_bayes.py
│ │ ├── decision_tree.py
│ │ └── ensemble/
│ │ ├── bagging.py
│ │ ├── random_forest.py
│ │ ├── adaboost.py
│ │ └── gradient_boosting.py
│ ├── clustering/
│ │ ├── __init__.py
│ │ ├── kmeans.py
│ │ ├── hierarchical.py
│ │ └── gmm.py
│ ├── dimensionality_reduction/
│ │ ├── __init__.py
│ │ ├── pca.py
│ │ └── lda.py
│ ├── neural_networks/
│ │ ├── __init__.py
│ │ ├── perceptron.py
│ │ ├── mlp.py
│ │ └── autoencoder.py
│ ├── recommendation/
│ │ ├── __init__.py
│ │ ├── collaborative_filtering.py
│ │ └── content_based.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── metrics.py
│ │ ├── data_preprocessing.py
│ │ └── helpers.py
│ └── ... # optional advanced modules
│
├── notebooks/ # Jupyter tutorials & demos
│ ├── Regression/
│ ├── Classification/
│ ├── Clustering/
│ └── NeuralNetworks/
│
├── examples/ # Standalone scripts (e.g., run_simple_linear.py)
│
├── tests/ # Pytest tests
│ ├── test_simple_linear.py
│ ├── test_logistic_regression.py
│ └── ...
│
├── .github/
│ ├── workflows/
│ │ └── ci.yml # GitHub Actions CI config
│ ├── ISSUE_TEMPLATE.md
│ ├── PULL_REQUEST_TEMPLATE.md
│ └── CODE_OF_CONDUCT.md
│
├── requirements.txt # numpy, pytest, jupyter, matplotlib (optional)
├── setup.py / pyproject.toml # For packaging (optional)
├── README.md # This file
└── LICENSE # MIT License
Contributions are welcomed! Follow these guidelines to ensure consistency and high quality.
- PEP 8 compliant: readable, concise, clear.
- Docstrings: Use triple-quoted docstrings. Top-level summary, followed by parameters, returns, and brief explanation of algorithm’s intuition or references.
- Inline comments: Explain non-obvious steps (e.g., derivation points, vectorization decisions).
- Type hints (optional): You may include type hints for clarity, but keep code beginner-friendly.
- Dependencies: Only NumPy and standard library in core modules. Development dependencies (pytest, jupyter, matplotlib) listed in
requirements.txt
.
-
Main branch: Always passing CI, stable.
-
Feature branches: Branch from
main
namedfeature/<algorithm-name>
orfix/<issue-number>
. -
Pull Requests:
- Base branch:
main
. - Provide clear PR title and description: what was added/changed, motivation, references.
- Link any related issue.
- Ensure all tests pass locally before opening PR.
- Base branch:
- Use pytest.
- Write tests for new algorithms under
tests/
, e.g.,test_<algorithm>.py
. - Synthetic datasets: small, deterministic. Validate core functionality and edge cases.
- Run locally:
pytest --maxfail=1 --disable-warnings -q
. - CI will run
pytest
on each push/PR; ensure coverage doesn’t drop significantly.
-
GitHub Actions config in
.github/workflows/ci.yml
. Example steps:- Checkout code.
- Setup Python (versions: 3.7, 3.8, 3.9, 3.10+).
- Install dependencies (
pip install numpy pytest
). - Run
pytest
. - (Optional) Report coverage to Codecov.
-
Keep CI fast; avoid heavy dependencies in CI. Notebooks are not executed in CI by default (to save time), but code cells can be spot-tested if desired.
-
Issue template: Encourage clear bug reports or feature requests:
- Title prefix:
[BUG]
,[FEATURE]
,[DOC]
. - Description: Expected behavior, actual behavior, minimal reproduction code.
- Title prefix:
-
PR template: Checklist for authors:
- Code follows style guidelines.
- New tests added and passing.
- Documentation/notebook updated.
- Relevant issue linked.
Templates reside under .github/ISSUE_TEMPLATE.md
and .github/PULL_REQUEST_TEMPLATE.md
.
- A
CODE_OF_CONDUCT.md
in.github/
defines community standards. - Encourage respectful, inclusive collaboration.
- Clearly state reporting guidelines for unacceptable behavior.
- Contributor list reflected in README via a badge or link to GitHub contributors graph.
- Acknowledge significant contributions in the release notes or the acknowledgments section.
-
Clone and activate the environment:
git clone https://github.com/AdilShamim8/ML-Algorithm-Library.git cd ML-Algorithm-Library python -m venv venv source venv/bin/activate pip install numpy pytest
-
Run tests:
pytest
-
(Optional) Run notebooks:
pip install jupyter matplotlib jupyter notebook
-
Ensure new contributions include tests and docs, and pass CI.
Track progress via GitHub Projects or Issues. Possible milestones:
- v0.1: Core Regression & Classification (Simple Linear, Multiple Linear, Logistic, k-NN, Decision Tree)
- v0.2: Regularized Models & Ensembles (Ridge, Lasso, Bagging, Random Forest)
- v0.3: Clustering & Dimensionality Reduction (k-Means, PCA, Hierarchical Clustering)
- v0.4: Neural Networks from Scratch (Perceptron, MLP)
- v0.5: Advanced Methods (Gradient Boosting basics, GMM, Recommendation prototypes)
- v1.0: Stable release with comprehensive tests, documentation, and example notebooks.
Each version is tagged with semantic versioning (v0.x.y
). Release notes summarizing additions, fixes, and any breaking changes.
This project is licensed under the MIT License. See LICENSE for full text.
- Classic machine learning textbooks and open-source implementations that inspired this library.
- The broader ML community’s tutorials, papers, and discussions—guiding clarity and pedagogy.
- Contributors who enhance the library with new algorithms, tests, and improved explanations.
- Issues & Feature Requests: Open an issue in this repository.
- Discussion: Use GitHub Discussions (if enabled) or open an issue with a “Discussion” label.
- Maintainer: Adil Shamim (@AdilShamim8).
Stay engaged: star the Repository ⭐, share feedback, and help others learn by contributing examples or improvements.