Important
FSML is in a pre-alpha state. Existing procedures and API may change significantly.
FSML
(Fortran Statistics and Machine Learning) is a scientific toolkit consisting of common statistical and machine learning procedures, including basic descriptive statistics (e.g., mean, variance, correlation), common statistical tests (e.g., t-test, Mann–Whitney U), linear parametric methods and models (e.g., multiple OLS regression, discriminant analysis), and non-linear statistical and machine learning procedures (e.g., k-means clustering).
Key features:
- Common statistics and machine learning techniques (as used in modern research).
- Familiar/intuitive interface (similarities to popular Python or R libs).
- Core procedures are kept pure (to simplify parallelisation and testing), while impure wrappers handle optional arguments and errors for safe conventional use.
- Minimal requirements/dependencies (Fortran 2008 or later, and stdlib).
The FSML Handbook. includes a short tutorial, detailed API documentation, as well as information for contributors and licence (MIT) details. The documentation pages were generated by FORD.
The aim is to create an easy-to-use library for modern Fortran applications that covers many statistics and machine learning procedures that are commonly used in research.
FSML
started as an effort to rewrite, re-structure, clean-up, and enhance old Fortran code I've written in the past 15 years, and to bundle and publish it as a well organised and well documented library.
The published research below uses some of the to-be-reworked code and demonstrates some applications of the above-mentioned methods:
- Mutz and Ehlers (2019) (k-means and hierarchical clustering, and discriminant analysis).
- Mutz et al. (2015) (multiple regression in cross validation and bootstrap setting, principal component analysis, and Bayesian classifier).
Currently covered are procedures for sample statistics (STS), statistical distributions (DST) and statistical tests (TST). See the full list here. Additionally planned are procedures that rely heavily on linear algebra (e.g., PCA), nonlinear algorithmic procedures (e.g., k-means clustering), and machine learning framework extensions (e.g., cross-validation).
I will consider the library to be in "alpha" once FSML
covers all of the originally planned functionality.
This stage is reached once FSML
:
- has undergone substantial testing (incl. comparisons to other libs).
- has proper documentation.
- fully works with GFortran and LFortran compilers.