This project performs metal subgenre classification using raw audio features and a neural network. It focuses on four subgenres:
- heavy metal
- melodeath
- thrash metal
- nu metal. The audio was segmented and analyzed to extract key time-based features.
Two approaches were tested:
-
MFCC-only: Each segment’s mel-frequency cepstral coefficients (MFCCs) were used as input to an LSTM-based neural network. This approach consistently achieved strong results, with classification accuracy reaching 84% on the validation set.
-
MFCC + Chroma: Combining MFCCs with chroma vectors (representing harmonic content) produced similar accuracy, showing that chroma can contribute useful information while keeping the model efficient.
The classifier operates on short segments of songs, allowing it to detect subgenre-relevant patterns across time.
Both approaches yield similar results (around 80% accuracy).