formulaSAE-track-cones-classification

This project was developed for the Machine Learning course as part of my academic work as a master's degree student in Computer Engineering.

The code was developed and tested on the Google Colab platform.

The assignment and the data were hosted on the Kaggle platform as a competition, owned by the University of Napoli "Federico II".

Important note about the data: The dataset used in this project was provided through a private Kaggle competition hosted by the aforementioned university. Due to the competition's intellectual property rules, the data is not included in this repository.

Project Overview

The dataset

The dataset consists of features extracted from various frames of videos depicting different types of cones used in Formula SAE races. These types are:

Big orange cones, delimiting the beginning and ending of the track
Little orange cones, delimiting the finish area
Blue cones, delimiting the right border of the track
Yellow cones, delimiting the left border of the track

The aim of this project is to properly classify a specific cone detected by sensors given its extracted features. A label is assigned to each type of cone, going from 1 to 4.

The datasets to train and test the model on are in the csv format.

Model implementation

Libraries and first operations

In the following picture 3 blocks can be seen and they show, respectively, all the libraries used, the loading of the dataset through pandas and the drop of columns that won't be used. Also, the first two rows of the csv file can be seen, which give an idea of how the dataset is structured.

Next, the dataset was split in training set, validation set and test set, following a 80 (75-25) - 20 rule, meaning that 80% of the training set got split into 75%-25%.

A class imbalance was noticed. In particular, class 1 only had 927 samples and class 3 only had 916. Thus, the SMOTE technique was used to oversample the minority classes.

Training procedure

Before training the model, both input features and target labels needed to be converted into a format compatible with Keras.

The pipeline that performs the aforementioned operations is the following:

The training and validation Pandas dataframes got coverted into NumPy arrays because Keras expects inputs as arrays;
Encoding of the validation labels into integers via LabelEncoder;
Application of one-hot encoding through get_dummies. This produces a NumPy array, needed for training with the categorical_crossentropy loss function;
Application of the same encoding to the training labels.

Model definition

The neural network used is a simple feedforward model, with the following features:

One hidden layer;
One dense hidden layer with 512 neurons and ReLU activation;
Dropout layer (0.3) to prevent overfitting;
Output layer with 4 neurons and softmax activation, suitable for multiclass classification;
categorical_crossentropy as loss function, which works with one-hot encoded labels;
Stochastic Gradient Descent with momentum as optimizer.

Then, a custom callback SOMT was used, to intervene at different stages of the training process. In particular, the one shown is used to automatically stop training once the model reaches a desired training and validation accuracy.

The training would stop once the train and val thresholds for accuracy would, respectively, go above 93% and 91%.

Results of the training

The following picture shows the results of the training. It can be seen that, out of the fixed 500 epochs, the model stopped at epoch 299:

Here, the training and validation accuracy and loss curves are shown:

There is an errore in the picture. Where it says test, it should say validation.

Furthermore, the local test dataframe produced as output by the model, underwent the same operations as the training and validation ones.

Model testing

To test the model, a csv file was provided by the competition moderators. Its structure is entirely similar to the file used to train the model, and it's just missing the column label.

The operations done on the training, validation and local test dataframes were applied to the test set provided by the moderators; thus, the process (although visible in the provided code) is omitted from this description for semplicity.

The custom callback used for this process is slightly different in terms of thresholds; in fact, it can be seen that the only one set is for the loss, that had to go lower than 0.205

The results produced are the following:

In these final three blocks, what's shown is the following:

The trained model predicts the class probabilities;
Every softmax vector gets rounded to the nearest 0 or 1 per value, and the index of the highest value is found;
+1 is added because the labels needed to be 1 to 4, but the ones in the vector are 0 to 3.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assets		assets
.gitattributes		.gitattributes
FormulaSAEClassification.pdf		FormulaSAEClassification.pdf
README.md		README.md
formulaSAEgit.ipynb		formulaSAEgit.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

formulaSAE-track-cones-classification

Project Overview

The dataset

Model implementation

Libraries and first operations

Training procedure

Model definition

Results of the training

Model testing

About

Uh oh!

Releases

Packages

Languages

Anip95/formulaSAE-track-cones-classification

Folders and files

Latest commit

History

Repository files navigation

formulaSAE-track-cones-classification

Project Overview

The dataset

Model implementation

Libraries and first operations

Training procedure

Model definition

Results of the training

Model testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages