Skip to content

Retsuko64/Anomaly-Detection-Using-CSE-CIC-IDS2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Anomaly-Detection-Using-CSE-CIC-2018

When attempting to run the dataset using pandas read_csv() function, the function looks for the dataset inside a folder named 'archive' (Ex: pd.read_csv('./archive/03-02-2018.csv') ). So either be sure the dataset is inside such folder, or feel free to change it in any way as long as it points to the dataset (.csv) file.

In this project, we build an anomaly-based intrusion detection system (IDS) using Machine Learning algorithms. To do that we used the CSE-CIC-IDS2018 dataset from the University of New Brunswick, which is a collaborative project between the Communications Security Establishment (CSE) and the Canadian Institute for Cybersecurity (CIC).
The dataset is of size (aprox.) 450 GB from the original source (https://www.unb.ca/cic/datasets/ids-2018.html). So instead we rely on the dataset found on Kaggle in the link: https://www.kaggle.com/datasets/solarmainframe/ids-intrusion-csv

  • About the dataset:

The CSE-CIC-IDS2018 dataset from Kaggle contains 10 files each captured in a specific day with different types of attacks:

  • 03-02-2018: ['Benign', 'Bot']
  • 02-28-2018 - 03-01-2018: ['Benign', 'Label', 'Infilteration']
  • 02-22-2018 - 02-23-2018: ['Benign', 'Brute Force -Web', 'Brute Force -XSS', 'SQL Injection']
  • 02-21-2018 : ['Benign', 'DDOS attack-LOIC-UDP', 'DDOS attack-HOIC']
  • 02-20-2018: ['Benign', 'DDoS attacks-LOIC-HTTP']
  • 02-16-2018: ['Benign', 'DoS attacks-SlowHTTPTest', 'DoS attacks-Hulk', 'Label']
  • 02-15-2018: ['Benign', 'DoS attacks-GoldenEye', 'DoS attacks-Slowloris']
  • 02-14-2018: ['Benign', 'FTP-BruteForce', 'SSH-Bruteforce']

In our study, we relied on data from dates: 03-02-2018 - 03-01-2018 - 02-23-2018 - 02-14-2018. Which covers the following attacks: Infilteration, Brute Force -XSS, Brute Force -Web, SQL Injection, Bot, FTP-BruteForce, and SSH-Bruteforce. These datasets were merged, preprocessed, and applied to different models. Note: If you will manually download the datasets from kaggle, there might be some issues with files 02-28-2018 - 03-01-2018 - 02-16-2018 because the column row (first row) is repeated multiple times, making pandas assign dtype 'object' to all the columns. So if thats the case, you can change them to dtype 'float' (There is a sample code included in our notebook), however, there is no such issue with our sample data.

  • Machine Learning Algorithms Used:

    • Decision Trees
    • Voting Classifier (Made up of Decision Trees, Random Forest, and AdaBoost)
    • Isolation Forest

The code also includes K-NN and Random Forest. But in our report we covered only whats on the list.

  • Libraries Used:

    • Numpy
    • Scikit-Learn
    • Matplotlib (For data visualization)
    • Seaborn (For data visualization)
    • Pandas

However, bear in mind that these libraries are already installed in Colab, but if you will run this locally then you have to install these packages manually by opening CMD (Terminal) and type:

pip install -r requirements.txt

Which would install all the packages for you, or you can install them manually one by one:

pip install numpy

And so on..

For more details on how we preprocessed the dataset, and how we created different models along with their results, plus our Exploratory Data Analysis. Read our report!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •