Anomaly-Detection-Using-CSE-CIC-2018

Group Number: 23
Link to the sample data: https://drive.google.com/file/d/1EMe83pWkccHg3mD5LLuCaHURjYjx7fjo/view?usp=sharing (It's a compressed (.rar) file so you have to decompress it)

When attempting to run the dataset using pandas read_csv() function, the function looks for the dataset inside a folder named 'archive' (Ex: pd.read_csv('./archive/03-02-2018.csv') ). So either be sure the dataset is inside such folder, or feel free to change it in any way as long as it points to the dataset (.csv) file.

In this project, we build an anomaly-based intrusion detection system (IDS) using Machine Learning algorithms. To do that we used the CSE-CIC-IDS2018 dataset from the University of New Brunswick, which is a collaborative project between the Communications Security Establishment (CSE) and the Canadian Institute for Cybersecurity (CIC).
The dataset is of size (aprox.) 450 GB from the original source (https://www.unb.ca/cic/datasets/ids-2018.html). So instead we rely on the dataset found on Kaggle in the link: https://www.kaggle.com/datasets/solarmainframe/ids-intrusion-csv

About the dataset:

The CSE-CIC-IDS2018 dataset from Kaggle contains 10 files each captured in a specific day with different types of attacks:

03-02-2018: ['Benign', 'Bot']
02-28-2018 - 03-01-2018: ['Benign', 'Label', 'Infilteration']
02-22-2018 - 02-23-2018: ['Benign', 'Brute Force -Web', 'Brute Force -XSS', 'SQL Injection']
02-21-2018 : ['Benign', 'DDOS attack-LOIC-UDP', 'DDOS attack-HOIC']
02-20-2018: ['Benign', 'DDoS attacks-LOIC-HTTP']
02-16-2018: ['Benign', 'DoS attacks-SlowHTTPTest', 'DoS attacks-Hulk', 'Label']
02-15-2018: ['Benign', 'DoS attacks-GoldenEye', 'DoS attacks-Slowloris']
02-14-2018: ['Benign', 'FTP-BruteForce', 'SSH-Bruteforce']

In our study, we relied on data from dates: 03-02-2018 - 03-01-2018 - 02-23-2018 - 02-14-2018. Which covers the following attacks: Infilteration, Brute Force -XSS, Brute Force -Web, SQL Injection, Bot, FTP-BruteForce, and SSH-Bruteforce. These datasets were merged, preprocessed, and applied to different models. Note: If you will manually download the datasets from kaggle, there might be some issues with files 02-28-2018 - 03-01-2018 - 02-16-2018 because the column row (first row) is repeated multiple times, making pandas assign dtype 'object' to all the columns. So if thats the case, you can change them to dtype 'float' (There is a sample code included in our notebook), however, there is no such issue with our sample data.

Machine Learning Algorithms Used:
- Decision Trees
- Voting Classifier (Made up of Decision Trees, Random Forest, and AdaBoost)
- Isolation Forest

The code also includes K-NN and Random Forest. But in our report we covered only whats on the list.

Libraries Used:
- Numpy
- Scikit-Learn
- Matplotlib (For data visualization)
- Seaborn (For data visualization)
- Pandas

However, bear in mind that these libraries are already installed in Colab, but if you will run this locally then you have to install these packages manually by opening CMD (Terminal) and type:

pip install -r requirements.txt

Which would install all the packages for you, or you can install them manually one by one:

pip install numpy

And so on..

For more details on how we preprocessed the dataset, and how we created different models along with their results, plus our Exploratory Data Analysis. Read our report!

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Group_23.pdf		Group_23.pdf
README.md		README.md
model.ipynb		model.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Anomaly-Detection-Using-CSE-CIC-2018

Group Number: 23

Link to the sample data: https://drive.google.com/file/d/1EMe83pWkccHg3mD5LLuCaHURjYjx7fjo/view?usp=sharing (It's a compressed (.rar) file so you have to decompress it)

Started By:

Professor:

Intro:

About the dataset:

Machine Learning Algorithms Used:

Libraries Used:

For more details on how we preprocessed the dataset, and how we created different models along with their results, plus our Exploratory Data Analysis. Read our report!

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Retsuko64/Anomaly-Detection-Using-CSE-CIC-IDS2018

Folders and files

Latest commit

History

Repository files navigation

Anomaly-Detection-Using-CSE-CIC-2018

Group Number: 23

Link to the sample data: https://drive.google.com/file/d/1EMe83pWkccHg3mD5LLuCaHURjYjx7fjo/view?usp=sharing (It's a compressed (.rar) file so you have to decompress it)

Started By:

Professor:

Intro:

About the dataset:

Machine Learning Algorithms Used:

Libraries Used:

For more details on how we preprocessed the dataset, and how we created different models along with their results, plus our Exploratory Data Analysis. Read our report!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages