Predict Network Intruder Using Machine Learning Model and Classification

Pp: 150-171 (22)

* (Excluding Mailing and Handling)

Abstract

The massive number of sensors deployed in IoT generates humongous volumes of data for a broad range of applications such as smart home, smart healthcare, smart manufacturing, smart transportation, smart grid, smart agriculture etc. Analyzing such data in order to facilitate enhanced decision making and increase productivity and accuracy is a critical process for businesses and life improving paradigm. Machine Learning would play a vital role in creating smarter techniques to predict the intruder from the dataset. It has shown remarkable results in different fields, including Network security, image recognition, information retrieval, speech recognition, natural language processing, indoor localization, physiological and psychological state detection, etc. In this regard, intrusion detection is becoming a research focus in the field of information security. In our experiment, we used the CICIDS2017 data set to predict the Network Intruder. The Canadian Institute of Cyber Security released the data set CICIDS-2017, which consists of eight separate files and includes five days’ worth of normal cum abnormal network packet data. The goal of this research is to examine relevant and significant elements of large network packets in order to increase network packet attack detection accuracy and reduce execution time. We choose important and meaningful features by applying Information Gain, ranking and grouping features based on little weight values on the CICIDS-2017 dataset; and then use Random Forest (RF), Random Tree (RT), Naive Bayes (NB), Bayes Net (BN), and J48 classifier algorithms. The findings of the experiment reveal that the amount of relevant and significant features produced by Information Gain has a substantial impact on improving detection accuracy and execution time. The Random Forest method, for example, has the best accuracy with 0.14% of negative results when using 22 relevant selected features, whereas the Random Tree classifier algorithm has a higher accuracy with 0.13% of negative results when using 52 relevant selected features but takes a longer execution time.

Keywords: Accuracy, CICIDS2017, Classification, Execution time, Information Gain, Model Prediction, Recent Data Set.

Cite as