Clustering and Principal Feature Selection Impact for Internet Traffic Classification Using K-NN
Paramita, Adi Suryaputra
MetadataShow full item record
Abstract K-NN is a classification algorithm which suitable for large amounts of data and have higher accuracy for internet traffic classification, unfortunately K-NN algorithm has disadvantage in computation time because K-NN algorithm calculates the distance of all data in some dataset. This research provide alternative solution to overcome K-NN computation time, the alternative solution is to implement clustering process before the classification process. Clustering process does not require high computation time. Fuzzy C-Mean algorithm is implemented in this research. The Fuzzy C-Mean algorithm clusters the based datasets that be entered. Fuzzy C-Mean has disadvantage of clustering, that is the results are often not the same even though the input data are same, and the initial dataset that of the Fuzzy C-Mean is not optimal, to optimize the initial datasets, in this research, feature selection algorithm is used, after selecting the main feature of dataset, the output from fuzzy C-Mean become consistent. Selection of the features is a method that is expected to provide an initial dataset that is optimum for the algorithm Fuzzy C-Means. Algorithms for feature selection in this study used is Principal Component Analysis (PCA). PCA reduced nonsignificant attribute to created optimal dataset and can improve performance clustering and classification algorithm.Results of this research is clustering and principal feature selection give signifanct impact in accuracy and computation time for internet traffic classification. The combination from this three methods have successfully modeled to generate a data classification method of internet bandwidth usage.