MSc(Engg) Thesis

Fast Identification of Structured P2P Botnets using Community Detection Algorithms

Botnets are a global problem, and effective botnet detection requires cooperation of large Internet Ser- vice Providers, allowing near global visibility of traffic that can be exploited to detect them. The global visibility comes with huge challenges, especially in the amount of data that has to be analysed. To handle such large volumes of data, a robust and effective detection method is the need of the hour and it must rely primarily on a reduced or abstracted form of data such as a graph of hosts, with the presence of an edge between two hosts if there is any data communication between them. Such an abstraction would be easy to construct and store, as very little of the packet needs to be looked at.

Structured P2P command and control have been shown to be robust against targeted and random node failures, thus are ideal mechanisms for botmasters to organize and command their botnets effec- tively. Thus this thesis develops a scalable, efficient and robust algorithm for the detection of structured P2P botnets in large traffic graphs. It draws from the advances in the state of the art in Community Detection, which aim to partition a graph into dense communities.

Popular Community Detection Algorithms with low theoretical time complexities such as Label Propa- gation, Infomap and Louvain Method have been implemented and compared on large LFR benchmark graphs to study their efficiency. Louvain method is found to be capable of handling graphs of millions of vertices and billions of edges. This thesis analyses the performance of this method with two objective functions, Modularity and Stability and found that neither of them are robust and general.

In order to overcome the limitations of these objective functions, a third objective function proposed in the literature is considered. This objective function has previously been used in the case of Protein Interaction Networks successfully, and used in this thesis to detect structured P2P botnets for the first time. Further, the differences in the topological properties - assortativity and density, of structured P2P botnet communities and benign communities are discussed. In order to exploit these differences, a novel measure based on mean regular degree is proposed, which captures both the assortativity and the density of a graph and its properties are studied.

This thesis proposes a robust and efficient algorithm that combines the use of greedy community detec- tion and community filtering using the proposed measure mean regular degree. The proposed algorithm is tested extensively on a large number of datasets and found to be comparable in performance in most cases to an existing botnet detection algorithm called BotGrep and found to be significantly faster.

Bharath Venkatesh