GraTFEL: Link Prediction in Dynamic Networks using Graphlet

Abstract:

A novel method for graphlet transitions based feature representation of the node-pair instances. GraTFEL uses unsupervised feature learning methodologies on graphlet transition based features to give a low-dimensional feature representation of the node-pair instances. GraTFEL models the feature learning task as an optimal coding task where the objective is to minimize the reconstruction error, and it solves this optimization task by using a gradient descent method. We validate the effectiveness of the learned feature representations by utilizing it for link prediction in real-life dynamic networks. Specifically, we show that GraTFEL, which use the extracted feature representation of graphlet transition events, outperforms existing methods that use well-known link prediction features. 

Contribution:

In this work, we propose a novel learning method GraTFEL (Graphlet Transition and Feature Extraction for Link Prediction) for obtaining feature representation of node-pair instances from graphlet transition events in the observed snapshots of the given network. GraTFEL considers the feature learning task as an optimal coding problem such that the optimal code of a node-pair is the desired feature representation. The learning can be considered as a two-step process (compression and reconstruction), where the first step compresses the input representation of a node-pair into a code by a non-linear transformation, and the second step reconstructs the input representation from the code by a reverse process and the optimal code is the one which yields the least amount of reconstruction error. The input representation of a node-pair is given as a vector of graphlet transition events (GTEs) associated with the corresponding node-pair. After obtaining an appropriate feature representation of the node-pairs, a traditional supervised learning technique is used (we use SVM and AdaBoost) for predicting link states at future times in the given dynamic network. 

Method:

A key challenge for dynamic link prediction is choosing an effective feature set for this task. Earlier works choose features by adapting topological features for static link prediction or by considering the feature values of different snapshots as a time series. GraTFEL uses graphlet transition events (GTEs) as features for link prediction. For a given node-pair, the value of a specific GTE feature is a normalized count of the observed GTE involving those node-pairs over the training data. The strength of GTEs as feature for dynamic link prediction comes from the fact that for a given node-pair, GTEs involving those nodes capture both the local topology and their transition over the temporal snapshots. 

Figure6

Feature representation for a node-pair in a dynamic network is constructed by concatenating GTE features from a continuous set of graph snapshots. Concatenation, being the simplest form of feature aggregation across a set of graph snapshots is not essentially the best feature representation to capture temporal characteristics of a node-pair. So, GraTFEL uses unsupervised feature extraction (UFE) to get optimal feature representation from GTE features. UFE provides a better feature representation by discovering dependency among different data dimensions, which cannot be achieved by simple aggregation. It also reduces the data dimension and overcomes the sparsity issue in GTE features. Once the optimal feature representation of a node-pair is known, GraTFEL uses that for solving the link prediction task using a supervised classification model. 

Figure7

Link Prediction Method (GraTFEL):

The pseudo-code of GraTFEL is given in Algorithm 1. For training link prediction model, we split the available network snapshots into two overlapping time windows, [1, t − 1] and [2, t]. GTE features and Ē are constructed in Lines 2 and 4, respectively. Then we learn optimal coding for node-pairs using graphlet transition features (Eˆ ∪ E ̄ ) in Line 5. Unsupervised feature representations are constructed using learned optimal coding (Lines 6 and 7). Finally, a classification model C is learned (Line 8), which is used for predicting link in G_(t+1) (Line 9). 

Figure8

Dataset:

Figure9

We use three real world dynamic networks for evaluating the performance of GraTFEL. These datasets are Enron, Collaboration and Facebook. The Enron email corpus consists of email exchanges between 184 Enron employees; the vertices of the network are employees, and the edges are email events between a pair of employees. The dataset has 11 temporal snapshots. The Collaboration dataset is constructed by using citation data from ArnetMiner (arnetminer.org). Each vertex in this dataset is an author and the edges represent co-authorship. We consider the data from years 2000-2009—total 10 snapshots considering each year as a time stamp. Many authors in the original dataset have published in only one or two years out of all ten years, so we select only those who participate in a minimum of 2 collaboration edges in at least 7 time stamps. The final dataset has 315 authors. Lastly, the dynamic social network of Facebook wall posts has 9 time stamps and 663 nodes. The dynamic link prediction task of all three datasets is to predict links on last snapshot, given the link information of all the previous snapshots. Although the number of vertices in these networks are in hundreds, these are large datasets considering the possible node-pairs and multiple temporal snapshots. 

Result:

Figure10

We present performance comparison of GraTFEL with several competing methods. The bar charts in the top, middle, and bottom rows of Figure 6 display the results for Enron, Collaboration, and Facebook datasets, respectively. The bar charts in a row show comparison results using AUC, PRAUC, and NDCG50 (from left to right). In total, there are 9 bars in a chart, each representing a link prediction method, where the height of a bar is indicative of the performance metric value of the corresponding method. From left to right, the first four bars (blue) correspond to the topological feature based methods, the next four (green) represent time series based methods, and the final bar (brown) represents GraTFEL

Contribution of Unsupervised Feature Extraction:

Figure11

GraTFEL has two novel aspects: first, utilization of graphlet transition events (GTEs) as features, and the second is unsupervised feature learning by optimal coding. In this section, we compare the relative contribution of these two aspects in the performance of GraTFEL. For this comparison, we build a version of GraTFEL which we call GTLiP. GTLiP uses GTEs just like GraTFEL, but it does not use optimal coding, rather it uses the GTEs directly as features. In Figure 7, we show the comparison between GraTFEL, and GTLiP using NDCGp for different p values for all the datasets. The superiority of GraTFEL over GTLiP for all the datasets over a range of p values is clearly evident from the three charts. GTLiP also outperforms all the competing methods. Comparison of GTLiP and other competing methods is not shown in this figure, but the NDCG50 from this figure can be compared with NDCG50 charts in Figure 6 for confirming this claim. This shows that GTE based features, irrespective of unsupervised coding, improve the dynamic link prediction performance over the existing state of-art dynamic link prediction methods. 

Code and Data:

 https://github.com/DMGroup-IUPUI/GraTFEL-Source

Contact:

1. Mohammad Al Hasan (alhasan@cs.iupui.edu)

2. Mahmudur Rahman (prime059@gmail.com)

Grant: NSF CAREER Award (IIS-1149851)

Paper:

Citation:

@Inbook{Rahman2016,
author="Rahman, Mahmudur and Hasan, Mohammad Al",
editor="Frasconi, Paolo and Landwehr, Niels and Manco, Giuseppe and Vreeken, Jilles",
title="Link Prediction in Dynamic Networks Using Graphlet",
bookTitle="Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part I",
year="2016",
publisher="Springer International Publishing",
address="Cham",
pages="394--409",
isbn="978-3-319-46128-1",
doi="10.1007/978-3-319-46128-1_25",
url="http://dx.doi.org/10.1007/978-3-319-46128-1_25"
}