NSF Career Award (Award# IIS-1149851)
Mining frequent patterns from a hidden dataset is an important task with various real-life applications. In this research, we propose a solution to this problem that is based on Markov Chain Monte Carlo (MCMC) sampling of frequent patterns.
In this work, we propose an interactive pattern discovery framework named PRIIME which identifies a set of interesting patterns for a specific user without requiring any prior input on the interestingness measure of patterns from the user. The proposed framework is generic to support discovery of the interesting set, sequence and graph type patterns.
In this paper, we introduce a new home discovery tool called RAVEN. It uses interactive feedback over a collection of home feature-sets to learn a buyer's interestingness profile. Then it recommends a small list of homes that match with the buyer's interest.
In this work, we propose a frequent subgraph mining algorithm called FSM-H which uses an iterative MapReduce-based framework. FSM-H is complete as it returns all the frequent subgraphs for a given user-defined support, and it is efficient as it applies all the optimizations that the latest FSM algorithms adopt.
In this work, we propose a novel approach for solving graph classification using two alternative graph representations, which are the bag of vertices and the bag of partitions. For the first representation, we use deep learning based node features and for the second, we use traditional metric based features.
In this work, we propose a supervised regression (Cox regression) model inspired by survival analysis to predict the sale probability of a house given historical home sale information within an observation time window.
A novel method for metric embedding of node-pair instances for a dynamic network. DyLink2Vec models the metric embedding task as an optimal coding problem where the objective is to minimize the reconstruction error, and it solves this optimization task using a gradient descent method
A novel method for graphlet transitions based feature representation of the node-pair instances. GraTFEL uses unsupervised feature learning methodologies on graphlet transition based features to give a low-dimensional feature representation of the node-pair instances.
A simple, yet powerful algorithm that obtains the approximate graphlet frequency for all graphlets that have upto 5 vertices.
A Uniform Sampler for Constructing Frequency Histogram of Graphlets. GUISE uses Markov Chain Monte Carlo (MCMC) sampling method for constructing the approximate GFD of a large network.
An approximate triangle counting algorithm, that runs on multi-core computers through a multi-threaded implementation.
12. Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams
A Bayesian non-exhaustive classification framework for solving online name disambiguation task in digital library domain.
Two Indirect triple sampling methods based on Markov Chain Monte Carlo (MCMC) sampling strategy. Triple-MCMC samples triple by performing MCMC walk on an imaginary triple sample space. Vertex-MCMC samples triple by performing MCMC walk on the original network to sample a node and then samples a triple centered by the selected node.