RAVEN: Smart Home Exploration Through Interactive Pattern Discovery

Abstract:

Being a homeowner is undoubtedly a significant milestone in a person's life. In that pursuit, a prospective buyer spends an enormous amount of time for evaluating potential homes that are available in the market. A study shows that 90\% of home buyers rely on the Internet as the primary resource for home related information. However, existing online home search tools i.e. search engines, listing sites, and forums require the user to formulate appropriate search queries to discover the most desired home, which is a complicated task---specifically for the first time home buyers. Another challenge for the home buyers is to filter the search results consisting of hundreds of homes that are generally returned against a search query. With such a process, the perspective home buyer becomes a victim of the well-known information overload' issue. In this paper, we introduce a new home discovery tool called RAVEN. It uses interactive feedback over a collection of home feature-sets to learn a buyer's interestingness profile.  Then it recommends a small list of homes that match with the buyer's interest, thus resolving the information Overload problem and eventually decreasing the interval between home search initiation and purchase.

Overview and motivation:

Being a homeowner is undoubtedly a significant milestone in a person’s life. In that pursuit, a prospective buyer spends an enormous amount of time for evaluating potential homes that are available in the market. A study shows that 90% of home buyers rely on the Internet as the primary resource for home related information. However, existing online home search tools i.e. search engines, listing sites, and forums require the user to formulate appropriate search queries to discover the most desired home, which is a complicated task-- specifically for the first time home buyers. Another challenge for the home buyers is to filter the search results consisting of hundreds of homes that are generally returned against a search query. With such a process, the prospective home buyer becomes a victim of the well-known “information overload” issue. In this paper, we introduce a new home discovery tool called RAVEN. It uses interactive feedback over a

collection of home feature-sets to learn a buyer’s interestingness profile. Then it recommends a small list of homes that match with the buyer’s interest, thus resolving the “information Overload” problem and eventually decreasing the interval between home search initiation and purchase.

Method:

RAVEN models the home exploration task as an interactive itemset pattern discovery problem. Each house is represented as an itemset, where each itemset is a set of features that the house possesses. RAVEN considers many features for a house, including those that are extracted from the textual description of a house, so this dataset is high dimensional and sparse. To discard insignificant or rare features, RAVEN only considers itemsets that are maximal for a system-defined minimum support threshold. Then the problem of selecting the features-set of the desired house becomes a pattern classification problem, where the itemsets representing a desired home belong to the positive class, and the remaining itemsets belong to the negative class. RAVEN solves this classification task in an interactive setup. It gives a user a small collection of itemsets, and she provides a binary feedback on each of the itemsets (likes or does not like) in the collection. Leveraging the feedback as a label, the interactive pattern discovery task learns a classification model to classify all the itemsets in the search space. The performance of interactive pattern discovery model is measured by the number of feedback it needs to achieve the expected level of accuracy in the recommendation.

System Architecture:

Raven Architecture

The proposed system has two main blocks: Train and Recommendation, which are shown in the upper and lower box of Figure 1, respectively. Train learns a model for a user; it contains the following three modules: Frequent Maximal Pattern Miner (FMPM), Learner, and Feedback-Collector (FC). The FMPM module is an off-the-shelf pattern mining algorithm that mines maximal frequent patterns given a normalized minimum support threshold. The Learner learns u’s interestingness function (f) over the maximal pattern set M (summary of the dataset) in multiple stages such that each stage uses the user’s feedback on a small number of patterns selected from a partition of M. The Feedback-Collector (FC)’s responsibility is to identify the patterns that are sent to the user for feedback. The Recommendation suggests houses to the user using the learned model.

Data:

Data

We crawl trulia.com from November 2015 to January 2016 (3 months) for five major cities: Carmel, Fishers, Indianapolis, Zionsville and Noblesville in the Central Indiana. The information that we crawl is the basic house information, text on house detail, school, and crime information. We extract the address of a house from the URL. We use python to write the data crawler.

To clean the crawled text data we use standardized data cleaning approach, i.e. remove stop words, stemming and lemmatization of words. We have manually cleaned some keywords that are written in unstructured abbreviated form. For example, keyword fireplace is written as frplc or firplc in many house details. In order to compile house features set, we use “Rake” to find the key phrases. Next, we group the key phrases, where we keep two key phrases together if they share at least one keyword. That way, we get all the features of a category together. For example, all the key phrases related to basement are grouped together by using this step. Finally, we manually investigate these groups and identify specific house features. In the end, we retain only those house features that appear in at least 25 houses.

In total, we crawl 7, 216 houses. The total number of house features we extracted from the crawled data is 123. The house features are categorized into three groups. The first group of house features are about the interior and exterior of a house. Interior features can be divided into several categories, i.e. details of bedrooms, bathrooms, kitchen, garage, different amenities, etc. Information regarding fence, yard, porch, etc. are considered as exterior features. The second group consists of neighborhood features i.e. nearby schools, shopping malls, restaurants, trails, etc. The final group is the school quality and crime rate information of the house neighborhood.

Result:

Result

The performance of the system improves with the increase in the number of feedback. After 10 interactive sessions, RAVEN reaches to a reasonable performance.

Code, Data, and Demo:

System: http://raven123123.pythonanywhere.com/Raven/

Code and Data: https://github.com/mansurul11/Raven

Demo: https://youtu.be/e2w3nqM6mnw

Contact:

1. Mohammad Al Hasan (alhasan@cs.iupui.edu)

2. Mansurul Bhuiyan (rabbi_buet@yahoo.com)