Enhanced bag-of-words model for web forum question post detection

A web forum is an online discussion board that connects people with common interest together. It is a problem-solving platform that has been found useful in tackling technical issues using experts across the globe. Research activities in this domain have been concentrated on answer detection with th...

Full description

Saved in:
Bibliographic Details
Main Authors: Obasa, Adekunle Isiaka, Salim, Naomie, Khan, Atif
Format: Conference or Workshop Item
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/61201/
http://eprints.utm.my/61201/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A web forum is an online discussion board that connects people with common interest together. It is a problem-solving platform that has been found useful in tackling technical issues using experts across the globe. Research activities in this domain have been concentrated on answer detection with the assumption that the starting post is a question post. The quality of web forum question posts varies from excellent to mediocre or even spam. Detecting good question posts require utilization of salient features. In this paper, we enhance the popular bag-of-words model with web forum metadata, simple rule of question mark and question words to mine question posts. We empirically address the following questions in the paper. Will the integration of simple rule of question mark and question words with forum metadata perform better than each of the two? Can dimensionality reduction of bag-of-words (BoW) using chi-square enhance question post detection in web forum? Can combination of BoW with simple rule of question marks, question words and forum metadata further enhance question post detection? We used three publicly available datasets of varying technical degrees for the experiments. The experimental results revealed that an enhanced BoW can perform better than complex techniques that implement higher N-gram with part-of-speech tagging.