City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Master of Philosophy  >

Please use this identifier to cite or link to this item:

Title: A short text modeling method and its applications in an interactive question answering system
Other Titles: Duan wen ben jian mo fang fa ji qi zai jiao hu shi wen da xi tong zhong de ying yong
Authors: Feng, Min (馮敏)
Department: Dept. of Computer Science
Degree: Master of Philosophy
Issue Date: 2007
Publisher: City University of Hong Kong
Subjects: Data mining
Question-answering systems
Text processing (Computer science)
Notes: CityU Call Number: QA76.9.T48 F46 2007
Includes bibliographical references (leaves 79-91)
Thesis (M.Phil.)--City University of Hong Kong, 2007
x, 91 leaves : ill. ; 30 cm.
Type: Thesis
Abstract: Text similarity measures play an increasingly important role in text-related research and application areas such as text mining, Web information retrieval, and question answering. Most existing methods for computing text similarity are only suitable for long text documents. This thesis focuses directly on how to model (a collection of) short text snippets and measure the similarity between each other. Our novel short text modeling method takes account of both semantic and statistical information implied in the short text snippets. It consists of three steps: initialization of similarity between words, iterative calculation of similarity between short text snippets and that between words, and construction of the proximity matrix and dimension reduction. Given a set of raw short text snippets, our method firstly initializes the similarity between words by using a lexical database. We then iteratively calculate both word similarity and short text similarity. Finally, we construct the proximity matrix based on word similarity and use it to convert the raw text snippets into vectors in our model. These resulting vectors (representing the original short text snippets) can be stored in the memory and waiting for further processing by other IR techniques. It is also feasible to embed the dimension reduction technology into the final step. Our experiments on calculation of word similarity, text classification, and text clustering show that the proposed short text modeling method can improve the performance of existing text-related information retrieval (IR) techniques. The thesis also discusses its applications in a user-interactive question answering (QA) system – BuyAns, including answer clustering and fusion, answer automation and user modeling, where both questions and answers are short text snippets. Our approaches to answering clustering and fusion, which include a new styled user interface (UI) and its supporting techniques, aims to facilitate users’ reading of the answers. The enabling technologies include an incremental soft-moVMF clustering algorithm and a hybrid answer fusion approach. The incremental soft-moVMF clustering algorithm is an incremental clustering algorithm based on the soft-moVMF clustering algorithm. Our fusion approach, which integrates the conceptually similar answers after clustering, consists of two parts, namely, extracting the summary using concept vectors and resolving some possible data inconsistencies using fusion rules. We believe the proposed approaches to answer clustering and fusion can also be directly applied to other virtual communities and QA systems. Experiments show that the proposed methods are effective and our proposed novel user interface can greatly improve the efficiency of users in reading and understanding answers. The automatic QA method developed in BuyAns is a simple and intuitive method. For a newly posted question, the method finds the most similar question from the accumulated question/answer database by directly applying the short text modeling method to measuring the similarity between questions. Although the method seems very simple and no NLP techniques are involved, our experiment shows its good performance in practice. This thesis also presents an approach to modeling (or estimating) users’ authority and interest in a user-interactive QA system, e.g., BuyAns. Since our short text modeling method can effectively handle the synonymy and homonymy problems, the user modeling approach (to both user authority and interest) works quite well. Our experiments show the results’ accordance with the situation in the human society.
Online Catalog Link:
Appears in Collections:CS - Master of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html159 BHTMLView/Open
abstract.html159 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer