City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Doctor of Philosophy  >

Please use this identifier to cite or link to this item:

Title: Applications of short text similarity assessment in user-interactive question answering
Other Titles: Duan wen ben xiang si du ji suan zai yong hu jiao hu shi wen da xi tong zhong de ying yong
Authors: Song, Wanpeng (宋萬鵬)
Department: Department of Computer Science
Degree: Doctor of Philosophy
Issue Date: 2010
Publisher: City University of Hong Kong
Subjects: Question-answering systems.
Text processing (Computer science)
Notes: CityU Call Number: QA76.9.Q4 S66 2010
101 leaves : ill. 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2010.
Includes bibliographical references (leaves 91-101)
Type: thesis
Abstract: With the dramatic development of the Internet and the emergency of Web 2.0, Question Answering (QA) becomes a new Information Retrieval (IR) technology. Unlike search engines, which return a lot of relevant documents, QA systems give one or several exact answers for each user question, which is more preferable. However, traditional automatic QA systems suffer from poor answer quality problem because it is very difficult for machines to understand questions in natural languages well. To solve this problem, User-interactive QA systems have been developed and become very popular Web-based services. Unlike the traditional automatic QA systems which totally obtain answers automatically, the user-interactive QA systems serve as interactive platforms for users to help each other with human-provided answers, which overcome the shortcoming of poor quality of the automatic answers. Short text similarity assessment is widely used in IR and text mining related applications, such as content-based image retrieval, text summarization, text categorization and machine translation. It is also very important in user-interactive QA systems because questions and answers are usually short text. Question/answer processing depends on better understanding the semantics of questions and answers and measuring their similarity. Hence, it is necessary to develop an effective and efficient short text similarity calculation method and apply it into user-interactive QA systems. The applications of short text similarity assessment in user-interactive QA systems mainly include frequently asked question (FAQ) answering, question categorization and answer clustering. First, FAQ answering can fully make use of redundant information and shorten use response time by retrieving similar questions in user-interactive QA systems. In this dissertation, we propose a novel question similarity calculation method based on semantic space for FAQ answering. The semantic space is constructed based on accumulated questions which are previously posted and classified into predefined categories. At first, all the content words in accumulated questions are extracted as features and weighted by an entropy based weighting strategy. These features then are semantically clustered by a hierarchical clustering algorithm and a representative feature is selected for each cluster. All the representative features comprise the semantic space in which the question similarity calculation is conducted. Questions are mapped into the semantic space and represented by vectors and finally the question similarity is calculated based on these vectors. The experimental results show our method outperforms the baseline with TREC-9 test questions. Second, question categorization is a very useful module in user-interactive QA systems. It can potentially save users' time and effort by recommending suitable categories. In this dissertation, we propose an automatic method of question categorization in user-interactive QA systems. In the method, we select "important" terms extracted from accumulated questions as features to construct a feature space. Each category is then represented by a vector in the feature space. These feature vectors only need to be calculated once and loaded into memory for the later categorization. Given a new question, we firstly represent it as an initial word vector using the Term Frequency weighting method. For the words which directly indicate the question's topic, we identify them using semantic patterns and give them a higher weight by multiplying their original weight with a coefficient which is greater than 1. After that, a word similarity matrix is constructed by calculating word similarity between each word in the question and each word of the feature space based on WordNet. The initial question vector is then mapped into the feature space to enrich its semantic representation by multiplying the initial question vector with the word similarity matrix. To categorize a new question, we calculate the similarity between the new question vector and each category vector. The similarity scores are sorted in the descending order and the top k ranked categories are recommended to the user. Experimental results show our proposed method achieves good precision and outperforms the traditional categorization methods on selected test questions. Finally, answer clustering is a very effective method to organize answers well in user-interactive QA systems. In this dissertation, an answer clustering method is proposed, in which all the answers are clustered into some clusters according to their content or meaning. After clustering, answers in the same cluster are semantically similar to each other, while the meaning or content between clusters is much different. Moreover, a representative answer is selected for each cluster. Users can get the information of each cluster quickly by only reading the representative answer. In answer clustering, there are two important parts: answer similarity calculation and clustering algorithm. For the answer similarity calculation, we use a combined method with statistic similarity and semantic similarity. For the clustering algorithm, we design an incremental algorithm to reduce the computational complexity. Experimental results show that answer clustering can make users to browse answers more conveniently in real time.
Online Catalog Link:
Appears in Collections:CS - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html132 BHTMLView/Open
fulltext.html132 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer