|
|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/6092
|
| Title: | Applications of short text similarity assessment in user-interactive question answering |
| Other Titles: | Duan wen ben xiang si du ji suan zai yong hu jiao hu shi wen da xi tong zhong de ying yong 短文本相似度計算在用戶交互式問答系統中的應用 |
| Authors: | Song, Wanpeng (宋萬鵬) |
| Department: | Department of Computer Science |
| Degree: | Doctor of Philosophy |
| Issue Date: | 2010 |
| Publisher: | City University of Hong Kong |
| Subjects: | Question-answering systems. Text processing (Computer science) |
| Notes: | CityU Call Number: QA76.9.Q4 S66 2010 101 leaves : ill. 30 cm. Thesis (Ph.D.)--City University of Hong Kong, 2010. Includes bibliographical references (leaves 91-101) |
| Type: | thesis |
| Abstract: | With the dramatic development of the Internet and the emergency of Web 2.0,
Question Answering (QA) becomes a new Information Retrieval (IR) technology.
Unlike search engines, which return a lot of relevant documents, QA systems give one
or several exact answers for each user question, which is more preferable. However,
traditional automatic QA systems suffer from poor answer quality problem because it
is very difficult for machines to understand questions in natural languages well. To
solve this problem, User-interactive QA systems have been developed and become
very popular Web-based services. Unlike the traditional automatic QA systems which
totally obtain answers automatically, the user-interactive QA systems serve as interactive
platforms for users to help each other with human-provided answers, which
overcome the shortcoming of poor quality of the automatic answers.
Short text similarity assessment is widely used in IR and text mining related applications,
such as content-based image retrieval, text summarization, text categorization
and machine translation. It is also very important in user-interactive QA systems
because questions and answers are usually short text. Question/answer processing depends
on better understanding the semantics of questions and answers and measuring
their similarity. Hence, it is necessary to develop an effective and efficient short text
similarity calculation method and apply it into user-interactive QA systems. The applications
of short text similarity assessment in user-interactive QA systems mainly
include frequently asked question (FAQ) answering, question categorization and answer
clustering.
First, FAQ answering can fully make use of redundant information and shorten
use response time by retrieving similar questions in user-interactive QA systems. In
this dissertation, we propose a novel question similarity calculation method based on
semantic space for FAQ answering. The semantic space is constructed based on accumulated
questions which are previously posted and classified into predefined categories.
At first, all the content words in accumulated questions are extracted as features
and weighted by an entropy based weighting strategy. These features then are semantically clustered by a hierarchical clustering algorithm and a representative feature
is selected for each cluster. All the representative features comprise the semantic
space in which the question similarity calculation is conducted. Questions are mapped
into the semantic space and represented by vectors and finally the question similarity
is calculated based on these vectors. The experimental results show our method outperforms
the baseline with TREC-9 test questions.
Second, question categorization is a very useful module in user-interactive QA
systems. It can potentially save users' time and effort by recommending suitable categories.
In this dissertation, we propose an automatic method of question categorization
in user-interactive QA systems. In the method, we select "important" terms extracted
from accumulated questions as features to construct a feature space. Each
category is then represented by a vector in the feature space. These feature vectors
only need to be calculated once and loaded into memory for the later categorization.
Given a new question, we firstly represent it as an initial word vector using the Term
Frequency weighting method. For the words which directly indicate the question's
topic, we identify them using semantic patterns and give them a higher weight by
multiplying their original weight with a coefficient which is greater than 1. After that,
a word similarity matrix is constructed by calculating word similarity between each
word in the question and each word of the feature space based on WordNet. The initial
question vector is then mapped into the feature space to enrich its semantic representation
by multiplying the initial question vector with the word similarity matrix. To
categorize a new question, we calculate the similarity between the new question vector
and each category vector. The similarity scores are sorted in the descending order
and the top k ranked categories are recommended to the user. Experimental results
show our proposed method achieves good precision and outperforms the traditional
categorization methods on selected test questions.
Finally, answer clustering is a very effective method to organize answers well in
user-interactive QA systems. In this dissertation, an answer clustering method is proposed,
in which all the answers are clustered into some clusters according to their
content or meaning. After clustering, answers in the same cluster are semantically similar to each other, while the meaning or content between clusters is much different.
Moreover, a representative answer is selected for each cluster. Users can get the information
of each cluster quickly by only reading the representative answer. In answer
clustering, there are two important parts: answer similarity calculation and clustering
algorithm. For the answer similarity calculation, we use a combined method with statistic
similarity and semantic similarity. For the clustering algorithm, we design an
incremental algorithm to reduce the computational complexity. Experimental results
show that answer clustering can make users to browse answers more conveniently in
real time. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b3947524 |
| Appears in Collections: | CS - Doctor of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|