CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
|Title: ||Research of passage retrieval for question answering system|
|Other Titles: ||Mian xiang wen da xi tong de duan luo jian suo ji shu yan jiu|
|Authors: ||Li, Xin (黎新)|
|Department: ||Department of Computer Science|
|Degree: ||Doctor of Philosophy|
|Issue Date: ||2010|
|Publisher: ||City University of Hong Kong|
|Subjects: ||Question-answering systems.|
Text processing (Computer science)
|Notes: ||CityU Call Number: QA76.9.Q4 L5 2010|
viii, 86 leaves : ill. 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2010.
Includes bibliographical references (leaves 77-86)
|Abstract: ||The quick development of the Web has made it a huge information source and an important platform in which people can exchange and share knowledge. For example, users can easily acquire information from the Web with the help of search engines. However, the information in the Web is so huge that it is difficult for users to identify and select valuable information. Hence, how to accurately retrieve and extract the information needed by users has been an important research topic. Question Answering (QA) system has been an important research topic which is the important research direction for next generation search engines. The features of question answering system are: firstly, it allows users to submit a query using natural language question instead of keywords; secondly, the responses to users are concise and precise answer instead of a list of documents. Users can accurately describe their information requirement and QA systems can understand the users' needs and make correct response.
The document retrieval module is an important component of the Web-based QA system. Usually, the retrieved documents undergoes several computation-intensive procedures including natural language processing, information extraction, and pattern matching, to determine the most likely answers. It could be more efficient if QA systems reduce the size of each document to be processed. Passage retrieval, which aims to find the text excerpts that may contain the exact answer of the given question, has long been studied in IR and recently has been an important component in QA systems. The passage retrieval module is usually added as an intermediate stage between the document retrieval module and answer extraction module. It can facilitate quick answer extraction and improve the efficiency of answer finding by users.
This dissertation firstly analyzes the evaluation methods of document relevance. It can be found that the document relevance is mainly density-based lexical relevance. Hence, the document retrieval methods can not be applied to passage retrieval directly. We discuss the definition of question answering passage retrieval and demonstrate the differences between document retrieval and passage retrieval in the aspects of topic, length and keyword. We then propose some heuristic methods for designing passage retrieval formulas which can be more fit to the requirement of QA systems.
Secondly, this dissertation proposes a Web-based question answering passage retrieval method. The dissertation describes the definition of passage retrieval and introduces the basic work-flow and the function of each component. The dissertation proposes a heuristics query rewrite method to solve this problem. The keywords are not considered independently but utilize the constraints relations to perform the keywords matching and calculate the relevance score.
Finally, this dissertation proposes a novel mixture relevance model based on multi-features. The dissertation explores the effectiveness of lexical similarity, topic similarity and structure similarity on passage retrieval. A web-based method of computing similarity between words is proposed and it is utilized to calculate the lexical similarity between a question and a passage. The dissertation then proposes a probabilistic topic language model to calculate the topic similarity between a question and a passage. For structure similarity, two structures which are "wh-movement" and "predicate-argument" are mainly considered. We then integrate the three different similarity metrics into a weighted average metrics for evaluation of the relevance of between a passage and a question.
Key Words: Web, Question Answering, Passage Retrieval, Lexical Similarity, Topic Language Model, Structure Similarity, Relevance|
|Online Catalog Link: ||http://lib.cityu.edu.hk/record=b3947519|
|Appears in Collections:||CS - Doctor of Philosophy |
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.