CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science >
CS - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
|Title: ||Information extraction from text for question answering systems|
|Other Titles: ||Wen da xi tong zhong de wen ben xin xi chou qu|
|Authors: ||Li, Huan (李欢)|
|Department: ||Department of Computer Science|
|Degree: ||Doctor of Philosophy|
|Issue Date: ||2010|
|Publisher: ||City University of Hong Kong|
|Subjects: ||Question-answering systems.|
Text processing (Computer science)
|Notes: ||CityU Call Number: QA76.9.Q4 L47 2010|
xiii, 119 leaves : ill. 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2010.
Includes bibliographical references (leaves 105-119)
|Abstract: ||With the rapid development of the Web, people could easily store data, exchange information and share knowledge on this platform. However, the great amount of data on the Web brings difficulty to users to obtain their required knowledge efficiently. Hence, web-based Information Retrieval and Information Extraction become important research topics. When the search engines become inadequate to meet people‘s growing need, how to appropriately make use of the abundant resources and make machine understand the information among them, becomes a popular research area during the Internet Age. Moreover, Question Answering which is based on Information Retrieval and Natural Language Understanding flourishes under this kind of circumstance. A Question Answering system takes questions represented in natural language instead of keywords as input, in order for users to express their requirements conveniently and clearly. A Question Answering system returns short answer to users instead of relevant documents, which facilitates information acquisition.
Question Answering (QA) systems can be categorized into automatic QA systems and user-interactive QA systems, according to whether the answers are provided by human users or machines. They can also be categorized into open domain QA systems and specific domain QA systems, according to the domain(s) of questions it can handle. The former do not impose restrictions on the scope of questions; and the systems attempt to find answers to any questions about any topic. The latter only accept questions of a certain domain; and domain knowledge always provides guidance in the QA process. In this dissertation, we focus on applying Information Extraction in QA and do research in the two different kinds of QA systems. In open domain QA, we investigate how to improve the semantic analysis over questions, how to make use of historical databases, in order to enhance machine intelligence. In specific domain QA, we study how to use experiences to solve new problems, in order to increase the precision of the return answers. The main research works and contributions are described as follows:
First, correct semantic analysis over a question is the key to catch the user‘s requirement. In this dissertation, we research into semantic constraints detection among text; expect to correctly detect the semantic constraint parts that are denoted by signal words; correctly disambiguate multisemantic constraints when they are denoted by the same signal word. We propose a method for multiple constraint relation detection based on dependency tree matching. For every kind of semantic constraint, we collect signal words and relevant example sentences to build our case-base. We define partial dependency tree (PDT) kernel to compute the similarity between two objects. Apriori algorithm is applied to decrease the complexity of this kernel function computing.
Second, a large amount of historical data are accumulated, in spite of in an open domain QA system or a specific domain QA system. In order to efficiently reuse data, in this dissertation we research into knowledge extraction only from historical data-bases. We expect to translate short-answer QA pairs into structured expressions; return answer automatically through retrieving among the knowledge base. In this case, we first avoid time-consuming handwork to build a knowledge base. Second, a reference answer will be given to the user automatically by our user-interactive system, thus provides convenience to users. The workflow of transformation from question answer pairs to knowledge base is described. We combine semantic pattern matching and the above multiple constraint relation detection together, to obtain information among the question sentences. Semantic network based structure is used to express the interconnected knowledge pieces. A user-interactive prototype is implemented to demonstrate the whole process of knowledge base construction, management and utilization.
Finally, domain knowledge plays an important role in specific domain QA. In some domain, experiences are the best gist of solving new problems. In this dissertation, we take the task of growth environment design for garden flowers as a background, and research into reusing experiences in a specific domain based on the Case-Based Reasoning method. A method of learning adaptation rules for case-based reasoning (CBR) is proposed. The Resource Space Model and the Semantic Link Network are applied in case-base construction for efficient resource management and reuse. Adaptation rules are generated only from the case-base based on case comparison. Relations between cases and general domain knowledge provide guidance during the similarity computing. The adaptation rules are refined before they are applied in the revision process. Distance measurement and confidence value are used to improve the accuracy of adaptation rules. After solving each new problem, the adaptation rule set is updated by an evolution module in the retention process.|
|Online Catalog Link: ||http://lib.cityu.edu.hk/record=b3947517|
|Appears in Collections:||CS - Doctor of Philosophy |
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.