|
|
CityU Institutional Repository >
CityU Electronic Theses and Dissertations >
ETD - Dept. of Chinese, Translation and Linguistics >
CTL - Doctor of Philosophy >
Please use this identifier to cite or link to this item:
http://hdl.handle.net/2031/6507
|
| Title: | A corpus-based study of adjectives in contemporary English |
| Other Titles: | Ji yu yu liao ku de dang dai Ying yu xing rong ci yan jiu 基於語料庫的當代英語形容詞研究 |
| Authors: | Cao, Jing ( 曹競) |
| Department: | Department of Chinese, Translation and Linguistics |
| Degree: | Doctor of Philosophy |
| Issue Date: | 2011 |
| Publisher: | City University of Hong Kong |
| Subjects: | English language -- Adjective. Corpora (Linguistics) |
| Notes: | CityU Call Number: PE1241 .C36 2011 xvi, 227 leaves : ill. 30 cm. Thesis (Ph.D.)--City University of Hong Kong, 2011. Includes bibliographical references (leaves 184-190) |
| Type: | thesis |
| Abstract: | This thesis describes a systematic study of adjectives in contemporary English. In
particular, the study employs large collections of naturally occurring texts to investigate
the use of adjectives against different text categories as embodied in the corpora. A
major objective is to chart the correlation between different types of texts and the
variational characteristics of adjectival constructions. The study draws on the most
recent development in corpus linguistics as well as the state of the art in artificial
intelligence in general and machine learning techniques in particular for the extraction
and analysis of differentiating adjectival features in relation to a taxonomy of text
categories. Three major research issues are identified, including 1) the correlation
between adjectives and text categories, 2) the correlation between adjectives and subject
domains, and 3) the possible application of adjectives in the field of automatic text
classification.
Two corpora are employed in the investigation. The British National Corpus (BNC) is
used for lexical investigation and the British component of International Corpus of
English (ICE-GB) is used for the exploration of syntactic features. While the empirical
data obtained from the BNC confirms the findings from previous studies that adjectives
are preferred in writing than in speech, and that adjectives tend to occur more often in
more formal texts than in less formal texts, an analysis of the textual dispersion of
adjectives across different text categories has further suggested that adjectives are more
likely to be a stylistic factor rather than a content factor. Linear regression analyses also show a strong positive correlation between the distributional probability of adjective use
and text formality. Cluster analyses reveal that, using the value of Chi by degrees of
freedom computed on the combination of chi-square and adjective frequency, text
categories that are stylistically related to each other can be meaningfully grouped
together. Machine learning techniques available in Weka are used to evaluate the
performance of adjectival probability in automatic text classification in terms of
precision, recall and F-measure. The results suggest that adjectives can be used as a
strong characteristic for automatic text classification.
The syntactic features of adjective phrases (AJPs) in the ICE-GB are examined in terms
of their internal structures and external, clausal functions. In addition to the survey of
distributions of syntactic features, multiple regression analyses suggest a significant
correlation between the distributional probability of syntactic properties and text
formality. Such a correlation is then tested by using machine learning techniques.
Results show that the AJP syntactic probability can contribute to automatic text
classification, and that the performance of AJP syntactic probability differs in different
classification tasks.
In addition, adjectives in terminology are investigated to provide additional information
about adjectives in specialized domains. A sub-corpus is constructed specially for the
study out of the ICE-GB, and is annotated at the terminological level. The distribution
of adjectives in term expressions, defined as term-ADJs, is examined across eight
subject domains. Linear regression analyses of the observed distributional features suggest a strong positive correlation between the use of term-ADJs and text formality,
and results obtained from machine learning techniques show that term-ADJs in general
contribute more to text classification than nouns, verbs and adverbs.
To sum up, the research described in this thesis represents the following unique
contributions to the existing body of knowledge. First, it is the first large-scale corpusbased
empirical study of adjectives in present-day British English. Secondly, it is the
first corpus-based linguistic study that proposes the use of machine learning techniques
to help provide linguistic insight into language use. Significant results have been
achieved. Firstly, the investigations have answered some of the fundamental questions
concerning the distributional behavior of adjectives across a broad range of text
categories and subject domains. Secondly, the study has also lent some profound insight
into the relationship between the distributional probability of adjectives and text
formality, hence addressing at the same time the issue of applying adjectives to the
automatic measurement of text similarity for automatic text classification. Thus, this
research has made several contributions through 1) in-depth experiments on the
correlation between adjectives and formality degree of texts, 2) use of adjectives to
measure text similarity, 3) in-depth investigation of adjective use in terminology, and 4)
in-depth experiments on the use of adjectival characteristics in the field of automatic
text classification. The findings in this research will serve as the basis for the future
work to build a regression model so as to express the relationship between adjectives
and text categories. |
| Online Catalog Link: | http://lib.cityu.edu.hk/record=b4086054 |
| Appears in Collections: | CTL - Doctor of Philosophy
|
Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.
|