City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Chinese, Translation and Linguistics  >
CTL - Doctor of Philosophy  >

Please use this identifier to cite or link to this item:

Title: A corpus-based study of adjectives in contemporary English
Other Titles: Ji yu yu liao ku de dang dai Ying yu xing rong ci yan jiu
Authors: Cao, Jing ( 曹競)
Department: Department of Chinese, Translation and Linguistics
Degree: Doctor of Philosophy
Issue Date: 2011
Publisher: City University of Hong Kong
Subjects: English language -- Adjective.
Corpora (Linguistics)
Notes: CityU Call Number: PE1241 .C36 2011
xvi, 227 leaves : ill. 30 cm.
Thesis (Ph.D.)--City University of Hong Kong, 2011.
Includes bibliographical references (leaves 184-190)
Type: thesis
Abstract: This thesis describes a systematic study of adjectives in contemporary English. In particular, the study employs large collections of naturally occurring texts to investigate the use of adjectives against different text categories as embodied in the corpora. A major objective is to chart the correlation between different types of texts and the variational characteristics of adjectival constructions. The study draws on the most recent development in corpus linguistics as well as the state of the art in artificial intelligence in general and machine learning techniques in particular for the extraction and analysis of differentiating adjectival features in relation to a taxonomy of text categories. Three major research issues are identified, including 1) the correlation between adjectives and text categories, 2) the correlation between adjectives and subject domains, and 3) the possible application of adjectives in the field of automatic text classification. Two corpora are employed in the investigation. The British National Corpus (BNC) is used for lexical investigation and the British component of International Corpus of English (ICE-GB) is used for the exploration of syntactic features. While the empirical data obtained from the BNC confirms the findings from previous studies that adjectives are preferred in writing than in speech, and that adjectives tend to occur more often in more formal texts than in less formal texts, an analysis of the textual dispersion of adjectives across different text categories has further suggested that adjectives are more likely to be a stylistic factor rather than a content factor. Linear regression analyses also show a strong positive correlation between the distributional probability of adjective use and text formality. Cluster analyses reveal that, using the value of Chi by degrees of freedom computed on the combination of chi-square and adjective frequency, text categories that are stylistically related to each other can be meaningfully grouped together. Machine learning techniques available in Weka are used to evaluate the performance of adjectival probability in automatic text classification in terms of precision, recall and F-measure. The results suggest that adjectives can be used as a strong characteristic for automatic text classification. The syntactic features of adjective phrases (AJPs) in the ICE-GB are examined in terms of their internal structures and external, clausal functions. In addition to the survey of distributions of syntactic features, multiple regression analyses suggest a significant correlation between the distributional probability of syntactic properties and text formality. Such a correlation is then tested by using machine learning techniques. Results show that the AJP syntactic probability can contribute to automatic text classification, and that the performance of AJP syntactic probability differs in different classification tasks. In addition, adjectives in terminology are investigated to provide additional information about adjectives in specialized domains. A sub-corpus is constructed specially for the study out of the ICE-GB, and is annotated at the terminological level. The distribution of adjectives in term expressions, defined as term-ADJs, is examined across eight subject domains. Linear regression analyses of the observed distributional features suggest a strong positive correlation between the use of term-ADJs and text formality, and results obtained from machine learning techniques show that term-ADJs in general contribute more to text classification than nouns, verbs and adverbs. To sum up, the research described in this thesis represents the following unique contributions to the existing body of knowledge. First, it is the first large-scale corpusbased empirical study of adjectives in present-day British English. Secondly, it is the first corpus-based linguistic study that proposes the use of machine learning techniques to help provide linguistic insight into language use. Significant results have been achieved. Firstly, the investigations have answered some of the fundamental questions concerning the distributional behavior of adjectives across a broad range of text categories and subject domains. Secondly, the study has also lent some profound insight into the relationship between the distributional probability of adjectives and text formality, hence addressing at the same time the issue of applying adjectives to the automatic measurement of text similarity for automatic text classification. Thus, this research has made several contributions through 1) in-depth experiments on the correlation between adjectives and formality degree of texts, 2) use of adjectives to measure text similarity, 3) in-depth investigation of adjective use in terminology, and 4) in-depth experiments on the use of adjectival characteristics in the field of automatic text classification. The findings in this research will serve as the basis for the future work to build a regression model so as to express the relationship between adjectives and text categories.
Online Catalog Link:
Appears in Collections:CTL - Doctor of Philosophy

Files in This Item:

File Description SizeFormat
abstract.html133 BHTMLView/Open
fulltext.html133 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer