Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/9528
Title: | Academic and formal writing style rewriter |
Authors: | Hsu, To-Liang |
Department: | Department of Computer Science |
Issue Date: | 2021 |
Course: | CS4514 Project |
Programme: | Bachelor of Science (Honours) in Computer Science |
Supervisor: | Dr. Song, Linqi |
Citation: | To-Liang, H. (2021). Academic and formal writing style rewriter (Outstanding Academic Papers by Students (OAPS), City University of Hong Kong). |
Abstract: | The two perspectives of this project, "Academic Writing Style" and "Formal Writing Style," have long been important in the field of applied English. The academic writing style taught in colleges and universities aims to assist scholars and students in communicating precisely. On the other hand, the formal writing style has wider application scenarios in business and industry. Given that not much research has been done in academic writing style rewrite, this project proposed a new task to rewrite informal sentences in a formal style with academic writing features, which by nature is a sequence to sequence task in the field of Natural Language Generation (NLG). There is no existing dataset suitable for the task proposed in the project. Therefore, Grammarly's Yahoo Answers Formality Corpus (GYAFC), a corpus popularly used in text formality transfer, is rewritten programmatically and manually to integrate with academic writing features. With 100K sentences from the train/evaluation set and 2K sentences from the test set rewritten with the academic writing style, a new corpus GYAFC-academic is generated and utilized in the training process. The main approach proposed in this project is the state-of-the-art transformer model, which outperforms the Recurrent Neural Network models in many fields in Natural Language Processing. Furthermore, another essential mechanism applied is "warm-starting," the process of adopting pre-trained model checkpoints into encoder-decoder models. By utilizing the pre-trained models, the time and computation power needed for training is reduced efficiently. Three evaluation methods are proposed in this project to compare the model performance with the benchmark models. There are two style classifiers (Academic Style Classifier, Formality Style Classifier) build as style transfer accuracy indicators and one python grammar check package (Language Tool Python) for fluency assessment. The results demonstrate that models proposed in this project perform well in style transfer accuracy and outperform the benchmark models by a significant margin in terms of grammar accuracy. |
Appears in Collections: | OAPS - Dept. of Computer ScienceĀ |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 153 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.