Skip navigation
Run Run Shaw Library City University of Hong KongRun Run Shaw Library

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/9528
Full metadata record
DC FieldValueLanguage
dc.contributor.authorHsu, To-Liangen_US
dc.date.accessioned2022-04-27T03:04:43Z-
dc.date.available2022-04-27T03:04:43Z-
dc.date.issued2021en_US
dc.identifier.citationTo-Liang, H. (2021). Academic and formal writing style rewriter (Outstanding Academic Papers by Students (OAPS), City University of Hong Kong).en_US
dc.identifier.othercs2021-4514-ht320en_US
dc.identifier.urihttp://dspace.cityu.edu.hk/handle/2031/9528-
dc.description.abstractThe two perspectives of this project, "Academic Writing Style" and "Formal Writing Style," have long been important in the field of applied English. The academic writing style taught in colleges and universities aims to assist scholars and students in communicating precisely. On the other hand, the formal writing style has wider application scenarios in business and industry. Given that not much research has been done in academic writing style rewrite, this project proposed a new task to rewrite informal sentences in a formal style with academic writing features, which by nature is a sequence to sequence task in the field of Natural Language Generation (NLG). There is no existing dataset suitable for the task proposed in the project. Therefore, Grammarly's Yahoo Answers Formality Corpus (GYAFC), a corpus popularly used in text formality transfer, is rewritten programmatically and manually to integrate with academic writing features. With 100K sentences from the train/evaluation set and 2K sentences from the test set rewritten with the academic writing style, a new corpus GYAFC-academic is generated and utilized in the training process. The main approach proposed in this project is the state-of-the-art transformer model, which outperforms the Recurrent Neural Network models in many fields in Natural Language Processing. Furthermore, another essential mechanism applied is "warm-starting," the process of adopting pre-trained model checkpoints into encoder-decoder models. By utilizing the pre-trained models, the time and computation power needed for training is reduced efficiently. Three evaluation methods are proposed in this project to compare the model performance with the benchmark models. There are two style classifiers (Academic Style Classifier, Formality Style Classifier) build as style transfer accuracy indicators and one python grammar check package (Language Tool Python) for fluency assessment. The results demonstrate that models proposed in this project perform well in style transfer accuracy and outperform the benchmark models by a significant margin in terms of grammar accuracy.en_US
dc.rightsThis work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.en_US
dc.rightsAccess is unrestricted.en_US
dc.titleAcademic and formal writing style rewriteren_US
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.description.courseCS4514 Projecten_US
dc.description.programmeBachelor of Science (Honours) in Computer Scienceen_US
dc.description.supervisorDr. Song, Linqien_US
Appears in Collections:OAPS - Dept. of Computer ScienceĀ 

Files in This Item:
File SizeFormat 
fulltext.html153 BHTMLView/Open
Show simple item record


Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.

Send feedback to Library Systems
Privacy Policy | Copyright | Disclaimer