Stylometric Query Processing using Deep Learning

Tungare, Ninad

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/62

Title:	Stylometric Query Processing using Deep Learning
Authors:	Tungare, Ninad
Department:	Department of Computer Science
Issue Date:	2017
Supervisor:	Supervisor: Dr. Nutanong, Sarana; First Reader: Dr. Wong, Ka Chun; Second Reader: Prof. Wang, Lusheng
Abstract:	Stylometry is defines as the statistical analysis of the variations in literary styles of different authors. This technique has been used for many years for authorship identification and authorship verification with high accuracy. It is believed that every person has a unique style of writing which can be used to detect the true owner of that work. However, these studies have limitation in terms of size of the text and author set. They give better results with documents of similar lengths and a small set of authors. With availability of books in electronic form and umpteen computational powers, a lot of research is going on in the field of Stylometry. Machine learning techniques are being used to solve this problem. As per research done by LeCun et al. [2] character-level convolutional networks are showing promising results but the major drawback of the method is that it requires a lot of data. If the dataset size goes to the scale of several millions, character-level ConvNets starts to show better results. In real situations, however, availability of such a huge amount of data is not possible. In this project, we plan to handle large amount of texts with varying sizes written by considerably large number of authors. The results should also be accurate and computational cost should be as little as possible. The authorship identification is to be done with emerging big data and machine learning concepts. By detecting the style of the author and comparing it against the previously written documents, the real owner of the work can be identified. The Stylometric Query Processing will not only help perform simple database interactions with the underlying Stylometric Data but also help in performing complex operations such as, given a set of authors, who could be the writer of the given text? We have carried out exploratory study to make the original model work and managed to improve the accuracy of that model.
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Description	Size	Format
fulltext.html		147 B	HTML	View/Open

Show full item record