Machine Learning Application: Classification and Summarization of Legal Documents

Liunardo, Mikhael

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/9093

Full metadata record

DC Field	Value	Language
dc.contributor.author	Liunardo, Mikhael	en_US
dc.date.accessioned	2019-01-29T06:57:22Z
dc.date.accessioned	2019-02-12T06:54:10Z	-
dc.date.available	2019-01-29T06:57:22Z
dc.date.available	2019-02-12T06:54:10Z	-
dc.date.issued	2018	en_US
dc.identifier.other	2018cslm374	en_US
dc.identifier.uri	http://144.214.8.231/handle/2031/9093	-
dc.description.abstract	Text mining is a widely researched area in machine learning technology. With various models developed such as text categorization (Moulinier, 1996), Naïve Bayes Model (McCallum and Nigam, 1998) and words to vector machines (Mikolov, Chen, Corrado and Dean, 2013), text mining became a versatile solution in many real-life world problems. The extensive models available for text mining gives a vast opportunity for its application. There are numerous research that are established relevant to legal case documents processing. Each models incorporates different algorithms and approaches which results in a unique forte for each developed model. Although many model has been established, the degree of acceptance of these model are relatively low in real life situations (Remus and Levy, 2015). One among many reasons for rejections of these models is machine learning's limitation to process high-contextual information. This research project aims to utilize text mining and machine learning technology to address the mentioned above concern. Legal documents are often complicated and difficult to be understood by commoners (Howe and Wogalter, 1994). The project is meant to create a machine learning system which produces a categorized and summarized information derived from the original legal documents. The simplified document produced by the model is designed to ease the understanding of the legal documents. The research aims to build a predictive machine learning model by utilizing a series of algorithm to produce a comprehensive automatic summarization machine. Blei, Ng and Jordan's (2003) Latent Dirichlet Allocation algorithm is implemented for identifying the major topics of the legal documents. Word2vec technique (Mikolov et al, 2013) is applied afterwards to convert sentences into vector matices, generating a feature space for LexRank algorithm (Erkan and Radev, 2004) to compute connectivity matrix of intra sentences based on IDF-modified-cosine formula to summarize the corpus. The extracted information is consolidated into a single coherent document at the final stage.	en_US
dc.title	Machine Learning Application: Classification and Summarization of Legal Documents	en_US
dc.contributor.department	Department of Computer Science	en_US
dc.description.supervisor	Supervisor: Dr. Chun, Hon Wai Andy; First Reader: Dr. Hou, Junhui; Second Reader: Dr. Yu, Yuen Tak	en_US
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Size	Format
fulltext.html	147 B	HTML	View/Open

Show simple item record